# Baseline

## Set-up
- Download the [Depresjon dataset](https://datasets.simula.no/depresjon/), unpack it, and place it in the current directory.

In [1]:
import numpy as np
import os
from sklearn.linear_model import LogisticRegressionCV
from sklearn.metrics import f1_score

In [2]:
seed = 0

## Data

We generate a simple dataset with the mean and standard deviation of a given person's activity as our two features.

In [3]:
data_dir = 'data'

n = len(os.listdir(os.path.join(data_dir, 'condition'))) + len(os.listdir(os.path.join(data_dir, 'control')))
data = np.empty((n, 3))
i = 0

for k, v in {'condition': 1, 'control': 0}.items():
    for file in os.listdir(os.path.join(data_dir, k)):
        activity = np.genfromtxt(os.path.join(data_dir, k, file), delimiter=',', skip_header=True, usecols=[2])
        data[i, :] = np.array([np.mean(activity), np.std(activity), v])
        i += 1

X = data[:, :2]
y = data[:, 2]

## Model

We use a logistic regression model using 11-fold cross validation as our baseline.

In [4]:
folds = int(n / 5)
clf = LogisticRegressionCV(cv=folds, random_state=seed).fit(X, y)
y_pred = clf.predict(X)
acc = clf.score(X, y)
f1 = f1_score(y, y_pred)
print('Acc: ', acc)
print('F1: ', f1)

Acc:  0.6363636363636364
F1:  0.47368421052631576
