### Version 2. CV score is still loglik

### Compare FL, SL, GEN, and Logistic regressions (LASSO, Ridge, without penalty - Sklearn)
#### Original image size is 128 X 128. Here we compress it to 32 x 32
#### Orignal data has 4 classes from not demented to moderate demented. Here we pick 800 images from non-demented labelled as "healthy" - 0, and 400 mild-demented images labelled as "sick" - 1 for a binary classification task. The tuning set is 40% of the whole data; training set 40% and test set 20%: 480/480/240, and p = 1024

In [1]:
import cv2
import PIL
import matplotlib.pyplot as plt 
import numpy as np 
import pathlib 

In [2]:
path = 'C:/Users/sswei/Desktop/running time/AD3/'
data_dir = pathlib.Path(path)

In [3]:
sick = list(data_dir.glob('1/*'))

In [4]:
healthy = list(data_dir.glob('0/*'))

In [5]:
len(healthy)

800

### Compress image size to 32 x 32 pixels (speed up experiments)

In [6]:
X1_all = np.vstack([np.asarray(cv2.resize(plt.imread(str(sick[i])), (32, 32))).flatten() for i in range(len(sick))])

In [7]:
y1_all = np.ones(len(sick))

In [8]:
X0_all = np.vstack([np.asarray(cv2.resize(plt.imread(str(healthy[i])), (32, 32))).flatten() for i in range(len(healthy))])

In [9]:
y0_all = np.zeros(len(healthy))

#### Make tuning, train, test sets

In [10]:
from sklearn.model_selection import train_test_split

In [11]:
X1_train, X1_test, y1_train, y1_test = train_test_split(X1_all, y1_all, test_size=0.2, random_state=42)

In [12]:
X1_train, X1_val, y1_train, y1_val = train_test_split(X1_train, y1_train, test_size=0.5, random_state=42)

In [13]:
X0_train, X0_test, y0_train, y0_test = train_test_split(X0_all, y0_all, test_size=0.2, random_state=42)

In [14]:
X0_train, X0_val, y0_train, y0_val = train_test_split(X0_train, y0_train, test_size=0.5, random_state=42)

In [15]:
X_train = np.concatenate((X1_train, X0_train))

In [16]:
y_train  = np.concatenate((y1_train, y0_train))

In [17]:
X_test = np.concatenate((X1_test, X0_test))
y_test  = np.concatenate((y1_test, y0_test))

In [18]:
X_val = np.concatenate((X1_val, X0_val))
y_val  = np.concatenate((y1_val, y0_val))

### normalize each feature to have mean 0, std 1

In [19]:
from sklearn import preprocessing

In [20]:
X_test = preprocessing.StandardScaler().fit(X_test).transform(X_test)

In [21]:
X_train = preprocessing.StandardScaler().fit(X_train).transform(X_train)

In [22]:
X_val = preprocessing.StandardScaler().fit(X_val).transform(X_val)

#### Fit graph based models: 
#### graph: 2-D grid graph of size 32x32
#### Tuning over version 2 grid: gridlogit  = {'l1': [0, 0.005, 0.01, 0.1, 0.2, 0.5], 'l2': [0, 0.005, 0.01, 0.1, 0.2, 0.5]} (modify in signals.py)

In [23]:
from signals import *
from skest import *

In [24]:
D = grid_incidence(32)

In [25]:
X_val.shape

(480, 1024)

##### Tuning
##### Note that here we are using loglikelihood as cv scorer
#### Caution! GridsearchCV default does not shuffle the data. Here it is necessary to shuffle. ? Seems to make no difference

In [26]:
from sklearn.utils import shuffle
X_val, y_val = shuffle(X_val, y_val)

In [27]:
naive_cv_logit(Log_FL, X_val, y_val, D) 

({'l1': 0, 'l2': 0.005}, 321.8581862449646)

In [28]:
naive_cv_logit(Log_SL, X_val, y_val, D) 

({'l1': 0.005, 'l2': 0.005}, 309.5065143108368)

In [29]:
naive_cv_logit(Log_OUR, X_val, y_val, D) 

({'l1': 0, 'l2': 0.01}, 341.92374634742737)

##### Fitting graph based methods

In [30]:
X_train, y_train = shuffle(X_train, y_train)

In [31]:
clf1 = Log_FL(0, 0.005, D).fit(X_train, y_train)

In [32]:
clf2 = Log_SL(0.005, 0.005, D).fit(X_train, y_train)

In [33]:
clf3 = Log_OUR(0, 0.01, D).fit(X_train, y_train)

##### Prediction Accuracy and sensitivity

In [34]:
def acc(clf):
    return 1 - np.sum(np.abs(y_test - clf.predict(X_test)))/len(y_test)
def sen(clf):
    return 1 - np.sum(np.abs(y_test[y_test == 1] - clf.predict(X_test[y_test == 1])))/len(y_test[y_test == 1])

In [35]:
X_test, y_test = shuffle(X_test, y_test)

##### FL: 
##### accuracy

In [36]:
acc(clf1)

0.8541666666666666

##### sensitivity:

In [37]:
sen(clf1)

0.675

##### SL:

In [38]:
acc(clf2)

0.8916666666666666

In [39]:
sen(clf2)

0.7375

##### GEN:

In [40]:
acc(clf3)

0.9041666666666667

In [41]:
sen(clf3)

0.7875

#### We may also compare to Logistic regression methods (ridge, lasso, non-penalty)

In [47]:
import sklearn.linear_model

#### Without any penalty

In [48]:
clf4 = sklearn.linear_model.LogisticRegression(penalty = 'none')

In [49]:
clf4.fit(X_train, y_train)

LogisticRegression(penalty='none')

In [50]:
acc(clf4)

0.9375

In [51]:
sen(clf4)

0.85

#### lasso

In [52]:
clf5 = sklearn.linear_model.LogisticRegression(penalty = 'l1', solver = 'saga')

In [53]:
clf5.fit(X_train, y_train)



LogisticRegression(penalty='l1', solver='saga')

In [54]:
acc(clf5)

0.9375

In [55]:
sen(clf5)

0.8375

#### Ridge

In [64]:
clf6 = sklearn.linear_model.LogisticRegression(penalty = 'l2')

In [65]:
clf6.fit(X_train, y_train)

LogisticRegression()

In [66]:
acc(clf6)

0.9375

In [67]:
sen(clf6)

0.8375

## Conclusions:
#### The $l_1, l_2$ penalties are too small, might need to modify the loss function (remove the standard 1/n factor for the loss part), so $l_1, l_2 = $0.005 * n ~ 2.5. 
#### In previous experiments, FL, SL, GEN have better sensitivity than Sklearn Logistic regression methods, but worse accuracy. And GEN has identical performance as the best one out of FL and SL.
#### In this experiment, Sklearn Logistic regression methods are better than graph based method. But GEN is better than FL and SL in both accuracy and sensitivity