### Version 2. CV score is still loglik

### Compare FL, SL, GEN, and Logistic regressions (LASSO, Ridge, without penalty - Sklearn)
#### Original image size is 128 X 128. Here we compress it to 32x 32
#### Orignal data has 4 classes from not demented to moderate demented. Here we pick 800 images from non-demented labelled as "healthy" - 0, and 400 mild-demented images labelled as "sick" - 1 for a binary classification task. The tuning set is 40% of the whole data; training set 40% and test set 20%: 480/480/240, and p = 1024

In [81]:
import cv2
import PIL
import matplotlib.pyplot as plt 
import numpy as np 
import pathlib 

In [82]:
path = 'C:/Users/sswei/Desktop/running time/AD4/'
data_dir = pathlib.Path(path)

In [83]:
sick = list(data_dir.glob('1/*'))

In [84]:
healthy = list(data_dir.glob('0/*'))

In [85]:
len(healthy)

800

### Compress image size to 32x 32 pixels (speed up experiments)

In [86]:
X1_all = np.vstack([np.asarray(cv2.resize(plt.imread(str(sick[i])), (32, 32))).flatten() for i in range(len(sick))])

In [87]:
y1_all = np.ones(len(sick))

In [88]:
X0_all = np.vstack([np.asarray(cv2.resize(plt.imread(str(healthy[i])), (32, 32))).flatten() for i in range(len(healthy))])

In [89]:
y0_all = np.zeros(len(healthy))

#### Make tuning, train, test sets

In [90]:
from sklearn.model_selection import train_test_split

In [91]:
X1_train, X1_test, y1_train, y1_test = train_test_split(X1_all, y1_all, test_size=0.2, random_state=111)

In [92]:
X1_train, X1_val, y1_train, y1_val = train_test_split(X1_train, y1_train, test_size=0.5, random_state=111)

In [93]:
X0_train, X0_test, y0_train, y0_test = train_test_split(X0_all, y0_all, test_size=0.2, random_state=111)

In [94]:
X0_train, X0_val, y0_train, y0_val = train_test_split(X0_train, y0_train, test_size=0.5, random_state=111)

In [95]:
X_train = np.concatenate((X1_train, X0_train))

In [96]:
y_train  = np.concatenate((y1_train, y0_train))

In [97]:
X_test = np.concatenate((X1_test, X0_test))
y_test  = np.concatenate((y1_test, y0_test))

In [98]:
X_val = np.concatenate((X1_val, X0_val))
y_val  = np.concatenate((y1_val, y0_val))

### normalize each feature to have mean 0, std 1

In [99]:
from sklearn import preprocessing

In [100]:
X_test = preprocessing.StandardScaler().fit(X_test).transform(X_test)

In [101]:
X_train = preprocessing.StandardScaler().fit(X_train).transform(X_train)

In [102]:
X_val = preprocessing.StandardScaler().fit(X_val).transform(X_val)

#### Fit graph based models: 
#### graph: 2-D grid graph of size 32 x 32
#### Here I have removed the 1/n factor for loss function


In [103]:
from signals import *
from skest import *

In [104]:
D = grid_incidence(32)

In [105]:
X_test.shape

(240, 1024)

##### Tuning
##### Note that here we are using loglikelihood as cv scorer
#### Caution! GridsearchCV default does not shuffle the data. Here it is necessary to shuffle. ? Seems to make no difference

In [152]:
from sklearn.utils import shuffle
X_val, y_val = shuffle(X_val, y_val)

In [153]:
naive_cv_logit(Log_LA, X_val, y_val) 

({'l1': 2.5, 'l2': 0}, 684.8825187683105)

In [154]:
naive_cv_logit(Log_EN, X_val, y_val)

({'l1': 0, 'l2': 2.5}, 646.5382959842682)

In [155]:
naive_cv_logit(Log_FL, X_val, y_val, D) 

({'l1': 0.25, 'l2': 0.5}, 859.0156228542328)

In [156]:
naive_cv_logit(Log_SL, X_val, y_val, D) 

({'l1': 0.5, 'l2': 0.5}, 909.1022973060608)

In [157]:
naive_cv_logit(Log_OUR, X_val, y_val, D) 

  -34.41608148  -33.63789794  -32.71988405  -31.91128863  -31.5582789
  -47.32367599  -39.35191591  -37.89737988  -35.78389262  -34.33593578
  -33.4799541   -32.94126242  -32.27420259           nan           nan
  -42.49299537  -38.08614374  -36.51676524  -34.82534858  -33.55486783
  -32.95344079  -32.54386844  -32.02750147  -31.57361814  -31.53366267
  -37.21773974  -35.42624332  -34.49503367  -33.58276042  -32.68915209
  -32.24097695  -31.9640274   -31.64057192  -31.47272146  -31.65262524
  -34.50592783  -33.75471665  -33.2689477   -32.44578874  -32.06502429
  -31.95457152  -31.83097879  -31.78329384  -31.72581063  -32.07490316
  -33.41166194  -33.05977764  -32.82529593  -32.3624772   -32.02310281
  -31.90975248  -31.90625297  -31.94676594  -32.18172963  -32.70885141
  -33.14067733  -33.00651179  -32.90397845  -32.61640535  -32.40079958
  -32.34670026  -32.35371424  -32.3878485   -32.64159268  -33.28285893
  -33.7273064   -33.72176868  -33.69671787  -33.60249623  -33.58396262
  -33.6

({'l1': 0.25, 'l2': 2.5}, 976.7801988124847)

In [27]:
naive_cv_cov(X_val)

({'t': 0}, 3.6826977729797363)

##### Fitting graph based methods

In [158]:
X_train, y_train = shuffle(X_train, y_train)

In [159]:
clf1 = Log_FL(0.25, 0.5, D).fit(X_train, y_train)

In [160]:
clf2 = Log_SL(0.5, 0.5, D).fit(X_train, y_train)

In [161]:
clf3 = Log_OUR(0.25, 2.5, D).fit(X_train, y_train)

##### Prediction Accuracy and sensitivity and specificity

In [162]:
def acc(clf):
    return 1 - np.sum(np.abs(y_test - clf.predict(X_test)))/len(y_test)
def sen(clf):
    return 1 - np.sum(np.abs(y_test[y_test == 1] - clf.predict(X_test[y_test == 1])))/len(y_test[y_test == 1])
def spec(clf):
    return 1 - np.sum(np.abs(y_test[y_test == 0] - clf.predict(X_test[y_test == 0])))/len(y_test[y_test == 0])

In [163]:
X_test, y_test = shuffle(X_test, y_test)

##### FL: 
##### accuracy

In [164]:
acc(clf1)

0.9125

##### sensitivity:

In [165]:
sen(clf1)

0.85

##### specificity

In [166]:
spec(clf1)

0.94375

##### SL:

In [167]:
acc(clf2)

0.9208333333333334

In [168]:
sen(clf2)

0.8625

In [169]:
spec(clf2)

0.95

##### GEN:

In [170]:
acc(clf3)

0.9291666666666667

In [171]:
sen(clf3)

0.8625

In [172]:
spec(clf3)

0.9625

#### We may also compare to Logistic regression methods (ridge, lasso, non-penalty)

#### OLS

In [173]:
clf4 = Log_OLR().fit(X_train, y_train)

In [174]:
acc(clf4)

0.8208333333333333

In [175]:
sen(clf4)

0.8875

In [176]:
spec(clf4)

0.7875

#### lasso

In [177]:
clf5 = Log_LA(2.5, 0).fit(X_train, y_train)

In [178]:
acc(clf5)

0.9

In [179]:
sen(clf5)

0.8375

In [180]:
spec(clf5)

0.93125

#### EN (always degenerate to Ridge)

In [181]:
clf6 = Log_EN(0, 2.5).fit(X_train, y_train)

In [182]:
acc(clf6)

0.925

In [183]:
sen(clf6)

0.9

In [184]:
spec(clf6)

0.9375

## Conclusions:
#### Now we use the same algorithm for the estimators. (Instead of using sklearn for some of them; sklearn has some default regularizations so using it gives unfair comparisons)

#### It's fair to only compare the accuracy since I use loglik to be the cv scorer. If sensitivity or specificity is more important, should use them as cross validation scorer instead.