## Trying 'shallow' models without Neural-Network based deep learning

The main idea here is to try out different shallow models, which can then be compared to the Neural-Network based models. As the training with an input feature array of 224x224x3 is really long (even on the kaggle gpu kernels), no cross-validation is performed. Instead the parameters found in the ResNet50 notebook are used with the assumption, that they should generate reasonable models.

In [1]:
# plotting imports and setup
%matplotlib inline
import matplotlib
import matplotlib.pyplot as plt
import numpy as np
matplotlib.rcParams['figure.figsize'] = [10,10]
from sklearn.model_selection import RandomizedSearchCV
from scipy.stats import randint as sp_randint
from scipy.stats import uniform as sp_uniform

First we have to import the dataset.

In [2]:
from keras.preprocessing import image
from os import listdir

fdir='Data/CERTH_ImageBlurDataset/TrainingSet/Naturally-Blurred/'
files= listdir(fdir)
X=[] #feature vector
images=[]
Y=[] #class vector (1='blurred', 0='in focus')
for fn in files:
    img_path = fdir+fn
    x=image.load_img(img_path, target_size=(224, 224))
    images.append(x)
    x=image.img_to_array(x)
    X.append(x)
    Y.append(1)
    
fdir='Data/CERTH_ImageBlurDataset/TrainingSet/Undistorted/'
files= listdir(fdir)
for fn in files:
    img_path = fdir+fn
    x=image.load_img(img_path, target_size=(224, 224))
    images.append(x)
    x=image.img_to_array(x)
    X.append(x)
    Y.append(0)

  from ._conv import register_converters as _register_converters
Using TensorFlow backend.


In [3]:
X_train = X
y_train = Y

In [4]:
import pandas as pd
fn='Data/CERTH_ImageBlurDataset/EvaluationSet/NaturalBlurSet.xlsx'
xl = np.array(pd.read_excel(fn))
val={}
for xx in xl:
    val[xx[0]]=xx[1]

In [5]:
fdir='Data/CERTH_ImageBlurDataset/EvaluationSet/NaturalBlurSet/'
files= listdir(fdir)
X_test=[] #feature vector
images=[]
y_test=[] #class vector (1='blurred', 0='in focus')
for fn in files:
    img_path = fdir+fn
    x=image.load_img(img_path, target_size=(224, 224))
    images.append(x)
    x=image.img_to_array(x)
    X_test.append(x)
    y_test.append((val[fn[:-4]]+1)/2)

In [6]:
reshape=np.shape(X_train)[1]*np.shape(X_train)[2]*np.shape(X_train)[3]
X_train__=np.array(X_train).reshape(len(X_train),reshape)
X_test__=np.array(X_test).reshape(len(X_test),reshape)

### Extra Trees Classifier

In [7]:
from sklearn.ensemble import ExtraTreesClassifier

#ExtraTreesClassifier
opt_grid_etc = {'max_features': 0.38884100114920783, 'n_estimators': 41, 'max_depth': 15}

etc = ExtraTreesClassifier(**opt_grid_etc).fit(X_train__,y_train)
print 'ExtraTreesClassifier stats'
print 'training score: ', etc.score(X_train__,y_train)
print 'test score: ',etc.score(X_test__,y_test)

ExtraTreesClassifier stats
training score:  0.9964705882352941
test score:  0.604


### Gradient Boosting Classifier

In [8]:
from sklearn.ensemble import GradientBoostingClassifier

#GradientBoostingClassifier
opt_grid_gbc = {'n_estimators': 3100, 'subsample': 0.6, 'learning_rate': 0.1, 'max_features': 0.30000000000000004}

gbc = GradientBoostingClassifier(**opt_grid_gbc).fit(X_train__,y_train)
print 'GradientBoostingClassifier stats'
print 'training score: ', gbc.score(X_train__,y_train)
print 'test score: ',gbc.score(X_test__,y_test)

GradientBoostingClassifier stats
training score:  1.0
test score:  0.614


### KNeighbors Classifier

In [9]:
from sklearn.neighbors import KNeighborsClassifier

#KNeighborsClassifier
opt_grid_knc = {'n_neighbors': 18, 'leaf_size': 2}

knc = KNeighborsClassifier(**opt_grid_knc).fit(X_train__,y_train)
print 'KNeighborsClassifier stats'
print 'training score: ', knc.score(X_train__,y_train)
print 'test score: ',knc.score(X_test__,y_test)

KNeighborsClassifier stats
training score:  0.7858823529411765
test score:  0.61


### Logistic Regression Classifier

In [10]:
from sklearn.linear_model import LogisticRegression

#Logistic Regression Classifier
opt_grid_lr = {'C': 3.9798556575404085, 'intercept_scaling': 9.943977937493685, 'tol': 0.007399648733766915, 'solver': 'sag'}

lr = LogisticRegression(**opt_grid_lr).fit(X_train__,y_train)
print 'LogisticRegression stats'
print 'training score: ', lr.score(X_train__,y_train)
print 'test score: ',lr.score(X_test__,y_test)

LogisticRegression stats
training score:  0.9788235294117648
test score:  0.577


### Random Forest Classifier

In [11]:
from sklearn.ensemble import RandomForestClassifier

#RandomForrestClassifier
opt_grid_rfc = {'max_features': 0.6118558223815502, 'n_estimators': 29, 'max_depth': 30}

rfc = RandomForestClassifier(**opt_grid_rfc).fit(X_train__,y_train)
print 'RandomForrestClassifier stats'
print 'training score: ', rfc.score(X_train__,y_train)
print 'test score: ',rfc.score(X_test__,y_test)

RandomForrestClassifier stats
training score:  0.9929411764705882
test score:  0.597
