# Logistic regression
In this notebook, we use grid search on 2 hyperparameters to train a logistic regressor, namely: degree of polynomial features expansion and regularization coefficient.

### CONFIG: 
* `ROOT_DIR`: select the path of the Data folder. This folder contains the training images and the set images
* `TO_TRAIN`: this flag, if activated, allows to search again the hyperparameters of the model. Otherwise, these parameters will be selected according to the result of our grid search.
* `SubmissionName`: select the name of the file where you want to have the submission


With this code, <b>F1-score</b> of 0.603 is obtained.

In [1]:
ROOT_DIR = '../../Data/'
TO_TRAIN = False
SubmissionName = 'Logistic.csv'

# Import

In [2]:
import sys
sys.path.insert(0,'../../src')
sys.path.insert(0,'../../src/models')

from utilities import * 
from logistic_utilities import *

# Prepare the inputs for the logistic regression

In [3]:
# Load images
imgs, gt_imgs = LoadImages(0, root_dir = ROOT_DIR, verbose = 0)

# Extract patches from input images
patch_size = 16
img_patches = [img_crop(imgs[i], patch_size, patch_size) for i in range(imgs.shape[0])]
gt_patches = [img_crop(gt_imgs[i], patch_size, patch_size) for i in range(gt_imgs.shape[0])]

# Linearize list of patches
img_patches = np.asarray([img_patches[i][j] for i in range(len(img_patches)) for j in range(len(img_patches[i]))])
gt_patches =  np.asarray([gt_patches[i][j] for i in range(len(gt_patches)) for j in range(len(gt_patches[i]))])

# Get X and Y
X = np.asarray([ extract_features(img_patches[i]) for i in range(len(img_patches))])
Y = np.asarray([value_to_class(np.mean(gt_patches[i])) for i in range(len(gt_patches))])

# Cross Validation

The original range of degrees and lambdas where bigger. Here to demonstrate the procedure we select a smaller set of lambdas and degrees. Moreover, we weren't able to reach a significantly better F1-score using more refined arrays of lambdas and more degrees, hence we moved to CNNs.

In [4]:
degrees = np.array([2,3,4,5])
lambdas = np.array([1e5,1e6,1e7,1e8,1e9])

if TO_TRAIN:
    best_lambda, best_degree, best_f1 = grid_search_hyperparam(Y,X,lambdas, degrees)
else:
    best_lambda, best_degree, best_f1 = 100000000.0 , 3 , 0.5903411579111377

Grid search ====> 1/5 lambda starts...
Grid search ====> Lambda = 1.00e-05, degree = 2, F1-score = 0.578
Grid search ====> Lambda = 1.00e-05, degree = 3, F1-score = 0.582
Grid search ====> Lambda = 1.00e-05, degree = 4, F1-score = 0.583
Grid search ====> Lambda = 1.00e-05, degree = 5, F1-score = 0.582
Grid search ====> 2/5 lambda starts...
Grid search ====> Lambda = 1.00e-06, degree = 2, F1-score = 0.580
Grid search ====> Lambda = 1.00e-06, degree = 3, F1-score = 0.587
Grid search ====> Lambda = 1.00e-06, degree = 4, F1-score = 0.587
Grid search ====> Lambda = 1.00e-06, degree = 5, F1-score = 0.584
Grid search ====> 3/5 lambda starts...
Grid search ====> Lambda = 1.00e-07, degree = 2, F1-score = 0.582
Grid search ====> Lambda = 1.00e-07, degree = 3, F1-score = 0.589
Grid search ====> Lambda = 1.00e-07, degree = 4, F1-score = 0.588
Grid search ====> Lambda = 1.00e-07, degree = 5, F1-score = 0.585
Grid search ====> 4/5 lambda starts...
Grid search ====> Lambda = 1.00e-08, degree = 2, F1-

In [5]:
logreg = linear_model.LogisticRegression(C=best_lambda, class_weight="balanced")
poly = PolynomialFeatures(best_degree)
X = poly.fit_transform(X)
logreg.fit(X, Y)

LogisticRegression(C=100000000.0, class_weight='balanced', dual=False,
          fit_intercept=True, intercept_scaling=1, max_iter=100,
          multi_class='ovr', n_jobs=1, penalty='l2', random_state=None,
          solver='liblinear', tol=0.0001, verbose=0, warm_start=False)

# Submission

In [6]:
predict_and_submit_logistic(best_degree, logreg, SubmissionName, root_dir = ROOT_DIR)

Loading test images...
Generating inputs from test images...
Predicting...
Submission saved in:  Logistic.csv
