# **Classification**

    Notebook version: 1.0 (Nov 11, 2015)

    Authors: Jesús Cid Sueiro (jcid@tsc.uc3m.es)

    Changes: v.1.0 - First version. Python version

In [90]:
# Import some libraries that will be necessary for working with data and displaying plots

# To visualize plots in the notebook
%matplotlib inline 

#import matplotlib
import matplotlib.pyplot as plt
import numpy as np
import scipy.io       # To read matlab files
from sklearn.preprocessing import PolynomialFeatures
from sklearn import svm, cross_validation

import pylab
pylab.rcParams['figure.figsize'] = 9, 7

## 1. Introduction

In this notebook we will analyze the behavior of logistic regression and support vector machines on the `cancer` dataset, taken from the <a href = https://archive.ics.uci.edu/ml/index.html>UCI repository </a>. You can load it from from file `CancerDataset.mat`. 

### 1.1 Data Preparation.

Load and normalize the dataset. Remind that the same transformation must be applied to training, validation and test data.

## 2. Linear Classification with Logistic Regression.

First we will analyze the behavior of logistic regression for this dataset. 

### 2.1. MAP estimator.

Implement a function to compute the MAP estimate of the parameters of a linear logistic regression model with gaussian prior and a given values of the inverse regularization parameter $C$. The method should return the estimated parameter and the negative log-likelihood, $L({\bf w})$. The sintaxis must be
    **`w, L = logregFitR(Z_tr, Y_tr, rho, C, n_it)`**
where

  - `Z_tr` is the input training data matrix (one instance per row)
  - `Y_tr` contains the labels of corresponding to each row in the data matrix
  - `rho` is the learning step
  - `C` is the inverse regularizer
  - `n_it` is the number of iterations


### 2..2 Log-likelihood

Compute the MAP estimate for a polynomial regression with degree 5, for $C$ ranging from -0.05 to 10. Sample $C$ uniformly in a log scale, an plot using `plt.semilogx`. 

Plot the final value of $L$ as a function of $C$. Can you explain the qualitative behavior of $L$ as $C$ grows?

The plot may show some oscillation because of the random noice introduced by random initializations of the learning algoritm. In order to smooth the results, you can initialize the random seed right before calling the `logregFitR` method, using

    np.random.seed(3)


###  2.3. Training and test errors.

Plot the training and validation error rates as a function of $C$. Compute the value of $C$ minimizing the validation error rate.


## 3. Non-linear classification with Support Vector Machines

In this section we will train a SVM with Gaussian kernels. In this case, we will select parameter $C$ of the SVM by cross-validation.

### 3.1. Dataset preparation.

Join the training and validation datasets in a single input matrix `X_tr2` and a single label vector `Y_tr2`

### 3.2. Cross validated error estimate

Apply a 10-fold cross validation procedure to estimate the average error rate of the SVM for $C=1$ and $\gamma$ (which is the kernel width) equal to 5.

### 3.3. Influence of $C$.

Repeate exercise 3.2 for $\gamma=5$ and different values of $C$, ranging from $10^{-3}$ to $10^{4}$, obtained by uniform sampling in a logarithmic scale. Plot the average number of errors as function of $C$.

Note that fitting the SVM may take some time, specially for the largest values of $C$.

### 3.3. Hyperparameter optimization.

Compute the value of $C$ minimizing the validation error rate.

### 3.4. Test error

Evaluate the classifier performance using the test data, for the selected hyperparameter values.
