# Task 1 (Supervised Learning) - Predicting Adoption and Adoption Speed

In this task you should target 3 classification tasks:
1. **Predicting  Adoption (binary classification task)**: create a new target from AdoptionSpeed that is 1 if AdoptionSpeed <> 4 and 0 otherwise.
2. **Predicting AdoptionSpeed (multiclass classification)**: in this task you should you the original target AdoptionSpeed, whose values are in the set {0, 1, 2, 3 , 4} (5 classes). This is a very difficult problem. You might also want to consider 3 classes (for instance {0-1, 2-3, 4}, or other sets that make sense). 
3. **Train specialized models for cats and dogs**: train with cat/dog instances and check whether the classification performance changes when Predicting Adoption and Predicting AdoptionSpeed.

## Intro

**You should:**

* Choose **one classifier in each category**: Tree models, Rule models, Linear models, Distance-based models, and Probabilistic models.
* Use cross-validation to evaluate the results. 
* Present and discuss the results for different evaluation measures, present confusion matrices. Remember that not only overall results are important. Check what happens when learning to predict each class.
* Describe the parameters used for each classifier and how their choice impacted or not the results.
* Choose the best classifier and fundament you choice.
* **Discuss critically your choices and the results!**

We have only categorical data, but most of it is binary so we need to find the best models, classifing this type of data. Although it's clear that since we transformed our data in mostly binary features, we will have a much better performance with the binary target, then with the multiclass classification problem.

Models to try:

* LogitRegression (Can be used in multiclass classification, combining multiple regression functions.)
  > Coeficient selection `l1-penalized`, solver `SAGA`
* SVM Quadratic Kernel
  > HyperParameter Selection - randomized search
* MLP
* Random Forests
  > HyperParameter Selection - randomized search
* Decision Tree
  > HyperParameter Selection - randomized search
* K_means with Russell Rao distance metric
* Naive Bayes

Feature Selection:

* RFE with with CV / metric - accuracy
* LassoCV with CV / metric - AIC & BIC

Model Selection:

* Test Error vs Train Error
* ROC for every fold in CV
* Learning Curves
* Precision & Recall

## 1.1. MultiClass Classification

### 1.1.1 LogitRegression

#### 1.1.1.1 Model training

In [None]:
from sklearn.svm import l1_min_c
from sklearn.model_selection import StratifiedKFold
from sklearn.feature_selection import RFECV
from sklearn import linear_model

#Create Classifier with l1_min penality

cs = l1_min_c(Dx, y, loss='log') * np.logspace(0, 7, 16)

logit = linear_model.LogisticRegression(penalty='l1', solver='saga',
                                      tol=1e-6, max_iter=int(1e6),
                                      warm_start=True, multi_class= 'multinomial')
# warm_start allows the model the use already computed coeficcients 
# Create the RFE object and compute a cross-validated score.
k_folds = 2
coefs_ = []
best = 0
cv=StratifiedKFold(k_folds)
for i, c in enumerate(cs):
    print (i, ' of ', len(cs))
    logit.set_params(C=c)
    rfecv_logit = RFECV(estimator=logit, step=1, cv=cv,
            scoring='accuracy', n_jobs = k_folds)
    rfecv_logit.fit(Dx, y)
    coefs_.append(rfecv_logit.estimator_.coef_.ravel().copy())
    if max(rfecv_logit.grid_scores_) > best:
        Brfecv_logit = rfecv_logit
        break
print(len(cs),' of ', len(cs),' : Finished')

#### 1.1.1.2 Model Metrics

In [None]:
#Plot Regularization path of L1 penalization
coefs_ = np.array(coefs_)
plt.figure(figsize=(18, 8));
plt.plot(np.log10(cs[0]), coefs_, marker='o')
ymin, ymax = plt.ylim()
plt.xlabel('log(C)')
plt.ylabel('Coefficients')
#plt.title('Logistic Regression Path')
plt.axis('tight')
plt.show()

In [None]:
print("Feature's Ranking \n")
for ix, cols in enumerate(Dx.columns):
  print(cols, ': ', Brfecv_logit.ranking_[ix])

In [None]:
print("Optimal number of features : %d" % rfecv_logit.n_features_)
# Plot number of features VS. cross-validation scores
plt.figure(figsize=(18, 8));
plt.xlabel("Number of features selected")
plt.ylabel("Cross validation score (nb of correct classifications)")
plt.plot(range(1, len(Brfecv_logit.grid_scores_) + 1), Brfecv_logit.grid_scores_)
plt.show()

In [None]:
from scipy import interp
from sklearn.metrics import auc
from sklearn.metrics import plot_roc_curve

#get used features
c_del = []
for ix, cols in enumerate(Dx.columns):
    if Brfecv_logit.ranking_[ix] != 1:
        c_del.append(cols)
Logx = Dx.drop(c_del, axis=1)

tprs = []
aucs = []
mean_fpr = np.linspace(0, 1, 100)

#ROC analysis

classifier = Brfecv_logit.estimator_
fig, ax = plt.subplots()
for i, (train, test) in enumerate(cv.split(Logx, y)):
    viz = roc_curve(classifier, Logx.iloc[test], y[test],
                         name='ROC fold {}'.format(i),
                         alpha=0.3, lw=1, ax=ax)
    interp_tpr = interp(mean_fpr, viz.fpr, viz.tpr)
    interp_tpr[0] = 0.0
    tprs.append(interp_tpr)
    aucs.append(viz.roc_auc)

#Plot ROC

ax.plot([0, 1], [0, 1], linestyle='--', lw=2, color='r',
        label='Chance', alpha=.8)

mean_tpr = np.mean(tprs, axis=0)
mean_tpr[-1] = 1.0
mean_auc = auc(mean_fpr, mean_tpr)
std_auc = np.std(aucs)
ax.plot(mean_fpr, mean_tpr, color='b',
        label=r'Mean ROC (AUC = %0.2f $\pm$ %0.2f)' % (mean_auc, std_auc),
        lw=2, alpha=.8)

std_tpr = np.std(tprs, axis=0)
tprs_upper = np.minimum(mean_tpr + std_tpr, 1)
tprs_lower = np.maximum(mean_tpr - std_tpr, 0)
ax.fill_between(mean_fpr, tprs_lower, tprs_upper, color='grey', alpha=.2,
                label=r'$\pm$ 1 std. dev.')

ax.set(xlim=[-0.05, 1.05], ylim=[-0.05, 1.05],
       title="Receiver operating characteristic example")
ax.legend(loc="lower right")
plt.show()

## 1.3. Classification - Results and Discussion 