# Model Evaluation & Hyperparameter Tunning

In this exercie, we will take a look at the implementation of multiple model evaluation methods we've mentioned in class and answer some questions related to the plot and the learning curve. 


## Pipeline Implementation VS Non-Pipeline Implementation

In this section we provide you the code to do classification on "breast cancer" dataset without using Pipeline.  You will need to follow the code snippet covered in lecture and finish the code to do the classification with Pipeline. 

In [None]:
import pandas as pd
df= pd.read_csv('https://archive.ics.uci.edu/ml/''machine-learning-databases''/breast-cancer-wisconsin/wdbc.data',header=None)
df.head()


## Label Encode the Target

In [None]:
from sklearn.preprocessing import LabelEncoder
X = df.loc[:, 2:].values
y = df.loc[:, 1].values
le = LabelEncoder()
y = le.fit_transform(y)
le.classes_

## Split the Data 

In [None]:
from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = \
train_test_split(X, y, test_size = 0.2, stratify = y, random_state = 1)

## Implementation without Pipeline

In [None]:
from sklearn.preprocessing import StandardScaler
from sklearn.decomposition import PCA
from sklearn.linear_model import LogisticRegression

stder = StandardScaler()
pca = PCA(n_components=2)

X_train_std = stder.fit_transform(X_train)
X_test_std = stder.transform(X_test)

X_train_pca = pca.fit_transform(X_train_std)
X_test_pca = pca.transform(X_test_std)

lr = LogisticRegression()
lr.fit(X_train_pca, y_train)
y_pred = lr.predict(X_test_pca)

print('Test Accuracy: %.3f' % lr.score(X_test_pca, y_test))

## Implementation with Pipeline

In [None]:
# TODO :: Implement the above task by using Pipeline. Your implementation should have the same test accuracy as the 
# implementation without Pipeline. 9 lines of code expected. 


## Cross Validation without Pipeline

In [None]:
import numpy as np
from sklearn.model_selection import StratifiedKFold


X_train_std = stder.fit_transform(X_train)
X_train_pca = pca.fit_transform(X_train_std)

kfold= StratifiedKFold(n_splits=10,random_state=1).split(X_train_std, y_train)
scores = []
for k, (train, test) in enumerate(kfold):
    lr = LogisticRegression()
    lr.fit(X_train_pca[train], y_train[train])
    score = lr.score(X_train_pca[test], y_train[test])
    scores.append(score)
    print('Fold: %2d, Class dist.: %s, Acc: %.3f' % (k+1,np.bincount(y_train[train]), score))
    print('CVaccuracy: %.3f +/-%.3f' % (np.mean(scores), np.std(scores)))

## Cross Validation with Pipeline

In [None]:
import numpy as np
from sklearn.model_selection import StratifiedKFold
kfold= StratifiedKFold(n_splits=10,random_state=1).split(X_train, y_train)
scores = []
for k, (train, test) in enumerate(kfold):
    pipe_lr.fit(X_train[train], y_train[train])
    score = pipe_lr.score(X_train[test], y_train[test])
    scores.append(score)
    print('Fold: %2d, Class dist.: %s, Acc: %.3f' % (k+1,np.bincount(y_train[train]), score))
    print('CVaccuracy: %.3f +/-%.3f' % (np.mean(scores), np.std(scores)))

## Question 1 

Did you notice that the final CVaccuracy of implementation with Pipeline and without Pipeline is different? I made a common mistake in cross validation without Pipeline code. Can you help me to fix this bug and explain why it is a problem?


In [None]:
# TODO :: Correct the implementation without pipeline, 13-20 lines of code expected.


## Cross Validation VS Nested Cross Validation

The following code uses SVM model with rbf kernel to classifiy the iris dataset. It evaluates the model by using both cross validation and nested cross validation.  It also plots the evaluation score. Read and execute the following code section and answer the question below. (hint : the sklearn document can help you to understand the code better)

In [None]:
from sklearn.datasets import load_iris
from matplotlib import pyplot as plt
from sklearn.svm import SVC
from sklearn.model_selection import GridSearchCV, cross_val_score, KFold
import numpy as np

print(__doc__)

# Number of random trials
NUM_TRIALS = 30

# Load the dataset
iris = load_iris()
X_iris = iris.data
y_iris = iris.target

# Set up possible values of parameters to optimize over
p_grid = {"C": [1, 10, 100],
          "gamma": [.01, .1]}

# We will use a Support Vector Classifier with "rbf" kernel
svm = SVC(kernel="rbf")

# Arrays to store scores
non_nested_scores = np.zeros(NUM_TRIALS)
nested_scores = np.zeros(NUM_TRIALS)

# Loop for each trial
for i in range(NUM_TRIALS):

    # Choose cross-validation techniques for the inner and outer loops,
    # independently of the dataset.
    # E.g "GroupKFold", "LeaveOneOut", "LeaveOneGroupOut", etc.
    inner_cv = KFold(n_splits=5, shuffle=True, random_state=i)
    outer_cv = KFold(n_splits=5, shuffle=True, random_state=i)

    # Non_nested parameter search and scoring
    clf = GridSearchCV(estimator=svm, param_grid=p_grid, cv=outer_cv)
    clf.fit(X_iris, y_iris)
    non_nested_scores[i] = clf.best_score_

    # Nested CV with parameter optimization
    clf = GridSearchCV(estimator=svm, param_grid=p_grid, cv=inner_cv)
    nested_score = cross_val_score(clf, X=X_iris, y=y_iris, cv=outer_cv)
    nested_scores[i] = nested_score.mean()

score_difference = non_nested_scores - nested_scores

print("Average difference of {:6f} with std. dev. of {:6f}."
      .format(score_difference.mean(), score_difference.std()))

# Plot scores on each trial for nested and non-nested CV
plt.figure()
plt.subplot(211)
non_nested_scores_line, = plt.plot(non_nested_scores, color='r')
nested_line, = plt.plot(nested_scores, color='b')
plt.ylabel("score", fontsize="14")
plt.legend([non_nested_scores_line, nested_line],
           ["Non-Nested CV", "Nested CV"],
           bbox_to_anchor=(0, .4, .5, 0))
plt.title("Non-Nested and Nested Cross Validation on Iris Dataset",
          x=.5, y=1.1, fontsize="15")

# Plot bar chart of the difference.
plt.subplot(212)
difference_plot = plt.bar(range(NUM_TRIALS), score_difference)
plt.xlabel("Individual Trial #")
plt.legend([difference_plot],
           ["Non-Nested CV - Nested CV Score"],
           bbox_to_anchor=(0, 1, .8, 0))
plt.ylabel("score difference", fontsize="14")

plt.show()

## Question 2

The above plots show the score and score difference of cross validation versus nested cross validation.  What obervation can be made in terms of the score of the two methods. Why does that happen? Which one do you think is a better way to evaluate the performance of the model? Why do you think so?


## Question 3

Read the code above especially in the loop for each trial. Explain what does this line : nested_score = cross_val_score(clf, X=X_iris, y=y_iris, cv=outer_cv) do?
