    # MODEL EVALUATION AND IMPROVEMENT
    
    This notebook aims to achieve the following:
        * Demonstrate how to evaluate how well a model generalizes on new data
        * Demonstrate how we can improve the performance of the model in generalizing
          on new data

    # Model Evaluation
    
    How well is our model doing?

`Let's build a model to classify the different species in the iris dataset`

In [1]:
import numpy as np

In [2]:
from sklearn.datasets import load_iris  # Load iris dataset
from sklearn.linear_model import LogisticRegression  # Log_Reg model

# Instantiate iris dataset into features and ground truth labels
features, labels = load_iris(return_X_y=True)

# Instantiate Log-Reg Model
log_reg = LogisticRegression(max_iter=1000)
log_reg.fit(features, labels)

LogisticRegression(max_iter=1000)

In [3]:
# Well done... We've trained a Logistic Regression of the iris dataset.
# Let's try it out

# Use this data to test the new model
sample_data = np.array([5.2, 3.0, 0.9, 0.4])
class_prediction = log_reg.predict(sample_data.reshape(1, -1))

print("Model Classification")
print("==================== \n")
print("Predicted Class: {}".format(load_iris().target_names[class_prediction][0]))

Model Classification

Predicted Class: setosa


    
    Wait a minute!
    
        * Can we trust this prediction?
        * Do we know how well the model is doing in classifing the plant?
        
    What we have just done is the equivalence of teaching a particular subject to students, and allowing
    them to graduate and apply the subject matter without testing howe well they are with the content :)
    
    Obviously, we want to test the students on preliminary work before allowing them to graduate.

In [4]:
from sklearn.model_selection import train_test_split  # Split the data into train and test sets

# Split data into training and test set
X_train, X_test, y_train, y_test = train_test_split(features, labels, test_size=0.25, random_state=0)

log_reg = LogisticRegression(max_iter=1000)  # Instantiate model
log_reg.fit(X_train, y_train)  # Fit the model on train set

acc_score = log_reg.score(X_test, y_test)  # Accuracy Score

`Main Question: How well is our model doing?`

In [5]:
print("Model Evaluation\n================")
print("\nModel Accuracy: {}".format(acc_score))

Model Evaluation

Model Accuracy: 0.9736842105263158


    
    Well done!
    
    Now, we are confident that 97% of the time, our students will be able to handle the subject matter
    correctly, when given a task in the real-world.

`Cross Validation`

    Evaluating how well our model generalizes on unseen data using the train_test_split method is a good idea,
    but often, it is prone to the following:
    
        * Training set containing determinist features and testing set containing simple features; hence, model
          produces a good result
        * Training set containing unimportant features and testing set containing determinist features; hence,
          model performs badly.
          
    Cross-validation aims to avoid the above two points by thoroughly training and evaluating the model on the
    entire dataset.
    
    i) k-fold cross-validation
        
       This method splits the dataset into k equal parts, and repeatly trains k separate models with each part
       assuming the testing set and the other k-1 parts assuming the training set.

In [None]:
import mglearn

mglearn.plots.plot_cross_validation()

`Student Notion`
    
    What we aim at doing with cross validation is evaluate our students on a defined number of exams (or tests)
    rather than on one exam, to see how they perform on different kinds of assessments of the same data.

In [None]:
from sklearn.model_selection import cross_val_score  # Returns scores of models trained and tested using cv

features, target = load_iris(return_X_y=True)
log_reg = LogisticRegression(max_iter=1000)

model_acc_scores = cross_val_score(log_reg, features, target, cv=5)

In [None]:
print("Model accuracies: {}".format(model_acc_scores))

    To answer the question, "how well is our model doing is generalizing on new data?", we observe the
    following using cross-validation:
    
        Reserving some parts of the data as test sets, 
        - the models produces an accuracy of 100%, in the best case
        - and an accuracy of 93%, in the worst case

In [None]:
# AVG Performance
print("Expected AVG Performance")
print("========================")
print("AVG Score: {}".format(np.mean(model_acc_scores)))

    # MODEL IMPROVEMENT
    
    With model improvement, we are more concerned with finding the model parameters that boost the model's
    performance. Once such technique that this notebook covers is GridSearch.