# Model Evaluation
After we have trained our machine learning model, we need to verify the quality of the trained model. We measure the trained model performance on new and neverseen dataset. Model evaluation answers some of the following questions:
- How good our model is working? 
- is model accurate enough to use into production?
- Will a performance of  our model improves by feeding it with large datasets?


## Train-Test Split 


In train test split , we split our whole dataset into training and test dataset(Shown in Figure 5.1). The training set has a known output/label and the model would be trained with our training set . We use test dataset to test the trained model .
![Figure_5_1.png](images/Figure_5_1.png). In the example we are going to use a popular iris data set to split our whole data set into training and test data. 

In [6]:
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split

iris=load_iris() 
X_train,X_test,y_train,y_test=train_test_split(iris.data,iris.target,test_size=0.1) # test size is 10 % of whole dataset


# Now let us check the shape of X_train,X_test,y_train,y_test and the whole dataset
print(f"The shape of X_train is {X_train.shape}")
print(f"The shape of X_test is {X_test.shape}")
print(f"The shape of y_train is {y_train.shape}")
print(f"The shape of y_test is {y_train.shape}")
print(f"The shape of whole data set  is {iris.data.shape}")

The shape of X_train is (135, 4)
The shape of X_test is (15, 4)
The shape of y_train is (135,)
The shape of y_test is (135,)
The shape of whole data set  is (150, 4)


We can see from above example our whole data set is divided into train and test data set where 90% of dataset is train data where as 10% of dataset is test dataset.

## K- Fold Cross Validation (K-Fold CV)

So now we know how to split data into train and test dataset , K- Fold CV uses the same operation but in iterative way. As shown in the figure 5.2, we can see that the whole data set has been divided in train and test set in 5 different combination. In each combination of dataset , test and train partition has been taken randomly. So when training and testing a model , we are going to do the operation 5 times which is determined by the value of K. K is a user specifed number which is usually 5 or 10 . 
![Figure_5_2.png](images/Figure_5_2.png)


In [6]:
from sklearn.model_selection import cross_val_score
from sklearn.datasets import load_iris
from sklearn.linear_model import LogisticRegression

iris=load_iris()
X=iris.data  # load features to X
y=iris.target # load target to y 
logreg= LogisticRegression(max_iter=200) # instantiate the class LogisticRegression
scores=cross_val_score(logreg,X,y,cv=5) # Here we choose K =5 (cv=5)
print(f"cross-val score is {scores}")
print("------------------------------------------------------------------")
print(f"The average score is {scores.mean()}")

cross-val score is [0.96666667 1.         0.93333333 0.96666667 1.        ]
------------------------------------------------------------------
The average score is 0.9733333333333334


Here we have different accuracy in 5 different iteration. The average accuracy is 97.33%. We can conclude that the model (Logistic Regression) for this dataset(iris) is 97.33% accurate .

# Exercise For Students

In this exercise we are going to apply K cross fold validation for both linear and logistic regression examples in chapter 2 without any train and test split . 

## K-fold CV Linear Regression

In [None]:
from sklearn.datasets import make_regression # please refer: https://scikit-learn.org/stable/modules/generated/sklearn.datasets.make_regression.html
from sklearn.linear_model import LinearRegression

# TO DO By Students: 
# Instantiate make_regression(use random_state=100) 
# Instantiate linear regression 
# use cross_val_score for k=10
# compare the result 
#START YOUR CODE DOWN BELOW IN THIS CELL

##  K-Fold CV Logistic Regression

In [None]:
from sklearn.datasets import load_breast_cancer  # like iris in example we are going to use breast cancer data
                                                 # Refer sklearn documentation https://scikit-learn.org/stable/modules/generated/sklearn.datasets.load_breast_cancer.html

from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression



## TO DO BY STUDENT 
# Instantiate load_breast_cancer to an object and get the features and target values
# Instantiate Logistic Regression
# Use cross_val_score for k=5
# Compare the results  
#START YOUR CODE DOWN BELOW IN THIS CELL

