# **(Part_4: Evaluating The Model )**

## Objectives

Machine learning models are parameterized so that their behavior can be tuned for a given problem. Models can have many parameters and finding the best combination of parameters can be treated as a search problem. In this part, we'll aim to tune parameters of the SVM Classification model using scikit-learn. 


## Inputs

* Write here which data or information you need to run the notebook 

## Outputs

* Write here which files, code or artefacts you generate by the end of the notebook 

## Additional Comments

* In case you have any additional comments that don't fit in the previous bullets, please state them here. 


---

# Change working directory

* We are assuming you will store the notebooks in a subfolder, therefore when running the notebook in the editor, you will need to change the working directory

We need to change the working directory from its current folder to its parent folder
* We access the current directory with os.getcwd()

In [1]:
import os
current_dir = os.getcwd()
current_dir

'/workspace/Breast-Cancer-Prediction/jupyter_notebooks'

We want to make the parent of the current directory the new current directory
* os.path.dirname() gets the parent directory
* os.chir() defines the new current directory

In [2]:
os.chdir(os.path.dirname(current_dir))
print("You set a new current directory")

You set a new current directory


Confirm the new current directory

In [3]:
current_dir = os.getcwd()
current_dir

'/workspace/Breast-Cancer-Prediction'

# Load Libraries and Data

In [4]:
%matplotlib inline
import matplotlib.pyplot as plt

#Load libraries for data processing
import pandas as pd #data processing, CSV file I/O (e.g. pd.read_csv)
import numpy as np
from scipy.stats import norm

## Supervised learning.
from sklearn.preprocessing import StandardScaler
from sklearn.preprocessing import LabelEncoder
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC
from sklearn.model_selection import cross_val_score
from sklearn.model_selection import GridSearchCV
from sklearn.pipeline import make_pipeline
from sklearn.metrics import confusion_matrix
from sklearn import metrics, preprocessing
from sklearn.metrics import classification_report
from sklearn.feature_selection import SelectKBest, f_regression

# visualization
import seaborn as sns 
plt.style.use('fivethirtyeight')
sns.set_style("white")

plt.rcParams['figure.figsize'] = (8,4) 


**I. Confusion Matrix**

In [None]:
rmse = np.sqrt(mean_squared_error(Y_test, Y_pred))
rmse

In [None]:
matrix = confusion_matrix(Y_test, Y_pred )
sns.heatmap(matrix, annot=True)

plt.title("Confusion Matrix")
plt.xlabel('Actual')
plt.ylabel('Predicted')

The above matrix shows the performance of our model on a set of test data for which the true values are present. The conclusion that can be drawn from the above matrix are as follows : 

1. There were a total of 64 outcomes that were correctly classified as class 0 (That is class Benign) i.e. TNs = 65
2. There were a total of 5 outcomes that were correctly classified as class 1 (That is class Maligant) i.e. FNs = 4
3. There were 4 outcomes that were incorrectly predicted as class 0 (class Benign) i.e. FPs = 2 
4. There were a total of 41 outcomes that were incorrectly predicted as (class Maligant) i.e. TPs = 43

**II. Classification Report**

In [None]:
print(classification_report(Y_test, Y_pred))

### II. Random Forest Classifier 

In [None]:
assert X_train.shape[0] == Y_train.shape[0]

In [None]:
X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size=0.2, random_state=2)

RFC_model = RandomForestClassifier()
RFC_model.fit(X_train, Y_train)

In [None]:
Y_pred = RFC_model.predict(X_test)

In [None]:
metrics.accuracy_score(Y_test,Y_pred)

In [None]:
rmse = np.sqrt(mean_squared_error(Y_test, Y_pred))
print(rmse)

At this state , the accuracy of the model without any hyperparameter tunings were recorded.

In [None]:
matrix = confusion_matrix(Y_test, Y_pred)
sns.heatmap(matrix, annot=True,)

plt.title("Confusion Matrix")
plt.xlabel('Actual')
plt.ylabel('Predicted')

The conclusion that can be drawn from the above matrix are as follows :

1. There were a total of 64 outcomes that were correctly classified as class 0 (That is class Benign) i.e. TNs = 65
2. There were a total of 5 outcomes that were correctly classified as class 1 (That is class Maligant) i.e. FNs = 4
3. There were 4 outcomes that were incorrectly predicted as class 0 (class Benign) i.e. FPs = 3
4. There were a total of 41 outcomes that were incorrectly predicted as (class Maligant) i.e. TPs = 42

#### Displaying The Classification Report

In [None]:
print(classification_report(Y_pred, Y_test))

### II. K Nearest Neighbors Classifier 

In [None]:
X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size=0.2, random_state=2)

KNNI = KNeighborsClassifier(n_neighbors = 3).fit(X_train, Y_train)
Y_pred = KNNI.predict(X_test)

metrics.accuracy_score(Y_test,Y_pred)

Primarily, all these models were built without any hyperparameter tuning and feature selection methods.

---