## **BREAST CANCER CLASSIFICATION**
### **Via SuperDataScience Team**

*   Breast cancer is the most common cancer among women worldwide accounting for 25% of all cancer cases and infected two point one million people in 2015.
*   Ealy diagnosis significantly increase the chances of survival. 
*   The key challenge in cancer detection is how to classify tumours into *malignant* or *benign* machine learning techniques can dramatically improve the accuracy of diagnosis.
*   Research indicates that most experienced physicians can diagnose cancer with 79% accuracy. 


**First stage**: Any process which is simply extracting some of the cells out of the tumour

When we say benign that means the tumour is kind of not spreading across the body so the patient is safe somehow if it's malignant that means it's a cancerous.
That means we need to intervene and actually stopping cancer growth. 

---



What we do here in the machine learning aspect. 
* We excute all these images and 
* We wanted to specify if that cancer out of these image is malignant or benign.

So what we do with that we extract out of these images some features when we see features that mean some characteristics out of the image such as 
* radius
* cells
* texture
* perimeter
* area
* smoothness

We feed all these features in to kind of our machine learning model.

**MAIN PART:**  We want to teach the machine how to basically classify images or classify data and tell us if it's malignant or benign.

**IMPORTING DATA**

In [None]:
import pandas 

In [None]:
# import libraries 
import pandas as pd # Import Pandas for data manipulation using dataframes
import numpy as np # Import Numpy for data statistical analysis 
import matplotlib.pyplot as plt # Import matplotlib for data visualisation
import seaborn as sns # Statistical data visualization
# %matplotlib inline

In [None]:
# Import Cancer data drom the Sklearn library
from sklearn.datasets import load_breast_cancer
cancer = load_breast_cancer()

In [None]:
cancer

In [None]:
# What dictionaries we have
cancer.keys()

In [None]:
# print them one by one
print(cancer['DESCR'])

In [None]:
print(cancer['target'])

In [None]:
print(cancer['target_names'])

In [None]:
print(cancer['feature_names'])

In [None]:
print(cancer['data'])

In [None]:
cancer['data'].shape

In [None]:
df_cancer = pd.DataFrame(np.c_[cancer['data'], cancer['target']], columns = np.append(cancer['feature_names'], ['target']))

In [None]:
df_cancer.head(5)

In [None]:
df_cancer.tail(5)

**VISUALIZING THE DATA**

In [None]:
sns.pairplot(df_cancer,vars= ['mean radius','mean texture', 'mean area', 'mean perimeter', 'mean smoothness'])

But the only problem is that doesn't show the target class. It doesn't show actual which one of these samples is malignant or which one of them is benign.

In [None]:
sns.pairplot(df_cancer,hue = 'target', vars= ['mean radius','mean texture', 'mean area', 'mean perimeter', 'mean smoothness'])

The blue points in here that's the malignant case. The orange points in here that's the benign case.

In [None]:
sns.countplot(df_cancer['target'])

We take one of these slide graphs and see how can we play.

In [None]:
sns.scatterplot(x='mean area', y='mean smoothness', hue='target', data=df_cancer)

Let's check the correlation between the variables

In [None]:
plt.figure(figsize=(20,10))
sns.heatmap(df_cancer.corr(), annot=True)

**MODEL TRAINING (FINDING A PROBLEM SOLUTION)**

In [None]:
# Let's drop the target label coloumns
x = df_cancer.drop(['target'],axis=1)

In [None]:
x

In [None]:
y = df_cancer['target']
y

If you get a call that we have to take in our model we're going to do that we use a subset of our data for training and then after them on the list trained what we're going to do in order to test the model we're going to use the testing dataset which is data said that the modern has seen ever before.

In [None]:
from sklearn.model_selection import train_test_split

In [None]:
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.20, random_state= 5)

In [None]:
x_train

In [None]:
x_train.shape

In [None]:
x_test

In [None]:
x_test.shape

In [None]:
y_train

In [None]:
y_train.shape

In [None]:
y_test

In [None]:
y_test.shape

In [None]:
from sklearn.svm import SVC
from sklearn.metrics import classification_report, confusion_matrix

In [None]:
svc_model = SVC()

In [None]:
svc_model.fit(x_train, y_train)

**EVALUATING THE MODEL**

We're talking about the testing data which has data that has never seen before. 

In [None]:
y_predict = svc_model.predict(x_test)

In [None]:
y_predict

We're going to plot a confusion matrix.  We need to specify compare our true value versus the predicted that.

In [None]:
cm = confusion_matrix(y_test, y_predict)

In [None]:
sns.heatmap(cm, annot=True)

In [None]:
print(classification_report(y_test, y_predict))

**IMPROVING THE MODEL**

In [None]:
min_train = x_train.min()
min_train

In [None]:
range_train = (x_train - min_train).max()
range_train

In [None]:
x_train_scaled = (x_train - min_train)/range_train
x_train_scaled

In [None]:
sns.scatterplot(x = x_train['mean area'], y= x_train['mean smoothness'], hue= y_train)

In [None]:
sns.scatterplot(x= x_train_scaled['mean area'], y= x_train_scaled['mean smoothness'], hue= y_train)

In [None]:
min_test = x_test.min()
range_test = (x_test - min_test).max()
x_test_scaled = (x_test - min_test)/ range_test

In [None]:
svc_model.fit(x_train_scaled, y_train)

In [None]:
y_predict = svc_model.predict(x_test_scaled)

In [None]:
cm = confusion_matrix(y_test, y_predict)

In [None]:
sns.heatmap(cm, annot=True, fmt = 'd')

In [None]:
print(classification_report(y_test, y_predict))

**IMPROVING THE MODEL - PART 2**

In [None]:
param_grid = {'C': [0.1, 1, 10, 100], 'gamma': [1, 0.1, 0.01, 0.001], 'kernel': ['rbf']} 

In [None]:
from sklearn.model_selection import GridSearchCV

In [None]:
grid = GridSearchCV(SVC(), param_grid, refit= True, verbose= 4)

In [None]:
grid.fit(x_train_scaled, y_train)

In [None]:
grid.best_params_

In [None]:
grid.best_estimator_

In [None]:
grid_prediction = grid.predict(x_test_scaled)

In [None]:
cm = confusion_matrix(y_test, grid_prediction)

In [None]:
sns.heatmap(cm, annot=True)

In [None]:
print(classification_report(y_test,grid_prediction ))

**CONCLUSION**
* Machine Learning techniques (SVM) was able to classify tumors into Malignant / Benign with 97% accuracy.
* The technique can rapidly evaluate breast masses and classify them in an automated fashion. 
* Early breast cancer can dramatically save lives especially in the developing world
* The technique can be further improved by combining Computer Vision/ ML techniques to directly classify cancer using tissue images.