# Breast Cancer Classification
Scikit learn provides breast cancer data, which has data for two classes:
<br>
1 - Malignant
0 - Benign

## 1. Import Packages

In [1]:
import numpy as np
from sklearn import datasets
from sklearn.metrics import accuracy_score
from sklearn.linear_model import LogisticRegressionCV
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split

## 2. Load Breast Cancer Dataset

In this section:
1. Load breast cancer data into <b>data</b> variable
2. Read Data and Labels into <b>X</b> and <b>y</b> variables
3. Split data(<b>X</b>,<b>y</b>) into Training(<b>X_train</b>, <b>y_train</b>) and Test(<b>X_test</b>, <b>y_test</b>) sets

### 2.1 Load Data

In [2]:
data = datasets.load_breast_cancer()

   ### 2.1 Read Data and Labels

In [3]:
X = data.data
y = data.target

In [4]:
print("Data has {} records and {} features".format(X.shape[0], X.shape[1]))

Data has 569 records and 30 features


### 2.2 Split Data into Training and Test sets
<b>test_size</b> is set to 0.25 (25%)

In [12]:
np.random.seed(0)
X_train, X_test, y_train, y_test = train_test_split(X, y, stratify=y, test_size=0.4)

In [13]:
print("Training Set has {} records".format(X_train.shape[0]))
print("Test Set has {} records".format(X_test.shape[0]))

Training Set has 341 records
Test Set has 228 records


## 3.Classifier

In this section:
1. Create classifier object, <b>clf</b>
2. Train classification model
3. Predict target value for both training and test sets
4. Compare predicted target with actual target for training and test sets

### 3.1 Create Classifier

In [14]:
clf = LogisticRegressionCV()

### 3.2 Train Classification Model

In [15]:
clf.fit(X_train, y_train)

LogisticRegressionCV(Cs=10, class_weight=None, cv=None, dual=False,
           fit_intercept=True, intercept_scaling=1.0, max_iter=100,
           multi_class='ovr', n_jobs=1, penalty='l2', random_state=None,
           refit=True, scoring=None, solver='lbfgs', tol=0.0001, verbose=0)

### 3.3 Predict Target
Apply trained model on both training and test sets

In [16]:
pred_train = clf.predict(X_train)
pred_test = clf.predict(X_test)

### 3.4 Training and Test Accuracy
1. Compare predicted target value with actual value for both training test sets
2. Print training and test accuracies

In [18]:
train_accuracy = round(100*accuracy_score(y_train, pred_train),2)
test_accuracy = round(100*accuracy_score(y_test, pred_test),2)
print("Training Accuray: {}%".format(train_accuracy))
print("Test Accuray: {}%".format(test_accuracy))

Training Accuray: 97.95%
Test Accuray: 95.18%
