## Breast Cancer prediction using scikit learn

In [33]:
import numpy as np
import pandas as pd
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsClassifier

cancer = load_breast_cancer()

### Data exploration

Scikit-learn works with lists, numpy arrays, scipy-sparse matrices, and pandas DataFrames, so converting the dataset to a DataFrame is not necessary for training this model. Using a DataFrame does however help make many things easier such as munging data, so let's practice creating a classifier with a pandas DataFrame. 



Converting the sklearn.dataset `cancer` to a DataFrame. 

*This function should return a `(569, 31)` DataFrame with * 

*columns = *

    ['mean radius', 'mean texture', 'mean perimeter', 'mean area',
    'mean smoothness', 'mean compactness', 'mean concavity',
    'mean concave points', 'mean symmetry', 'mean fractal dimension',
    'radius error', 'texture error', 'perimeter error', 'area error',
    'smoothness error', 'compactness error', 'concavity error',
    'concave points error', 'symmetry error', 'fractal dimension error',
    'worst radius', 'worst texture', 'worst perimeter', 'worst area',
    'worst smoothness', 'worst compactness', 'worst concavity',
    'worst concave points', 'worst symmetry', 'worst fractal dimension',
    'target']

*and index = *

    RangeIndex(start=0, stop=569, step=1)

### Adding a target column for labelling purpose

In [25]:
newData = np.c_[cancer.data, cancer.target]
newColumns = np.append(cancer.feature_names, ["target"])
df_cancer = pd.DataFrame(newData,columns=newColumns)

### Class distribution

How many instances of malignant (encoded 0) and how many benign (encoded 1)?

In [28]:
counts = df_cancer.target.value_counts(ascending=True)
counts.index = "malignant benign".split()

### Training and test data split 

Split the DataFrame into X (the data) and y (the labels).

In [31]:
X = df_cancer[df_cancer.columns[:-1]]
y = df_cancer.target
X_train, X_test, y_train, y_test = train_test_split(X,y,random_state=0)

### Classification

Using KNeighborsClassifier, fit a k-nearest neighbors (knn) classifier with X_train, y_train and using one nearest neighbor (n_neighbors = 1).

In [34]:
model = KNeighborsClassifier(n_neighbors=1)
model.fit(X_train, y_train)

KNeighborsClassifier(algorithm='auto', leaf_size=30, metric='minkowski',
           metric_params=None, n_jobs=1, n_neighbors=1, p=2,
           weights='uniform')

### Prediction

Using the knn classifier, predict the class labels for the test set X_test.