Make your own Machine Learning Tutorial ipython notebook page outlining an implementation of Machine Learning.

It could investigate setting up  a ML technology stack:

1.  Azure  azure.microsoft.com/Services/MachineLearning

2. H20 ML  - https://www.h2o.ai/

3. Tensor Flow


Implementing a demonstration page in Ipython for at Deep Learning Tutorial:

1. http://deeplearning.net/tutorial/

2. https://www.datacamp.com/community/tutorials/deep-learning-python

3. https://pythonspot.com/en/machine-learning/

Deep learning refers to neural networks with multiple hidden layers that can learn increasingly abstract representations of the input data. Keras is a simple,modular library used for deep learning in Python, especially for beginners. <br/>
I used keras to implement neural network with tensorflow machine learning library as backend.

In [25]:
%matplotlib inline
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
from keras.models import Sequential
from keras.layers import Dense
from keras.wrappers.scikit_learn import KerasClassifier
from keras.utils import np_utils
from sklearn.model_selection import cross_val_score
from sklearn.model_selection import KFold
from sklearn.preprocessing import LabelEncoder
from sklearn.pipeline import Pipeline
from sklearn.model_selection import StratifiedKFold
from sklearn.model_selection import train_test_split
from keras.optimizers import SGD, Adam

##### About Data
Wisconsin Breast Cancer dataset has 699 observations collected using fine-needle tissue from a mass under skin. It has 11 variables of which nine are predictor variables (cytological characteristics used to identify mass as benign or malign), ID and a class variable (has values 2 for benign, 4 for malignant). 458 of the samples are benign and 241 are malignant. <br/>

There are 16 samples with missing data. Data file doesn't have column names and they are listed in a separate file. <br/>
'class' variable is the outcome (2=benign, 4=malignant).

In [26]:
data = pd.read_csv('data/breast-cancer-wisconsin.data.txt', sep=",", header=None)
data.head()

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,10
0,1000025,5,1,1,1,2,1,3,1,1,2
1,1002945,5,4,4,5,7,10,3,2,1,2
2,1015425,3,1,1,1,2,2,3,1,1,2
3,1016277,6,8,8,1,3,4,3,7,1,2
4,1017023,4,1,1,3,2,1,3,1,1,2


###### Adding column headers to data

In [27]:
data.columns = ["ID", "clumpThickness", "sizeUniformity",
"shapeUniformity", "maginalAdhesion",
"singleEpithelialCellSize", "bareNuclei",
"blandChromatin", "normalNucleoli", "mitosis", "class"]
data.head(10)

Unnamed: 0,ID,clumpThickness,sizeUniformity,shapeUniformity,maginalAdhesion,singleEpithelialCellSize,bareNuclei,blandChromatin,normalNucleoli,mitosis,class
0,1000025,5,1,1,1,2,1,3,1,1,2
1,1002945,5,4,4,5,7,10,3,2,1,2
2,1015425,3,1,1,1,2,2,3,1,1,2
3,1016277,6,8,8,1,3,4,3,7,1,2
4,1017023,4,1,1,3,2,1,3,1,1,2
5,1017122,8,10,10,8,7,10,9,7,1,4
6,1018099,1,1,1,1,2,10,3,1,1,2
7,1018561,2,1,2,1,2,1,3,1,1,2
8,1033078,2,1,1,1,2,1,1,1,5,2
9,1033078,4,2,1,1,2,1,2,1,1,2


In [28]:
data.shape

(699, 11)

###### Counting the different classes of 'class' variable

In [29]:
data['class'].value_counts()

2    458
4    241
Name: class, dtype: int64

###### Converting classes '2','4' to binary '0'(benign) and '1'(malignant)

In [30]:
#data['class'].replace(2, 0, inplace=True)
#data['class'].replace(4, 1, inplace=True)
#data['class'].value_counts()

In [31]:
data.dtypes

ID                           int64
clumpThickness               int64
sizeUniformity               int64
shapeUniformity              int64
maginalAdhesion              int64
singleEpithelialCellSize     int64
bareNuclei                  object
blandChromatin               int64
normalNucleoli               int64
mitosis                      int64
class                        int64
dtype: object

######  Converting 'class' variable from int to string

In [32]:
data['class'] = data['class'].astype(str)

In [33]:
#Converting 'object' to numeric data type
data['bareNuclei'] = pd.to_numeric(data['bareNuclei'], errors='coerce')
#data['bareNuclei'] = data['bareNuclei'].astype(str).astype(int)

In [34]:
data.dtypes

ID                            int64
clumpThickness                int64
sizeUniformity                int64
shapeUniformity               int64
maginalAdhesion               int64
singleEpithelialCellSize      int64
bareNuclei                  float64
blandChromatin                int64
normalNucleoli                int64
mitosis                       int64
class                        object
dtype: object

In [35]:
data.isnull().sum()

ID                           0
clumpThickness               0
sizeUniformity               0
shapeUniformity              0
maginalAdhesion              0
singleEpithelialCellSize     0
bareNuclei                  16
blandChromatin               0
normalNucleoli               0
mitosis                      0
class                        0
dtype: int64

###### Replacing missing values with mean value of column

In [36]:
data["bareNuclei"].fillna(data["bareNuclei"].mean(), inplace=True)

In [37]:
#Now check for missing values in dataframe
data.isnull().sum()

ID                          0
clumpThickness              0
sizeUniformity              0
shapeUniformity             0
maginalAdhesion             0
singleEpithelialCellSize    0
bareNuclei                  0
blandChromatin              0
normalNucleoli              0
mitosis                     0
class                       0
dtype: int64

In [38]:
data = data.drop(['ID'], axis=1)
data.columns

Index(['clumpThickness', 'sizeUniformity', 'shapeUniformity',
       'maginalAdhesion', 'singleEpithelialCellSize', 'bareNuclei',
       'blandChromatin', 'normalNucleoli', 'mitosis', 'class'],
      dtype='object')

In [39]:
data.describe()

Unnamed: 0,clumpThickness,sizeUniformity,shapeUniformity,maginalAdhesion,singleEpithelialCellSize,bareNuclei,blandChromatin,normalNucleoli,mitosis
count,699.0,699.0,699.0,699.0,699.0,699.0,699.0,699.0,699.0
mean,4.41774,3.134478,3.207439,2.806867,3.216023,3.544656,3.437768,2.866953,1.589413
std,2.815741,3.051459,2.971913,2.855379,2.2143,3.601852,2.438364,3.053634,1.715078
min,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0
25%,2.0,1.0,1.0,1.0,2.0,1.0,2.0,1.0,1.0
50%,4.0,1.0,1.0,1.0,2.0,1.0,3.0,1.0,1.0
75%,6.0,5.0,5.0,4.0,4.0,5.0,5.0,4.0,1.0
max,10.0,10.0,10.0,10.0,10.0,10.0,10.0,10.0,10.0


In [40]:
data['mitosis'].value_counts()

1     579
2      35
3      33
10     14
4      12
7       9
8       8
5       6
6       3
Name: mitosis, dtype: int64

In [41]:
#np.ravel(data['class'])

###### Splitting data into input and output variables

In [42]:
X = data.iloc[:,0:9].values
y = data.iloc[:,9].values

##### Encoding Output Variable
Encoding is to reshape output variable from a vector to a matrix with a boolean for each class value and whether or not a given instance has that class value or not.<br/>

This is done here by encoding 'class' strings to integers using LabelEncoder() from scikit-learn. 

In [45]:
#encoding class values as integers
encoder =  LabelEncoder()
y1 = encoder.fit_transform(y)
Y = pd.get_dummies(y1).values

###### Splitting data into train and test variables

In [46]:
X_train, X_test, y_train, y_test = train_test_split(X, Y, test_size=0.30, random_state=0) 

In [47]:
print(X_train.shape)
print(y_train.shape)
print(X_test.shape)
print(y_test.shape)

(489, 9)
(489, 2)
(210, 9)
(210, 2)


###### Neural Networks
We start building neural network model using Sequential(). Keras deals with stack of layers and we add each layer using add(). 

This function takes input shape, activation function as parameters. The activation function can take values 'sigmoid', 'tanh' and 'relu'. Recent papers on neural networks suggested that 'relu' activation functio performs better than the other 2 in most cases. So I used relu as activation here. <br/>

Since my data has 9 features, I took input_shape as 9. <br/>

In first layer I used 12 neurons and in the next layer(hidden layer) I have taken 8 neurons with same activation function. In the output layer I have 2 neurons and used softmax activation which gives probabilities.

In [48]:
model = Sequential()

model.add(Dense(12, input_shape=(9,), activation='relu'))
model.add(Dense(8, activation='relu'))
#model.add(Dense(6, activation='relu'))
model.add(Dense(2, activation='softmax'))

##### Compilation
Before training model on our data, we have to do compilation using compile() of Tensorflow library. It has optimizer, loss function, metrics as arguments.

optimizer is used to browse through weights of netwrok, loss function is to minimize the model on the objective specified and metrics are the metrics we want to analyze the performance of our model.<br/>

I used adam algorithm and crossentropy as optimizer and loss function respectively. As it is a classification , I have givne accuracy as metrics.

In [49]:
model.compile(optimizer=Adam(lr=0.04), loss='categorical_crossentropy', metrics=['accuracy'])

In [50]:
model.summary()

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
dense_1 (Dense)              (None, 12)                120       
_________________________________________________________________
dense_2 (Dense)              (None, 8)                 104       
_________________________________________________________________
dense_3 (Dense)              (None, 2)                 18        
Total params: 242.0
Trainable params: 242
Non-trainable params: 0.0
_________________________________________________________________


In [51]:
model.fit(X_train, y_train, epochs=350)

Epoch 1/350
Epoch 2/350
Epoch 3/350
Epoch 4/350
Epoch 5/350
Epoch 6/350
Epoch 7/350
Epoch 8/350
Epoch 9/350
Epoch 10/350
Epoch 11/350
Epoch 12/350
Epoch 13/350
Epoch 14/350
Epoch 15/350
Epoch 16/350
Epoch 17/350
Epoch 18/350
Epoch 19/350
Epoch 20/350
Epoch 21/350
Epoch 22/350
Epoch 23/350
Epoch 24/350
Epoch 25/350
Epoch 26/350
Epoch 27/350
Epoch 28/350
Epoch 29/350
Epoch 30/350
Epoch 31/350
Epoch 32/350
Epoch 33/350
Epoch 34/350
Epoch 35/350
Epoch 36/350
Epoch 37/350
Epoch 38/350
Epoch 39/350
Epoch 40/350
Epoch 41/350
Epoch 42/350
Epoch 43/350
Epoch 44/350
Epoch 45/350
Epoch 46/350
Epoch 47/350
Epoch 48/350
Epoch 49/350
Epoch 50/350
Epoch 51/350
Epoch 52/350
Epoch 53/350
Epoch 54/350
Epoch 55/350
Epoch 56/350
Epoch 57/350
Epoch 58/350
Epoch 59/350
Epoch 60/350
Epoch 61/350
Epoch 62/350
Epoch 63/350
Epoch 64/350
Epoch 65/350
Epoch 66/350
Epoch 67/350
Epoch 68/350
Epoch 69/350
Epoch 70/350
Epoch 71/350
Epoch 72/350
Epoch 73/350
Epoch 74/350
Epoch 75/350
Epoch 76/350
Epoch 77/350
Epoch 78

Epoch 90/350
Epoch 91/350
Epoch 92/350
Epoch 93/350
Epoch 94/350
Epoch 95/350
Epoch 96/350
Epoch 97/350
Epoch 98/350
Epoch 99/350
Epoch 100/350
Epoch 101/350
Epoch 102/350
Epoch 103/350
Epoch 104/350
Epoch 105/350
Epoch 106/350
Epoch 107/350
Epoch 108/350
Epoch 109/350
Epoch 110/350
Epoch 111/350
Epoch 112/350
Epoch 113/350
Epoch 114/350
Epoch 115/350
Epoch 116/350
Epoch 117/350
Epoch 118/350
Epoch 119/350
Epoch 120/350
Epoch 121/350
Epoch 122/350
Epoch 123/350
Epoch 124/350
Epoch 125/350
Epoch 126/350
Epoch 127/350
Epoch 128/350
Epoch 129/350
Epoch 130/350
Epoch 131/350
Epoch 132/350
Epoch 133/350
Epoch 134/350
Epoch 135/350
Epoch 136/350
Epoch 137/350
Epoch 138/350
Epoch 139/350
Epoch 140/350
Epoch 141/350
Epoch 142/350
Epoch 143/350
Epoch 144/350
Epoch 145/350
Epoch 146/350
Epoch 147/350
Epoch 148/350
Epoch 149/350
Epoch 150/350
Epoch 151/350
Epoch 152/350
Epoch 153/350
Epoch 154/350
Epoch 155/350
Epoch 156/350
Epoch 157/350
Epoch 158/350
Epoch 159/350
Epoch 160/350
Epoch 161/350
Ep

Epoch 177/350
Epoch 178/350
Epoch 179/350
Epoch 180/350
Epoch 181/350
Epoch 182/350
Epoch 183/350
Epoch 184/350
Epoch 185/350
Epoch 186/350
Epoch 187/350
Epoch 188/350
Epoch 189/350
Epoch 190/350
Epoch 191/350
Epoch 192/350
Epoch 193/350
Epoch 194/350
Epoch 195/350
Epoch 196/350
Epoch 197/350
Epoch 198/350
Epoch 199/350
Epoch 200/350
Epoch 201/350
Epoch 202/350
Epoch 203/350
Epoch 204/350
Epoch 205/350
Epoch 206/350
Epoch 207/350
Epoch 208/350
Epoch 209/350
Epoch 210/350
Epoch 211/350
Epoch 212/350
Epoch 213/350
Epoch 214/350
Epoch 215/350
Epoch 216/350
Epoch 217/350
Epoch 218/350
Epoch 219/350
Epoch 220/350
Epoch 221/350
Epoch 222/350
Epoch 223/350
Epoch 224/350
Epoch 225/350
Epoch 226/350
Epoch 227/350
Epoch 228/350
Epoch 229/350
Epoch 230/350
Epoch 231/350
Epoch 232/350
Epoch 233/350
Epoch 234/350
Epoch 235/350
Epoch 236/350
Epoch 237/350
Epoch 238/350
Epoch 239/350
Epoch 240/350
Epoch 241/350
Epoch 242/350
Epoch 243/350
Epoch 244/350
Epoch 245/350
Epoch 246/350
Epoch 247/350
Epoch 

Epoch 264/350
Epoch 265/350
Epoch 266/350
Epoch 267/350
Epoch 268/350
Epoch 269/350
Epoch 270/350
Epoch 271/350
Epoch 272/350
Epoch 273/350
Epoch 274/350
Epoch 275/350
Epoch 276/350
Epoch 277/350
Epoch 278/350
Epoch 279/350
Epoch 280/350
Epoch 281/350
Epoch 282/350
Epoch 283/350
Epoch 284/350
Epoch 285/350
Epoch 286/350
Epoch 287/350
Epoch 288/350
Epoch 289/350
Epoch 290/350
Epoch 291/350
Epoch 292/350
Epoch 293/350
Epoch 294/350
Epoch 295/350
Epoch 296/350
Epoch 297/350
Epoch 298/350
Epoch 299/350
Epoch 300/350
Epoch 301/350
Epoch 302/350
Epoch 303/350
Epoch 304/350
Epoch 305/350
Epoch 306/350
Epoch 307/350
Epoch 308/350
Epoch 309/350
Epoch 310/350
Epoch 311/350
Epoch 312/350
Epoch 313/350
Epoch 314/350
Epoch 315/350
Epoch 316/350
Epoch 317/350
Epoch 318/350
Epoch 319/350
Epoch 320/350
Epoch 321/350
Epoch 322/350
Epoch 323/350
Epoch 324/350
Epoch 325/350
Epoch 326/350
Epoch 327/350
Epoch 328/350
Epoch 329/350
Epoch 330/350
Epoch 331/350
Epoch 332/350
Epoch 333/350
Epoch 334/350
Epoch 



<keras.callbacks.History at 0x1e4dd403438>

##### Evaluate and Predict Model
After training the neural network model on train data, I evaluated the model on my test data. This gives scores for every sample (input,output) and metrics(accuracy in this case).<br/>

I got an accuracy of 95% on the test data.

In [55]:
scores = model.evaluate(X_test, y_test)
print("\n%s: %.2f%%" % (model.metrics_names[1], scores[1]*100))

 32/210 [===>..........................] - ETA: 0s
acc: 94.76%


In [56]:
y_pred = model.predict(X_test)

y_test_class = np.argmax(y_test, axis=1)
y_pred_class = np.argmax(y_pred, axis=1)

In [57]:
from sklearn.metrics import classification_report, confusion_matrix

print(classification_report(y_test_class, y_pred_class))
print(confusion_matrix(y_test_class, y_pred_class))

             precision    recall  f1-score   support

          0       0.97      0.95      0.96       135
          1       0.91      0.95      0.93        75

avg / total       0.95      0.95      0.95       210

[[128   7]
 [  4  71]]
