# Use of ANN with Scikit-learn

In [1]:
from sklearn.datasets import load_breast_cancer
cancer = load_breast_cancer()

This object is like a dictionary, it contains a description of the data and the features and targets:


In [2]:
cancer.keys()


dict_keys(['data', 'target', 'target_names', 'DESCR', 'feature_names'])

In [3]:
cancer['data'].shape

(569, 30)

In [4]:
X = cancer['data']
y = cancer['target']

In [5]:
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y)

In [7]:
X_train.shape
X_test.shape

(143, 30)

The neural network may have difficulty converging before the maximum number of iterations allowed if the data is not normalized. Multi-layer Perceptron is sensitive to feature scaling, so it is highly recommended to scale your data. You must apply the same scaling to the test set for meaningful results. There are a lot of different methods for normalization of data, we use the built-in StandardScaler for standardization.

In [8]:
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
# Fit only to the training data
scaler.fit(X_train)

StandardScaler(copy=True, with_mean=True, with_std=True)

In [9]:
# Now apply the transformations to the data:
X_train = scaler.transform(X_train)
X_test = scaler.transform(X_test)

Train model. Use estimator objects of SciKit Learn. We will import our estimator (the Multi-Layer Perceptron Classifier model) from the neural_network library of SciKit-Learn

In [10]:
from sklearn.neural_network import MLPClassifier

Next create an instance of the model, there are a lot of parameters to choose to define and customize here, only hidden_layer_sizes defined. Pass this parameter in a tuple consisting of the number of neurons you want at each layer, where the nth entry in the tuple represents the number of neurons in the nth layer of the MLP model. There are many ways to choose these numbers, for simplicity choose 3 layers in this case with the same number of neurons as there are features in data set:

In [11]:
mlp = MLPClassifier(hidden_layer_sizes=(30,30,30))

Now that the model has been made, fit the training data to our model, remember data has already been processed and scaled:

In [12]:
mlp.fit(X_train,y_train)


MLPClassifier(activation='relu', alpha=0.0001, batch_size='auto', beta_1=0.9,
       beta_2=0.999, early_stopping=False, epsilon=1e-08,
       hidden_layer_sizes=(30, 30, 30), learning_rate='constant',
       learning_rate_init=0.001, max_iter=200, momentum=0.9,
       nesterovs_momentum=True, power_t=0.5, random_state=None,
       shuffle=True, solver='adam', tol=0.0001, validation_fraction=0.1,
       verbose=False, warm_start=False)

Now we have a model. Use it to get predictions. Do this with the predict() method off of our fitted model:

In [14]:
predictions = mlp.predict(X_test)


Use SciKit-Learn's built in metrics such as a classification report and confusion matrix to evaluate how well our model performed:

In [15]:
from sklearn.metrics import classification_report,confusion_matrix
print(confusion_matrix(y_test,predictions))

[[49  1]
 [ 2 91]]


In [16]:
print(classification_report(y_test,predictions))

             precision    recall  f1-score   support

          0       0.96      0.98      0.97        50
          1       0.99      0.98      0.98        93

avg / total       0.98      0.98      0.98       143

