# Foundations of Artificial Intelligence and Machine Learning
## A Program by IIIT-H and TalentSprint
#### To be done in the Lab


The objective of this experiment is to understand MultiLayer Perceptron(MLP).

In this experiment we will use famous Iris data set.This is perhaps the best known database to be found in the pattern recognition literature. Fisher's paper is a classic in the field and is referenced frequently to this day. 

The data set contains 3 classes of 50 instances each, where each class refers to a type of iris plant. One class is linearly separable from the other 2; the latter are NOT linearly separable from each other. 


#### Data Attributes

  1. sepal length in cm 
  2. sepal width in cm 
  3. petal length in cm 
  4. petal width in cm 
  5. class: 
     -- Iris Setosa  
     -- Iris Versicolour 
     -- Iris Virginica

**MultiLayer Perceptron **

An MLP can be viewed as a logistic regression classifier where the input is first transformed using a learnt non-linear transformation \Phi. This transformation projects the input data into a space where it becomes linearly separable. This intermediate layer is referred to as a hidden layer. A single hidden layer is sufficient to make MLPs a universal approximator.

In [None]:
# Load required libraries
from sklearn import datasets
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import Perceptron
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

from sklearn.neural_network import MLPClassifier
import numpy as np

import numpy as np
import matplotlib.pyplot as plt
from sklearn import svm, datasets
from Utils import *


Let us load the iris dataset from sklearn datasets package

In [None]:
# Load the iris dataset
iris = datasets.load_iris()

# Create our X and y data
X = iris.data
y = iris.target

In [None]:
# View ten observations of our y data
y[45:55]

In [None]:
# View the corresponding x data.
X[45:55]

 Split the data into 70% training data and 30% test data

In [None]:

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3)

Now let us train the scaler, which standarizes all the features to have mean=0 and unit variance

In [None]:

sc = StandardScaler()
sc.fit(X_train)

In [None]:
# Apply the scaler to the X training data
X_train_std = sc.transform(X_train)

# Apply the SAME scaler to the X test data
X_test_std = sc.transform(X_test)

In [None]:
# Calling the MLP Classifier instance
clf = MLPClassifier(activation='logistic', alpha=1e-05, batch_size=6, beta_1=0.9,
       beta_2=0.999, early_stopping=False, epsilon=1e-08,
       hidden_layer_sizes=(7, 3), learning_rate='constant',
       learning_rate_init=0.001, max_iter=300, momentum=0.9,
       nesterovs_momentum=True, power_t=0.5, random_state=1, shuffle=True,
       solver='lbfgs', tol=0.0001, validation_fraction=0.1, verbose=False,
       warm_start=False)


In [None]:
# Trying to fit the data into the model
clf.fit(X_train_std, y_train)

In [None]:
# Apply the trained perceptron on the X data to make predicts for the y test data
y_pred = clf.predict(X_test_std)

In [None]:
# View the accuracy of the model, which is: 1 - (observations predicted wrong / total observations)
print('Accuracy: %.2f' % accuracy_score(y_test, y_pred))

In [None]:
clf.coefs_

In [None]:
clf.classes_

In [None]:
for trial in range(20):
    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3)
    sc = StandardScaler()
    sc.fit(X_train)
    X_train_std = sc.transform(X_train)
    X_test_std = sc.transform(X_test)
    clf = MLPClassifier(activation='logistic', alpha=1e-05, batch_size=6, beta_1=0.9,
       beta_2=0.999, early_stopping=False, epsilon=1e-08,
       hidden_layer_sizes=(7, ), learning_rate='constant',
       learning_rate_init=0.001, max_iter=300, momentum=0.9,
       nesterovs_momentum=True, power_t=0.5, random_state=1, shuffle=True,
       solver='lbfgs', tol=0.0001, validation_fraction=0.1, verbose=False,
       warm_start=False)
    clf.fit(X_train_std, y_train)
    y_pred = clf.predict(X_test_std)
    print(y_pred)
    print('Accuracy: %.2f' % accuracy_score(y_test, y_pred))

As you can see this is performing much better; however a few times the neural netweok is getting caught in the local minima

In [None]:
xx, yy, zz, aa = np.mgrid[-0:10:0.5, 0:10:0.5, 0:10:0.5, 0:10:0.5]
grid = np.c_[xx.ravel(), yy.ravel(), zz.ravel(), aa.ravel()]
probs = clf.predict_proba(grid)[:, 1].reshape(xx.shape)

In [None]:
import matplotlib.pyplot as plt
f, ax = plt.subplots(figsize=(8, 6))
xx, yy = np.mgrid[-0:10:0.5, 0:10:0.5]
contour = ax.contourf(xx, yy, probs[:,:, 0, 0].reshape(20,20), 25, cmap="RdYlBu",
                      vmin=0, vmax=1)
ax_c = f.colorbar(contour)
ax_c.set_label("$P(y = 1)$")
ax_c.set_ticks([0, .25, .5, .75, 1])

ax.scatter(X_train[:,1], X_train[:, 2], c=y_train[:], s=50,
           cmap="RdYlBu", vmin=-.2, vmax=1.2,
           edgecolor="white", linewidth=1)
plt.show()

In [None]:

# Take the first two features. TODO - Try combinations of two features
X = iris.data[:, :2]
y = iris.target

# we create an instance of MLP and fit out data. We do not scale our
# data since we want to plot the vectors
models = [clf]
models = [clf.fit(X, y) for clf in models]

# title for the plots
titles = 'Classification with MLP'

# Set-up 2x2 grid for plotting.

X0, X1 = X[:, 0], X[:, 1]
xx, yy = make_meshgrid(X0, X1)


f, ax = plt.subplots(figsize=(8, 6))

plot_contours(ax, clf, xx, yy, cmap=plt.cm.coolwarm, alpha=0.8)
ax.scatter(X0, X1, c=y, cmap=plt.cm.coolwarm, s=20, edgecolors='k')
ax.set_xlim(xx.min(), xx.max())
ax.set_ylim(yy.min(), yy.max())
ax.set_xlabel('Sepal length')
ax.set_ylabel('Sepal width')
ax.set_xticks(())
ax.set_yticks(())
ax.set_title(titles)

plt.show()