

#  <span style="color:#0b186c;">Introduction to DL: Perceptrons</span>

---

“Deep learning is a subset of machine learning that's based on artificial neural networks. The learning process is deep because the structure of artificial neural networks consists of multiple input, output, and hidden layers. Each layer contains units that transform the input data into information that the next layer can use for a certain predictive task. Thanks to this structure, a machine can learn through its own data processing.” - Microsoft, 2022



<br></br>

## <span style="color:#0b186c;">Table of Contents:</span>
* [History of Perceptrons](#first-bullet)
* [Dataset Information](#second-bullet)
* [Multilayer Perceptron (MLP)](#third-bullet)
* [Conclusion](#fourth-bullet)

#  <span style="color:#0b186c;">History of Perceptrons</span><a class="anchor" id="first-bullet"></a>

---


Perceptron learning is a fundamental concept in neural networks and forms the basis for more complex artificial neural networks used in deep learning. The mathematical modeling of a perceptron dates back as far as the 1940s with the first implementation being introduced in the late 1950s by an American psychologist, Frank Rosenblatt. The original intent of the perceptron was to create a physical machine capable of mimicing the learning behavior and process of a human neuron. The perceptron consisted of one or more numerical input features, weights associated with each input, and a bias that was added to the weighted sum of the inputs. These components would produce an output similar to the dependent y output discussed in traditional machine learning. However, the original perceptron was only able to perform well on data points that could be linearly separated. 

Advances from the original single-layered perceptron included the expansion into multiple layers and the addition of activation functions to apply non-linear transformations to the neuron outputs at each layer. These advances have culminated over the last few decades to bring us to where we are now in the world of Deep Learning with `Multi-layer Perceptrons` (MLP) and `Artificial Neural Networks` (ANNs). These foundational algorithms are at the core of what has driven the rapid expanse and development of cutting edge technologies, like our work on `Arcas`, since the late 1990s. 


## <span style="color:#0b186c;">Required Imports:</span>

<div class="alert alert-warning">

<b>Note:</b> If you have not previously installed these `packages`, you can use the cell below to perform the required `pip` installs.

</div>

In [None]:
# In case you still need to perform some pip installs:
! pip install --user pandas -q
! pip install --user numpy -q
! pip install --user scikit-learn -q

In [None]:
# Dataframe and array libraries
import pandas as pd
import numpy as np

# Libraries for visualizing data
import matplotlib.pyplot as plt
import seaborn as sns

# Retrieves the dataset from Scikit-learn
from sklearn.datasets import load_iris

# Required for performing standardization
from sklearn.preprocessing import StandardScaler

# Required for training and validating a model
from sklearn.model_selection import train_test_split

# Required for instantiating and running the MLP neural network
from sklearn.neural_network import MLPClassifier

# Classification metrics and confusion matrix
from sklearn.metrics import confusion_matrix, accuracy_score, ConfusionMatrixDisplay

# Filters out warning messages
import warnings
warnings.filterwarnings('ignore')

#  <span style="color:#0b186c;">Dataset Information</span><a class="anchor" id="second-bullet"></a>

---

We will be using a dataset containing 3 species in the Iris genus, namely, Iris Setosa, Iris Versicolor and Iris Virginica found in the Gaspé Peninsula. For the purposes of an integral study, the collected Iris samples were, "all from the same pasture, and picked on the same day and measured at the same time by the same person with the same apparatus." The dataset contains 150 rows of data, 50 rows of data for each species of Iris flower. The column names represent the feature of the flower that was studied and recorded.

Our target dataset can be found in the Scikit-learn library, so we will be importing it directly from the library and storing it into a Pandas dataframe.

https://scikit-learn.org/stable/auto_examples/datasets/plot_iris_dataset.html

In [None]:
# Import the iris dataset
iris = load_iris(as_frame=True)

# Place the dataset into a dataframe
df = iris.frame 

# View the first 5 records in the dataset
df.head()



A sepal is a part of the flower of angiosperms. Usually green, sepals typically function as protection for the flower in bud, and often as support for the petals when in bloom. Petals are modified leaves that surround the reproductive parts of flowers. They are often brightly colored or unusually shaped to attract pollinators. 



The petal and sepal measurement values allow us to look for patterns related to the specific species of Iris flower.


<div class="alert alert-info">
   
We can use the `.info()` method for our dataframe to view a concise summary of the information contained within. This includes the number of observations, columns and data types, and any missing values.
    
 </div>

In [None]:
df.info()

<div class="alert alert-info">
   
For numerical features in the dataframe, we can use the `.describe()` method to view relevant statistical information about each of the features. Understanding these values can assist in identifying the presence of outliers.
    
 </div>

In [None]:
df.describe()

<div class="alert alert-info">
   
Additionally, we can use a `.pairplot()` from the `seaborn` library to visualize a scatter matrix of the independent variables. We can color code the plotted points based on the `target` feature to identify any discernable patterns in the measurement values.
    
 </div>

In [None]:
# Set the figure size
sns.set(rc={'figure.figsize':(12,8),'ytick.labelsize':(12)})

# Create a pairplot
sns.pairplot(df, hue = "target", palette = "Set2")

<div class="alert alert-info">
   
Our dataset contains an equal number of observations for each of the Iris flowers. We can visualize the target variable distributions with a pie chart:
    
</div>

In [None]:
# Create a pie chart for the target variable
df.target.value_counts().plot(kind='pie', figsize=(8, 8), fontsize=10, autopct='%1.0f%%')
plt.title("Target Variable Distribution", fontsize = 20)
plt.show()

<div class="alert alert-info">
   
Lastly, we can use the `.corr()` method on our dataframe to identify linear relationships between the independent variables and the dependent variable. This also helps identify collinearity that may exist amongst the independent variables as well. The correlation matrix can be enhanced by using a `.heatmap()` from the `seaborn` library that scales the specified hue based on the severity of the linear relationship.
    
</div>

In [None]:
# Set the figure size
sns.set(rc={'figure.figsize':(12,8),'ytick.labelsize':(12)})

# Use the corr method to create the correlation matrix
correlation_matrix = df.corr().round(2)

# Create a heatmap based on the severity of the linear relationship
sns.heatmap(data = correlation_matrix, annot = True, cmap = "Blues")
plt.title("Variable Correlation Heatmap\n", fontsize = 20)
plt.show()

#  <span style="color:#0b186c;">Multilayer Perceptron (MLP)</span><a class="anchor" id="third-bullet"></a>

---


A Mulilayer Perceptron (MLP) network is a fully connected, feedforward ANN consisting of an input layer, one or more hidden layers, and an output layer. This type of network is sometimes used to ambiguously refer to *any* type of ANN, however, it can also refer to specific ones (e.g., specific activation functions, specific perceptron algorithm variations, etc.). 

The `scikit-learn` library includes 2 variations of an MLP model in the form of a regressor and a classifier. The primary difference between them is the loss and activation functions. Since the iris dataset includes a discrete, categorical target variable, we will be using the `MLPClassifier()`.

As seen on the documentation page, the MLP model has a predefined `loss` and `optimizer` function to determine how it learns. Since the model doesn't know the relationship between the x and y values, it has to guess. The loss function defines how the effectiveness of the model's guess is evaluated and scored by comparing the guessed answers with the known correct answers in the dataset. After each guess, the optimizer defines the logic used to update the weights learned by the model to minimize the loss function.

https://scikit-learn.org/stable/modules/generated/sklearn.neural_network.MLPClassifier.html

<div class="alert alert-success">
    
First, split the data into independent variables (X) and the dependent target variable (y):

</div>

In [None]:
# Split the independent (X) and dependent (y) variables
X = df.iloc[:, :-1]
y = df.iloc[:, -1].values

# Review the input variables without a target label
X

<div class="alert alert-info">
    
&nbsp;**Note:** It is imperative that the subsets are representative of the whole &nbsp;dataset. 
The best way to accomplish this is using the built-in function, &nbsp;`train_test_split()`.

</div>

https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.train_test_split.html

In [None]:
# Split the independent (X) and dependent (y) variables
X = df.iloc[:, :-1]
y = df.iloc[:, -1].values

# Split the data into an 80/20 
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2)

# Output the shape of the training set
X_train.shape

Using **standardization**, we can change the form of our features into a normal distribution, so that it easier to correctly represent the feature weights in the modeling process.

The `StandardScaler()` from `scikit-learn` standardizes independently on each feature by setting the mean to 0 and the standard deviation to 1 to accomplish the scaling appropriately. First, the scaler has to be fit on the training data to learn the relevant statistics. Using the `.fit_transform()` method, we can fit and simultaneously transform the training data in a single line of code. The test data is then transformed using the `.transform()` method.

https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.StandardScaler.html

In [None]:
# Instantiate the standard scaler
sc = StandardScaler()

# Fit and transform the scaler on the training set
X_train = sc.fit_transform(X_train) 

# Transform the fit scaler on the test set
X_test = sc.transform(X_test) 

<div class="alert alert-info">
    
&nbsp;**Note:** The output layer activation function is determined automatically by the `MLPClassifier` internally. Binary classification problems will use the `Logistic` activation function and multi-class classification problems will use the `Softmax` activation function.

</div>

In [None]:
# Instantiate the MLP ANN
MLP = MLPClassifier()

# Fit the ANN on the training data
MLP.fit(X_train, y_train)

# Verify the output activation function is softmax
MLP.out_activation_

In [None]:
# Make predictions based on the X values in the test set
y_pred = MLP.predict(X_test)

# Calculate the accuracy score of the test set
score = round((accuracy_score(y_test, y_pred) * 100), 2)

#Plot the confusion Matrix for the predictions
cm = confusion_matrix(y_test, y_pred)
cm_display = ConfusionMatrixDisplay(cm)
cm_display = cm_display.plot(include_values=True, cmap='Blues', ax=None, xticks_rotation='horizontal')
plt.grid(False)
plt.show()

# Print the accuracy score on the validation data
print(f"Accuracy = {score}%")

#  <span style="color:#0b186c;">Conclusion</span><a class="anchor" id="fourth-bullet"></a>

---