# Introduction to Multi-Layer Perceptrons
This notebook will take you through the basics of creating multi-layer perceptrons, one of the oldest models in machine learning and still one of the best universal function approximators in computer science. Below is an illustration of an extremely simplified MLP that takes in 3 inputs, has one hidden layer with 4 nodes, and outputs a single prediction. We'll build a very similar MLP to classify irises into their species based on the petal and sepal qualities. This notebook borrows heavily from [this Iris ML notebook](https://www.kaggle.com/code/ash316/ml-from-scratch-with-iris/data) and [this one as well](https://www.kaggle.com/code/mohitchaitanya/simple-iris-dataset-classification-using-pytorch).

Learning outcomes:
1. Pull in data on Kaggle
2. Inspect and explore the data
3. Split the dataset into train/test
4. Train classical machine learning models on the data
5. Build a simple ANN with Pytorch for classification


    <img src="https://upload.wikimedia.org/wikipedia/commons/4/41/Iris_versicolor_3.jpg" alt="Iris versicolor" style="width:40%;">
    <img src="https://isaiahlg.com/portfolio/csci5922/mod2/nn1.png" alt="Basic MLP" style="width:37%;">

## Introducing the Iris Dataset

From [Wikipedia](https://en.wikipedia.org/wiki/Iris_flower_data_set): "The Iris flower data set or Fisher's Iris data set is a multivariate data set used and made famous by the British statistician and biologist Ronald Fisher in his 1936 paper The use of multiple measurements in taxonomic problems as an example of linear discriminant analysis. It is sometimes called Anderson's Iris data set because Edgar Anderson collected the data to quantify the morphologic variation of Iris flowers of three related species. Two of the three species were collected in the Gaspé Peninsula 'all from the same pasture, and picked on the same day and measured at the same time by the same person with the same apparatus.'

The data set consists of 50 samples from each of three species of Iris (Iris setosa, Iris virginica and Iris versicolor). Four features were measured from each sample: the length and the width of the sepals and petals, in centimeters. Based on the combination of these four features, Fisher developed a linear discriminant model to distinguish each species. Fisher's paper was published in the Annals of Eugenics (today the Annals of Human Genetics)."

## Importing Data into Kaggle Notebooks
We want our iris data into this notebook. To do this:
- Go to the "Input" section on the right
- Click "Add Input"
- Click "Datasets" to search for just datasets
- Search "iris"
- Import the first result called "Iris Species" by clicking the +
The dataset should appear in the right-hand menu under datasets.

Note: this notebook can be run with the standard CPU since the dataset is quite simple. We also shouldn't need "internet on" for this one.

## Import Necessary Libraries
Run this code to import our data manipulation and visualization libaries. You should see the Iris dataset printed out from the directory "/kaggle/input".

In [None]:
# This Python 3 environment comes with many helpful analytics libraries installed
# It is defined by the kaggle/python docker image: https://github.com/kaggle/docker-python

import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
import seaborn as sns # plotting
import matplotlib.pyplot as plt # plotting

## Configure the Current Working Directory
By default, our notebook is looking in `/kaggle/working/` for files. But the dataset we just imported is in `/kaggle/input/`. Let's set the working directory to just `/kaggle/` so we can use relative file paths for the rest of the notebook.

In [None]:
import os

print("The default working directory is:", os.getcwd())

# Let's set the working directory to just kaggle
os.chdir('/kaggle') # modify here if not using kaggle to your actual current directory
print("Now the working directory is:", os.getcwd())

# Now let's look at the input subdirectory of our working directory for our iris dataset
for dirname, _, filenames in os.walk('input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))


## Import and Inspect the Dataset

In [None]:
iris = pd.read_csv("input/iris/Iris.csv") #load the dataset with pandas
iris.head(2) #show the first 2 rows from the dataset

In [None]:
iris.info()  #checking if there is any inconsistency in the dataset
#as we see there are no null values in the dataset, so the data can be processed

In [None]:
# drop the "Id" column that we won't use
iris.drop('Id',axis=1,inplace=True) #dropping the Id column as it is unecessary, axis=1 specifies that it should be column wise, inplace =1 means the changes should be reflected into the dataframe

## Exploratory Data Analysis with the Dataset
Let's start by plotting some of the input variables and coloring them by the label (species). This helps give us a sense of what the input data look like, and whether this is going to be an easy classification task.

In [None]:
fig = iris[iris.Species=='Iris-setosa'].plot(kind='scatter',x='SepalLengthCm',y='SepalWidthCm',color='orange', label='setosa')
iris[iris.Species=='Iris-versicolor'].plot(kind='scatter',x='SepalLengthCm',y='SepalWidthCm',color='blue', label='versicolor',ax=fig)
iris[iris.Species=='Iris-virginica'].plot(kind='scatter',x='SepalLengthCm',y='SepalWidthCm',color='green', label='virginica', ax=fig)
fig.set_xlabel("Sepal Length")
fig.set_ylabel("Sepal Width")
fig.set_title("Sepal Length versus Sepal Width by Iris Species")
fig=plt.gcf()
fig.set_size_inches(10,6)
plt.show()

The above graph shows relationship between the sepal length and width. Now we will check relationship between the petal length and width.

In [None]:
fig = iris[iris.Species=='Iris-setosa'].plot.scatter(x='PetalLengthCm',y='PetalWidthCm',color='orange', label='setosa')
iris[iris.Species=='Iris-versicolor'].plot.scatter(x='PetalLengthCm',y='PetalWidthCm',color='blue', label='versicolor',ax=fig)
iris[iris.Species=='Iris-virginica'].plot.scatter(x='PetalLengthCm',y='PetalWidthCm',color='green', label='virginica', ax=fig)
fig.set_xlabel("Petal Length")
fig.set_ylabel("Petal Width")
fig.set_title(" Petal Length versus Petal Width by Iris Species")
fig=plt.gcf()
fig.set_size_inches(10,6)
plt.show()

As we can see that the petal features are giving a better cluster division compared to the sepal features. This is an indication that the petals can help in better and accurate predictions over the sepal. We will check that later.

### Now let us see how are the length and width are distributed

In [None]:
iris.hist(edgecolor='black', linewidth=1.2)
fig=plt.gcf()
fig.set_size_inches(12,6)
plt.show()

### Now let us see how the length and width vary according to the species

In [None]:
plt.figure(figsize=(15,10))
plt.subplot(2,2,1)
sns.violinplot(x='Species',y='PetalLengthCm',data=iris)
plt.subplot(2,2,2)
sns.violinplot(x='Species',y='PetalWidthCm',data=iris)
plt.subplot(2,2,3)
sns.violinplot(x='Species',y='SepalLengthCm',data=iris)
plt.subplot(2,2,4)
sns.violinplot(x='Species',y='SepalWidthCm',data=iris)

The violinplot shows density of the length and width in the species. The thinner part denotes that there is less density whereas the fatter part conveys higher density

## Setting Up Classical Machine Learning

**Classification**: samples belong to two or more classes and we want to learn from already labeled data how to predict the class of unlabeled data

**Regression**: if the desired output consists of one or more continuous variables, then the task is called regression. An example of a regression problem would be the prediction of the length of a salmon as a function of its age and weight.

Before we start, let's define some ML terms:

**Attribute**-->An attribute is a property of an instance that may be used to determine its classification. In the following dataset, the attributes are the petal and sepal length and width. It is also known as a **feature**.

**Target variable**, in the machine learning context is the variable that is or should be the output. Here, the target variable is the flower species.

In [None]:
pip install

In [None]:
# importing alll the necessary packages to use the various classification algorithms
from sklearn.linear_model import LogisticRegression  # for Logistic Regression algorithm
from sklearn.model_selection import train_test_split #to split the dataset for training and testing
from sklearn.neighbors import KNeighborsClassifier  # for K nearest neighbours
from sklearn import svm  #for Support Vector Machine (SVM) Algorithm
from sklearn import metrics #for checking the model accuracy
from sklearn.tree import DecisionTreeClassifier #for using Decision Tree Algoithm

In [None]:
iris.shape #get the shape of the dataset

### Steps To Be followed When Applying an Algorithm

 1. Split the dataset into training and testing dataset. The testing dataset should be smaller than training one as it will help in training the model better. A good rule of thumb is 80/20 train/test, or 60/20/20 if you're have train/validate/test. 80/10/10 can be better for large datasets.
 2. Select an appropriate model that fits your task (ie classification, regression, etc)..
 3. Then pass the training dataset to the algorithm to train it. We use the **.fit()** method
 4. Then pass the testing data to the trained algorithm to predict the outcome. We use the **.predict()** method.
 5. We then check the accuracy by **passing the predicted outcome and the actual output** to the model.

### Splitting The Data into Training And Testing Dataset

In [None]:
train, test = train_test_split(iris, test_size = 0.3)# in this our main data is split into train and test
# the attribute test_size=0.2 splits the data into 80% and 20% ratio. train=80% and test=20%
print("Training data shape:", train.shape)
print("Testing data shape:", test.shape)

In [None]:
# let's set the features into a variable X and the label into variable y
train_X = train[['SepalLengthCm','SepalWidthCm','PetalLengthCm','PetalWidthCm']]# taking the training data features
train_y=train.Species# output of our training data
test_X= test[['SepalLengthCm','SepalWidthCm','PetalLengthCm','PetalWidthCm']] # taking test data features
test_y =test.Species   #output value of test data

Let's peek at the train input features and label...

In [None]:
train_X.head()

In [None]:
train_y.head()  ##output of the training data

### Support Vector Machine (SVM)
A Support Vector Machine (SVM) is a powerful machine learning algorithm widely used for both linear and nonlinear classification, as well as regression and outlier detection tasks. SVMs are highly adaptable, making them suitable for various applications such as text classification, image classification, spam detection, handwriting identification, gene expression analysis, face detection, and anomaly detection.

SVMs are particularly effective because they focus on finding the maximum separating hyperplane between the different classes in the target feature, making them robust for both binary and multiclass classification. In this outline, we will explore the Support Vector Machine (SVM) algorithm, its applications, and how it effectively handles both linear and nonlinear classification, as well as regression and outlier detection tasks.

One place to learn more is [here](https://www.geeksforgeeks.org/support-vector-machine-algorithm/), but there are so many places to learn!

In [None]:
model = svm.SVC() #select the algorithm
model.fit(train_X,train_y) # we train the algorithm with the training data and the training output
prediction=model.predict(test_X) #now we pass the testing data to the trained algorithm
print('The accuracy of the SVM is:',metrics.accuracy_score(prediction,test_y))#now we check the accuracy of the algorithm. 

The SVM does pretty well! Let's check some other models.

### Logistic Regression
Logistic regression is a supervised machine learning algorithm used for classification tasks where the goal is to predict the probability that an instance belongs to a given class or not. Logistic regression is a statistical algorithm which analyze the relationship between two data factors. Logistic regression is used for binary classification where we use sigmoid function, that takes input as independent variables and produces a probability value between 0 and 1. For example, we have two classes Class 0 and Class 1 if the value of the logistic function for an input is greater than 0.5 (threshold value) then it belongs to Class 1 otherwise it belongs to Class 0. It’s referred to as regression because it is the extension of linear regression but is mainly used for classification problems.

Learn more about logistic regression [here](https://www.geeksforgeeks.org/understanding-logistic-regression/).

In [None]:
model = LogisticRegression(max_iter = 1000)
model.fit(train_X,train_y)
prediction=model.predict(test_X)
print('The accuracy of the Logistic Regression is',metrics.accuracy_score(prediction,test_y))

### Decision Tree
A decision tree is a flowchart-like structure used to make decisions or predictions. It consists of nodes representing decisions or tests on attributes, branches representing the outcome of these decisions, and leaf nodes representing final outcomes or predictions. Each internal node corresponds to a test on an attribute, each branch corresponds to the result of the test, and each leaf node corresponds to a class label or a continuous value.

Learn more about decision trees [here](https://www.geeksforgeeks.org/decision-tree/). 

In [None]:
model=DecisionTreeClassifier()
model.fit(train_X,train_y)
prediction=model.predict(test_X)
print('The accuracy of the Decision Tree is',metrics.accuracy_score(prediction,test_y))

### K-Nearest Neighbors
The K-Nearest Neighbors (KNN) algorithm is a supervised machine learning method employed to tackle classification and regression problems. Evelyn Fix and Joseph Hodges developed this algorithm in 1951, which was subsequently expanded by Thomas Cover. The article explores the fundamentals, workings, and implementation of the KNN algorithm.

Learn more about how K-Nearest Neighbors works [here](https://www.geeksforgeeks.org/k-nearest-neighbours/).

In [None]:
model=KNeighborsClassifier(n_neighbors=3) #this examines 3 neighbours for putting the new data into a class
model.fit(train_X,train_y)
prediction=model.predict(test_X)
print('The accuracy of the KNN is',metrics.accuracy_score(prediction,test_y))

## Findings
It seems that these classical machine learning models are pretty good at distinguishing between the various flower types based on the sepal and pedal attributes. Perhaps this task is a little bit too easy. As a follow up task, we'll try using less information (either sepal or pedal dimensions), and see how well the models do. After that, we'll move onto a more challenging dataset.

## Task

Repeat the above analysis but using either just the pedal dimensions or the sepal dimensions as input features. 

In [None]:
## get coding here (though you'll likely need more cells.)

## Deliverable
Write a short paragraph that addresses these questions. Which input features do you expect to perform better, sepal or pedal dimensions? Which one actually performs better? Why do you think that is? Is there a way you could have known this without running it? How does it compare to using all of the features as predictors? What do you think this machine learning task is relatively easy?


**Your response:**


Now download and submit your notebook on Discord!
