<p align="center">
  <img src="https://miro.medium.com/max/1400/1*7bnLKsChXq94QjtAiRn40w.png" 
</p>

### <div align="center">Machine Learning and Statistics: Project</div>
#### <div align="center">Author: Sean Elliott</div>


### <div align="center">Iris Fisher Dataset</div>

The Iris Fisher Dataset, made famous by British statistician Ronalf Fisher in 1936, is a multivariate dataset which explores the relationships between 3 species of Iris flower. Two of the species were collected in the Gaspe peninsula which is situated along the southern shore of the St Lawrence River. This river extends out from the Matapedia Value in Quebec. Canada into the Gulf of St.Lawrence.

The dataset consists of 50 samples from each of the three species of Iris - Iris Setosa, Iris Virginica and Iris Versicolor.

Four distinct features were measured from each sample - the length and width of the sepals (respectively) and the length and width of the petals (again, respectively). The common unit of measurement is in centimeters.

Based on the combination of the 4 distinct features; Fisher was able to develop a linear discriminant model to distinguish the 4 species from one another.

As the dataset has grown in popularity; it is commonly used as an example for statistical classification techniques in machine learning; like SVM (support vector machine) along with others.

In [81]:
import pandas as pd

import numpy as np

import seaborn as sns 

import matplotlib.pyplot as plt 

import sklearn


In [82]:
# Start by importing the csv file for the dataset
csv_url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data'
col_names = ['Sepal_Length','Sepal_Width','Petal_Length','Petal_Width','Class']
iris =  pd.read_csv(csv_url, names = col_names)
print(iris)


     Sepal_Length  Sepal_Width  Petal_Length  Petal_Width           Class
0             5.1          3.5           1.4          0.2     Iris-setosa
1             4.9          3.0           1.4          0.2     Iris-setosa
2             4.7          3.2           1.3          0.2     Iris-setosa
3             4.6          3.1           1.5          0.2     Iris-setosa
4             5.0          3.6           1.4          0.2     Iris-setosa
..            ...          ...           ...          ...             ...
145           6.7          3.0           5.2          2.3  Iris-virginica
146           6.3          2.5           5.0          1.9  Iris-virginica
147           6.5          3.0           5.2          2.0  Iris-virginica
148           6.2          3.4           5.4          2.3  Iris-virginica
149           5.9          3.0           5.1          1.8  Iris-virginica

[150 rows x 5 columns]


### <div align="center">Project Purpose</div>

The purpose of this project is to explore classification algorithms using the Fisher Iris Dataset. This project will explore classification algorithms, their purpose, and their best use cases.
The hope is that the outcome will provide (using classification algorithms) a trained machine that uses the dataset to learn how to accurately catergorise new observations into their correct classes or groups.

### <div align="center">What is Supervised Learning?</div>

Supervised machine learning is a subcatergory of Machine Learning and Artificial Intelligence. It is deifned by its use of labelled datasets to train algorithms to accurately classify the data and predict accurate data outcomes. As data is fed into the model, it adjusts its 'weights' inorder that the model fits appropriately as part of the validation process. Supervised Learning helps to solve a variety of real-world problems at scale. For example after supervised learning, machine algorithms can be trained to identify and classifying spam emails and then directed to put the suspected emails into a 'spam' folder.

Supervised Learning can be split into two types of problems when data mining:

- Classification
- Regression

<p align="center"><img src="https://www.simplilearn.com/ice9/free_resources_article_thumb/Regression_vs_Classification.jpg"></p>


#### <div align="center">Classification</div>

Classification uses an algorithm to accurately catergorise test data. It does this by recognising specific instances or patterns within the data and then makes a judgement using it's findings. Depending on what conclusion that the algorithm comes to; will determine how the datapoints are labelled or defined.

Some common instances of classifiers are:

- linear classifiers
- support vector machines (SVM)
- decision tress
- k-nearest neighbour
- random forest

#### <div align="center">Regression</div>

Regression is used to understand the relationship between dependent and independent variables within datasets. Regression is also commonly used to make predictions on future datapoints and their 
likely trajectory (a common exmaple might be for sales projections)

Some common examples of regression is:

- linear regression
- logistical regression
- polynomial regression


-----

<p align="center"><img src="https://d1rwhvwstyk9gu.cloudfront.net/2021/03/Flowchart.png"></p>

### <div align="center">What is Classification in Machine Learning?</div>

Classification Algorithms are a supervised learning technique used to catergories new observations. In other words it is a program which is taught to recognise patterns in a dataset, so that when  given a new, unseen dataset, it will be able to find the same or similar patterns.

There are 3 types of Classification:

1) Binary Classification
2) Multi-class Classification
3) Multi-Label Classification or Decision Trees

<p align="center"><img src="https://cdn.analyticsvidhya.com/wp-content/uploads/2023/05/image-6.png"></p>

### <p align="center">Binary Classification</p>

When a dataset provides a set of distinct features describing each datapoint, the output model delivered will have binary labels representing two classes of data; for example true or false, positive or negative. Examples of Binary Classification algorithms are: Logistic Regression or SVM (Support Vector Machine) algorithms.

### <p align="center">Multi-class Classification</p>

In Multi-class Classification two or more outcome models are provided. The subtypes are one vs all/one vs rest and multi-class classification algorithms. Multi-class does not rely on binary models and classifies the dataset into multiple classes. Multi-class makes the assumption that each sample is assigned to one and only one label. Examples of Multi-class Classification algorithms are: Random forest, Naive Bayes or k-NN (K-Nearest Neighbours) - note that some of these can be used for binary and multi-class classification!

<p align="center"><img src="https://miro.medium.com/v2/resize:fit:688/1*bcLAJfWN2GpVQNTVOCrrvw.png"></p>

### <p align="center">Multi-Label Classification or Decision Trees</p>

Decision Tree is a non-parametric supervised learning algorithm for classification and regression tasks. It uses a heirarchical structure consisting of root nodes, branches, internal ndoes and leaf nodes. The decision tree depicts decisions and their potential outcomes. It alkso incorporates chance into it's output model. The algorithmic model uses conditional control statements to form its decisions. The Decision Tree starts with a root node and ends with the decision being made by the leaf nodes. Some examples of Multi-Label Classification algorithms include: Multi-Label Decision Trees and Multi-Label Gradient Boosting.

Within these classification types there are two types of learners:

#### <div align="center">Lazy Learners</div>

A Lazy Learner first stores the input dataset before waiting for the test dataset to 'arrive' (or be produced). The classification for this learner is carried out using the training dataset's most appropriate data. The Lazy Learner spends less time training to spot patterns and more time on prediciting outcomes. It does have some advantages, it is most appropriate when the dataset is small and more data is required, or when the cost of learning is high. It can however be less accurate than Eager Learner algorithms as it does not have access to all of the training data when it builds its inital models. Lazy Learners delay the learning process until new data is available, this can reduce the amount of data that needs to be processed which can save time and resources. It can also improve accuracy in some regard because the data is more likely to be representative of a real work situation. It can also help to prevent overfitting as the model is only trained on relevant data. 

##### Advantages of Lazy Learners:

- Very useful when not all the data is available.
- Lazy Learning is not prone to suffering from data interference - meaning that collecting data about an operating regime won't affect the modelling performance.
- Lazy Learnings problem solving capabilities increase with every newly presented case.
- Lazy Learners can be simultaneously applied to multiple problems.

##### Disadvantages of Lazy Learners:

- Possibility of high memory requirements (depending on the dataset) as every request for information requires the model to start the identification of a local model from scratch.
- Lazy Learners tend to be slower to evaluate - this could be offset by lowering the size of the training information but could compromise accuracy.
- If the data is 'noisy' then the case-base gets increased (as no abstraction occurs during the training phase).
- Increased computational costs as the processor can only process a limited amoutn of training data points.

Some examples of 'Lazy Learners' are: 

- k-NN (k-Nearest Neighbours)
- Lazy Bayesian Rules
- Case-based Reasoning

#### <div align="center">Eager Learners</div>

Eager Learners consturct a classification model based on the given training data before receiving data for classification. This means the predictive model is built during the training phase - meaning that the learning process is completed before the prediciton phase begins. Eager Learners must be able to commit to a single hypothesis (or problem) that covers the entire dataset. Due to the nature of Eager Learners the training phase can take longer, but the prediciton phase can be shorter. Common uses for Eager Learners are as follows: Image Recognition, Spamdetection and time series forecasting.


##### Advantages of Eager Learners:

- Much faster than Lazy Learner algorithms in the prediciton phase.
- Increased accuracy compared to Lazy Learner algorithms.
- Ideal for real-time or time-sensitive applications where immediate predicitons are needed.

#### Disadvantages of Eager Learners:

- May contain irrelevant attributes in the data; making is 'noisy'.
- Slower in the training phase than Lazy Learner algorithms.
- Requires the entire dataset to present a prediciton model.

Some examples of 'Eager Learners' are:

- Decision Tree
- Naive Bayes
- ANN (Artificial Neural Networks)
- SVM (Support Vector Machines)
- Linear Regression

----

<p align="center"><img src="https://cdn.analyticsvidhya.com/wp-content/uploads/2023/05/image-7.png"></p>

### <div align="center">What is Regression in Machine Learning?</div>


Regression algorithms predicts an outcome using the continuous values from the provided dataset. A supervised learning regression algorithm uses real-world values to predict quantitative data like income, weight, height or probability. Regression is used to map these estimations as the algorithms distinctly label the datasets inputted.

There are three types of Regression: 

1) Linear Regression 
2) Polynomial Regression 
3) Logistic Regression 

### <p align="center">Linear Regression</p>

This is the simplest and most preferable to use of the regression models. Linear regression applies linear equations to the inputted data. Using a straight line, the model uses two quantitative values and plots them against one another in an attempt to find a relationship between the two.

### <p align="center">Polynomial Regression</p>

This is used to find the relationship between two quantitative values which have a non-linear relationship. It is specifically used for curvy trend datasets in fields like social science, biology and economics. These fields use a polynomial function to predict the models accuracy and complexity. In Machine Learning, Polynomial Regression can be used to predict customers lifetime value, stock prices and to track economies for example.

### <p align="center">Logistic Regression</p>

Logistic Regression also known as the Logit Model, charts the probable chance of an event occuring. It uses datasets comprising of independent vairbales and is commonly used in predictive analytics and classification.

-----

In [83]:
# count number of species of flower in the dataset.
iris['Class'].value_counts()

Iris-setosa        50
Iris-versicolor    50
Iris-virginica     50
Name: Class, dtype: int64

In [84]:
# print out values for the indepedent features and store them in variable x.
x=iris.iloc[:,:4]
# print out the dependent features and store them in vairbale y.
y=iris.iloc[:,4]
x

Unnamed: 0,Sepal_Length,Sepal_Width,Petal_Length,Petal_Width
0,5.1,3.5,1.4,0.2
1,4.9,3.0,1.4,0.2
2,4.7,3.2,1.3,0.2
3,4.6,3.1,1.5,0.2
4,5.0,3.6,1.4,0.2
...,...,...,...,...
145,6.7,3.0,5.2,2.3
146,6.3,2.5,5.0,1.9
147,6.5,3.0,5.2,2.0
148,6.2,3.4,5.4,2.3


### <p align="center">Logistic Regression</p>

We start by doing the most basic regression test using a machine learning algorithm: The Logistic Regression algorithm.

In [85]:
# import train_test_split to train the algorithm.
from sklearn.model_selection import train_test_split

# split the dataset into 4 arrays, 2 for training the algorithm (independent and dependent features) and 2 for the test (which the algorithm won't see).
x_train,x_test,y_train,y_test=train_test_split(x,y,random_state=5)

In [86]:
from sklearn.linear_model import LogisticRegression

model=LogisticRegression()

# we now train the model using the .fit() method and the x_train and y_train arguments.
model.fit(x_train,y_train)

# the program stores the prediction for x_test in y_test.
y_pred=model.predict(x_test)

#print out the predictions.
y_pred

array(['Iris-versicolor', 'Iris-virginica', 'Iris-virginica',
       'Iris-setosa', 'Iris-virginica', 'Iris-versicolor', 'Iris-setosa',
       'Iris-virginica', 'Iris-setosa', 'Iris-versicolor',
       'Iris-versicolor', 'Iris-virginica', 'Iris-virginica',
       'Iris-virginica', 'Iris-setosa', 'Iris-setosa', 'Iris-virginica',
       'Iris-virginica', 'Iris-setosa', 'Iris-setosa', 'Iris-versicolor',
       'Iris-virginica', 'Iris-setosa', 'Iris-versicolor',
       'Iris-versicolor', 'Iris-virginica', 'Iris-versicolor',
       'Iris-versicolor', 'Iris-versicolor', 'Iris-virginica',
       'Iris-setosa', 'Iris-versicolor', 'Iris-versicolor', 'Iris-setosa',
       'Iris-versicolor', 'Iris-setosa', 'Iris-setosa', 'Iris-virginica'],
      dtype=object)

In [87]:
# import two fucntions - accuracy score and confusion matrix.
from sklearn.metrics import accuracy_score, confusion_matrix

#calculate the confusion matrix using the model's predictions using two arguments - y_test (actual results) and y_pred (predictions).
confusion_matrix(y_test,y_pred)

array([[12,  0,  0],
       [ 0, 13,  1],
       [ 0,  0, 12]], dtype=int64)

In [88]:
# get accuracy of the predictions by using the accuracy function on the actual results and the predicted results and multiplying by 100 (for a percentage).
accuracy=accuracy_score(y_test,y_pred)*100

# print results to 2 decimal places.
print("Accuracy of the Logistic Regression model is {:.2f}".format(accuracy))

Accuracy of the Logistic Regression model is 97.37


#### <p align="center">Results of Logistic Regression - Explained</p>



### References:

https://en.wikipedia.org/wiki/Iris_flower_data_set

https://en.wikipedia.org/wiki/Gasp%C3%A9_Peninsula

https://www.simplilearn.com/tutorials/machine-learning-tutorial/classification-in-machine-learning#:~:text=Based%20on%20training%20data%2C%20the,into%20various%20classes%20or%20groups. - Date Accessed: 04/11/2023

https://builtin.com/machine-learning/classification-machine-learning - Date Accessed: 04/11/2023 

https://www.autoblocks.ai/glossary/lazy-learning - Date Accessed: 04/11/2023  

https://www.engati.com/glossary/lazy-learning#:~:text=set%20of%20attributes.-,What%20are%20some%20examples%20of%20lazy%20learning%3F,some%20examples%20of%20lazy%20learning - Date Accessed: 04/11/2023  

https://www.analyticsvidhya.com/blog/2023/02/lazy-learning-vs-eager-learning-algorithms-in-machine-learning/ - Date Accessed: 04/11/2023

https://www.datacamp.com/blog/what-is-eager-learning - Date Accessed: 04/11/2023

https://www.ibm.com/topics/supervised-learning - Date Accessed: 04/11/2023

https://www.analyticsvidhya.com/blog/2023/05/regression-vs-classification/ - Date Accessed: 04/11/2023

https://www.analyticsvidhya.com/blog/2021/08/decision-tree-algorithm/ - Date Accessed: 04/11/2023

https://www.datacamp.com/blog/classification-machine-learning - Date Accessed: 04/11/2023