# Iris Flowers Exercise Notebook

> This notebook contains the the solution for the Iris Flowers Exercise.

In this notebook, we are trying to make a model that classifiy between 3 different types of irises’ (Setosa, Versicolour, and Virginica) based on the petal and sepal length. 

The dataset that we are using in this notebook is the iris flowers dataset taken from scikit-learn datasets <b><a href="https://scikit-learn.org/stable/auto_examples/datasets/plot_iris_dataset.html">found here.</a></b>

![image.png](https://miro.medium.com/max/720/1*YYiQed4kj_EZ2qfg_imDWA.png)

## Import the libraries
Importing the Data Science and Machine Learning libraries

* Pandas as pd
* Numpy as np
* Matplotlib as plt
* Scikit-learn

In [1]:
# Import libraries
import pandas as pd 
import numpy as np
import matplotlib.pyplot as plt

# We will leave the scikit library as we will only import functions of the library when we need to

## Importing the data

Since the dataset is part of the scikit-learn datasets, we can access it directly from scikit-learn.datasets

In [2]:
# Import the dataset and assign it to the variable df
from sklearn.datasets import load_iris
df = load_iris()

> **Note:** According to the <a href='https://scikit-learn.org/stable/modules/generated/sklearn.datasets.load_iris.html'>dataset page</a>, the loaded dataset includes of:

* feature_names (that contains the names of the dataset columns)
* data (which has the petal and sepal length and width data)
* targets (which has the target/type of flower assosiated with the data)
* target_names (The names of target classes)

In [4]:
# Check for the length of the dataset
len(df.target)

150

## Let's have a look at the attributes we have

In [8]:
# Have a look at the feature_names attribute
df.feature_names

['sepal length (cm)',
 'sepal width (cm)',
 'petal length (cm)',
 'petal width (cm)']

In [6]:
# Have a look at the first 10 items of the data attribute
df.data[:10]

array([[5.1, 3.5, 1.4, 0.2],
       [4.9, 3. , 1.4, 0.2],
       [4.7, 3.2, 1.3, 0.2],
       [4.6, 3.1, 1.5, 0.2],
       [5. , 3.6, 1.4, 0.2],
       [5.4, 3.9, 1.7, 0.4],
       [4.6, 3.4, 1.4, 0.3],
       [5. , 3.4, 1.5, 0.2],
       [4.4, 2.9, 1.4, 0.2],
       [4.9, 3.1, 1.5, 0.1]])

In [7]:
# Have a look at the first 10 items of the target attribute
df.target[:10]

array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0])

In [9]:
# Have a look at the target_names attribute
df.target_names

array(['setosa', 'versicolor', 'virginica'], dtype='<U10')

## Create the X and y variables

In [10]:
# Make the X and y variables
X = df.data
y = df.target

## Create the training and testing datasets

In [12]:
# Import the train_test_split() function from sckit-learn
from sklearn.model_selection import train_test_split

# Use the train_test_function() function to split the dataset
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)

In [17]:
# Check the length of the training and testing sets
print(f"The length of the training dataset = {len(X_train)}")
print(f"The length of the test dataset = {len(X_test)}")

The length of the training dataset = 120
The length of the test dataset = 30


## Make the model

As we have the training and testing datasets ready, it is time to make the model.

As this is a classification problem, we will use the `RandomForsetClassifier()` model

In [18]:
# Import the model from scikit-learn library 
from sklearn.ensemble import RandomForestClassifier

# Make the model
model = RandomForestClassifier()

# Train the model
model.fit(X_train, y_train)

# Evaluate the model
model.score(X_test, y_test)

1.0

At this stage, we have created a model that can predict target values with unseen data

## Predict values of unseen data

0 = setosa
<br>
1 = versicolor
<br>
2 = virginica

In [19]:
model.predict(X_test)

array([2, 2, 0, 2, 0, 2, 1, 2, 2, 0, 1, 1, 1, 1, 2, 0, 2, 2, 2, 2, 0, 1,
       2, 0, 2, 1, 0, 0, 0, 1])

In [20]:
df.target_names

array(['setosa', 'versicolor', 'virginica'], dtype='<U10')