## Case Study with the Iris dataset

![Iris Flowers](../../assets/images/iris.png)

### Iris Flower:
Iris is the family in the flower which contains the several species such as the iris.setosa, iris.versicolor, iris.virginica, etc.

Things covered in this notebook:
- Create the dataset
- Build the model
- Train the model
- Make predictions

### Imports

In [16]:
from sklearn import datasets
from sklearn import metrics
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression

import matplotlib.pyplot as plt

### Load Data

In [17]:
iris = datasets.load_iris()

### Data Exploration

In [3]:
print("Keys:", iris.keys())
print()
print("Our model will classify every data point to one of these classes: ")
print("Target Names:", iris.target_names)
print()
print("These are the features that our model will use to make those predictions: ")
print("Feature Names:", iris.feature_names)
print()
print("Descirbes what physical property, each of our features, is describing:") 
print(iris.DESCR)
print()
print("First 3 rows of data:")
print(iris.data[:3])
print()
print("First 3 rows of target:")
print(iris.target[:3])
print()
print("Data shape:", iris.data.shape)
print()
print("Target shape:", iris.target.shape)

Keys: dict_keys(['data', 'target', 'target_names', 'DESCR', 'feature_names', 'filename'])

Our model will classify every data point to one of these classes: 
Target Names: ['setosa' 'versicolor' 'virginica']

These are the features that our model will use to make those predictions: 
Feature Names: ['sepal length (cm)', 'sepal width (cm)', 'petal length (cm)', 'petal width (cm)']

Descirbes what physical property, each of our features, is describing:
.. _iris_dataset:

Iris plants dataset
--------------------

**Data Set Characteristics:**

    :Number of Instances: 150 (50 in each of three classes)
    :Number of Attributes: 4 numeric, predictive attributes and the class
    :Attribute Information:
        - sepal length in cm
        - sepal width in cm
        - petal length in cm
        - petal width in cm
        - class:
                - Iris-Setosa
                - Iris-Versicolour
                - Iris-Virginica
                
    :Summary Statistics:

                    

# Build Model

### Assign X and y to hold data and target

In [4]:
X = iris.data
y = iris.target

### Split X and y into training and test sets

In [5]:
X_train, X_test, y_train, y_test = train_test_split(X,y,test_size = 0.15, random_state= 42)

### Create Model

In [6]:
model = LogisticRegression()

### Fit Model to Training Data

In [7]:
model.fit(X_train, y_train)



LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,
                   intercept_scaling=1, l1_ratio=None, max_iter=100,
                   multi_class='warn', n_jobs=None, penalty='l2',
                   random_state=None, solver='warn', tol=0.0001, verbose=0,
                   warm_start=False)

### Make Predictions Against Training Set 

In [8]:
predictions = model.predict(X_test)

### Evaluate Model

#### model.score(X_test, y_test) = metrics.accuracy_score(y_test, predictions)

In [9]:
score = model.score(X_test, y_test)
print(score)

1.0


In [10]:
accuracy_score = metrics.accuracy_score(y_test, predictions)
print(accuracy_score)

1.0


### Classification Report

In [11]:
classification_report = metrics.classification_report(y_test, predictions)
print(classification_report)

              precision    recall  f1-score   support

           0       1.00      1.00      1.00         8
           1       1.00      1.00      1.00         9
           2       1.00      1.00      1.00         6

    accuracy                           1.00        23
   macro avg       1.00      1.00      1.00        23
weighted avg       1.00      1.00      1.00        23



#### precision = # true positives / (# true positives + # false positives) 
#### recall = # true positives / (# true positives + false negtives)
#### F1 score = combination of precision ad recall
#### support = # of samples in each class

### Confusion Matrix

In [12]:
confusion_matrix = metrics.confusion_matrix(y_test, predictions)
print(confusion_matrix)

# Each column representsthe different classes, and each row represents the classes that these could have been predicted to be.

[[8 0 0]
 [0 9 0]
 [0 0 6]]


#### All 3 classes were predicted correctly! class 1 had all 8 data points prediced as class 1, etc.