# Advanced Certification in AIML
## A Program by IIIT-H and TalentSprint
## Not for grading

## Learning Objective

The objective of this experiment is to understand and visualize the Decision tree classifier.

## Dataset

#### History

This is a multivariate dataset introduced by R.A.Fisher (Father of Modern Statistics) for showcasing linear discriminant analysis. This is arguably the best known dataset in Feature Selection literature.


The data set contains 3 classes of 50 instances each, where each class refers to a type of iris plant. One class is linearly separable from the other 2; the latter are NOT linearly separable from each other. 

#### Description
The Iris dataset consists of 150 data instances. There are 3 classes (Iris Versicolor, Iris Setosa and Iris Virginica) each have 50 instances. 


For each flower we have the below data attributes 

- sepal length in cm
- sepal width in cm
- petal length in cm
- petal width in cm

To make our experiment easy we rename the classes  with numbers : 

    "0": setosa
    "1": versicolor
    "2": virginica

### Challenges

When we use the data with large number of features or dimensionality, models usually choke because

    1. Training time increases exponentially with number of features.
    2. Models have increasing risk of overfitting with increasing number of features.
    
To avoid the above mentioned problems while learning about data analysis, we use simple, well behaved, data that reduces the cognitive load, and makes it easier to debug as we are able to better comprehend the data we are working with.  

Hence, this is a good dataset to work on.

## Domain Information



Iris Plants are flowering plants with showy flowers. They are very popular among movie directors as it gives excellent background. 

They are predominantly found in dry, semi-desert, or colder rocky mountainous areas in Europe and Asia. They have long, erect flowering stems and can produce white, yellow, orange, pink, purple, lavender, blue or brown colored flowers. There are 260 to 300 types of iris.

![alt text](https://cdn-images-1.medium.com/max/1275/1*7bnLKsChXq94QjtAiRn40w.png)

As you could see, flowers have 3 sepals and 3 petals.  The sepals are usually spreading or drop downwards and the petals stand upright, partly behind the sepal bases. However, the length and width of the sepals and petals vary for each type.


In [None]:
! apt-get install graphviz
! pip install graphviz
! wget https://cdn.iiith.talentsprint.com/aiml/Experiment_related_data/Iris.csv

### Importing Required Packages

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.tree import DecisionTreeClassifier, export_graphviz # export_graphviz function creates a GraphViz representation of the decision tree
import graphviz  # graphviz is a graphical vizualisation of graph objects
import os

### Loading the Iris data 

In [None]:
iris = pd.read_csv("Iris.csv")

In [None]:
print(iris.head())

### Get the features and labels of the dataset

In [None]:
labels = iris.Species
features = iris.drop(['Species', 'Id'], axis=1)

In [None]:
print(features)

### Training a  Classifier

* Iterating the depth ranging from 2 to 6
* Performing decision tree classifier with each depth and saving the decision graph

In [None]:
for depth in range(2, 7):
    # max_depth parameter regularize the tree, or limit the way it grows to prevent over-fitting
    # random_state generates the same sequence of random numbers everytime of the code execution
    clf = DecisionTreeClassifier(max_depth = depth, random_state=42)
    # Fitting the data
    clf.fit(features, labels)
    saveAs = "iris_dt_" + str(depth)
    save_dot = saveAs + ".dot"
    save_png = saveAs + ".png"
    # Exporting the model into a dot file
    graph = graphviz.Source(export_graphviz(clf, out_file=save_dot
       , feature_names=features.columns, class_names=['0', '1', '2'] 
       , filled = True))
    os.system("dot -T png -o " + save_png + " " + save_dot)

In [None]:
%ls *dot

In [None]:
%ls *png

### Now let us see how the tree looks at each depth

In [None]:
plt.figure(figsize=(10,10))
plt.imshow(plt.imread("iris_dt_2.png"))
plt.grid(False)

In [None]:
plt.figure(figsize=(15,15))
plt.imshow(plt.imread("iris_dt_3.png"))
plt.grid(False)

In [None]:
plt.figure(figsize=(20,20))
plt.imshow(plt.imread("iris_dt_4.png"))
plt.grid(False)

In [None]:
plt.figure(figsize=(20,20))
plt.imshow(plt.imread("iris_dt_5.png"))
plt.grid(False)

In [None]:
plt.figure(figsize=(20,20))
plt.imshow(plt.imread("iris_dt_6.png"))
plt.grid(False)