# Advanced Certification in AIML
## A Program by IIIT-H and TalentSprint
## Not for grading

## Learning Objective

The objective of this experiment is to understand Decision Tree classifier and visualize it

## Dataset

The dataset chosen for this  experiment is a handmade fruits dataset. The dataset contains 69 records. Each record represents the following details of fruits : 

*  Weight -   It is the mass of an object. With respect to this dataset, we have calculated the weights in grams 

* Sphericity -   is a measure of how closely the shape of an object approaches that of a mathematically perfect sphere.

* Color -  Every fruit has a different color at different stages.

*  Label -   We have considered two fruits for simplicity. They are Apple and Orange.




In [None]:
!wget https://cdn.talentsprint.com/aiml/Experiment_related_data/fruits_weight_sphercity.csv

### Importing Required Packages

In [None]:
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier, export_graphviz # export_graphviz function creates a GraphViz representation of the decision tree
from sklearn.metrics import accuracy_score
import matplotlib.pyplot as plt
from mlxtend.plotting import plot_decision_regions
import graphviz    # graphviz is a graphical vizualisation of graph objects
import os
import warnings
warnings.filterwarnings("ignore")

### Loading the data

In [None]:
fruits_data = pd.read_csv("fruits_weight_sphercity.csv")
fruits_data.head()

In [None]:
# Replace labels with integer values
fruits_data['labels']= fruits_data['labels'].replace(['apple','orange'],[1, 0])

In [None]:
# Get Features and Labels from the dataset
data = fruits_data[["Sphericity", "Weight"]].values
labels = fruits_data["labels"].values

In [None]:
# Verify the shape of features and labels
data.shape, labels.shape

### Splitting the data into train and test sets 

In [None]:
# Training and testing set ratio is 67 : 33
# random_state generates the same sequence of random numbers everytime of the code execution
X_train, X_test, y_train, y_test = train_test_split(data, labels, test_size = 0.33, random_state = 42) 

In [None]:
# Verify the shape of train and test sets
X_train.shape, X_test.shape, y_train.shape, y_test.shape

### Training a Decision Tree Classifier 

In [None]:
# Creating an object of DecisionTreeClassifier.
decision_tree = DecisionTreeClassifier()

# Training or fitting the model with the train data
clf = decision_tree.fit(X_train, y_train)

# Predicting the test data using trained model
pred = clf.predict(X_test)

# Calculating the score
accuracy_score(y_test, pred) 

### Visualize the data to get more sense of data


Matplotlib has a number of built-in colormaps using cmap, For more details refer to the following [link](https://matplotlib.org/tutorials/colors/colormaps.html)

In [None]:
plt.scatter(X_train[:,0], X_train[:,1], c=y_train, cmap='RdYlBu', s=70, label="Train") # S is marker size
plt.scatter(X_test[:,0], X_test[:,1], c=y_test, cmap='RdYlBu', marker="*", s=150, label="Test")
plt.xlabel("Sphericity")
plt.ylabel("Weight")
plt.legend(loc='upper left')
plt.show()

### Visualize the Decision Tree boundaries at each depth

* graphviz is a graphical vizualisation of graph objects.
* export_graphviz function creates a GraphViz representation of the decision tree, which can be exported as a DOT file.
* The OS module has functions for dealing with the operating system. Python's standard utility modules include OS. This module allows you to use operating system-dependent functions.


In [None]:
for depth in range(1, 4):

    # max_depth parameter regularize the tree, or limit the way it grows to prevent over-fitting
    # random_state generates the same sequence of random numbers everytime of the code execution
    dt = DecisionTreeClassifier(criterion = 'gini', max_depth = depth) 

    # Fitting the data
    dt = dt.fit(X_train, y_train) 
    training_acc = accuracy_score(dt.predict(X_train),y_train)
    testing_acc = accuracy_score(dt.predict(X_test),y_test)
    print("Training Accuracy:", round(training_acc,2)," ","Testing Accuracy:", round(testing_acc,2), " ", "at Depth:",depth)


    # plt.subplots() function returns a tuple containing a figure and axes objects
    fig,ax =  plt.subplots(1,2,figsize=(14,4))

    # Exporting the model into a dot file
    saveAs = "Fruits_" + str(depth)
    save_dot = saveAs + ".dot"
    save_png = saveAs + ".png"

    # Plot the GraphViz representation of the decision tree
    graph = graphviz.Source(export_graphviz(dt, out_file=save_dot,feature_names= ["Sphericity","Weight"], class_names=['0', '1'],filled = True))
    os.system("dot -T png -o " + save_png + " " + save_dot)   
    ax[0].imshow(plt.imread(save_png))

    # Plot the decision boundary 
    plot_decision_regions(X_train, y_train, clf=dt, legend=2)
    plt.show()