# Advanced Certification in AIML
## A Program by IIIT-H and TalentSprint
## Not for grading

## Learning Objective

The objective of this experiment is to: 

* understand Decision tree classifier 
* visualize the decision boundaries
* understand overfitting

### Dataset

Two handmade datasets (A,B) are chosen for this experiment

In [None]:
! wget https://cdn.iiith.talentsprint.com/aiml/Experiment_related_data/A.csv
! wget https://cdn.talentsprint.com/aiml/Experiment_related_data/week1/B.csv.zip
! unzip B.csv.zip

### Importing the required packages

In [None]:
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import accuracy_score
from mlxtend.plotting import plot_decision_regions

###  Considering A as train data and B as test data

In [None]:
A_train = pd.read_csv("A.csv")
A_train.head()

In [None]:
B_test = pd.read_csv("B.csv", names=['X','Y','Label'])
B_test.head()

#### Visualize the data to get more sense of data


Matplotlib has a number of built-in colormaps using cmap, For more details refer to the following [link](https://matplotlib.org/tutorials/colors/colormaps.html)

In [None]:
plt.scatter(A_train.X, A_train.Y, c= A_train.Label, cmap='RdYlBu', s=50, label="Train") # s is marker size
plt.scatter(B_test.X, B_test.Y, c=B_test.Label, cmap='RdYlBu', marker="*", s=150, label="Test")
plt.xlabel("X")
plt.ylabel("Y")
plt.legend()
plt.show()

In [None]:
# Get the train and test sets from the data 
X_train, y_train = A_train[['X','Y']].values, A_train['Label'].values
X_test, y_test = B_test[['X','Y']].values, B_test['Label'].values 

#### Train and Visualize a Decision Tree Classifier at each depth

In [None]:
# Training a decision tree classifier at each depth

for depth in range(1,5):
        
        # max_depth parameter regularize the tree, or limit the way it grows to prevent over-fitting
        clf = DecisionTreeClassifier(criterion='gini', max_depth = depth)
        clf = clf.fit(X_train, y_train)
        training_acc = accuracy_score(clf.predict(X_train),y_train)
        testing_acc = accuracy_score(clf.predict(X_test),y_test)
        print("Training Accuracy: "+str(round(training_acc,2)) + " " +  "Testing Accuracy: " + str(round(testing_acc,2)) + " " + 
              "At Depth: " + str(depth)) 
        plot_decision_regions(X_train, y_train, clf=clf, legend=2) 
        plt.show()