<a href="https://colab.research.google.com/github/data-enthusiast-shubhs/Sparks_foundation/blob/main/Sparks_Iris.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **The Sparks Foundation Graduate Rotational Internship Program (GRIP)**

## **DATA SCIENCE**

## **NAME - SHUBHAM OLI**

## **TASK - 2**

## **Prediction using Decision Tree Algorithm**

## **TOPIC - IRIS FLOWER CLASSIFICATION**

In [None]:
from IPython.display import Image
Image(url='https://editor.analyticsvidhya.com/uploads/51518iris%20img1.png', width=1050)

### **PROJECT SUMMARY**-
The project focuses on the classification of iris flowers using the decision tree algorithm. To accomplish this task, several Python libraries such as NumPy, Pandas, Matplotlib, and Seaborn were utilized. These libraries provide essential functionalities for data manipulation, visualization, and analysis.

The dataset used for this project contains various attributes of iris flowers, such as sepal length, sepal width, petal length, and petal width. The goal is to train a decision tree model to predict the correct species of the iris flower based on these attributes.

Data preprocessing techniques were employed to handle missing values, normalize the data, and split it into training and testing sets. Exploratory data analysis was conducted using scatter plots, heatmaps, and pair plots to gain insights into the relationships between different attributes and the target variable.

The decision tree algorithm was then implemented, and hyperparameter tuning was performed to optimize the model's performance. The decision tree graph was generated to visualize the decision-making process of the trained model.

Overall, the project successfully developed a classification model using the decision tree algorithm to predict the species of iris flowers. The visualizations and exploratory analysis provided valuable insights into the dataset, enhancing the understanding of the relationships between the attributes.

## **GITHUB LINK**-
**https://github.com/data-enthusiast-shubhs/Sparks_foundation.git**

### **PROBLEM STATEMENT**-
Create a Decision Tree Classifier and visualize it graphically.\
The purpose is if we feed any new data to this classifier, it would be able to predict the right class accordingly.

## ***Let's Begin -:***

## **Import the Libraries**

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn import datasets

from sklearn import metrics
%pylab inline 
%matplotlib inline


In [None]:
import warnings 
warnings.filterwarnings('ignore')

## **Loading the Dataset**

In [None]:
iris = pd.read_csv('/content/Iris.csv')
iris

In [None]:
# Dataset First Look From Top Five Rows & Columns
iris.head()

In [None]:
# Dataset First Look From Bottom Five Rows & Columns
iris.tail()

## **Dataset Rows & Columns count**

In [None]:
# Dataset Rows & Columns count
iris.shape

## **EXPLORATORY DATA ANALYSIS**

## **Dataset Information**

In [None]:
# Dataset Info
iris.info()

## **Duplicate Values**

In [None]:
# Dataset Duplicate Value Count
duplicated_values_count = len(iris[iris.duplicated()])

print("Number of duplicated values:", duplicated_values_count)

## **Missing Values/Null Values**

In [None]:
# Missing Values/Null Values Count
iris.isnull().sum()

## **Renaming complex names**

In [None]:
# Rename the complex columns name
iris= iris.rename(columns={'SepalLengthCm':'Sepal_Length',
                           'SepalWidthCm':'Sepal_Width',
                           'PetalLengthCm':'Petal_Length',
                           'PetalWidthCm':'Petal_Width'})

In [None]:
iris.head()

In [None]:
# Visualizing the missing values
sns.heatmap(iris.isnull(), cmap='viridis', cbar=True)

**We can see that in above heatmap, there is no  yellow line. Its fully clean, which means that there is no  null value.**

### **What did you know about your dataset?**

### **Observations About Dataset**

1.**The dataset contains 150 rows and 6 columns**

2.**No duplicate values present in the dataset**.

3.**NO missing values present in the dataset**. 

## **Visualization**

### **Scatterplot**

In [None]:
# classifying the dataset through visualization
sns.FacetGrid(iris, hue="Species",height=6).map(plt.scatter,"Petal_Length","Sepal_Width").add_legend()

### **Pairplot**

In [None]:
# visualize the whole dataset
sns.pairplot(iris[['Sepal_Length','Sepal_Width','Petal_Length','Petal_Width','Species']], hue="Species",diag_kind='kde')     

## **Dataset Description**

In [None]:
# checking statistical features
iris.describe().T

In [None]:
# converting categorical variables into numbers
flower_mapping = {'Iris-setosa':0,'Iris-versicolor':1,'Iris-virginica':2}
iris['Species']=iris['Species'].map(flower_mapping)

In [None]:
iris.head()

## **SEPARATING INPUT COLUMNS AND THE OUTPUT COLUMNS**

In [None]:
# Separate features and target
data=iris.values

# slicing the matrices
X=data[:,0:4]
Y=data[:,5]


In [None]:
print(X.shape)

In [None]:
print(Y.shape)

## **SPLITTING DATA INTO TRAINING AND TESTING**

In [None]:
# split the data to train and test dataset
     
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test= train_test_split(X,Y, test_size=0.2)

In [None]:
print(X_train.shape)

In [None]:
print(X_test.shape)

In [None]:
print(y_train.shape)

In [None]:
print(y_test.shape)

# **APPLYING ALGORITHM**

In [None]:
from sklearn.tree import DecisionTreeClassifier
from sklearn import metrics
from sklearn.metrics import confusion_matrix

In [None]:
model=DecisionTreeClassifier(criterion='entropy',max_depth=3)

In [None]:
model.fit(X_train,y_train)

In [None]:
model.score(X_train,y_train)

In [None]:
model.score(X_test,y_test)

## **TUNING PARAMETERS**

In [None]:
# load the iris datasets
iris = datasets.load_iris()
#df = pd.DataFrame(data.data, columns=data.feature_names)
#df.head()
iris = pd.DataFrame(data= np.c_[iris['data'], iris['target']],
                     columns= iris['feature_names'] + ['target'])


In [None]:
from sklearn import datasets
from sklearn.tree import DecisionTreeClassifier
from sklearn import metrics
import numpy as np
import matplotlib.pyplot as plt
from ipywidgets import interact
from IPython.display import display

def Decision_Tree(Type, Depth):
    # import some data to play with
    iris = datasets.load_iris()
    X = iris.data[:, :2]  # we only take the first two features.
    Y = iris.target
    h = .02  # step size in the mesh

    # we create an instance of DecisionTreeClassifier and fit the data.
    model = DecisionTreeClassifier(criterion=Type, max_depth=Depth)  
    model.fit(X, Y)

    # Plot the decision boundary. For that, we will assign a color to each
    # point in the mesh [x_min, m_max]x[y_min, y_max].
    x_min, x_max = X[:, 0].min() - .5, X[:, 0].max() + .5
    y_min, y_max = X[:, 1].min() - .5, X[:, 1].max() + .5
    xx, yy = np.meshgrid(np.arange(x_min, x_max, h), np.arange(y_min, y_max, h))
    Z = model.predict(np.c_[xx.ravel(), yy.ravel()])

    # Put the result into a color plot
    Z = Z.reshape(xx.shape)
    plt.figure(1, figsize=(4, 3))
    plt.pcolormesh(xx, yy, Z, cmap=plt.cm.Paired)

    # Plot also the training points
    plt.scatter(X[:, 0], X[:, 1], c=Y, edgecolors='k', cmap=plt.cm.Paired)
    plt.xlabel('Sepal length')
    plt.ylabel('Sepal width')
    plt.xlim(xx.min(), xx.max())
    plt.ylim(yy.min(), yy.max())
    plt.xticks(())
    plt.yticks(())
    plt.show()

    model.fit(iris.data, iris.target)
    expected = iris.target
    predicted = model.predict(iris.data)
    print(metrics.classification_report(expected, predicted))
    print(metrics.confusion_matrix(expected, predicted))


i = interact(Decision_Tree, Type=['gini', 'entropy'], Depth=(1, 10))


0 -> **SETOSA** \
1 -> **VERSICOLOR** \
2 -> **VIRGINICA**


## **DECISION TREE GRAPH**

In [None]:
from sklearn.tree import export_graphviz
import pydotplus
from IPython.display import Image
import io
from sklearn.datasets import load_iris
from sklearn.tree import DecisionTreeClassifier
import six

# Load the Iris dataset
iris = load_iris()

# Train a decision tree classifier
clf = DecisionTreeClassifier()
clf.fit(iris.data, iris.target)

# Visualize the graph
dot_data = io.StringIO()
export_graphviz(clf, out_file=dot_data,  
                feature_names=iris.feature_names,  
                filled=True, rounded=True,
                special_characters=True,
                class_names=iris.target_names)
graph = pydotplus.graph_from_dot_data(dot_data.getvalue())  
Image(graph.create_png())


0 -> **SETOSA** \
1 -> **VERSICOLOR** \
2 -> **VIRGINICA**
