<a href="https://colab.research.google.com/github/Sairaj-97/Machine-Learning/blob/main/DecisionTree.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

Sure! Here’s a clean and concise version you can directly copy into your notes. 📚

---

## ✅ Decision Tree Splitting Criteria – Definitions & Formulas

---

### 🔹 **Entropy**

* **Definition:**
  Entropy measures the **impurity** or **uncertainty** in a dataset.
  A node is **pure** if all elements belong to the same class (Entropy = 0).

* **Formula:**

$$
\text{Entropy}(S) = -\sum_{i=1}^{C} p_i \cdot \log_2(p_i)
$$

Where:

* $C$ = number of classes
* $p_i$ = proportion of samples in class $i$

---

### 🔹 **Information Gain (IG)**

* **Definition:**
  Information Gain measures the **reduction in entropy** after a dataset is split on a feature.
  Higher IG means a **better split**.

* **Formula:**

$$
\text{Information Gain} = \text{Entropy(Parent)} - \sum_{j=1}^{k} \frac{n_j}{n} \cdot \text{Entropy(Child}_j)
$$

Where:

* $n$ = total samples in parent
* $n_j$ = samples in child $j$
* $k$ = number of child nodes

---

### 🔹 **Gini Impurity**

* **Definition:**
  Gini Impurity measures the **probability of misclassifying** a sample if randomly labeled according to the distribution in the node.
  Gini = 0 means all samples belong to one class (pure).

* **Formula:**

$$
\text{Gini}(S) = 1 - \sum_{i=1}^{C} p_i^2
$$

Where:

* $C$ = number of classes
* $p_i$ = proportion of samples in class $i$

---



In [4]:
#Import necessary libraries
from sklearn.datasets import load_iris
from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, classification_report
from sklearn import tree
import pandas as pd
import matplotlib.pyplot as plt


In [12]:
iris=load_iris()
df=pd.DataFrame(iris.data,columns=iris.feature_names)
df['target']=iris.target
df.head()
X=df.drop('target',axis='columns')
Y=df['target']
X_test,X_train,Y_test,Y_train=train_test_split(X,Y,test_size=0.2,random_state=42)
model=DecisionTreeClassifier()
model.fit(X_train,Y_train)
predictions=model.predict(X_train)
print("accuuracy of train data:",accuracy_score(predictions,Y_train))

accuuracy of train data: 1.0


In [13]:
#test data
test_predictions=model.predict(X_test)
print("accuracy of test data:",accuracy_score(test_predictions,Y_test))

accuracy of test data: 0.9416666666666667
