<h1 align=middle style="line-height:200%;font-family:vazir;color:#0099cc">
<font face="vazir" color="#0099cc">
Decision Tree
</font>
</h1>

<h1 align=left style="line-height:200%;font-family:vazir;color:#0099cc">
<font face="vazir" color="#0099cc">
First, Classification
</font>
</h1>

In [25]:
from sklearn.datasets import load_iris
from sklearn.tree import DecisionTreeClassifier



iris = load_iris()
X = iris.data[:, 2:]
y = iris.target

In [26]:
tree_clf = DecisionTreeClassifier(max_depth=2)
tree_clf.fit(X, y)

In [27]:
from sklearn.datasets import load_iris
from sklearn.tree import DecisionTreeClassifier, export_graphviz
import os


dot_file = "D:\Sharif University of Tech\Data\Hands-on ML\Season 6\Data\iris_tree_clf.dot"
export_graphviz(
    tree_clf,
    out_file=dot_file,
    feature_names=iris.feature_names[2:],
    class_names=iris.target_names,
    rounded=True,
    filled=True
)

![image.png](attachment:image.png)

The Gini parameter in a decision tree, specifically in the context of classification tasks, refers to the Gini impurity. 

Gini impurity is a measure of how often a randomly chosen element from the set would be incorrectly labeled if it was randomly labeled according to the distribution of labels in the subset.

- The Gini impurity ranges from 0 to 1, where 0 indicates that all elements in the subset belong to the same class (perfect purity), and 1 indicates a random distribution of classes (maximum impurity).
- In the context of a decision tree, each node's goal is to split the data in a way that decreases the Gini impurity of the subsets formed by the split.
- A Gini impurity of 0 means the node is perfectly classifying the elements it's responsible for, while a higher Gini value indicates a mixture of different classes.

In our decision tree visualization, each node shows the Gini impurity of the subset of data at that node. It helps you understand how well the node is splitting the data: lower values are better, indicating that the node is doing a good job of classifying the elements.


These model's use CART as their basic algorithm and so they are only capable of creating binary trees

Let's see this in action :

In [28]:
tree_clf.predict_proba([[5, 1.5]])

array([[0.        , 0.90740741, 0.09259259]])

In [29]:
tree_clf.predict([[5, 1.5]])

array([1])

Well Enough

<h1 align=left style="line-height:200%;font-family:vazir;color:#0099cc">
<font face="vazir" color="#0099cc">
CART Algorithm
</font>
</h1>

The cost function \( J(k, t_k) \) for a decision tree at node \( k \) with threshold \( t_k \) is given by:

$$
J(k, t_k) = \frac{m_{\text{left}}}{m} G_{\text{left}} + \frac{m_{\text{right}}}{m} G_{\text{right}}
$$

where:
- \( m \) is the total number of samples at the current node,
- \( m_{\text{left}} \) and \( m_{\text{right}} \) are the number of samples in the left and right subsets created by the split,
- \( G_{\text{left}} \) and \( G_{\text{right}} \) are the Gini impurities of the left and right subsets, respectively.


# impurities


Gini impurity and entropy are measures used to quantify the impurity or purity of a node in a decision tree.

### Gini Impurity
The Gini impurity is calculated as:

$$
Gini = 1 - \sum_{i=1}^{n} p_i^2
$$

where \( p_i \) is the proportion of samples that belong to class \( i \) for a particular node. Gini impurity considers a binary split for each feature and is computationally faster as it doesn't require logarithmic calculations. A Gini impurity of 0 represents a perfectly pure node, where all instances belong to a single class.

### Entropy
Entropy, on the other hand, is a measure from information theory and is given by the formula:

$$
Entropy = -\sum_{i=1}^{n} p_i \log_2(p_i)
$$

where \( p_i \) again represents the proportion of samples of class \( i \). Entropy is a measure of disorder or unpredictability. A node with entropy of 0 is perfectly pure.

### Comparison
Both Gini impurity and entropy provide similar results when it comes to the quality of the splits. However, they have some differences:
- Entropy potentially leads to more balanced trees because it tends to favor splits that result in equal proportioned classes.
- Gini impurity is less computationally intensive because it doesn't use logarithms.

In practice, the choice between using Gini impurity or entropy often does not have a significant effect on the performance of the decision tree. The choice may be based on computational efficiency (favoring Gini) or a slight preference for balanced trees (favoring entropy).


<h1 align=left style="line-height:200%;font-family:vazir;color:#0099cc">
<font face="vazir" color="#0099cc">
Regularization Hyperparameters
</font>
</h1>

### Non Parametric Model's :

These models don't have any parameters predefined, and make everything about the model, causing overfitting sometimes.

### Parametric Model's :

As you can guess from the name, these models are the opposite of the non-parametric model's having some fixed parameters before the model is introduced.

So we choose to restraint the model AKA **Regularization**

<h1 align=left style="line-height:200%;font-family:vazir;color:#0099cc">
<font face="vazir" color="#0099cc">
Second, Regression
</font>
</h1>

In [30]:
from sklearn.tree import DecisionTreeRegressor


tree_reg = DecisionTreeRegressor(max_depth=2)
tree_reg.fit(X, y)

In [31]:
from sklearn.datasets import load_iris
from sklearn.tree import DecisionTreeClassifier, export_graphviz
import os


dot_file = "D:\Sharif University of Tech\Data\Hands-on ML\Season 6\Data\iris_tree_reg.dot"
export_graphviz(
    tree_reg,
    out_file=dot_file,
    feature_names=iris.feature_names[2:],
    class_names=iris.target_names,
    rounded=True,
    filled=True
)

![iris_tree_visualization_reg.png.png](attachment:iris_tree_visualization_reg.png.png)

The cost function \( J(k, t_k) \) for a decision tree regression is given by:

$$
J(k, t_k) = \frac{m_{\text{left}}}{m} MSE_{\text{left}} + \frac{m_{\text{right}}}{m} MSE_{\text{right}}
$$

where:
- \( m \) is the total number of samples at the current node,
- \( m_{\text{left}} \) and \( m_{\text{right}} \) are the number of samples in the left and right subsets created by the split,
- \( MSE_{\text{left}} \) and \( MSE_{\text{right}} \) are the MSE of the left and right subsets, respectively.
