### Chapter 6: Decision Tree 

In [None]:

from sklearn.datasets import load_iris
from sklearn.tree import DecisionTreeClassifier

In [10]:
iris = load_iris()
X= iris.data[:,2:]# petal length and width
y= iris.target

In [11]:
tree_clf = DecisionTreeClassifier(max_depth=2)
tree_clf.fit(X,y)

0,1,2
,criterion,'gini'
,splitter,'best'
,max_depth,2
,min_samples_split,2
,min_samples_leaf,1
,min_weight_fraction_leaf,0.0
,max_features,
,random_state,
,max_leaf_nodes,
,min_impurity_decrease,0.0


In [12]:
 ### Visualization of the trained Decision Tree by First using the export_graphviz()
 # method to output a graph definition filed called iris_tree.dot:

from sklearn.tree import export_graphviz

export_graphviz(
    tree_clf,
    out_file="iris_tree.dot",
    feature_names= iris.feature_names[2:],
    class_names= iris.target_names,
    rounded=True,
    filled= True
)

## Decision Tree Visualization
Here's what the trained tree looks like:

![Decision Tree Image](iris_tree.png)


### Estimating class Probabilities

A decision Tree can also estimate the probability that an instance belongs to particular class. 

### üå≥ Decision Tree Prediction Made Simple

When you give a Decision Tree a flower‚Äôs petal length and width, it follows a path and lands on a final decision point called a **leaf**.

At that leaf, it checks how many training examples from each flower type ended up there. For example:
- üå± **Iris setosa**: 0 out of 54 samples ‚Üí 0%
- üåø **Iris versicolor**: 49 out of 54 samples ‚Üí 90.7%
- üåº **Iris virginica**: 5 out of 54 samples ‚Üí 9.3%

The Decision Tree then:
- Gives you **probabilities** for each type of flower.
- Picks the one with the **highest probability** as the predicted class.

‚úÖ In this case, the flower is most likely an **Iris versicolor**.


In [16]:
probabilities=tree_clf.predict_proba([[5,1.5]])
print(probabilities)
predicition=tree_clf.predict([[5,1.5]])
print (predicition)

[[0.         0.90740741 0.09259259]]
[1]


## The CART Training Algorithm

### Scikit-Learn uses the classification and Regression Tree(CART) algorithm to train Decision Trees(also called `growing` trees).

#### üå≥ How Decision Trees Choose Where to Split

When training a Decision Tree, the algorithm looks at each feature (like petal length, petal width, etc.) and tries different values, called **thresholds**, to split the data.

üìå For example:  
It might ask, ‚ÄúDoes **petal length ‚â§ 2.45 cm** help separate the flower types clearly?‚Äù

---

### üß† Choosing the Best Feature and Threshold (`k`, `t‚Çñ`)

To pick the best pair `(k, t‚Çñ)`:
- The tree **tries many combinations** of feature and threshold.
- It calculates how **pure** each group is after splitting.  
  ‚Üí Purer groups = better separation of classes.
- It **weights the groups** by their size (larger groups matter more).

‚úÖ The algorithm chooses the feature and threshold that create the **cleanest and most balanced split**.

This process helps the tree make accurate predictions as it grows.



As you can see, the CART algorithm is a greedy algorithm: it greed
ily searches for an optimum split at the top level, then repeats the
 process at each subsequent level. It does not check whether or not
 the split will lead to the lowest possible impurity several levels
 down. A greedy algorithm often produces a solution that‚Äôs reasona
bly good but not guaranteed to be optimal.
 Unfortunately, finding the optimal tree is known to be an NP
Complete problem:2 it requires O(exp(m)) time, making the prob
lem intractable even for small training sets. This is why we must
 settle for a ‚Äúreasonably good‚Äù solution

## Gini Impurity Or Entropy ?

By default , the Ginin impurity is used , but uyou can sekect the entropy impurity measure instead by setting the criterion hyperparameter to `entropy`.

#### what should we use ?

It doesn't make big difference because both lead to similar trees.
**Gini Impurity** is prefered because its slightly faster to compute . so it is a Good default. 

##### When they Differ?

! Gini Impurity tends to isolate the most frequent class in its own branch of the tree while entropy tends to produce slightly more balanced trees.