# Decision Tree Exercises - Part 1
## Summary
These exercises focus on understanding how a decision tree is built and interpreted.
## Exercise 1
- Sketch an example of your own invention: a partition of a two dimensional feature space that could result from
recursive binary splitting. Your example should contain at least six regions.

- Draw a decision tree corresponding to this partition. Label all aspects of your two figures, including the regions,
the cutpoints, etc.

- Can you express your decision tree as a set of nested rules for classification: If .. (if.. else..) .. else .. (if .. else ..) ..

*Solution*:  
1. ![image.png](attachment:image.png)

2. ![image-2.png](attachment:image-2.png)

3.  
```py
def region(x1, x2):
    if x1 < 5:
        if x2 > 10:
            return "R1"
        else:
            return "R2"
    else: # Could use elifs for better readability, but this keeps more of the tree structure.
        if x2 > 13:
            if x1 < 8:
                return "R4"
            else:
                if x2 > 18:
                    return "R5"
                else:
                    return "R6"
        else:
            return "R3"
```

## Exercise 2
Consider the following regression tree:  
![image.png](attachment:image.png)  
  
Sketch a partition [0,2]x[0,3] of a two dimensional feature space (X1 and X2) and divide it into regions corresponding to the above tree, and indicate the mean (output) for each region.

*Solution*:  
![image.png](attachment:image.png)

## Exercise 3
Explain in your own words: What is a Decision Tree? How does it work? What are some similarities or differences to
other algorithms you have used in this course?


*Solution*:  
it splits the training data into regions based on binary tests. either a test point passes the test or it fails, resulting in the regions. Similar to knn because all close points will end up in the same region.

## Exercise 4
Consider the following simple classification setting with two classes, and S1 and S2 as two out of many possible first
splits of the data set.  
![image.png](attachment:image.png)  
- What are the other possible (first) splits? (note that we have a 2D space)  

Here we want to try three different impurity functions: a) classfication error rate, b) Gini index, and c) entropy.  

Use all the three impurity functions, and compute by hand the **weighted average of impurities** of the child nodes, if:  
  
- we split the data by S1  
- we split the data by S2  

- Are the results differet for S1 and S2 when we use:  
  - Classfication error rate?  
  - Gini index?  
  - Entropy?  

One of the conditions we can use as stopping condition is to stop when further splitting does not reduce impurity
(i.e., there is no information gain).  

Use the above impurity functions on the whole dataset (before splitting).  

- Based on each impurity function, is there any impurity reduction due to splitting the data by S1 or S2?  

- Which of the impurity functions do you prefer to use? why?

*Solution*:  
1. At x2=1.5 or x1=2.5.  
  
2.  
**S1**:  
    _Classificatin error_: 2 correct 0 incorrect vs. 12 correct 2 incorrect.  
    Error rate: left: $0/2=0$, right: $2/14=0.143$  
    Weighted average: $(2/16)*0+(14/16)*0.143\approx0.125$  
    _Gini index_:  
    Left: $(2/2)(1-(2/2))+(0/2)(1-(0/2))=0$  
    Right: $(2/14)(1-(2/14))+(12/14)(1-(12/14))=0.245$  
    Weighted average: $(2/16)0+(14/16)0.245=0.214$  
    _Entropy_:  
    Left: $-((2/2)\log(2/2)+(0/2)\log(0/2))=0$  
    Right: $-((2/14)\log(2/14)+(12/14)\log(12/14))=0.178111$  
    Weighted average: $(2/16)0+(14/16)0.178111=0.155847125$  
      
**S2**:  
    _Classificatin error_: 4 correct 2 incorrect vs. 10 correct 0 incorrect.  
    Error rate: left: $2/6=0.333$, right: $0/10=0$  
    Weighted average: $(6/16)*0.333+(10/16)*0\approx0.125$  
    _Gini index_:  
    Left: $(4/6)(1-(4/6))+(2/6)(1-(2/6))=0.4444444444$  
    Right: $(0/10)(1-(0/10))+(10/10)(1-(10/10))=0$  
    Weighted average: $(6/16)0.44444+(10/16)0=0.0283160861$  
    _Entropy_:  
    Left: $-((4/6)\log(4/6)+(2/6)\log(2/6))=0.276435$  
    Right: $-((0/10)\log(0/10)+(10/10)\log(10/10))=0$  
    Weighted average: $(6/16)0.276435+(10/16)0=0.103663125$  

Results are lower for both gini and entropy on S2, but results for error rate is the same.  
  
3. I don't wanna, but I'll say use gini cause it's easy to calculate and error rate bad.  

## Exercise 5
Consider the Gini index **or** entropy, **and** classfication error, in the following simple classification setting with two
classes.  
![image.png](attachment:image.png)  
  
- Sketch a decision tree by hand for classifying this data.  

- Now compute the impurity of the whole data (based on one of the impurity functions above) - you can do it by
hand, but might be easier to automate it by few lines of code :)  
  
- Try every possible split, compute the impurity of each region and the weighted average of the children (or the
information gain) for the split. What is the best split? Is it the same as the first split in your hand-made decision
tree?  
  
- Continue splitting based on the information gain until there is no classification error.  

In [None]:
# Solution
# 1. horizontal at x2=2.5, verticals in the top region at x1=1.5 and x1=2.5. vertical in the bottom region at x1=1.5.  
import numpy as np
data = np.array([
    1.00, 3.00, 0,
    2.00, 3.00, 1,
    3.00, 3.00, 0,
    2.00, 2.00, 0,
    3.00, 2.00, 0,
    1.00, 1.00, 1
])

def gini(data):
    pass
# i dont wanna do this anymore