# AI for Trading | Module 7 | L16: Decision Trees
Learn about machine learning from a bird's-eye-view.

## 1. Welcome
![image.png](attachment:6231ffea-b576-4b86-b02d-c2b47ceef91d.png)

Welcome to this lesson on decision trees. We'll start with decision trees because they underlie the more complicated machine learning techniques you'll learn to apply on financial datasets. For several decision-tree related topics, you'll learn from one of Udacity's best instructors, Luis Serrano. Luis was formerly a machine learning engineer at Google. He holds a PhD in mathematics from the University of Michigan.

## 2. Recommending Apps 1
- https://youtu.be/l34ijtQhVNk
- ![image.png](attachment:36587c67-ba00-446a-a3f9-28f9c8a5ebd8.png)


## 4. Recommending Apps 2
- https://youtu.be/uI_yNrqqKVg


## 5. Recommending Apps 3
- https://youtu.be/nEvW8B1HNq4
- ![image.png](attachment:380489c6-a500-486c-b3e7-469047f3b12e.png)


## 6. Tree Anatomy
- ![image.png](attachment:53cf3e70-c4a2-4e92-ba5b-a79fbd3d399e.png)
- ![image.png](attachment:8ada7708-37ce-4f81-83d6-82db39aae236.png)

## 7. Quiz: Student Admissions
- https://youtu.be/MOa335cQGI4
- ![image.png](attachment:4de7c204-fad5-421a-a502-479ea52df04c.png)


## 8. Solution: Student Admissions
- https://youtu.be/TdgBi6LtOB8
- At 0:55 and 1:10, it should be horizontal lines added to the decision tree, not vertical.

![image.png](attachment:a0e56eb3-2b52-4066-ba19-da979c85a22a.png)

## 9. Entropy
- https://youtu.be/piLpj1V1HEk

![image.png](attachment:2f6d5c31-737a-4350-a630-bf240730b4b5.png)

From probability, look at the following example about how much balls can move around:
- Low Entropy: The left 4 balls are all orange.  It's rigid and answer is concrete regardless we move the balls around.
- Medium Entropy: The middle example has 1 blue and we can organize the results in 4 ways.
- High Entropy: We can organize the combination in 6 ways.

Fundamentally: the more rigid, the less entropy


## 10. Entropy Formula 1
- https://youtu.be/iZiSYrOKvpo

![image.png](attachment:f5075725-b0d4-4d18-9f48-cffdb2ad4d29.png)


## 11.Entropy Formula 2
- https://youtu.be/6GHg70hrSJw

![image.png](attachment:67e29d3b-796c-4620-8feb-8d2b00960fb7.png)


## 12. Entropy Formula 3
- https://youtu.be/w73JTBVeyjE
  - At 0:15, the fourth log should be log(0.25) instead of log(0.75). The final sum, -3.245, is still correct for log(0.75) + log(0.75) + log(0.75) + log(0.25), using a base of 2.
  - At 0:29, the value in the first row for P(Blue) should be 0, not 1.
  - At 1:22, the formula has a denominator of m-n. It should be m+n.

![image.png](attachment:d7e05795-dd24-4cae-8cb1-29074c0f4968.png)
![image.png](attachment:1ed1538f-9be5-443d-b10d-670de686db8a.png)

![image.png](attachment:1218bead-ad70-4138-a963-f7f7dca13026.png)

In [17]:
## 13. Quiz: Do You Know Your Entropy?
# import math
import numpy as np

# 4 red, 10 blue
m = 4
n = 10

# What I translated from concept 12.
resultsDuane = -(m / (m + n) * np.log2(m / (m + n)) + (n / (m + n)) * np.log2(n / (m + n)))

# Answer from class
resultsUdacity = -((4/14) * np.log2(4/14) + (10/14) * np.log2(10/14))

print(f'My Answer: {resultsDuane}')
print(f'From Course: {resultsUdacity}')

My Answer: 0.863120568566631
From Course: 0.863120568566631


## 14. Multiclass Entropy
![image.png](attachment:5b3afda0-3fc6-461a-b453-04be6e61ac50.png)



In [20]:
import numpy as np

### 14. Multiclass Entropy | Question
# If we have a bucket with eight red balls, three blue balls, and two yellow balls, what 
# is the entropy of the set of balls? Input your answer to at least three decimal places.
red = 8
blue = 3
yellow = 2

p1_red = red / (red + blue + yellow)
p2_blue = blue / (red + blue + yellow)
p1_yellow = yellow / (red + blue + yellow)

results_duane = -(p1_red * np.log2(p1_red)) - (p2_blue * np.log2(p2_blue)) - (p1_yellow * np.log2(p1_yellow))

print(f'My Results: {results_duane}')


My Results: 1.3346791410515946


## 15. Quiz: Information Gain
- https://youtu.be/tVLOLPEtLFw
- ![image.png](attachment:6459b003-98b3-406a-8625-38d8cc165932.png)
- ![image.png](attachment:19bf02e7-8130-4ff6-b5d2-d63b6e73378a.png)


## 16. Solution: Information Gain
- https://youtu.be/k9iZL53PAmw
- ![image.png](attachment:1a820ff4-fd12-4da5-bc7c-34ff18fcc8bc.png)
- ![image.png](attachment:e7258ac7-76ea-4796-b757-15c85991bb3e.png)
- ![image.png](attachment:302c063a-f662-4b83-a90a-b9d98a5d10c2.png)


## 17. Maximizing Information Gain
- https://youtu.be/3FgJOpKfdY8
- ![image.png](attachment:9d87225f-cf03-4d82-b75b-b96ca891872c.png)
- ![image.png](attachment:81b5bbd1-ccde-424d-98a3-7c6c4efa2a42.png)


## 18. Calculating Information Gain on a Dataset
![image.png](attachment:22f07dc1-4187-4beb-8c6c-a46aa2b1736a.png)
![image.png](attachment:26c615d5-78c4-4005-92d4-866be4618476.png)

In [27]:
### 18. Calculating Information Gain on a Dataset | Quiz
def two_group_ent(first, tot):                        
    return -(
        first / tot * np.log2(first/tot) +           
        (tot - first) / tot * np.log2((tot - first) / tot)
    )

tot_ent = two_group_ent(10, 24)                       
g17_ent = 15 / 24 * two_group_ent(11,15) + 9 / 24 * two_group_ent(6,9)                  

answer = tot_ent - g17_ent

print(answer)

0.11260735516748976


## 19. Gini Impurity
![image.png](attachment:1b46388f-69db-45ee-9c2e-0c9b1878c5e4.png)
![image.png](attachment:fec58355-c592-4c1e-8f1c-ad7829ebafd0.png)

![image.png](attachment:e043da7e-31c0-4ee2-90b6-0658022e38d7.png)

![image.png](attachment:d05ce24a-5e64-4024-af23-233e0033fc1e.png)

### Resources
- [Serrano.Academy - Gini Impurity Index explained in 8 minutes](https://youtu.be/u4IxOk2ijSs)
- [scikit-learn](https://scikit-learn.org/stable/modules/generated/sklearn.tree.DecisionTreeClassifier.html)


In [34]:
## 19. Gini Impurity
### Parent
blue = 10
red = 10
Gp = 1 - (blue / (blue+red))**2 - (red / (blue+red))**2
print(f'Gini Parent = {Gp}')

### Left Content
blue_1 = 8
red_1 = 2
G1 = 1 - (blue_1 / (blue_1+red_1))**2 - (red_1 / (blue_1+red_1))**2
print(f'Gini 1 = {G1}')

### Right Content
blue_2 = 2
red_2 = 8
G2 = 1 - (blue_2 / (blue_2+red_2))**2 - (red_2 / (blue_2+red_2))**2
print(f'Gini 2 = {G2}')

### G Increase
G_increase = Gp - ((G1 + G2) / 2)
print(f'Gini Increase = {G_increase}')

Gini Parent = 0.5
Gini 1 = 0.31999999999999984
Gini 2 = 0.31999999999999984
Gini Increase = 0.18000000000000016


## 20. Hyperparameters for Decision Trees
In order to create decision trees that will generalize to new problems well, we can tune a number of different aspects about the trees. We call the different aspects of a decision tree "hyperparameters". These are some of the most important hyperparameters used in decision trees:

### Maximum Depth
![image.png](attachment:921c117c-ec08-40d6-88a9-d9e06d7429d0.png)

### Minimum number of samples to split
![image.png](attachment:34d3f4e7-89e1-4249-b365-d68012663efc.png)

### Minimum number of samples per leaf
![image.png](attachment:d71a94e1-bb9b-45a7-80f8-3c60a562de92.png)
