# Dataset

![image.png](attachment:6313adc2-c4bb-4d4e-8e31-3e3a35f320fb.png)

# Root Node

* First task is to determine the root node. What attribute, w.r.t. the attribute we wish to make a decision about (`Played?`), maximizes the information gain, and minimizes it's entropy? General entropy is as follows

$$
S_X=-\Sigma^c_{i=1}~p_i\log_2(p_i); s.t. c \in X
$$

So this is the Entropy for a given random variable, X, where i is the given value in this random variable (in our demonstration, these random variables are the attributes, and values are the features!). There exist `c` values in the given random variable. So $p_i$ is the probability of a value occuring in the set of X. Now, getting our root node. 

1) Calculate Entropy of `Played?` Attribute. Recall that Entropy is a measurement of uncertainty. The more uncertainty we have (Entropy), the worse our attribute is at making a decision.

$$
S_\text{Played?} = -\frac{10}{14}\log_2(\frac{10}{14})-\frac{4}{14}\log_2(\frac{4}{14}) = 0.863~\text{bits}
$$

2) Entropy of `Temperature`, w.r.t. `Played?` 

* Three values in the random variable. `Mild`, `Hot`, and `Cool`. Each is w.r.t. `Played?`, for their given subset of `Temperature`

$$
S_\text{Mild, Played?} = -\frac{5}{7}\log_2(\frac{5}{7})-\frac{2}{7}\log_2(\frac{2}{7}) = 0.863~\text{bits}
$$

$$
S_\text{Hot, Played?} = -\frac{2}{3}\log_2(\frac{2}{3})-\frac{1}{3}\log_2(\frac{1}{3}) = 0.918~\text{bits}
$$

$$
S_\text{Cool, Played?} = -\frac{3}{4}\log_2(\frac{3}{4})-\frac{1}{4}\log_2(\frac{1}{4}) = 0.811~\text{bits}
$$

* Now, the Entropy of `Temperature` w.r.t. `Played?` is the sum of these entropies! Since `Temperature` can be `Mild` OR `Hot` OR `Cool`. But, we need to weight them by their membership in the `Temperature` set. Some Entropy is more important than others. Namely `Mild`. So, our equation ends up looking like... 

$$
S_\text{Temperature, Played?} = \frac{7}{14}S_\text{Mild, Played?} + \frac{3}{14}S_\text{Hot, Played?} + \frac{4}{14}S_\text{Cool, Played?} = 0.860~\text{bits}
$$

3) Entropy of `Outlook` w.r.t. `Played?`

* Using `Sunny`, `Rain`, and `Overcast` random variables.

$$
S_\text{Sunny, Played?} = -\frac{2}{5}\log_2(\frac{2}{5})-\frac{3}{5}\log_2(\frac{3}{5}) = 0.971~\text{bits}
$$

$$
S_\text{Rain, Played?} = -\frac{3}{5}\log_2(\frac{3}{5})-\frac{2}{5}\log_2(\frac{2}{5}) = 0.971~\text{bits}
$$

* 0 bits! All `Overcast` measurements result in Yes. There is no uncertainty here.
$$
S_\text{Overcast, Played?} = -\frac{4}{4}\log_2(\frac{4}{4})-\frac{0}{4}\log_2(\frac{0}{4}) = 0.000~\text{bits}
$$

$$
S_\text{Outlook, Played?} = \frac{5}{14}S_\text{Sunny, Played?} + \frac{5}{14}S_\text{Rain, Played?} + \frac{4}{14}S_\text{Overcast, Played?} = 0.693~\text{bits}
$$

4) Entropy of `Humidity` w.r.t. `Played?`

* This is a numerical attribute, so we need a slightly different approach. If we choose a threshold, we can then represent this random variable as being of 2 potential values! Above or below the threshold (or equal, if applicable). Either way, it allows our math to work similarily.
* From a Piazza post: "<= 75 for low humidity and 75 > for high humidity"
* And just like that, we have 2 values to choose from. Entropies of each below.

$$
S_\text{Low, Played?} = -\frac{6}{7}\log_2(\frac{6}{7})-\frac{1}{7}\log_2(\frac{1}{7}) = 0.592~\text{bits}
$$

$$
S_\text{High, Played?} = -\frac{4}{7}\log_2(\frac{4}{7})-\frac{3}{7}\log_2(\frac{3}{7}) = 0.985~\text{bits}
$$

$$
S_\text{Humidity, Played?} = \frac{7}{14}S_\text{Low, Played?} + \frac{7}{14}S_\text{High, Played?} = 0.788~\text{bits}
$$

5) Entropy of `Windy` w.r.t. `Played?`

$$
S_\text{No, Played?} = -\frac{7}{8}\log_2(\frac{7}{8})-\frac{1}{8}\log_2(\frac{1}{8}) = 0.544~\text{bits}
$$

$$
S_\text{Yes, Played?} = -\frac{3}{6}\log_2(\frac{3}{6})-\frac{3}{6}\log_2(\frac{3}{6}) = 1.000~\text{bits}
$$

$$
S_\text{Windy, Played?} = \frac{8}{14}S_\text{No, Played?} + \frac{6}{14}S_\text{Yes, Played?} = 0.739~\text{bits}
$$


5) Comparing Information Gains 

* We can now compute the Information Gain. It is the difference of the Entropy we are looking to decide, and the Entropy of the attributes we are selecting from to construct the current node. Whichever difference is greatest has the Maximum Gain, and should be used for the node.

$$
IG_\text{Played?, Temperature} = S_\text{Played?} - S_\text{Temperature, Played?} = 0.003~\text{bits}
$$

$$
IG_\text{Played?, Outlook} = S_\text{Played?} - S_\text{Temperature, Played?} = 0.170~\text{bits}
$$

$$
IG_\text{Played?, Humidity} = S_\text{Played?} - S_\text{Temperature, Played?} = 0.075~\text{bits}
$$

$$
IG_\text{Played?, Windy} = S_\text{Played?} - S_\text{Temperature, Played?} = 0.124~\text{bits}
$$

6) Outlook is first node. Decision Tree so far is below.


![image.png](attachment:8b64a981-7718-4e3e-a458-a5929c4fd2e2.png)

# Sunny Node