<h1 style="font-size: 1.6rem; font-weight: bold">ITO 5047: Fundamentals of Artificial Intelligence</h1>
<h1 style="font-size: 1.6rem; font-weight: bold">Tutorial 4 - Decision Trees</h1>
<p style="margin-top: 5px; margin-bottom: 5px;">Monash University Australia</p>
<p style="margin-top: 5px; margin-bottom: 5px;">Jupyter Notebook by: Tristan Sim Yook Min</p>

---

### **1) Decision Tree**

Consider the following data that deals with playing ball in different weather conditions (it’s a reduced version of what we saw in the topic book videos). The data has been sorted in descending order based on the temperature attribute. All attributes are categorical other than Temperature.

In this question, we will see an approach incorporating continuous attributes into the decision tree algorithm. We need to define a split point p of the continuous attribute so the data is split into attributes ≤ p and > p.

| Outlook   | Temperature | Humidity | Play Ball |
|-----------|-------------|----------|-----------|
| Sunny     | 30          | High     | No        |
| Rain      | 22          | High     | No        |
| Sunny     | 20          | Normal   | Yes       |
| Overcast  | 18          | High     | Yes       |
| Overcast  | 9           | Normal   | Yes       |
| Rain      | 6           | Normal   | No        |

With this, the continuous attribute can be turned into a categorical attribute. The optimal split can be found by considering all possible split points and taking the one with the highest information gain. With the continuous-valued attribute sorted, we can find possible split points by identifying the points where an increase in the attribute results in a change to the target class. In this data, for example, when the Temperature moves from 6 to 9, the day with temperature 6 has PlayBall=No, and the day with temperature 9 has PlayBall=Yes. The value for p is set as the mean of the two values of the attribute, so the split between 6 and 9 would have p = 7.5. The information gain can then be computed for the split point by considering data into a set with temperatures less than or equal to 7.5 and greater than 7.5.

This approach identifies all potential split points and values for the Temperature attribute in this data set. Compute the information gained for each possible split point and determine the best choice.

<br>

#### **Step 1: Calculate Original Entropy**

**Original dataset**: [3+, 3-] (3 Yes, 3 No)

$$H(S) = -\frac{3}{6} × log₂\left(\frac{3}{6}\right) - \frac{3}{6} × log₂\left(\frac{3}{6}\right)$$
$$H(S) = -0.5 × log₂(0.5) - 0.5 × log₂(0.5)$$
$$H(S) = -0.5 × (-1.0) - 0.5 × (-1.0) = 1.0$$

#### **Step 2: Identify Potential Split Points**

Looking at consecutive temperature values where Play Ball changes:

1. **Between 6 and 9**: Play Ball changes from No to Yes → p = $\frac{6+9}{2} = 7.5$
2. **Between 18 and 20**: Play Ball changes from Yes to Yes → No change, skip
3. **Between 20 and 22**: Play Ball changes from Yes to No → p = $\frac{20+22}{2} = 21$
4. **Between 22 and 30**: Play Ball changes from No to No → No change, skip

**Valid split points**: p = 7.5 and p = 21

#### **Step 3: Calculate Information Gain for Each Split Point**

**Split Point p = 7.5 (Temperature ≤ 7.5 vs > 7.5)**

**Left subset (≤ 7.5)**: Temperature 6
- Data: [0+, 1-] (0 Yes, 1 No)
- Size: 1

$H(S_{≤7.5}) = 0$ (pure subset)

**Right subset (> 7.5)**: Temperatures 9, 18, 20, 22, 30
- Data: [3+, 2-] (3 Yes, 2 No)
- Size: 5

$H(S_{>7.5}) = -\frac{2}{5} × log₂\left(\frac{2}{5}\right) - \frac{3}{5} × log₂\left(\frac{3}{5}\right) = 0.97$

**Information Gain**:
$IG(S, p=7.5) = H(S) - \frac{1}{6} × H(S_{≤7.5}) - \frac{5}{6} × H(S_{>7.5})$
$IG(S, p=7.5) = 1 - 0 - \frac{5}{6} × 0.97$
$IG(S, p=7.5) = 1 - 0.81 = 0.19$

**Split Point p = 21 (Temperature ≤ 21 vs > 21)**

**Left subset (≤ 21)**: Temperatures 6, 9, 18, 20
- Data: [3+, 1-] (3 Yes, 1 No)
- Size: 4

$H(S_{≤21}) = -\frac{1}{4} × log₂\left(\frac{1}{4}\right) - \frac{3}{4} × log₂\left(\frac{3}{4}\right) = 0.81$

**Right subset (> 21)**: Temperatures 22, 30
- Data: [0+, 2-] (0 Yes, 2 No)
- Size: 2

$H(S_{>21}) = 0$ (pure subset)

**Information Gain**:
$IG(S, p=21) = H(S) - \frac{4}{6} × H(S_{≤21}) - \frac{2}{6} × H(S_{>21})$
$IG(S, p=21) = 1 - \frac{4}{6} × 0.81 - 0$
$IG(S, p=21) = 1 - 0.54 = 0.46$

#### **Step 4: Determine Best Split Point**

Comparing the information gains:
- **p = 7.5**: IG = 0.19
- **p = 21**: IG = 0.46

#### **Conclusion**

**The best split point is p = 21** with an information gain of 0.46.

This split creates:
- **Left branch (Temperature ≤ 21)**: Contains temperatures 6, 9, 18, 20 with outcomes [3+, 1-]
- **Right branch (Temperature > 21)**: Contains temperatures 22, 30 with outcomes [0+, 2-]

The split at p = 21 provides better separation and higher information gain, making it the optimal choice for the continuous Temperature attribute.
