# Tree Lab (10 pts)

在本Lab中，你将会完成一个决策树，并在一个简单的数据集上进行验证。

## 准备工作

### 环境准备

请确保完成以下依赖包的安装，并且通过下面代码来导入与验证。

In [1]:
import pandas as pd
import numpy as np
from autograder import *
from utils import *
from codes import *
import random

np.random.seed(0)
random.seed(0)

# reload modifications on imported modules
%load_ext autoreload
%autoreload 2

### 数据集准备
我们将使用以下数据集进行决策树的构建。该数据集包括7个特征，以及一个标签“是否适合读博”，这些特征描述了适合读博的各种条件，如love doing research,I absolutely want to be a college professor等。

请运行下面的代码来加载数据集。

（防侵权说明）参考https://zhuanlan.zhihu.com/p/372884253，数据集来源GPT4，但构造的决策树不一定与参考内容完全一致。

In [2]:
# Load the datasets for training and testing
train_data = pd.read_csv('train_phd_data.csv')
test_data = pd.read_csv('test_phd_data.csv')

## 决策树构建
在这个部分，你将学习并完成决策树的构建。注意：不考虑剪枝，决策树构建停止条件是数据所有实例属于同一类或者特征不可再分（即每个特征值都一样）。

我们采用**信息增益率**作为分类标准。

请完成`codes.py`文件中的以下函数：

1. **计算数据的信息熵 `getInfoEntropy()`；(3.33 pts)**
2. **根据信息增益率找到最优特征 `find_best_feature()`；(3.33 pts)** 

你可能会用到`pandas`库函数，请参考 [pandas官方文档](https://pandas.pydata.org/docs/reference/)；我们在`utils.py`中提供了部分函数，你可能在完成`codes.py`中的函数时会用到，因此请确保阅读并理解。

In [3]:
run_get_info_entropy_tests()
run_find_best_feature_tests()

.


----------------------------------------------------------------------
Ran 1 test in 0.007s

OK
...
----------------------------------------------------------------------
Ran 3 tests in 0.046s

OK


Final Score of Info Entropy: 3.3333333/3.33
Final Score of Find Best Feature: 3.3333333/3.33


接下来，请你利用刚刚完成的函数，进一步完成`codes.py`文件中的以下函数：

3. **构建树`create_Tree()`；(3.33 pts)**

完成后，你可以运行如下可视化代码，在`decision_tree.png`中，直观查看你所构建的树。

In [4]:
Tree = create_Tree(train_data)

# plot decision tree
tree_graph = Digraph(comment='Decision Tree')
plot_tree(Tree, 'Root', 0, tree_graph)
tree_graph.render('decision_tree', format='png', cleanup=True)

'decision_tree.png'

为了拿到create_Tree的全部分数，你需要保证该树在训练集上的分类精度`>0.9`，在测试集上的分类精度`>0.8`。

In [5]:
acc_train = test(train_data, Tree)
acc_test = test(test_data, Tree)

print("Train acc:", acc_train)
print("Test acc:", acc_test)

Train acc: 1.0
Test acc: 0.8571428571428571


  if classify(tree, sample) == sample[-1]:


运行以下测试代码来查看你的分数（满足精确度要求时拿到全部分数）：

In [6]:
run_tree_tests()

  if classify(tree, sample) == sample[-1]:
.
----------------------------------------------------------------------
Ran 1 test in 0.128s

OK


Final Score of Tree: 3.333333/3.33


接下来是一个小测试，旨在看看您是否适合攻读博士学位。请注意！！！这仅仅是一个基于假设的模型，它无法准确的预测实际情形。请您将其当做一个轻松的尝试，只供娱乐，而无法取代对自身状况的深入思考和慎重决策。


In [8]:
# You can input your profile to predict your phd admission result
# input your profile about "I love doing research,I absolutely want to be a college professor,Money is important to me,I can deal with extreme stress and competition,I am OK being with judged all the time,I need a clear target and immediate feedback,I work 9-5 Mon-Fri"
loving = input('Do you love research? (1/0)')
professor = input('Do you want to be a professor? (1/0)')
money = input('Is money important to you? (1/0)')
stress = input('Can you deal with stress? (1/0)')
judge = input('Can you deal with being judged all the time? (1/0)')
feedback = input('Do you need a clear target and immediate feedback? (1/0)')
work = input('Do you work 9-5 Mon-Fri? (1/0)')

# Combine the user's responses into a single data frame
test_data = pd.Series({
    'I love doing research': int(loving),
    'I absolutely want to be a college professor': int(professor),
    'Money is important to me': int(money),
    'I can deal with extreme stress and competition': int(stress),
    'I am OK being with judged all the time': int(judge),
    'I need a clear target and immediate feedback': int(feedback),
    'I work 9-5 Mon-Fri': int(work)
})

# Use the decision tree to predict the result 
result = classify(Tree, test_data)

# Print the result to the user
if result == 1:
    print("Congratulations! According to the model, you are likely to gain admission for Ph.D.")
elif result == 0: 
    print("Unfortunately, according to the model, you are unlikely to gain admission for Ph.D.")


Congratulations! According to the model, you are likely to gain admission for Ph.D.
