# 实现决策树

实验内容：  
使用LendingClub Safe Loans数据集：
1. 实现信息增益、信息增益率、基尼指数三种划分标准
2. 使用给定的训练集完成三种决策树的训练过程
3. 计算三种决策树在最大深度为6时测试集上的精度，查准率，查全率，F1值

在这部分，我们会实现一个很简单的二叉决策树

## 1. 读取数据

In [1]:
# 导入类库
import pandas as pd
import numpy as np
import json

In [2]:
# 导入数据
loans = pd.read_csv('data/lendingclub/lending-club-data.csv', low_memory=False)

In [3]:
# 对数据进行预处理，将safe_loans作为标记
loans['safe_loans'] = loans['bad_loans'].apply(lambda x : +1 if x==0 else -1)
del loans['bad_loans']

我们只使用grade, term, home_ownership, emp_length这四列作为特征，safe_loans作为标记，只保留loans中的这五列

In [4]:
features = ['grade', 'term','home_ownership','emp_length']
target = 'safe_loans'
loans = loans[features + [target]]

## 2. 划分训练集和测试集

In [6]:
from sklearn.utils import shuffle
loans = shuffle(loans, random_state = 34)
split_line = int(len(loans) * 0.6)
train_data = loans.iloc[: split_line]
test_data = loans.iloc[split_line:]

## 3. 特征预处理

In [7]:
def one_hot_encoding(data, features_categorical):
    for cat in features_categorical:
        one_encoding = pd.get_dummies(data[cat], prefix = cat)
        data = pd.concat([data, one_encoding],axis=1)
        del data[cat]
    return data

In [8]:
train_data = one_hot_encoding(train_data, features)

获取所有特征的名字

In [10]:
one_hot_features = train_data.columns.tolist()
one_hot_features.remove(target)
one_hot_features

['grade_A',
 'grade_B',
 'grade_C',
 'grade_D',
 'grade_E',
 'grade_F',
 'grade_G',
 'term_ 36 months',
 'term_ 60 months',
 'home_ownership_MORTGAGE',
 'home_ownership_OTHER',
 'home_ownership_OWN',
 'home_ownership_RENT',
 'emp_length_1 year',
 'emp_length_10+ years',
 'emp_length_2 years',
 'emp_length_3 years',
 'emp_length_4 years',
 'emp_length_5 years',
 'emp_length_6 years',
 'emp_length_7 years',
 'emp_length_8 years',
 'emp_length_9 years',
 'emp_length_< 1 year']

接下来是对测试集进行one_hot编码，但只要保留出现在one_hot_features中的特征即可·

In [11]:
test_data_tmp = one_hot_encoding(test_data, features)

In [12]:
# 创建一个空的DataFrame
test_data = pd.DataFrame(columns = train_data.columns)
for feature in train_data.columns:
    # 如果训练集中当前特征在test_data_tmp中出现了，将其复制到test_data中
    if feature in test_data_tmp.columns:
        test_data[feature] = test_data_tmp[feature].copy()
    else:
        # 否则就用全为0的列去替代
        test_data[feature] = np.zeros(test_data_tmp.shape[0], dtype = 'uint8')

In [13]:
test_data.head()

Unnamed: 0,safe_loans,grade_A,grade_B,grade_C,grade_D,grade_E,grade_F,grade_G,term_ 36 months,term_ 60 months,...,emp_length_10+ years,emp_length_2 years,emp_length_3 years,emp_length_4 years,emp_length_5 years,emp_length_6 years,emp_length_7 years,emp_length_8 years,emp_length_9 years,emp_length_< 1 year
37225,1,False,False,False,False,True,False,False,True,False,...,False,True,False,False,False,False,False,False,False,False
101585,-1,False,True,False,False,False,False,False,True,False,...,False,False,False,True,False,False,False,False,False,False
31865,1,True,False,False,False,False,False,False,True,False,...,False,False,False,False,False,False,False,False,False,False
97692,1,False,False,True,False,False,False,False,True,False,...,True,False,False,False,False,False,False,False,False,False
88181,1,True,False,False,False,False,False,False,True,False,...,False,False,False,False,False,True,False,False,False,False


In [14]:
train_data.shape

(73564, 25)

In [15]:
test_data.shape

(49043, 25)

**处理完后，所有的特征都是0和1，标记是1和-1**，以上就是数据预处理流程

## 4. 实现3种特征划分准则

决策树中有很多常用的特征划分方法，比如信息增益、信息增益率、基尼指数

我们需要实现一个函数，它的作用是，给定决策树的某个结点内的所有样本的标记，让它计算出对应划分指标的值是多少

接下来我们会实现上述三种划分指标

**这里我们约定，将所有特征取值为0的样本，划分到左子树，特征取值为1的样本，划分到右子树**

### 4.1 信息增益

In [16]:
def information_entropy(labels_in_node):
    '''
    求当前结点的信息熵
    '''
    # 统计样本总个数
    num_of_samples = labels_in_node.shape[0]
    
    if num_of_samples == 0:
        return 0
    # 统计出标记为1的个数
    num_of_positive = len(labels_in_node[labels_in_node == 1])
    # 统计出标记为-1的个数
    num_of_negative = len(labels_in_node[labels_in_node == -1])             # YOUR CODE HERE
    # 统计正例的概率
    prob_positive = num_of_positive / num_of_samples
    # 统计负例的概率
    prob_negative = num_of_negative / num_of_samples                 # YOUR CODE HERE
    if prob_positive == 0:
        positive_part = 0
    else:
        positive_part = prob_positive * np.log2(prob_positive)
    
    if prob_negative == 0:
        negative_part = 0
    else:
        negative_part = prob_negative * np.log2(prob_negative)
    
    return - ( positive_part + negative_part )

In [18]:
def compute_information_gains(data, features, target, annotate = False):
    '''
    计算所有特征的信息增益
    '''
    information_gains = dict()
    for feature in features:
        left_split_target = data[data[feature] == 0][target]
        right_split_target =  data[data[feature] == 1][target]
        left_entropy = information_entropy(left_split_target)
        left_weight = len(left_split_target) / (len(left_split_target) + len(right_split_target))

        # 计算右子树的信息熵
        right_entropy = information_entropy(right_split_target)                         # YOUR CODE HERE
        # 计算右子树的权重
        right_weight = len(right_split_target) / (len(left_split_target) + len(right_split_target))                          # YOUR CODE HERE
        # 计算当前结点的信息熵
        current_entropy = information_entropy(data[target])
        # 计算使用当前特征划分的信息增益
        gain = current_entropy - (left_weight * left_entropy + right_weight * right_entropy)                                  # YOUR CODE HERE
        # 将特征名与增益值以键值对的形式存储在information_gains中
        information_gains[feature] = gain
        if annotate:
            print(" ", feature, gain)
    return information_gains

### 4.2 信息增益率

In [20]:
def compute_information_gain_ratios(data, features, target, annotate = False):
    '''
    计算所有特征的信息增益率并保存起来
    '''
    gain_ratios = dict()
    for feature in features:
        left_split_target = data[data[feature] == 0][target]
        right_split_target =  data[data[feature] == 1][target]
        left_entropy = information_entropy(left_split_target)
        
        # 计算左子树的权重
        left_weight = len(left_split_target) / (len(left_split_target) + len(right_split_target))
        # 计算右子树的信息熵
        right_entropy = information_entropy(right_split_target)                             # YOUR CODE HERE
        # 计算右子树的权重
        right_weight = len(right_split_target) / (len(left_split_target) + len(right_split_target))                              # YOUR CODE HERE
        # 计算当前结点的信息熵
        current_entropy = information_entropy(data[target])
        # 计算当前结点的信息增益
        gain = current_entropy - (left_weight * left_entropy + right_weight * right_entropy)                                       # YOUR CODE HERE
        
        if left_weight == 0:
            left_IV = 0
        else:
            left_IV = left_weight * np.log2(left_weight)
        
        # 计算IV公式中，当前特征为1的值
        if right_weight == 0:
            right_IV = 0
        else:
            right_IV = right_weight * np.log2(right_weight)                              # YOUR CODE HERE

        IV = - (left_IV + right_IV)
        gain_ratio = gain / (IV + np.finfo(np.longdouble).eps)
        gain_ratios[feature] = gain_ratio
        if annotate:
            print(" ", feature, gain_ratio)
            
    return gain_ratios

### 4.3 基尼指数

In [22]:
def gini(labels_in_node):
    '''
    计算一个结点内样本的基尼指数
    '''
    num_of_samples = labels_in_node.shape[0]
    if num_of_samples == 0:
        return 0
    num_of_positive = len(labels_in_node[labels_in_node == 1])
    # 统计出-1的个数
    num_of_negative = len(labels_in_node[labels_in_node == -1])                    # YOUR CODE HERE
    # 统计正例的概率
    prob_positive = num_of_positive / num_of_samples
    # 统计负例的概率
    prob_negative = num_of_negative / num_of_samples                       # YOUR CODE HERE
    # 计算基尼值
    gini = 1 - prob_positive**2 - prob_negative**2                                # YOUR CODE HERE
    
    return gini

In [24]:
def compute_gini_indices(data, features, target, annotate = False):
    '''
    计算使用各个特征进行划分时，各特征的基尼指数
    '''
    
    gini_indices = dict()
    for feature in features:
        left_split_target = data[data[feature] == 0][target]
        right_split_target =  data[data[feature] == 1][target]
        left_gini = gini(left_split_target)
        
        # 计算左子树的权重
        left_weight = len(left_split_target) / (len(left_split_target) + len(right_split_target))
        # 计算右子树的基尼值
        right_gini = gini(right_split_target)                                  # YOUR CODE HERE
        # 计算右子树的权重
        right_weight = len(right_split_target) / (len(left_split_target) + len(right_split_target))                                # YOUR CODE HERE
        # 计算当前结点的基尼指数
        gini_index = left_weight * left_gini + right_weight * right_gini                                  # YOUR CODE HERE

        gini_indices[feature] = gini_index
        
        if annotate:
            print(" ", feature, gini_index)
            
    return gini_indices

## 5. 完成最优特征的选择 

In [26]:
def best_splitting_feature(data, features, target, criterion = 'gini', annotate = False):
    '''
    给定划分方法和数据，找到最优的划分特征
    
    '''
    if criterion == 'information_gain':
        if annotate:
            print('using information gain')
        information_gains = compute_information_gains(data, features, target, annotate)
        best_feature = max(information_gains.items(), key = lambda x: x[1])[0]
        return best_feature

    elif criterion == 'gain_ratio':
        if annotate:
            print('using information gain ratio')
        gain_ratios = compute_information_gain_ratios(data, features, target, annotate)
        # 根据这些特征和他们的信息增益率，找到最佳的划分特征
        best_feature = max(gain_ratios.items(), key=lambda item: item[1])[0]                                                   # YOUR CODE HERE
        return best_feature
    
    elif criterion == 'gini':
        if annotate:
            print('using gini')
        gini_indices = compute_gini_indices(data, features, target, annotate)
        # 根据这些特征和他们的基尼指数，找到最佳的划分特征
        best_feature = min(gini_indices.items(), key=lambda item: item[1])[0]                                                   # YOUR CODE HERE
        return best_feature
    else:
        raise Exception("传入的criterion不合规!", criterion)

## 6. 判断结点内样本的类别是否为同一类

In [27]:
def intermediate_node_num_mistakes(labels_in_node):
    '''
    求树的结点中，样本数少的那个类的样本有多少，比如输入是[1, 1, -1, -1, 1]，返回2
    
    '''
    if len(labels_in_node) == 0:
        return 0
    # 统计1的个数
    num_of_one = len(labels_in_node[labels_in_node == 1])                                    # YOUR CODE HERE
    # 统计-1的个数
    num_of_minus_one = len(labels_in_node[labels_in_node == -1])                              # YOUR CODE HERE
    return num_of_one if num_of_minus_one > num_of_one else num_of_minus_one

## 7. 创建叶子结点

In [29]:
def create_leaf(target_values):
    '''
    计算出当前叶子结点的标记是什么，并且将叶子结点信息保存在一个dict中
    '''
    # 创建叶子结点
    leaf = {'splitting_feature' : None,'left' : None,'right' : None,'is_leaf': True}
    num_ones = len(target_values[target_values == +1])
    num_minus_ones = len(target_values[target_values == -1])
    if num_ones > num_minus_ones:
        leaf['prediction'] = 1
    else:
        leaf['prediction'] = -1
    return leaf

## 8. 递归地创建决策树
递归的创建决策树  
递归算法终止的三个条件：
1. 如果结点内所有的样本的标记都相同，该结点就不需要再继续划分，直接做叶子结点即可
2. 如果结点所有的特征都已经在之前使用过了，在当前结点无剩余特征可供划分样本，该结点直接做叶子结点
3. 如果当前结点的深度已经达到了我们限制的树的最大深度，直接做叶子结点

**这里需要填写五个部分**

In [30]:
def decision_tree_create(data, features, target, criterion = 'gini', current_depth = 0, max_depth = 10, annotate = False):
    if criterion not in ['information_gain', 'gain_ratio', 'gini']:
        raise Exception("传入的criterion不合规!", criterion)
    remaining_features = features[:]
    target_values = data[target]
    print("-" * 50)
    print("Subtree, depth = %s (%s data points)." % (current_depth, len(target_values)))
    if intermediate_node_num_mistakes(target_values) == 0: 
        print("Stopping condition 1 reached.")
        return create_leaf(target_values)
    # 如果已经没有剩余的特征可供分割，即remaining_features为空
    if len(remaining_features) == 0:                                                                  # YOUR CODE HERE
        print("Stopping condition 2 reached.")
        return create_leaf(target_values)
    # 如果已经到达了我们要求的最大深度，即当前深度达到了最大深度
    if current_depth >= max_depth:                                                                 # YOUR CODE HERE
        print("Reached maximum depth. Stopping for now.")
        return create_leaf(target_values)
    splitting_feature = best_splitting_feature(data,features,target,criterion)
    # 左子树的数据
    left_split = data[data[splitting_feature] == 0]
    # 右子树的数据
    right_split = data[data[splitting_feature] == 1]                                                    # YOUR CODE HERE
    
    remaining_features.remove(splitting_feature)
    print("Split on feature %s. (%s, %s)" % (\
                      splitting_feature, len(left_split), len(right_split)))
    if len(left_split) == len(data):
        print("Creating leaf node.")
        return create_leaf(left_split[target])
    
    # 判断右子树是不是“完美”的
    if len(right_split) == len(data):
        print("Creating right node.")
        return create_leaf(right_split[target])                                                      # YOUR CODE HERE

    left_tree = decision_tree_create(left_split, remaining_features, target, criterion, current_depth + 1, max_depth, annotate)
    right_tree = decision_tree_create(right_split, remaining_features, target, criterion, current_depth + 1, max_depth, annotate)                                                   # YOUR CODE HERE
    return {'is_leaf': False,'prediction': None,'splitting_feature': splitting_feature,'left': left_tree,'right': right_tree}

训练一个模型

In [31]:
my_decision_tree = decision_tree_create(train_data, one_hot_features, target, 'gini', max_depth = 6, annotate = False)

--------------------------------------------------
Subtree, depth = 0 (73564 data points).
Split on feature term_ 36 months. (14831, 58733)
--------------------------------------------------
Subtree, depth = 1 (14831 data points).
Split on feature grade_F. (13003, 1828)
--------------------------------------------------
Subtree, depth = 2 (13003 data points).
Split on feature grade_E. (9818, 3185)
--------------------------------------------------
Subtree, depth = 3 (9818 data points).
Split on feature home_ownership_RENT. (6796, 3022)
--------------------------------------------------
Subtree, depth = 4 (6796 data points).
Split on feature grade_G. (6507, 289)
--------------------------------------------------
Subtree, depth = 5 (6507 data points).
Split on feature grade_D. (4368, 2139)
--------------------------------------------------
Subtree, depth = 6 (4368 data points).
Reached maximum depth. Stopping for now.
--------------------------------------------------
Subtree, depth = 6 

Split on feature emp_length_2 years. (7209, 1000)
--------------------------------------------------
Subtree, depth = 6 (7209 data points).
Reached maximum depth. Stopping for now.
--------------------------------------------------
Subtree, depth = 6 (1000 data points).
Reached maximum depth. Stopping for now.
--------------------------------------------------
Subtree, depth = 5 (5855 data points).
Split on feature emp_length_10+ years. (3802, 2053)
--------------------------------------------------
Subtree, depth = 6 (3802 data points).
Reached maximum depth. Stopping for now.
--------------------------------------------------
Subtree, depth = 6 (2053 data points).
Reached maximum depth. Stopping for now.
--------------------------------------------------
Subtree, depth = 3 (20502 data points).
Split on feature home_ownership_MORTGAGE. (10775, 9727)
--------------------------------------------------
Subtree, depth = 4 (10775 data points).
Split on feature home_ownership_OTHER. (10741,

现在，模型就训练好了

## 9. 预测

接下来我们需要完成预测函数

In [32]:
def classify(tree, x, annotate = False):
    '''
    递归的进行预测，一次只能预测一个样本
    '''
    if tree['is_leaf']:
        if annotate:
            print ("At leaf, predicting %s" % tree['prediction'])
        return tree['prediction']
    else:
        split_feature_value = x[tree['splitting_feature']]
        if annotate:
             print ("Split on %s = %s" % (tree['splitting_feature'], split_feature_value))
        if split_feature_value == 0:
            return classify(tree['left'], x, annotate)
        else:
            return classify(tree['right'], x, annotate)

我们取测试集第一个样本来测试

In [33]:
test_sample = test_data.iloc[0]
print(test_sample)

safe_loans                     1
grade_A                    False
grade_B                    False
grade_C                    False
grade_D                    False
grade_E                     True
grade_F                    False
grade_G                    False
term_ 36 months             True
term_ 60 months            False
home_ownership_MORTGAGE     True
home_ownership_OTHER       False
home_ownership_OWN         False
home_ownership_RENT        False
emp_length_1 year          False
emp_length_10+ years       False
emp_length_2 years          True
emp_length_3 years         False
emp_length_4 years         False
emp_length_5 years         False
emp_length_6 years         False
emp_length_7 years         False
emp_length_8 years         False
emp_length_9 years         False
emp_length_< 1 year        False
Name: 37225, dtype: object


In [34]:
print('True class: %s ' % (test_sample['safe_loans']))
print('Predicted class: %s ' % classify(my_decision_tree, test_sample))

True class: 1 
Predicted class: 1 


打印出使用决策树判断的过程

In [35]:
classify(my_decision_tree, test_sample, annotate=True)

Split on term_ 36 months = True
Split on grade_A = False
Split on grade_B = False
Split on grade_C = False
Split on home_ownership_MORTGAGE = True
Split on grade_G = False
At leaf, predicting 1


1

## 10. 在测试集上对我们的模型进行评估

In [36]:
from sklearn.metrics import accuracy_score
from sklearn.metrics import precision_score
from sklearn.metrics import recall_score
from sklearn.metrics import f1_score

先来编写一个批量预测的函数，传入的是整个测试集那样的pd.DataFrame，这个函数返回一个np.ndarray，存储模型的预测结果  

In [37]:
def predict(tree, data):
    '''
    按行遍历data，对每个样本进行预测，将值存在prediction中，最后返回np.ndarray
    '''
    predictions = np.zeros(len(data))
    for i in range(len(data)):
        predictions[i] = classify(tree,data.iloc[i])
    return predictions

## 11. 请你计算使用不同评价指标得到模型的四项指标的值，填写在下方表格内
**树的最大深度为6**  
**这里需要填写一个部分**

In [43]:

my_decision_tree_list = ['gini','information_gain','gain_ratio']
accuracy_scores = []
precision_scores = []
recall_scores = []
f1_scores = []
for term in my_decision_tree_list:
# YOUR CODE HERE
    # 构建决策树
    my_decision_tree = decision_tree_create(train_data, one_hot_features, target, term, max_depth=6, annotate=False)
    # 预测测试集
    predictions = predict(my_decision_tree, test_data)
    # 计算四项指标
    accuracy = accuracy_score(test_data[target], predictions)
    precision = precision_score(test_data[target], predictions)
    recall = recall_score(test_data[target], predictions)
    f1 = f1_score(test_data[target], predictions)
    
    accuracy_scores.append(accuracy)
    precision_scores.append(precision)
    recall_scores.append(recall)
    f1_scores.append(f1)

print("Accuracy Scores:", accuracy_scores)
print("Precision Scores:", precision_scores)
print("Recall Scores:", recall_scores)
print("F1 Scores:", f1_scores)

--------------------------------------------------
Subtree, depth = 0 (73564 data points).
Split on feature term_ 36 months. (14831, 58733)
--------------------------------------------------
Subtree, depth = 1 (14831 data points).
Split on feature grade_F. (13003, 1828)
--------------------------------------------------
Subtree, depth = 2 (13003 data points).
Split on feature grade_E. (9818, 3185)
--------------------------------------------------
Subtree, depth = 3 (9818 data points).
Split on feature home_ownership_RENT. (6796, 3022)
--------------------------------------------------
Subtree, depth = 4 (6796 data points).
Split on feature grade_G. (6507, 289)
--------------------------------------------------
Subtree, depth = 5 (6507 data points).
Split on feature grade_D. (4368, 2139)
--------------------------------------------------
Subtree, depth = 6 (4368 data points).
Reached maximum depth. Stopping for now.
--------------------------------------------------
Subtree, depth = 6 

Split on feature home_ownership_MORTGAGE. (5830, 7271)
--------------------------------------------------
Subtree, depth = 3 (5830 data points).
Split on feature emp_length_3 years. (5283, 547)
--------------------------------------------------
Subtree, depth = 4 (5283 data points).
Split on feature emp_length_1 year. (4705, 578)
--------------------------------------------------
Subtree, depth = 5 (4705 data points).
Split on feature emp_length_7 years. (4467, 238)
--------------------------------------------------
Subtree, depth = 6 (4467 data points).
Reached maximum depth. Stopping for now.
--------------------------------------------------
Subtree, depth = 6 (238 data points).
Reached maximum depth. Stopping for now.
--------------------------------------------------
Subtree, depth = 5 (578 data points).
Split on feature home_ownership_OTHER. (576, 2)
--------------------------------------------------
Subtree, depth = 6 (576 data points).
Reached maximum depth. Stopping for now.
-

Split on feature emp_length_7 years. (1081, 71)
--------------------------------------------------
Subtree, depth = 6 (1081 data points).
Reached maximum depth. Stopping for now.
--------------------------------------------------
Subtree, depth = 6 (71 data points).
Reached maximum depth. Stopping for now.
--------------------------------------------------
Subtree, depth = 5 (93 data points).
Split on feature grade_C. (93, 0)
Creating leaf node.
--------------------------------------------------
Subtree, depth = 3 (20502 data points).
Split on feature home_ownership_MORTGAGE. (10775, 9727)
--------------------------------------------------
Subtree, depth = 4 (10775 data points).
Split on feature emp_length_1 year. (9785, 990)
--------------------------------------------------
Subtree, depth = 5 (9785 data points).
Split on feature home_ownership_OTHER. (9754, 31)
--------------------------------------------------
Subtree, depth = 6 (9754 data points).
Reached maximum depth. Stopping fo

Split on feature home_ownership_OTHER. (3184, 1)
--------------------------------------------------
Subtree, depth = 6 (3184 data points).
Reached maximum depth. Stopping for now.
--------------------------------------------------
Subtree, depth = 6 (1 data points).
Stopping condition 1 reached.
--------------------------------------------------
Subtree, depth = 5 (2219 data points).
Split on feature emp_length_1 year. (2011, 208)
--------------------------------------------------
Subtree, depth = 6 (2011 data points).
Reached maximum depth. Stopping for now.
--------------------------------------------------
Subtree, depth = 6 (208 data points).
Reached maximum depth. Stopping for now.
--------------------------------------------------
Subtree, depth = 3 (637 data points).
Split on feature emp_length_3 years. (590, 47)
--------------------------------------------------
Subtree, depth = 4 (590 data points).
Split on feature emp_length_2 years. (541, 49)
--------------------------------

Split on feature home_ownership_OWN. (25, 4)
--------------------------------------------------
Subtree, depth = 4 (25 data points).
Split on feature home_ownership_MORTGAGE. (13, 12)
--------------------------------------------------
Subtree, depth = 5 (13 data points).
Split on feature grade_A. (13, 0)
Creating leaf node.
--------------------------------------------------
Subtree, depth = 5 (12 data points).
Split on feature grade_A. (12, 0)
Creating leaf node.
--------------------------------------------------
Subtree, depth = 4 (4 data points).
Split on feature grade_A. (4, 0)
Creating leaf node.
Accuracy Scores: [0.8117366392757376, 0.8122056154802928, 0.8117774198152642]
Precision Scores: [0.8128399100388468, 0.8122387390142941, 0.8124897892501225]
Recall Scores: [0.9980168193799422, 0.9999497928956947, 0.9987699259445212]
F1 Scores: [0.8959603357935657, 0.8963724740087312, 0.8960508090942874]


树的最大深度为6  

###### 双击此处编写

划分标准|精度|查准率|查全率|F1
-|-|-|-|-
信息增益|0.8117366392757376|0.8128399100388468|0.9980168193799422|0.8959603357935657
信息增益率|0.8122056154802928|0.8122387390142941|0.9999497928956947|0.8963724740087312
基尼指数|0.8117774198152642|0.8124897892501225|0.9987699259445212|0.8960508090942874