# <font color='blue'>Mean Machine: Decision Tree Part 2</font>

In this notebook, you will implement your own binary decision tree classifier. The tasks herewith are to
    
* Use pandas to do some feature engineering.
* Transform categorical variables into binary variables.
* Write a function to compute the number of misclassified examples in an intermediate node.
* Write a function to find the best feature to split on.
* Build a binary decision tree from scratch.
* Make predictions using the decision tree.
* Evaluate the accuracy of the decision tree.
* Visualize the decision at the root node.

**Important Note**: In this notebook, we will onlyfocus on building decision trees where the data contain **only binary (0 or 1) features**. This allows us to avoid dealing with:
* Multiple intermediate nodes in a split
* The thresholding issues of real-valued features.

# <font color='red'>Import all relevant packages</font>

In [2]:
import numpy as np     # 用来做数学运算
import pandas as pd    # 用来处理数据表
from sklearn.model_selection import train_test_split # 做交叉验证，划分训练集和测试集

# <font color='red'>Load Lending Club dataset</font>

We will use the same [LendingClub](https://www.lendingclub.com/) dataset as in the previous notebook.

In [3]:
loans = pd.read_csv('lending-club-data.csv', low_memory=False)
loans.head(3).append(loans.tail(3))

Unnamed: 0,id,member_id,loan_amnt,funded_amnt,funded_amnt_inv,term,int_rate,installment,grade,sub_grade,...,sub_grade_num,delinq_2yrs_zero,pub_rec_zero,collections_12_mths_zero,short_emp,payment_inc_ratio,final_d,last_delinq_none,last_record_none,last_major_derog_none
0,1077501,1296599,5000,5000,4975,36 months,10.65,162.87,B,B2,...,0.4,1.0,1.0,1.0,0,8.1435,20141201T000000,1,1,1
1,1077430,1314167,2500,2500,2500,60 months,15.27,59.83,C,C4,...,0.8,1.0,1.0,1.0,1,2.3932,20161201T000000,1,1,1
2,1077175,1313524,2400,2400,2400,36 months,15.96,84.33,C,C5,...,1.0,1.0,1.0,1.0,0,8.25955,20141201T000000,1,1,1
122604,9695736,11547808,8525,8525,8525,60 months,18.25,217.65,D,D3,...,0.6,0.0,1.0,1.0,0,6.95812,20190101T000000,0,1,0
122605,9684700,11536848,22000,22000,22000,60 months,19.97,582.5,D,D5,...,1.0,1.0,0.0,1.0,0,8.96154,20190101T000000,1,0,1
122606,9604874,11457002,2000,2000,2000,36 months,7.9,62.59,A,A4,...,0.8,0.0,1.0,1.0,0,0.904916,20170101T000000,0,1,1


Reassign the labels to have +1 for a safe loan, and -1 for a risky (bad) loan.

In [4]:
loans['safe_loans'] = loans['bad_loans'].apply(lambda x : +1 if x==0 else -1)
loans = loans.drop('bad_loans', 1)

In this notebook, we will just use 4 categorical features: 

1. grade of the loan 
2. the length of the loan term
3. the home ownership status: own, mortgage, rent
4. number of years of employment.

Since we are building a binary decision tree, we will have to convert these categorical features to a binary representation in a subsequent section using 1-hot encoding.

In [5]:
features = ['grade',              # grade of the loan
            'term',               # the term of the loan
            'home_ownership',     # home_ownership status: own, mortgage or rent
            'emp_length',         # number of years of employment
           ]
target = 'safe_loans'
loans = loans[features + [target]]

Let's explore what the dataset looks like.

In [6]:
loans.head(3).append(loans.tail(3))

Unnamed: 0,grade,term,home_ownership,emp_length,safe_loans
0,B,36 months,RENT,10+ years,1
1,C,60 months,RENT,< 1 year,-1
2,C,36 months,RENT,10+ years,1
122604,D,60 months,MORTGAGE,5 years,-1
122605,D,60 months,MORTGAGE,10+ years,-1
122606,A,36 months,OWN,3 years,1


## Subsample dataset to make sure classes are balanced

We will undersample the larger class (safe loans) in order to balance out our dataset. This means we are throwing away many data points. We use `seed=1` so everyone gets the same results.

In [7]:
safe_loans_raw = loans[loans[target] == +1]
risky_loans_raw = loans[loans[target] == -1]

ratio = len(risky_loans_raw)/float(len(safe_loans_raw))

risky_loans = risky_loans_raw
safe_loans = safe_loans_raw.sample(frac=ratio, random_state=1)

# Append the risky_loans with the downsampled version of safe_loans
loans_data = risky_loans.append(safe_loans)

N1 = len(safe_loans)
N2 = len(risky_loans)
N = N1 + N2
print( "%% of safe loans  : %.2f%%" %(N1/N*100.0) )
print( "%% of risky loans : %.2f%%" %(N2/N*100.0) )
print( "Total number of loans in our new dataset :", N )

% of safe loans  : 50.00%
% of risky loans : 50.00%
Total number of loans in our new dataset : 46300


## Transform categorical data into binary features

We will implement **binary decision trees** (decision trees for binary features, a specific case of categorical variables taking on two values, e.g., true/false). Since all of our features are currently categorical features, we want to turn them into binary features. 

For instance, the **home_ownership** feature represents the home ownership status of the loanee, which is either `own`, `mortgage` or `rent`. For example, if a data point has the feature 
```
   {'home_ownership': 'RENT'}
```
we want to turn this into three features: 
```
 { 
   'home_ownership = OWN'      : 0, 
   'home_ownership = MORTGAGE' : 0, 
   'home_ownership = RENT'     : 1
 }
```

In [8]:
categorical_variables = []
for feat_name, feat_type in zip(loans_data.columns.values,loans_data.dtypes):
    if feat_type == object:
        categorical_variables.append(feat_name)

for feature in categorical_variables:
    feat_value = loans_data[feature].unique()
    loans_data_one_hot_encoded = pd.DataFrame()
    for val in feat_value:
        label = feature + '.' + val
        loans_data_one_hot_encoded[label] = loans_data[feature].apply(lambda x: 1 if x == val else 0)
    loans_data = pd.concat([loans_data, loans_data_one_hot_encoded], axis=1)
loans_data = loans_data.drop(categorical_variables,axis=1)

loans_data.head(3).append(loans_data.tail(3))

Unnamed: 0,safe_loans,grade.C,grade.F,grade.B,grade.D,grade.A,grade.E,grade.G,term. 60 months,term. 36 months,...,emp_length.3 years,emp_length.10+ years,emp_length.1 year,emp_length.9 years,emp_length.2 years,emp_length.8 years,emp_length.7 years,emp_length.5 years,emp_length.n/a,emp_length.6 years
1,-1,1,0,0,0,0,0,0,1,0,...,0,0,0,0,0,0,0,0,0,0
6,-1,0,1,0,0,0,0,0,1,0,...,0,0,0,0,0,0,0,0,0,0
7,-1,0,0,1,0,0,0,0,1,0,...,0,0,0,0,0,0,0,0,0,0
90431,1,0,0,1,0,0,0,0,0,1,...,0,0,0,0,0,1,0,0,0,0
115727,1,0,1,0,0,0,0,0,1,0,...,1,0,0,0,0,0,0,0,0,0
105752,1,0,0,0,1,0,0,0,1,0,...,0,0,1,0,0,0,0,0,0,0


Let's see what the feature columns look like now:

In [9]:
features = loans_data.columns.values
features = features[features != target]
features

array(['grade.C', 'grade.F', 'grade.B', 'grade.D', 'grade.A', 'grade.E',
       'grade.G', 'term. 60 months', 'term. 36 months',
       'home_ownership.RENT', 'home_ownership.OWN',
       'home_ownership.MORTGAGE', 'home_ownership.OTHER',
       'emp_length.< 1 year', 'emp_length.4 years', 'emp_length.3 years',
       'emp_length.10+ years', 'emp_length.1 year', 'emp_length.9 years',
       'emp_length.2 years', 'emp_length.8 years', 'emp_length.7 years',
       'emp_length.5 years', 'emp_length.n/a', 'emp_length.6 years'], dtype=object)

In [10]:
print( "# of features (after one-hot encoding) = %s" % len(features) )

# of features (after one-hot encoding) = 25


## Train-test split

We split the data into a train test split with 80% of the data in the training set and 20% of the data in the test set. We use `random_state=0` so that everyone gets the same result.

In [11]:
(train_data, test_data) = train_test_split( loans_data, 
                          train_size=0.8, random_state=0 )
print( train_data.shape, test_data.shape )

(37040, 26) (9260, 26)


# <font color='red'>Decision tree implementation</font>

In this section, we will implement binary decision trees from scratch. There are several steps involved in building a decision tree. For that reason, we have split the entire assignment into several sections.

## Function to count number of mistakes while predicting majority class

Prediction at an intermediate node works by predicting the **majority class** for all data points that belong to this node.

Now, we will write a function that calculates the number of **missclassified examples** when predicting the **majority class**. This will be used to help determine which feature is the best to split on at a given node of the tree.

**Note**: Keep in mind that in order to compute the number of mistakes for a majority classifier, we only need the label (y values) of the data points in the node. 

** Steps to follow **:
* ** Step 1:** Calculate the number of safe loans and risky loans.
* ** Step 2:** Since we are assuming majority class prediction, all the data points that are **not** in the majority class are considered **mistakes**.
* ** Step 3:** Return the number of **mistakes**.


Now, let us write the function `count_num_mistakes` which computes the number of misclassified examples of an intermediate node given the set of labels (y values) of the data points contained in the node.

In [18]:
def count_num_mistakes( labels_in_node ):
    # Corner case: If labels_in_node is empty, return 0
    if len(labels_in_node) == 0:
        return 0
    
    # Count the number of 1's (safe loans)
    counter1 = (labels_in_node == 1).sum()
    
    # Count the number of -1's (risky loans)
    counter_1 = (labels_in_node == -1).sum()
                
    # Return the number of mistakes by majority rules
    return counter_1 if counter1 >= counter_1 else counter1 

Because there are several steps in this assignment, we have introduced some stopping points where you can check your code and make sure it is correct before proceeding. To test your `count_num_mistakes` function, run the following code until you get a **Test passed!**, then you should proceed. Otherwise, you should spend some time figuring out where things went wrong.

In [19]:
# Test case 1
example_labels = np.array([-1, -1, 1, 1, 1])
if count_num_mistakes(example_labels) == 2:
    print( 'Test passed!' )
else:
    print( 'Test 1 failed... try again!' )

# Test case 2
example_labels = np.array([-1, -1, 1, 1, 1, 1, 1])
if count_num_mistakes(example_labels) == 2:
    print( 'Test passed!' )
else:
    print( 'Test 2 failed... try again!' )
    
# Test case 3
example_labels = np.array([-1, -1, -1, -1, -1, 1, 1])
if count_num_mistakes(example_labels) == 2:
    print( 'Test passed!' )
else:
    print( 'Test 3 failed... try again!' )

Test passed!
Test passed!
Test passed!


## Function to pick best feature to split on

The function **best_splitting_feature** takes 3 arguments: 
1. The data (a dataframe which includes all of the feature columns and label column)
2. The features to consider for splits (a list of strings of column names to consider for splits)
3. The name of the target/label column (string)

The function will loop through the list of possible features, and consider splitting on each of them. It will calculate the classification error of each split and return the feature that had the smallest classification error when split on.

Recall that the **classification error** is defined as follows:
$$
\mbox{classification error} = \frac{\mbox{# mistakes}}{\mbox{# total examples}}
$$

Follow these steps: 
* **Step 1:** Loop over each feature in the feature list
* **Step 2:** Within the loop, split the data into two groups: one group where all of the data has feature value 0 or False (we will call this the **left** split), and one group where all of the data has feature value 1 or True (we will call this the **right** split). Make sure the **left** split corresponds with 0 and the **right** split corresponds with 1 to ensure your implementation fits with our implementation of the tree building process.
* **Step 3:** Calculate the number of misclassified examples in both groups of data and use the above formula to compute the **classification error**.
* **Step 4:** If the computed error is smaller than the best error found so far, store this **feature and its error**.


**Note:** Remember that since we are only dealing with binary features, we do not have to consider thresholds for real-valued features. This makes the implementation of this function much easier.

In [14]:
def best_splitting_feature( data, features, target ):
    
    best_feature = None # Keep track of the best feature 
    best_error = 10     # Keep track of the best error so far 
    # Note: Since error is always <= 1 so it's OK with some number > 1.

    # Convert to float to make sure error gets computed correctly.
    num_data_points = float(len(data))  
    
    # Loop through each feature to consider splitting on that feature
    for feature in features:
        
        # The left split will have all data points where the feature value is 0
        left_split = data[ data[feature] == 0 ]
        
        # The right split will have all data points where the feature value is 1
        right_split = data[ data[feature] == 1 ] 
            
        # Calculate the number of misclassified examples in the left split.
        left_mistakes = count_num_mistakes( left_split[target] )            

        # Calculate the number of misclassified examples in the right split.
        right_mistakes = count_num_mistakes( right_split[target] ) 
            
        # Compute the classification error of this split.
        # Error = (# of mistakes (left) + # of mistakes (right)) / (# of data points)
        error = (left_mistakes + right_mistakes) / num_data_points

        # If this is the best error we have found so far, 
        # store the feature as best_feature and the error as best_error
        if error < best_error:
            best_feature = feature
            best_error = error
    
    return best_feature # Return the best feature we found

## Building the tree

With the above functions implemented correctly, we are now ready to build our decision tree. Each node in the decision tree is represented as a dictionary which contains the following keys and possible values:

    { 
       'is_leaf'            : True/False.
       'prediction'         : Prediction at the leaf node.
       'left'               : (dictionary corresponding to the left tree).
       'right'              : (dictionary corresponding to the right tree).
       'splitting_feature'  : The feature that this node splits on.
    }

First, we will write a function that creates a leaf node given a set of target values.

In [20]:
def create_leaf( target_values ):

    leaf = {'splitting_feature' : None,
            'left' : None,
            'right' : None,
            'is_leaf': True }
    
    # Count the number of data points that are +1 and -1 in this node.
    num_ones = len( target_values[target_values == +1] )
    num_minus_ones = len( target_values[target_values == -1] )
    
    # For the leaf node, set the prediction to be the majority class.
    # Store the predicted class (1 or -1) in leaf['prediction']
    leaf['prediction'] = 1 if num_ones > num_minus_ones else -1
     
    return leaf

We have provided a function that learns the decision tree recursively and implements 3 stopping conditions:
1. **Stopping condition 1:** All data points in a node are from the same class.
2. **Stopping condition 2:** No more features to split on.
3. **Additional stopping condition:** Set **max_depth** of the tree. By not letting the tree grow too deep, we will save computational effort in the learning process. 

Now, we will write down the skeleton of the learning algorithm.

In [21]:
def decision_tree( data, features, target, current_depth = 0, max_depth = 10 ):
    remaining_features = features[:] # Make a copy of the features.
    
    target_values = data[target]
    print( "--------------------------------------------------------------------" )
    print( "Subtree, depth = %s (%s data points)." % (current_depth, len(target_values)) )
    
    # Stopping condition 1: Check if there are mistakes at current node.
    if count_num_mistakes(target_values) == 0:
        print( "Stopping condition 1 reached." )     
        # If not mistakes at current node, make current node a leaf node
        return create_leaf(target_values)
    
    # Stopping condition 2: Check if there are remaining features to consider splitting on
    if len(remaining_features) == 0:
        print( "Stopping condition 2 reached." )    
        # If there are no remaining features to consider, make current node a leaf node
        return create_leaf(target_values)    
    
    # Additional stopping condition (limit tree depth)
    if current_depth >= max_depth:
        print( "Reached maximum depth. Stopping for now." )
        # If the max tree depth has been reached, make current node a leaf node
        return create_leaf(target_values)

    # Find the best splitting feature
    splitting_feature = best_splitting_feature( data, features, target )
    
    # Split on the best feature that we found. 
    left_split = data[data[splitting_feature] == 0]
    right_split = data[data[splitting_feature] == 1]
    remaining_features = remaining_features[remaining_features != splitting_feature]
    print( "Split on feature %s. (%s, %s)" % (\
                      splitting_feature, len(left_split), len(right_split)) )
    
    # Create a leaf node if the split is "perfect"
    if len(left_split) == len(data):
        print( "Creating leaf node." )
        return create_leaf(left_split[target])
    if len(right_split) == len(data):
        print( "Creating leaf node." )
        return create_leaf(right_split[target])
     
    # Repeat on left and right subtrees
    left_tree = decision_tree( left_split, remaining_features, target, current_depth + 1, max_depth )        
    right_tree = decision_tree( right_split, remaining_features, target, current_depth + 1, max_depth )  

    return {'is_leaf'          : False, 
            'prediction'       : None,
            'splitting_feature': splitting_feature,
            'left'             : left_tree, 
            'right'            : right_tree}

Here is a recursive function to count the nodes in your tree:

In [22]:
def count_nodes( tree ):
    if tree['is_leaf']:
        return 1
    return 1 + count_nodes(tree['left']) + count_nodes(tree['right'])

Run the following test code to check your implementation. Make sure you get **'Test passed'** before proceeding.

In [23]:
small_data_decision_tree = decision_tree( train_data, features, 'safe_loans', max_depth = 3 )
if count_nodes(small_data_decision_tree) == 13:
    print( 'Test passed!' )
else:
    print( 'Test failed... try again!' )
    print( 'Number of nodes found                :', count_nodes(small_data_decision_tree) )
    print( 'Number of nodes that should be there : 13' )

--------------------------------------------------------------------
Subtree, depth = 0 (37040 data points).
Split on feature term. 60 months. (27703, 9337)
--------------------------------------------------------------------
Subtree, depth = 1 (27703 data points).
Split on feature grade.D. (23068, 4635)
--------------------------------------------------------------------
Subtree, depth = 2 (23068 data points).
Split on feature grade.E. (21809, 1259)
--------------------------------------------------------------------
Subtree, depth = 3 (21809 data points).
Reached maximum depth. Stopping for now.
--------------------------------------------------------------------
Subtree, depth = 3 (1259 data points).
Reached maximum depth. Stopping for now.
--------------------------------------------------------------------
Subtree, depth = 2 (4635 data points).
Split on feature grade.C. (4635, 0)
Creating leaf node.
--------------------------------------------------------------------
Subtree, dept

## Build the tree!

Now that all the tests are passing, we will train a tree model on the **train_data**. Limit the depth to 6 (**max_depth = 6**) to make sure the algorithm doesn't run for too long. Call this tree **my_decision_tree**. 

In [26]:
my_decision_tree = decision_tree( train_data, features, 'safe_loans', max_depth = 6 )

--------------------------------------------------------------------
Subtree, depth = 0 (37040 data points).
Split on feature term. 60 months. (27703, 9337)
--------------------------------------------------------------------
Subtree, depth = 1 (27703 data points).
Split on feature grade.D. (23068, 4635)
--------------------------------------------------------------------
Subtree, depth = 2 (23068 data points).
Split on feature grade.E. (21809, 1259)
--------------------------------------------------------------------
Subtree, depth = 3 (21809 data points).
Split on feature grade.C. (14639, 7170)
--------------------------------------------------------------------
Subtree, depth = 4 (14639 data points).
Split on feature grade.F. (14265, 374)
--------------------------------------------------------------------
Subtree, depth = 5 (14265 data points).
Split on feature grade.G. (14164, 101)
--------------------------------------------------------------------
Subtree, depth = 6 (14164 data 

## Making predictions with a decision tree

We can make predictions from the decision tree with a simple recursive function. Below, we call this function `classify`, which takes in a learned `tree` and a test point `x` to classify.  We include an option `annotate` that describes the prediction path when set to `True`.

In [28]:
def classify( tree, x, annotate = False ):   
    # if the node is a leaf node.
    if tree['is_leaf']:
        if annotate: 
            print( "At leaf, predicting %s" % tree['prediction'] )
        return tree['prediction'] 
    else:
        # split on feature.
        split_feature_value = x[tree['splitting_feature']]
        if annotate: 
            print( "Split on %s = %s" % (tree['splitting_feature'], split_feature_value) )
        if split_feature_value == 0:
            return classify( tree['left'], x, annotate )
        else:
            return classify( tree['right'], x, annotate )

Now, let's consider the first example of the test set and see what `my_decision_tree` model predicts for this data point.

In [29]:
test_data.iloc[0]

safe_loans                -1
grade.C                    0
grade.F                    0
grade.B                    0
grade.D                    0
grade.A                    0
grade.E                    1
grade.G                    0
term. 60 months            1
term. 36 months            0
home_ownership.RENT        0
home_ownership.OWN         1
home_ownership.MORTGAGE    0
home_ownership.OTHER       0
emp_length.< 1 year        0
emp_length.4 years         0
emp_length.3 years         1
emp_length.10+ years       0
emp_length.1 year          0
emp_length.9 years         0
emp_length.2 years         0
emp_length.8 years         0
emp_length.7 years         0
emp_length.5 years         0
emp_length.n/a             0
emp_length.6 years         0
Name: 16626, dtype: int64

In [30]:
classify( my_decision_tree, test_data.iloc[0], annotate=True )

Split on term. 60 months = 1
Split on grade.A = 0
Split on grade.C = 0
Split on grade.F = 0
Split on grade.B = 0
Split on grade.D = 0
At leaf, predicting -1


-1

## Evaluating your decision tree

Now, we will write a function to evaluate a decision tree by computing the classification error of the tree on the given dataset.

Again, recall that the **classification error** is defined as follows:
$$
\mbox{classification error} = \frac{\mbox{# mistakes}}{\mbox{# total examples}}
$$

Now, write a function called `misclassify_error` that takes in as input:
1. `tree` (as described above)
2. `data` (a dataframe)

This function should return a prediction (class label) for each row in `data` using the decision `tree`.

In [24]:
def misclassify_error( tree, data ):
    # Apply the classify(tree, x) to each row in your data
    prediction = data.apply( lambda x: classify(tree, x), axis=1 )
    true_label = data["safe_loans"]
    return (prediction!=true_label).sum() / float(len(prediction))

Now, let's use this function to evaluate the classification error on the test set.

In [31]:
misclassify_error( my_decision_tree, test_data )

0.39146868250539957

## Printing out a decision stump

We can print out a single decision stump 

In [33]:
def print_stump( tree, name = 'root' ):
    split_name = tree['splitting_feature']
    if split_name is None:
        print( "(leaf, label: %s)" % tree['prediction'] )
        return None
    split_feature, split_value = split_name.split('.')
    print( '                       %s' % name          )
    print( '         |---------------|----------------|')
    print( '         |                                |')
    print( '         |                                |')
    print( '         |                                |')
    print( '  [{0} == 0]               [{0} == 1]    '.format(split_name))
    print( '         |                                |')
    print( '         |                                |')
    print( '         |                                |')
    print( '    (%s)                         (%s)' \
        % (('leaf, label: ' + str(tree['left']['prediction']) \
            if tree['left']['is_leaf'] else 'subtree'),
           ('leaf, label: ' + str(tree['right']['prediction']) \
            if tree['right']['is_leaf'] else 'subtree')) )

In [34]:
print_stump( my_decision_tree )

                       root
         |---------------|----------------|
         |                                |
         |                                |
         |                                |
  [term. 60 months == 0]               [term. 60 months == 1]    
         |                                |
         |                                |
         |                                |
    (subtree)                         (subtree)


### Exploring the left subtree

The tree is a recursive dictionary, so we do have access to all the nodes! We can use
* `my_decision_tree['left']` to go left
* `my_decision_tree['right']` to go right

In [149]:
print_stump( my_decision_tree['left'], my_decision_tree['splitting_feature'] )

                       term. 60 months
         |---------------|----------------|
         |                                |
         |                                |
         |                                |
  [grade.D == 0]               [grade.D == 1]    
         |                                |
         |                                |
         |                                |
    (subtree)                         (leaf, label: -1)


### Exploring the left of left subtree

In [150]:
print_stump( my_decision_tree['left']['left'], my_decision_tree['left']['splitting_feature'] )

                       grade.D
         |---------------|----------------|
         |                                |
         |                                |
         |                                |
  [grade.E == 0]               [grade.E == 1]    
         |                                |
         |                                |
         |                                |
    (subtree)                         (leaf, label: -1)


### Exploring the right of left subtree

In [153]:
print_stump( my_decision_tree['left']['right'], my_decision_tree['left']['splitting_feature'] )

(leaf, label: -1)


### Exploring the right subtree

In [151]:
print_stump( my_decision_tree['right'], my_decision_tree['splitting_feature'] )

                       term. 60 months
         |---------------|----------------|
         |                                |
         |                                |
         |                                |
  [grade.A == 0]               [grade.A == 1]    
         |                                |
         |                                |
         |                                |
    (subtree)                         (subtree)


### Exploring the right of right subtree

In [152]:
print_stump( my_decision_tree['right']['right'], my_decision_tree['right']['splitting_feature'] )

                       grade.A
         |---------------|----------------|
         |                                |
         |                                |
         |                                |
  [emp_length.n/a == 0]               [emp_length.n/a == 1]    
         |                                |
         |                                |
         |                                |
    (subtree)                         (leaf, label: -1)


### Exploring the left of right subtree

In [154]:
print_stump( my_decision_tree['right']['left'], my_decision_tree['right']['splitting_feature'] )

                       grade.A
         |---------------|----------------|
         |                                |
         |                                |
         |                                |
  [grade.C == 0]               [grade.C == 1]    
         |                                |
         |                                |
         |                                |
    (subtree)                         (leaf, label: -1)
