# Module 8 - Programming Assignment

## Directions

1. Change the name of this file to be your JHED id as in `jsmith299.ipynb`. Because sure you use your JHED ID (it's made out of your name and not your student id which is just letters and numbers).
2. Make sure the notebook you submit is cleanly and fully executed. I do not grade unexecuted notebooks.
3. Submit your notebook back in Blackboard where you downloaded this file.

*Provide the output **exactly** as requested*

In [1079]:
from copy import deepcopy

## Decision Trees

For this assignment you will be implementing and evaluating a Decision Tree using the ID3 Algorithm (**no** pruning or normalized information gain). Use the provided pseudocode. The data is located at (copy link):

http://archive.ics.uci.edu/ml/datasets/Mushroom

**Just in case** the UCI repository is down, which happens from time to time, I have included the data and name files on Blackboard.

<div style="background: lemonchiffon; margin:20px; padding: 20px;">
    <strong>Important</strong>
    <p>
        No Pandas. The only acceptable libraries in this class are those contained in the `environment.yml`. No OOP, either. You can used Dicts, NamedTuples, etc. as your abstract data type (ADT) for the the tree and nodes.
    </p>
</div>

One of the things we did not talk about in the lectures was how to deal with missing values. There are two aspects of the problem here. What do we do with missing values in the training data? What do we do with missing values when doing classifcation?

For the first problem, C4.5 handled missing values in an interesting way. Suppose we have identifed some attribute *B* with values {b1, b2, b3} as the best current attribute. Furthermore, assume there are 5 observations with B=?, that is, we don't know the attribute value. In C4.5, those 5 observations would be added to *all* of the subsets created by B=b1, B=b2, B=b3 with decreased weights. Note that the observations with missing values are not part of the information gain calculation.

This doesn't quite help us if we have missing values when we use the model. What happens if we have missing values during classification? One approach is to prepare for this advance. When you train the tree, you need to add an implicit attribute value "?" at every split. For example, if the attribute was "size" then the domain would be ["small", "medium", "large", "?"]. The "?" value gets all the data (because ? is now a wildcard). However, there is an issue with this approach. "?" becomes the worst possible attribut value because it has no classification value. What to do? There are several options:

1. Never recurse on "?" if you do not also recurse on at least one *real* attribute value.
2. Limit the depth of the tree.

There are good reasons, in general, to limit the depth of a decision tree because they tend to overfit.
Otherwise, the algorithm *will* exhaust all the attributes trying to fulfill one of the base cases.

You must implement the following functions:

`train` takes training_data and returns the Decision Tree as a data structure. There are many options including namedtuples and just plain old nested dictionaries. **No OOP**.

```
def train(training_data, depth_limit=None):
   # returns the Decision Tree.
```

The `depth_limit` value defaults to None. (What technique would we use to determine the best parameter value for `depth_limit` hint: Module 3!)

`classify` takes a tree produced from the function above and applies it to labeled data (like the test set) or unlabeled data (like some new data).

```
def classify(tree, observations, labeled=True):
    # returns a list of classifications
```

`evaluate` takes a data set with labels (like the training set or test set) and the classification result and calculates the classification error rate:

$$error\_rate=\frac{errors}{n}$$

Do not use anything else as evaluation metric or the submission will be deemed incomplete, ie, an "F". (Hint: accuracy rate is not the error rate!).

`cross_validate` takes the data and uses 10 fold cross validation (from Module 3!) to `train`, `classify`, and `evaluate`. **Remember to shuffle your data before you create your folds**. I leave the exact signature of `cross_validate` to you but you should write it so that you can use it with *any* `classify` function of the same form (using higher order functions and partial application).

Following Module 3's discussion, `cross_validate` should print out the fold number and the evaluation metric (error rate) for each fold and then the average value (and the variance). What you are looking for here is a consistent evaluation metric cross the folds. You should print the error rates in terms of percents (ie, multiply the error rate by 100 and add "%" to the end).

```
def pretty_print_tree(tree):
    # pretty prints the tree
```

This should be a text representation of a decision tree trained on the entire data set (no train/test).

To summarize...

Apply the Decision Tree algorithm to the Mushroom data set using 10 fold cross validation and the error rate as the evaluation metric. When you are done, apply the Decision Tree algorithm to the entire data set and print out the resulting tree.

**Note** Because this assignment has a natural recursive implementation, you should consider using `deepcopy` at the appropriate places.

-----

In [1080]:
import math
import random
from typing import List, Dict, Tuple, Callable

In [1081]:
def parse_data(file_name: str) -> List[List]:
    data = []
    file = open(file_name, "r")
    for line in file:
        datum = [value for value in line.rstrip().split(",")]
        data.append(datum)
    random.shuffle(data)
    return data

In [1082]:
data = parse_data("agaricus-lepiota.data")

In [1083]:
len(data[0])

23

In [1084]:
len(data)

8124

In [1085]:
def create_folds(xs: List, n: int) -> List[List[List]]:
    k, m = divmod(len(xs), n)
    # be careful of generators...
    return list(xs[i * k + min(i, m):(i + 1) * k + min(i + 1, m)] for i in range(n))

In [1086]:
folds = create_folds(data, 10)

In [1087]:
len(folds)

10

In [1088]:
def create_train_test(folds: List[List[List]], index: int) -> Tuple[List[List], List[List]]:
    training = []
    test = []
    for i, fold in enumerate(folds):
        if i == index:
            test = fold
        else:
            training = training + fold
    return training, test

In [1089]:
training, test = create_train_test(folds, 0)

In [1090]:
len(training)

7311

In [1091]:
len(test)

813

### <a id="count_total"></a> count_total

Formal Parameters:

**data** the training set

**picked_attributes** The relevant attributes

**picked_vals** The relevant values of those attributes

**returns** Total, an int

This function counts the amount of rows in **data** whose attributes in **picked_attributes** are the values in **picked_vals**. This allows us to compare the entropy of the next potential attributes at any level in the tree in [build_entropy_dict](#build_entropy_dict).

In [1092]:
def count_total(data,picked_attributes=[],picked_vals=[]):
    total = 0
    for row in range(len(data)):
        to_continue = False
        for i in range(len(picked_attributes)):
            if data[row][picked_attributes[i]] != picked_vals[i]:
                to_continue = True
                break
        if to_continue:
            continue
        total += 1
    return total

In [1093]:
data1 = [
    ['a','b','c'],
    ['a','b','d'],
    ['a','e','b']
]
picked_attributes=[1]
picked_vals=['b']
assert(count_total(data1,picked_attributes,picked_vals) == 2)
picked_attributes=[2]
assert(count_total(data1,picked_attributes,picked_vals) == 1)
picked_attributes=[1,2]
picked_vals=['b','d']
assert(count_total(data1,picked_attributes,picked_vals) == 1)

### <a id="calculate_entropy"></a> calculate_entropy

Formal Parameters:

**x** a float in [0,1] or a list of such floats


**returns** entropy, a float in [0,1)

This function calculates the entropy of a list.  If **x** is a float, it calculates the entropy of **x** and its complement. This allows us to calculate the entropy of the next potential attributes at any level in the tree in [build_entropy_dict](#build_entropy_dict).

In [1094]:
def calculate_entropy(x):
    entropy = 0
    if type(x) == list:
        for i in x:
            try:
                entropy -= i*math.log2(i)
            except ValueError:
                pass
    else:
        try:
            entropy = -x*math.log2(x)-(1-x)*math.log2(1-x)
        except ValueError:
            return 0
    return entropy

In [1095]:
assert(calculate_entropy(1/2)==1)
assert(calculate_entropy(1)==0)
assert(calculate_entropy(0)==0)

### <a id="build_attribute_dict"></a> build_attribute_dict

Formal Parameters:

**data** the training set

**picked_attributes** The relevant attributes

**picked_vals** The relevant values of those attributes

**returns** attribute_dict, a dict with key:value pairs, attribute:{'p':`count of p`, 'e': `count of e`}

This function maps each attribute to the counts of each classification for the rows in **data** whose attributes in **picked_attributes** are the values in **picked_vals**. This allows us to compare the entropy of the next potential attributes at any level in the tree in [build_entropy_dict](#build_entropy_dict).

In [1096]:
def build_attribute_dict(data,picked_attributes=[],picked_vals=[]):
    total = count_total(data,picked_attributes,picked_vals)
    
    attribute_dict = {i:{} for i in range(1,len(data[0])) if i not in picked_attributes}
    for i in range(1,len(data[0])):
        if i in picked_attributes:
            continue
        for row in range(len(data)):
            to_continue = False
            for j in range(len(picked_attributes)):
                if data[row][picked_attributes[j]] != picked_vals[j]:
                    to_continue = True
                    break
            if to_continue:
                continue

            if data[row][i] not in attribute_dict[i].keys():
                attribute_dict[i][data[row][i]] = {'p':0,'e':0}

            if data[row][0] == 'p':
                attribute_dict[i][data[row][i]]['p']+=1
            else:
                attribute_dict[i][data[row][i]]['e']+=1
                    
    return attribute_dict

In [1097]:
test_data1 =[['p','a1','a2'],
             ['p','b1','b2'],
             ['e','c1','a2'],
             ['p','c1','a2']
             ]
test_att_dict = build_attribute_dict(test_data1)
assert(test_att_dict == {1:{'a1':{'p':1,'e':0},'b1':{'p':1,'e':0},
                            'c1':{'p':1,'e':1}},2:{'a2':{'p':2,'e':1},'b2':{'p':1,'e':0}}})

picked_attributes=[1]
picked_vals=['c1']

test_att_dict2 = build_attribute_dict(test_data1,picked_attributes,picked_vals)
print(test_att_dict2)
assert(test_att_dict2 == {2:{'a2':{'p':1,'e':1}}})

{2: {'a2': {'p': 1, 'e': 1}}}


### <a id="build_entropy_dict"></a> build_entropy_dict

Formal Parameters:

**attribute_dict** the map built in [build_attribute_dict](#build_attribute_dict)

**total** The total amount of remaining rows in the subtree

**returns** entropy_dict, a dict with key:value pairs, attribute:entropy

This function maps each attribute to its entropy at each level in the subtree. It can then be used to [pick the best attribute](#pick_best_attribute).

In [1098]:
def build_entropy_dict(attribute_dict,total):
    entropy_dict = {}
    for k in attribute_dict.keys():
        entropy = 0
        for c in attribute_dict[k].keys():
            t = attribute_dict[k][c]['p']+attribute_dict[k][c]['e']
            e = calculate_entropy(attribute_dict[k][c]['p']/t)
            entropy += e*t/total
        entropy_dict[k] = entropy
    return entropy_dict

In [1099]:
ed = build_entropy_dict(test_att_dict,4)
print(ed)

{1: 0.5, 2: 0.6887218755408672}


### <a id="build_attribute_dict"></a> build_attribute_dict

Formal Parameters:

**data** the training set

**picked_attributes** The relevant attributes

**picked_vals** The relevant values of those attributes

**returns** best_attribute, the int of the column in data containing the next best attribute to decide on.

This picks the next best attribute (the one with the lowest entropy) in **data** whose attributes in **picked_attributes** are the values in **picked_vals**. This allows us to pick the next best attribute to build the subtree out of in [id3](#id3).

In [1100]:
def pick_best_attribute(data,picked_attributes=[],picked_vals=[]):
    best_attribute = None
    best_value = 2
    attribute_dict = build_attribute_dict(data,picked_attributes,picked_vals)

    total = count_total(data,picked_attributes,picked_vals)
    entropy_dict = build_entropy_dict(attribute_dict,total)
    for k in entropy_dict.keys():
        if entropy_dict[k] < best_value:
            best_attribute = k
            best_value = entropy_dict[k]
                
    return best_attribute
                
        
    

In [1101]:
assert(pick_best_attribute(test_data1) == 1)
picked_attributes=[1]
picked_vals=['c1']
assert(pick_best_attribute(test_data1,picked_attributes,picked_vals) == 2)

### <a id="check_homogeneous"></a> check_homogeneous

Formal Parameters:

**data** the training set

**picked_attributes** The relevant attributes

**picked_vals** The relevant values of those attributes

**returns** A tuple of form (boolean,string) where the boolean determines if the remaining data for the subtree is homogeneous, and the string is the majority classification, regardless of if the subtree is homogeneous

This checks to see if the classifications in **data** whose attributes in **picked_attributes** are the values in **picked_vals**. This allows us to end recursion in a subtree in [id3](#id3).  This function is also used when the depth limit is reached or there are no more attributes remaining.  So, the majority classification will be returned to serve the pseudocode.  I disagree with this approach, because we are trying to classify mushrooms as poisonous or edible, and any uncertainty should default to poisonous.  By switching the `'e'` to a `'p'` in the final return statement, we can see the effects of this approach.  With this particular data set, shallow trees will have much greater error rates, but at a depth level of at least 4, the error rate drops to 0, regardless of the approach.  This is simply due to the fact that a decision tree can be made out of finite data, and is not an endorsement of the approach.

In [1102]:
def check_homogeneous(data,picked_attributes,picked_vals):
    p_count = 0
    s_count = 0
    homogeneous = True
    is_first_row = True
    first_row = []
    for row in data:
        to_continue = False
        for i in range(len(picked_attributes)):
            if row[picked_attributes[i]]!=picked_vals[i]:
                to_continue = True
        if to_continue:
            continue
        if is_first_row:
            first_row = deepcopy(row)
            for v in picked_vals:
                first_row.remove(v)
            is_first_row = False
            first_row = first_row[1:]
        else:
            next_row = deepcopy(row)
            for v in picked_vals:
                next_row.remove(v)
            next_row = next_row[1:]
            if next_row != first_row:
                homogeneous = False
        if row[0] == 'p':
            p_count+=1
        else:
            s_count+=1
    if p_count ==0:
        return True, 'e'
    if s_count ==0:
        return True, 'p'
    if p_count >= s_count:
        return homogeneous, 'p'
    return homogeneous, 'e'

In [1103]:
data1 = [
    ['p','b','c'],
    ['p','b','d'],
    ['p','e','b']
]
picked_attributes=[]
picked_vals=[]

assert(check_homogeneous(data1,picked_attributes,picked_vals) == (True,'p'))
data1 = [
    ['p','b','c'],
    ['p','b','b'],
    ['s','e','b']
]

assert(check_homogeneous(data1,picked_attributes,picked_vals) == (False,'p'))
picked_attributes=[1]
picked_vals=['b']
assert(check_homogeneous(data1,picked_attributes,picked_vals) == (True,'p'))
picked_attributes=[2]
assert(check_homogeneous(data1,picked_attributes,picked_vals) == (False,'p'))

### <a id="id3"></a> id3

Formal Parameters:

**data** the training set

**picked_attributes** The relevant attributes

**picked_vals** The relevant values of those attributes

**depth** The remaining allowable recursive depth of the tree

**returns** subtree, a dict with key:value pairs `(attribute,value):x`, where `x` is a subtree of that subtree, or `'p'` or `'e'`.

This recursively builds the decision tree that will be used to [classify](#classify) a mushroom as poisonous (p) or edible (e).  All defaults or missing values map to `'p'`.

In [1104]:
def id3(data, picked_attributes=[], picked_vals=[],depth = float('inf')):
    subtree={}
    if len(data) == 0:
        return 'p'
    homogeneous = check_homogeneous(data, picked_attributes, picked_vals)
    
    if homogeneous[0] or depth == 0:
        return homogeneous[1]
        
    if len(data[0]) == len(picked_attributes) + 1:
        return homogeneous[1]

    best_attribute = pick_best_attribute(data, picked_attributes, picked_vals)
    for row in data:
        to_continue = False
        for i in range(len(picked_attributes)):
            if row[picked_attributes[i]] != picked_vals[i]:
                to_continue = True
                break
        if to_continue:
            continue
        if not (best_attribute,row[best_attribute]) in subtree.keys():
            p = deepcopy(picked_attributes)
            v = deepcopy(picked_vals)
            p.append(best_attribute)
            v.append(row[best_attribute])
            if row[best_attribute] == '?':
                subtree[(best_attribute,row[best_attribute])] = 'p'
            else:
                subtree[(best_attribute,row[best_attribute])] = id3(data, p, v,depth-1)
    return subtree

In [1105]:
print(id3(test_data1))
test_data2 =[['p','a1','a2','a3'],
             ['p','b1','b2','a3'],
             ['e','c1','a2','b3'],
             ['p','c1','a2','a3'],
             ['e','d1','c2','b3']
             ]
assert(id3(test_data2) == {(3,'a3'):'p',(3,'b3'):'e'})

test_data3 =[['p','a1','a2','a3','a4'],
             ['p','c1','b2','a3','d4'],
             ['e','a1','a2','b3','a4'],
             ['e','c1','a2','a3','d4'],
             ['p','c1','c2','b3','e4'],
             ['e','c1','c2','b3','e4']
             ]
assert(id3(test_data3,depth=2)!=id3(test_data3,depth=3))
assert(id3(test_data3,depth=2)!=id3(test_data3))
print(id3(test_data3))

{(1, 'a1'): 'p', (1, 'b1'): 'p', (1, 'c1'): 'p'}
{(2, 'a2'): {(1, 'a1'): {(3, 'a3'): 'p', (3, 'b3'): 'e'}, (1, 'c1'): 'e'}, (2, 'b2'): 'p', (2, 'c2'): 'p'}


### <a id="tree_fixer"></a> tree_fixer

Formal Parameters:

**tree** A decision tree, with `type(tree) == dict` evaluates to `True`

**returns** A fixed version of **tree**

This helper function fixes an issue with homogeneity that isn't addressed in [id3](#id3).  Specifically, it turns subtrees of the form `{(1,'a'):{(2,'b'):'p',(2,'c'):'p'}, (1,'d'):'e'}` to the form `{(1,'a'):'p',(1,'d'):'e'}`, since [check_homogeneous](#check_homogeneous) doesn't account for homogeneity of remaining values.  This issue only seemed to arise at the first depth of the tree, so it is likely that **depth_limit** in [train](#train) will produce trees that are 1 level shallower than intended.

In [1106]:
def tree_fixer(tree):
    p_count = 0
    s_count = 0
    for k in tree.keys():
        if type(tree[k]) == dict:
            tree[k] = tree_fixer(tree[k])
        if tree[k] == 'p':
            p_count +=1
        if tree[k] == 'e':
            s_count +=1
    for k in tree.keys():
        if type(tree[k]) == dict:
            return tree
    if s_count == 0:
        return 'p'
    if p_count == 0:
        return 'e'
    return tree

In [1107]:
assert(tree_fixer(id3(test_data1))=='p')
bad_tree1 = {'a': {'b': 'p','e':'p'}, 'c':{'d':'e','h':'e'}}
assert(tree_fixer(bad_tree1)=={'a':'p','c':'e'})

### <a id="pretty_print_tree"></a> pretty_print_tree

Formal Parameters:

**tree** A decision tree, with `type(tree) == dict` evaluates to `True`

**returns** `None`

**prints** A string version of **tree**

This function prints the decision tree in a way that I think is pretty.  Child trees are located below and to the right of their parent tree.  Siblings are located directly vertical from each other.  Leaves are located to the very left.

In [1108]:
def pretty_print_tree(tree):
    tree_str = str(tree)
    space = " "
    c = 0
    while c < len(tree_str):
        if tree_str[c] == '{':
            c+=1
            continue
        elif tree_str[c] ==':':
            print()
            space += " "*5
        elif tree_str[c] =='}':
            space = space[:len(space)-5]
            print()
        elif tree_str[c] =='(':
            print(space,end = '') != ':'
            while tree_str[c] != ':':
                print(tree_str[c],end='')
                c+=1
            print(tree_str[c],end='')
            print()
            space += " "*5
        elif tree_str[c] =='\'':
            print(tree_str[c],end='')
            c+=1
            print(tree_str[c],end='')
            c+=1
            print(tree_str[c],end='')
            print()
            
        c+=1
        
    
    

In [1109]:
pretty_print_tree(tree_fixer(id3(data1)))



 (1, 'b'):
'p'
      (1, 'e'):
'e'



### <a id="train"></a> train

Formal Parameters:

**training_data** A list of lists of data to train a tree on

**depth_limit**  `None`, or the maximum depth of the tree 

**returns** A decision tree

This function creates the tree from the training data.  [tree_fixer](#tree_fixer) may cause the recursion to be off by one.

In [1110]:
def train(training_data,depth_limit = None):
    if depth_limit == None:
        return tree_fixer(id3(training_data))
    return tree_fixer(id3(training_data,depth = depth_limit))
    
    

### <a id="classify"></a> classify

Formal Parameters:

**tree** A decision tree as a nested dict

**observations**  a list of attribute values observed

**labeled** Whether the observations are labeled with a classification.

**returns** `'p'` or `'e'`, if the mushroom is poisonous or edible, based on the observations.

This classifies observations based on a decision tree.  The labeled parameter helps determine if we are testing or not

In [1111]:
def classify(tree, observations, labeled=True):
    observations_copy = deepcopy(observations)
    if not labeled:
        observations_copy = [''] + observations_copy
    classification = tree
    while type(classification) != str:
        broken = False
        for i in range(1, len(observations_copy)):
            for o in classification.keys():
                if o[0] != i:
                    break
                if o[1] == observations_copy[i]:
                    classification = classification[o]
                    broken = True
                    break
            if broken:
                break
        if not broken:
            return 'p'
    return classification

In [1112]:
tree_test1 = {'a':{'b':'c','d':{'e':'f'}},'g':'h'}
observation1 = ['a','d','e']
print(classify(tree_test1,observation1,False))

p


### <a id="evaluate"></a> evaluate

Formal Parameters:

**tree** A decision tree as a nested dict

**test**  A list of lists of test data

**model** A higher order function, but actually just [classify](#classify)

**returns** The error rate: amount of errors/total

This determines the error rate of the tree on the test data.

In [1113]:
def evaluate(tree,test,model):
    total = len(test)
    error = 0
    for data_point in test:
        prediction = model(tree,data_point)
        actual = data_point[0]
        if actual!=prediction:
            error+=1
    
    rate = error/total
    return rate


### <a id="evaluate"></a> evaluate

Formal Parameters:

**folds** A decision tree as a nested dict

**depth_limit**  The maximum depth of the tree

**labeled** Whether the data is labeled or not.  Used by [classify](#classify)

**returns** The average error rate of 10 folds of cross validation

**prints** Average error rate, variance

This determines the average error rate and variance of the different trees over 10 folds of cross validation

In [1114]:
def cross_validate(folds,depth_limit=None,labeled=True):
    print("tree evaluation with "+ str(depth_limit)+ " depth limit")
    total = 0
    rates = []
    variance = 0
    for i in range(10):
        training, test = create_train_test(folds, i)
        tree = train(training,depth_limit)
        error = evaluate(tree,test,classify)
        print("fold "+str(i)+ " error rate: " + str(100*error)+"%")
        total += error
        rates.append(error)
      
    mean = total/10
    for r in rates:
        variance += (r-mean)**2
        
    variance /=9
        
        

    print("mean error rate: " + str(100*mean)+"%")
    print("variance: " + str(100*variance) +"%")
    return 100*total/10

In [1115]:
folds = create_folds(data,10)
cross_validate(folds)

tree evaluation with None depth limit
fold 0 error rate: 0.0%
fold 1 error rate: 0.0%
fold 2 error rate: 0.0%
fold 3 error rate: 0.0%
fold 4 error rate: 0.0%
fold 5 error rate: 0.0%
fold 6 error rate: 0.0%
fold 7 error rate: 0.0%
fold 8 error rate: 0.0%
fold 9 error rate: 0.0%
mean error rate: 0.0%
variance: 0.0%


0.0

### <a id="depth_tuner"></a> depth_tuner

Formal Parameters:

**upper_bound** The max bound to tune the max depth of the tree to

**folds** The training data

**returns** The depth with the least average error. Breaks ties by minimum max depth

**prints** A summary of what is going on

This tunes the depth hyperparameter to make the best tree.

In [1116]:
def depth_tuner(upper_bound,folds):
    print("Classification evaluation for trees of different depths")
    best_depth = 0
    smallest_error_rate = float('inf')

    for j in range(1,upper_bound):
        print("depth = " + str(j))
        k = cross_validate(folds,j)
        if k < smallest_error_rate:
            smallest_error_rate = k
            best_depth = j
            
    print("best depth is " + str(best_depth)+ " with error rate " + str(smallest_error_rate) +"%")
    return best_depth
        

        

folds = create_folds(data,10)
depth_tuner(23,folds)

Classification evaluation for trees of different depths
depth = 1
tree evaluation with 1 depth limit
fold 0 error rate: 1.4760147601476015%
fold 1 error rate: 0.984009840098401%
fold 2 error rate: 1.3530135301353015%
fold 3 error rate: 1.968019680196802%
fold 4 error rate: 1.2315270935960592%
fold 5 error rate: 1.600985221674877%
fold 6 error rate: 0.9852216748768473%
fold 7 error rate: 1.7241379310344827%
fold 8 error rate: 1.2315270935960592%
fold 9 error rate: 2.2167487684729066%
mean error rate: 1.4771205593829337%
variance: 0.001650611825984985%
depth = 2
tree evaluation with 2 depth limit
fold 0 error rate: 0.4920049200492005%
fold 1 error rate: 0.6150061500615006%
fold 2 error rate: 0.12300123001230012%
fold 3 error rate: 0.984009840098401%
fold 4 error rate: 0.12315270935960591%
fold 5 error rate: 0.7389162561576355%
fold 6 error rate: 0.6157635467980296%
fold 7 error rate: 0.6157635467980296%
fold 8 error rate: 0.3694581280788177%
fold 9 error rate: 1.2315270935960592%
mean er

4

In [1117]:
pretty_print_tree(tree_fixer(id3(data,depth=4)))

 (5, 'n'):
      (20, 'k'):
'e'
           (20, 'n'):
'e'
                (20, 'w'):
                     (22, 'g'):
'e'
                          (22, 'p'):
'e'
                               (22, 'l'):
                                    (3, 'y'):
'p'
                                         (3, 'n'):
'e'
                                              (3, 'c'):
'e'
                                                   (3, 'w'):
'p'

                                                   (22, 'w'):
'e'
                                                        (22, 'd'):
                                                             (8, 'n'):
'p'
                                                                  (8, 'b'):
'e'


                                                             (20, 'b'):
'e'
                                                                  (20, 'r'):
'p'
                                                                       (20, 'o'):
'e'
                                

## Before You Submit...

1. Did you provide output exactly as requested?
2. Did you re-execute the entire notebook? ("Restart Kernel and Rull All Cells...")
3. If you did not complete the assignment or had difficulty please explain what gave you the most difficulty in the Markdown cell below.
4. Did you change the name of the file to `jhed_id.ipynb`?

Do not submit any other files.