# Module 8 - Programming Assignment

## Directions

1. Change the name of this file to be your JHED id as in `jsmith299.ipynb`. Because sure you use your JHED ID (it's made out of your name and not your student id which is just letters and numbers).
2. Make sure the notebook you submit is cleanly and fully executed. I do not grade unexecuted notebooks.
3. Submit your notebook back in Blackboard where you downloaded this file.

*Provide the output **exactly** as requested*

In [1]:
from copy import deepcopy
from typing import List, Dict, Tuple, Callable
import random
import math
import json

## Decision Trees

For this assignment you will be implementing and evaluating a Decision Tree using the ID3 Algorithm (**no** pruning or normalized information gain). Use the provided pseudocode. The data is located at (copy link):

http://archive.ics.uci.edu/ml/datasets/Mushroom

**Just in case** the UCI repository is down, which happens from time to time, I have included the data and name files on Blackboard.

<div style="background: lemonchiffon; margin:20px; padding: 20px;">
    <strong>Important</strong>
    <p>
        No Pandas. The only acceptable libraries in this class are those contained in the `environment.yml`. No OOP, either. You can used Dicts, NamedTuples, etc. as your abstract data type (ADT) for the the tree and nodes.
    </p>
</div>

One of the things we did not talk about in the lectures was how to deal with missing values. There are two aspects of the problem here. What do we do with missing values in the training data? What do we do with missing values when doing classifcation?

For the first problem, C4.5 handled missing values in an interesting way. Suppose we have identifed some attribute *B* with values {b1, b2, b3} as the best current attribute. Furthermore, assume there are 5 observations with B=?, that is, we don't know the attribute value. In C4.5, those 5 observations would be added to *all* of the subsets created by B=b1, B=b2, B=b3 with decreased weights. Note that the observations with missing values are not part of the information gain calculation.

This doesn't quite help us if we have missing values when we use the model. What happens if we have missing values during classification? One approach is to prepare for this advance. When you train the tree, you need to add an implicit attribute value "?" at every split. For example, if the attribute was "size" then the domain would be ["small", "medium", "large", "?"]. The "?" value gets all the data (because ? is now a wildcard). However, there is an issue with this approach. "?" becomes the worst possible attribut value because it has no classification value. What to do? There are several options:

1. Never recurse on "?" if you do not also recurse on at least one *real* attribute value.
2. Limit the depth of the tree.

There are good reasons, in general, to limit the depth of a decision tree because they tend to overfit.
Otherwise, the algorithm *will* exhaust all the attributes trying to fulfill one of the base cases.

You must implement the following functions:

`train` takes training_data and returns the Decision Tree as a data structure. There are many options including namedtuples and just plain old nested dictionaries. **No OOP**.

```
def train(training_data, depth_limit=None):
   # returns the Decision Tree.
```

The `depth_limit` value defaults to None. (What technique would we use to determine the best parameter value for `depth_limit` hint: Module 3!)

`classify` takes a tree produced from the function above and applies it to labeled data (like the test set) or unlabeled data (like some new data).

```
def classify(tree, observations, labeled=True):
    # returns a list of classifications
```

`evaluate` takes a data set with labels (like the training set or test set) and the classification result and calculates the classification error rate:

$$error\_rate=\frac{errors}{n}$$

Do not use anything else as evaluation metric or the submission will be deemed incomplete, ie, an "F". (Hint: accuracy rate is not the error rate!).

`cross_validate` takes the data and uses 10 fold cross validation (from Module 3!) to `train`, `classify`, and `evaluate`. **Remember to shuffle your data before you create your folds**. I leave the exact signature of `cross_validate` to you but you should write it so that you can use it with *any* `classify` function of the same form (using higher order functions and partial application).

Following Module 3's discussion, `cross_validate` should print out the fold number and the evaluation metric (error rate) for each fold and then the average value (and the variance). What you are looking for here is a consistent evaluation metric cross the folds. You should print the error rates in terms of percents (ie, multiply the error rate by 100 and add "%" to the end).

```
def pretty_print_tree(tree):
    # pretty prints the tree
```

This should be a text representation of a decision tree trained on the entire data set (no train/test).

To summarize...

Apply the Decision Tree algorithm to the Mushroom data set using 10 fold cross validation and the error rate as the evaluation metric. When you are done, apply the Decision Tree algorithm to the entire data set and print out the resulting tree.

**Note** Because this assignment has a natural recursive implementation, you should consider using `deepcopy` at the appropriate places.

-----

In [2]:
# getting data ready
mushroom_cols = [
    "cap-shape"
    ,"cap-surface"
    ,"cap-color"
    ,"bruises"
    ,"odor"
    ,"gill-attachment"
    ,"gill-spacing"
    ,"gill-size"
    ,"gill-color"
    ,"stalk-shape"
    ,"stalk-root"
    ,"stalk-surface-above-ring"
    ,"stalk-surface-below-ring"
    ,"stalk-color-above-ring"
    ,"stalk-color-below-ring"
    ,"veil-type"
    ,"veil-color"
    ,"ring-number"
    ,"ring-type"
    ,"spore-print-color"
    ,"population"
    ,"habitat"
    ,"edibility"
]

self_check =[[ 'Shape', 'Size', 'Color', 'Safe?'],
 ['round', 'large', 'blue', 'no'],
 [ 'square', 'large', 'red', 'no'],
 ['round', 'large', 'green', 'yes'],
 ['square', 'large', 'green', 'yes'],
 [ 'square', 'large', 'green', 'yes'],
 [ 'square', 'large', 'green', 'yes'],
 ['round', 'large', 'red', 'yes'],
 [ 'round', 'large', 'red', 'yes'],
 [ 'round', 'small', 'blue', 'no'],
 ['square', 'small', 'blue', 'no'],
 ['round', 'small', 'green', 'no'],
 [ 'square', 'small', 'green', 'no'],
 ['square', 'small', 'red', 'no'],
 [ 'square', 'small', 'red', 'no'],
 ['round', 'small', 'red', 'yes']]

<a id="parse_data"></a>
## parse_data

- Reads in a comma separated file into a nested list
- Stores the label column as the very last column
- Function mostly resued from mod 3

* **file_name** str: path to where file is located
* **class_index** int: index of where label field is in the file

**returns** List[List[]]: data stored in a nest list

In [3]:
def parse_data(file_name: str, class_index:int) -> List[List]:
    data = []
    file = open(file_name, "r")
    for line in file:
        datum = [value for value in line.rstrip().split(",")]
        data.append(datum)
    random.shuffle(data)
    for row in data:
        #swap
        label = row[class_index]
        row.pop(class_index)
        row.append(label)

    return data

In [4]:
# unit tests
data = parse_data("agaricus-lepiota.data",0)

# verify all observations are present
assert len(data) ==8124 

#verify all attributes and class cols are present
assert len(data[0]) == 23

# verify moved class/label col is last column
for row in data[1:]:
    assert row[0] not in ['p','e'] # first col is cap-shape, doesnt have values e or p
    assert row[-1] in ['p','e'] # label/class only takes e or p value

<a id="create_folds"></a>
## create_folds

- Resued from mod 3
- Creates folds from the data. Fold number based on parameter

* **xs** List[List[]]: list of to perform cross validation on
* **n** int: number of folds

**returns** List[List[float]]: normalized data set

In [5]:
def create_folds(xs: List[List], n: int) -> List[List[List]]:
    k, m = divmod(len(xs), n)
    # be careful of generators...
    return list(xs[i * k + min(i, m):(i + 1) * k + min(i + 1, m)] for i in range(n))

In [6]:
# make unit tests
folds = create_folds(data, 10)

#verify a list is returned
assert type(folds) == list

# no other unit tests since this was used in mod 3. 

<a id="create_train_test"></a>
## create_train_test

- Mostly resused function from Mod 3
- Creates training and test data based on folds
- also for both training and test set, the column names are added to the first(index 0) row of the list

* **folds**: List[List[List]]: data to split
* **index** : index of fold for splitting
* **cols_names**: list of column names

**returns** Tuple[List[List], List[List]]: returns training and test data

In [7]:
def create_train_test(folds: List[List[List]], index: int, cols_names:List[str]) -> Tuple[List[List], List[List]]:
    training = []
    test = []
    for i, fold in enumerate(folds):
        if i == index:
            test = fold
        else:
            training = training + fold
    # add column names
    training.insert(0,cols_names)
    test_copy = deepcopy(test)
    test_copy.insert(0,cols_names)
    return training, test_copy

In [8]:
train, test = create_train_test(folds, 0,mushroom_cols)

# verify first row is the column names
assert train[0] == test[0] == mushroom_cols

# verify train is 9/10 of data
assert len(train) == math.floor(len(data) * 9/10) +1 # round for the col name


# train and test should be the same size as data
assert len(test) + len(train) == len(data) +2 # 2 col rows


<a id="entropy"></a>
## entropy

- Calculates the entropy of a given data set
- assumes that the first row in the data set only contains col names. Not data values
- And assumed that the label field is the in the last column of the data

* **data** List[List[]]: data used for classifying 

**returns** float: returns the entropy

In [9]:
def entropy(data:List[List])->float:
    label_data = [ row[-1] for row in data[1:]]

    n = len(label_data)
    p_vals = set(label_data) # can work for more than just binary data
    e = 0

    for p in p_vals:
        p_cnt = len([ v for v in label_data if v ==p])
        p_i = p_cnt/n

        e = e - p_i * math.log(p_i,2)
    return e

In [10]:
test_e = entropy(self_check)
test_e

# verify the entropy is correct 
assert round(test_e) == 1

# verify entropy can handle none binary data
self_check_mod = deepcopy(self_check)
self_check_mod.append(['7', 'round', 'small', 'red', 'unknown'])
test_e2 = entropy(self_check_mod)
assert test_e2 > 0
assert test_e2 != test_e

# if empty dataset, 0 is returned
test_e3 = entropy([])
assert test_e3 == 0

<a id="info_gain"></a>
## info_gain

- Calculates the information gain for a given attribute

* **data** List[List[]]: data used for classifying 
* **col_index** int: index for the attribute to calculate info gain on

**returns** float: returns the entropy

In [11]:
def info_gain(data:List[List],col_index:int)->float:
    e = entropy(data)
    
    col_vals = [ row[col_index] for row in data]
    col_unique = set(col_vals[1:])
    col_name = data[0][col_index]
    n = len(data[1:])
    gain_info = e

    for c in col_unique:
        data_col = [row for row in data if row[col_index]== c or row[col_index]==col_name]
        col_e = entropy(data_col)
        rel_freq =(len(data_col) -1) / n # subtract col name row
        gain_info += - rel_freq * col_e
    
    return gain_info

In [12]:
shape_gain = info_gain(self_check,1)
size_gain =  info_gain(self_check,2)
size_color = info_gain(self_check,3)
e = entropy(self_check)

#verify info gain is different than entropy
assert shape_gain != e

# verify gain is different between the two values
assert shape_gain != size_gain

# verify no error is thrown when a homogeneous entropy is present
# when a dominate value, log of 0 would throw an error
assert size_color >0

<a id="best_attribute"></a>
## best_attribute

- Finds the attribute with the highest gain
- This is done by finding the info gain for each remianing col
- Returns the col name and col index with the highest gain

* **data** List[List[]]: data used for classifying 
* **attributes** List[str]: list of column names available to select from

**returns** Tuple[str, int]: column name and index with highest gain

In [13]:
def best_attribute(data:List[List], attributes:List[str])->Tuple[str, int]:
    gains= [] # gain value in order of attribute index value
    if len(data) == 0:
        return None
    #list of indexes of attributes   
    attr_index = [data[0].index(col) for col in data[0] if col in attributes]

    for col_index in attr_index:
        col_gain = info_gain(data,col_index)
        gains.append(col_gain)
    
    # find max
    best_gain_index = attr_index[gains.index(max(gains))]
    return data[0][best_gain_index], best_gain_index


In [14]:
root_self_check, root_index = best_attribute(self_check, ["Size","Shape","Color"])

#verify col name not index is returned
assert isinstance(root_self_check, str)

# verify index is returned
assert root_index == 1

# verify size is returned
assert root_self_check =="Size"

# verify None is returned if data is empty, error checking
empty_val = best_attribute([],[])
assert empty_val == None

#verify best attribute found if only have two cols
root_self_check1, root_index1 = best_attribute(self_check, ["Shape","Color"])
assert root_self_check1 =="Color"

<a id="homogen"></a>
## homogen

- Indicates where a data set is homogeneuos or not
- Returns the values the label takes if homogenous.
- If data is not homogeneous, then None is returned

* **data** List[List[]]: data used for classifying 

**returns** str: if homogeneous, then value of label else None


In [15]:
def homogen(data:List[List])->str:
    if len(data) <=1 :
        return None
    label_data = [ row[-1] for row in data[1:]]
    unique_labels = set(label_data)

    if len(unique_labels) ==1:
        return unique_labels.pop()
    else:
        return None

In [16]:
h1 = homogen(self_check)

# verify returns None when not homogeneous
assert h1 == None

#verify returns the value for homogenous set
h2 = homogen(self_check[:2])
assert h2 == 'no'

# verify none returned if data set only contains columns
h3 = homogen(self_check[:1])
assert h3 == None

<a id="majority_label"></a>
## majority_label

- Finds the label with the highest frequncy in the data set

* **data** List[List[]]: data used for classifying 

**returns** str: value of majority label

In [17]:
def majority_label(data:List[List])->str:
    if len(data) <=1:
        return None
    labels = [ row[-1] for row in data[1:]]
    unique_labels = set(labels)
    max_cnt =0
    max_label = None

    for l in unique_labels:
        current_cnt = labels.count(l)
        if current_cnt > max_cnt:
            max_cnt = current_cnt
            max_label = l
    return max_label  

In [18]:
self_check_major = majority_label(self_check)

#verify value not index is returned
assert isinstance(self_check_major,str)

#verify the correct answer is returned
assert self_check_major == 'no'

#verify none is returned if no real data
self_check_empty = majority_label(self_check[:1])
assert self_check_empty == None

<a id="get_domains"></a>
## get_domains

- Gets the domains for a given attribute
- This is unique values that a column can take on

* **data** List[List[]]: data used for classifying 
* **col_index** int: attribute index in data

**returns** List[str]: returns list of unique values

In [19]:
def get_domains(data:List[List], col_index:int)->List[str]:
    if len(data) <= 1:
        return []
    values = [row[col_index] for row in data[1:]]
    unique_vals = set(values)
    return unique_vals

In [20]:
shapes = get_domains(self_check, 0 )

#verify a list of unique values is returned. No dups
assert len(shapes) == 2

#verify the correct values are present
for s in shapes:
    assert s in ["round","square"]

# verify empty list is returned if data contains no data
empty_test = get_domains([],4)
assert empty_test == []

<a id="id3"></a>
## id3

- Produced a decision tree using the id3 algorithm
- This is a recursive way to create a decision tree

* **data** List[List[]]: data used for classifying 
* **attributes** List[str]: attribute names
* **current_depth** int: current depth of the tree
* **depth_limit** int: number of levels that the tree can take on
* **default** str: default value that the tree will have when making a leaf

**returns** dict: nested dictionary of a decision tree

In [21]:
def id3(data:List[List], attributes:List[str], current_depth:int, depth_limit:int, default:str)->dict:
    if len(data) == 1:
        return default
    if homogen(data) != None:
        return homogen(data)
    if len(attributes) == 0 or current_depth > depth_limit:
        return majority_label(data)
    node_name, node_index = best_attribute(data, attributes)
    node = {}
    children = {}
    default = majority_label(data)
    for value in get_domains(data, node_index): # todo make a unique domain
        subset = [ row for row in data if row[node_index] == value or row[node_index] == node_name] 
        remaining_att = deepcopy(attributes) 
        remaining_att.remove(node_name)
        child = id3(subset,remaining_att, current_depth +1, depth_limit, default)
        children[value] = child
    children["?"] = id3(data,remaining_att, current_depth +1, depth_limit, default)
    node[node_name] = children
    return node

In [22]:
self_check_cols = ["Size","Shape","Color"]
self_check_tree = id3(self_check, self_check_cols, 1, 3, "no")

# verify a list is returned
assert isinstance(self_check_tree, dict)

#verify a nest dict is returned. Root should make the dict len of one
assert len(self_check_tree) == 1 

# verify root is size
assert list(self_check_tree.keys()) == ["Size"]

<a id="train"></a>
## train

- Trains a dataset by creating a decision true

* **training_data** List[List[]]: data used for classifying 
* **depth_limit** int: number of levels that the tree can take on


**returns** dict: nested dictionary of a decision tree

In [23]:
def train(training_data:List[List], depth_limit=None)->dict:
   default = majority_label(training_data)
   atttributes = training_data[0][:-1]

   if depth_limit == None:
      depth_limit = len(atttributes)
   tree = id3(training_data, atttributes, 1,depth_limit, default)

   return tree

In [24]:
tree_train = train(self_check)

#verify a dictionary is returned
assert isinstance(tree_train,dict)

# verify tree match the test from id3
assert tree_train == self_check_tree

#verify depth limit works
tree_train_dep1 = train(self_check,1)
assert tree_train_dep1 == {'Size': {'small': 'no', 'large': 'yes', '?': 'no'}} # only one attribute in tree

<a id="get_label"></a>
## get_label

- Classifying a single data point by using recursion to go through a decision tree

* **data** List[]: data used for classifying. A single record
* **cols** List[str]: attribute names
* **tree_part** dict: part of a decision tree

**returns** str: label to classify the data point

In [25]:
def get_label(data:List, cols:List[str], tree_part:dict)->str:
    for att_name,att_mapping in tree_part.items():
        att_index = cols.index(att_name)
        data_val = data[att_index]
        if data_val in att_mapping: # is the test value in the att mapping
            if isinstance(att_mapping[data_val], dict):
                value = get_label(data, cols, att_mapping[data_val])
            else:
                value = att_mapping[data_val]
        else:
            # Missing data
            if isinstance(att_mapping["?"], dict):
                value = get_label(data, cols, att_mapping["?"])
            else:
                value = att_mapping["?"]
    return value

In [26]:
d_test =  [ 'round', 'large', 'green', 'no']
test_label = get_label(d_test, [ 'Shape', 'Size', 'Color', 'Safe?'], tree_train)

# verify a string is returned
assert isinstance(test_label, str)

# verify a value from label field is returned
assert test_label in ["yes","no"]

# verify can handle missing values 
d_test2=   [ 'round', 'med', 'green', 'no']
test_label2 = get_label(d_test2, [ 'Shape', 'Size', 'Color', 'Safe?'], tree_train)
assert isinstance(test_label2,str)

<a id="classify"></a>
## classify

- Classifies each data point by calling a recursive function to read through a decision tree

* **tree** dict: decision tree
* **observations** List[List[]]: data used for classifying. Multiple records
* **labeled** bool: indicator is data is labeled or not. Not sure why this was requested. Did not use

**returns** List[str]: labels for each data point

In [27]:
def classify(tree:dict, observations:List[List], labeled=True)->List[str]:
    if len(observations) == 0:
        return []
    cols = observations[0][:-1]
    labels = []
    for ob in observations[1:]:
        label = get_label(ob, cols, tree)
        labels.append(label)
    return labels

In [28]:
self_check_test_data =[[ 'Shape', 'Size', 'Color', 'Safe?'],
 ['round', 'small', 'blue', 'yes'],
 [ 'round', 'large', 'green', 'no']]

l = classify(tree_train, self_check_test_data, labeled=True)

#verify each observation is labeled
assert len(l) == 2

#verify labels are valid labels
for lab in l:
    assert lab in ["yes","no"]

#verify empty test data can be handled
l2 = classify(tree_train, self_check_test_data[:][:1], labeled=True)
assert l2 == []
l3 = classify(tree_train, [], labeled=True)
assert l3 == []

<a id="evaluate"></a>
## evaluate

- returns the error rate of the predictions

* **label** List[str]: list of actual labels for the data points
* **prediction** List[str]: list of prediction labels from the model


**returns** float: returns error rate

In [29]:
def evaluate(label:list[str], prediction:list[str])->float:
    n = len(label)
    false_vals = 0

    for i in range(n):
        if label[i] != prediction[i]:
            false_vals += 1
    return false_vals/n


In [30]:
l_test = [1,1,1,0]
p_test = [0,0,0,0]

# verify error rate is 0 if all match
zero_rate = evaluate(l_test,l_test)
assert zero_rate == 0

#verify error rate doesnt match accuracy
e_test = evaluate(l_test, p_test)
assert e_test != 1/4

# verify error rate is correct
e_test = evaluate(l_test, p_test)
assert e_test == 3/4

# verify accuracy and error rate when half data is correct
p_2 = [1,1,0,1]
e_test2 = evaluate(l_test, p_2)
accuracy = 2/4 # two correct predictions over 4 records
assert e_test2 == accuracy 

<a id="cross_validate"></a>
## cross_validate

- Performs cross validation to classify data using a decision tree
- First splits the data into 10 folds
- Then trains the model to build the decision tree
- Then makes predictions on the test set
- Then evaulate the model
- Prints out the results for each fold

* **data** List[List[]]: data used for classifying. Data is parsed
* **col_names** List[str]: attribute names
* **depth_limit** int: number of levels that the tree can take on



In [31]:
def cross_validate(data:List[list],col_names:list[str], depth_limit:None):
    folds = create_folds(data, 10)
    for i in range(10):
        train_data, test_data = create_train_test(folds, i,col_names)
        tree = train(train_data, depth_limit)

        pred_labels = classify(tree, test_data, labeled=True)
        actual_labels = [row[-1] for row in test_data[1:]]

        error_rate = evaluate(actual_labels, pred_labels)
        error_rate = error_rate*100
        print("Fold", i, "Error rate:", error_rate, "%")


In [32]:
cross_validate(data,mushroom_cols,4)

Fold 0 Error rate: 0.0 %
Fold 1 Error rate: 0.0 %
Fold 2 Error rate: 0.0 %
Fold 3 Error rate: 0.0 %
Fold 4 Error rate: 0.0 %
Fold 5 Error rate: 0.0 %
Fold 6 Error rate: 0.0 %
Fold 7 Error rate: 0.0 %
Fold 8 Error rate: 0.0 %
Fold 9 Error rate: 0.0 %


<a id="pretty_print_tree"></a>
## pretty_print_tree

- Prints the tree
- Uses recursion to print the tree
- Each level is indented

* **tree** dict: decision tree

In [33]:
def pretty_print_tree(tree:dict):
    #print(json.dumps(self_check_tree, sort_keys=True, indent=2)) #still a dictionary format but spaced properly
    def pretty_print_tree_recursive(tree, depth=0):
        for k,v in tree.items():
            if isinstance(v, dict):
                print('\t'*depth + k+":")
                pretty_print_tree_recursive(v, depth+1)
            else:
                print('\t'*depth + k + ":" + v )
    return pretty_print_tree_recursive(tree)

In [34]:
# test out printing tree
pretty_print_tree(self_check_tree)

Size:
	small:
		Shape:
			round:
				Color:
					blue:no
					green:no
					red:yes
					?:no
			square:no
			?:
				Color:
					blue:no
					green:no
					red:no
					?:no
	large:
		Color:
			blue:no
			green:yes
			red:
				Shape:
					round:yes
					square:no
					?:yes
			?:
				Shape:
					round:yes
					square:yes
					?:yes
	?:
		Color:
			blue:no
			green:
				Shape:
					round:no
					square:yes
					?:yes
			red:
				Shape:
					round:yes
					square:no
					?:no
			?:
				Shape:
					round:yes
					square:no
					?:no


In [35]:
full_data_train = deepcopy(data) 
full_data_train.insert(0,mushroom_cols)
full_tree = train(full_data_train, 4)
pretty_print_tree(full_tree)

odor:
	s:p
	c:p
	f:p
	l:e
	m:p
	p:p
	a:e
	y:p
	n:
		spore-print-color:
			k:e
			b:e
			o:e
			r:p
			y:e
			n:e
			h:e
			w:
				habitat:
					g:e
					l:
						cap-color:
							n:e
							c:e
							w:p
							y:p
							?:e
					p:e
					d:
						gill-size:
							b:e
							n:p
							?:p
					w:e
					?:
						gill-size:
							b:e
							n:p
							?:e
			?:
				cap-color:
					b:
						stalk-root:
							b:p
							?:e
					c:e
					g:e
					p:
						habitat:
							p:e
							g:p
							w:e
							m:p
							?:e
					e:e
					r:e
					y:p
					n:
						stalk-surface-above-ring:
							s:e
							f:e
							y:e
							k:p
							?:e
					u:e
					w:
						bruises:
							f:e
							t:p
							?:e
					?:
						gill-color:
							k:e
							h:e
							g:e
							o:e
							p:e
							e:e
							r:p
							y:e
							n:e
							u:e
							w:e
							?:e
	?:
		spore-print-color:
			k:
				gill-size:
					b:e
					n:
						population:
							s:p
							y:e
							v:p
							?:p
			

## Before You Submit...

1. Did you provide output exactly as requested?
2. Did you re-execute the entire notebook? ("Restart Kernel and Rull All Cells...")
3. If you did not complete the assignment or had difficulty please explain what gave you the most difficulty in the Markdown cell below.
4. Did you change the name of the file to `jhed_id.ipynb`?

Do not submit any other files.