# Decision Tree
* Decision Trees are considered one of the most mature, traditional, algorithms in predictive analytics
* They are typically used to solve classification problems through visual and explicit representations of decisions and decision making.
* Think of them like a map where you follow each path according to your decision, and each path leads to a new choice to make until you reach the end.
* They mimic the way you probably make decisions in your daily life:


## Terminology
* Root: Our starting point for the tree. Note that a decision tree is drawn upside down since its root is at the top
    - Alone Or With Friends is the root in the above example
* Branch: Also known as an edge, these lead from condition to condition, down to the results
    - Sunny or Rainy are branches in the above example
* Condition: Also known as an internal node, this is the choice that needs to be made in order to figure out which branch to take.
    - Weather Outside? is our condiition in the above example
* Leaf: Also known as a decision, these are the final results that signify the classification of the data. There are no branches coming out of a leaf, only going in to it.
    - video games, soccer and movies are all examples of a leaf

## Calculate the entropy for a fair coin

Entropy shows the uncertainy about a random variable

Show that the fair coin has the largest entropy (uncertainty)

In [13]:
import numpy as np

def entropy(probability):
    H = np.array([-i*np.log2(i) for i in probability]).sum()
    return H
    
    
probability =[0.5, 0.5]
entropy(probability)

1.0

In [14]:
probability = [0.9, 0.1]
entropy(probability)

0.4689955935892812

## Let build a Decision Tree for Tennis Data

The following table informs about decision making factors to play tennis at outside based on 14 days data, for different weather conditions

### Activity: Write a function that takes Wind conditions (Weak or Strong) and returns the Tennis Player Decision

In [None]:
def conditional_probability(col, condition, decision):
    
    condition_df = df[(df[col] == condition) & decision]
    
    return dict(condition_df['Decision'].value_counts() / len(condition_df['Decision']))

conditional_probability('Wind', 'Weak', df['Decision'])  

In [19]:
# Entropy for playing tennis
probability = [9/14, 5/14]
entropy(probability)

0.9402859586706311

In [28]:
import pandas as pd

df = pd.read_csv('tennis.txt', delimiter="\t", header=None, names=['Outlook', 'Temp', 'Humidity', 'Wind', 'Decision'])
df

Unnamed: 0,Outlook,Temp,Humidity,Wind,Decision
1,Sunny,Hot,High,Weak,No
2,Sunny,Hot,High,Strong,No
3,Overcast,Hot,High,Weak,Yes
4,Rain,Mild,High,Weak,Yes
5,Rain,Cool,Normal,Weak,Yes
6,Rain,Cool,Normal,Strong,No
7,Overcast,Cool,Normal,Strong,Yes
8,Sunny,Mild,High,Weak,No
9,Sunny,Cool,Normal,Weak,Yes
10,Rain,Mild,Normal,Weak,Yes


### H(Decision) = 0.9402859586706311

### What is the H(Decision|Wind == Weak)? 

0.8112781244591328

In [30]:
dict(df[df['Wind'] == 'Weak']['Decision'].value_counts())

{'Yes': 6, 'No': 2}

In [33]:
probability = [6/8, 2/8]
entropy(probability)

0.8112781244591328

### What is the H(Decision|Wind == Strong)? 

1.0

In [34]:
dict(df[df['Wind'] == 'Strong']['Decision'].value_counts())

{'No': 3, 'Yes': 3}

In [36]:
probability = [3/6, 3/6]
entropy(probability)

1.0

In [37]:
# What is the chance the wind is strong and weak?
dict(df['Wind'].value_counts())

{'Weak': 8, 'Strong': 6}

## Obtain the Information Gain Between PlayTennis (Decision) and Wind

- What is the probability that wind be weak? Hint = Count how many weak wind we have devide over how many sample we have.

`p(Wind = Weak) = 8/ 14`

`p(Wind = Strong) = 6/ 14`

- Information Gain(Decision, Wind) = 

`Entropy(Decision) - p(Wind = Weak)Entropy(Decision | Wind = Weak ) - p(Wind = Strong)Entropy(Decision | Wind = Strong )`

= 0.048

In [41]:
(0.94 - ((8/14*0.811) + (6/14*1)))

0.04800000000000004

## Other factors on Decision column

We have applied similar calculation on the other features (columns)

1 - Gain(Decision, Wind) = 0.048

2 - Gain(Decision, Outlook) = 0.246

3 - Gain(Decision, Temperature) = 0.029

4 - Gain(Decision, Humidity) = 0.151

In [61]:
# Instructor's Solution:

def info_gain(df, feature, decision):
    
    # 1. Entropy
    dict_decision = dict(df[decision].value_counts())
    prob_decision = [q for (p,q) in dict_decision.items()]/sum(dict_decision.values())
    entropy_decision = entropy(prob_decision)
    # print(entropy_decision)
    

    # 2. Obtain the probabilities 
    dict_feature = dict[df[feature].value_counts()]
    dict_prob_feature = []
    for (p,q) in dict_features.items():
        dict_prob_features[p] = q/su,(dict_feature.values())
    # print(dict_prob_feature)
        
    # 3. Obtain the conditional entropy
    conditions = df[feature].unique()
    dict_ = {}
    for condition in conditions:
        dict_[condition] = conditional_prob(df, feature, decision, condition)
    # print(dict_)
    
    # 4. Calculate the information gain from the formula
    S = 0
    for (i, j) in dict_.items():
        prob_condition = list(dict_[i].values())
        S = S + dict_prob_feature[i]*entropy(prob_condition)
    print(entropy_decision - S)


In [66]:
info_gain(df, 'Wind', df['Decision'] == 'Yes')
info_gain(df, 'Humidy', df['Decision'] == 'Yes')
info_gain(df, 'Temp', df['Decision'] == 'Yes')

KeyError: "None of [Index(['No', 'No', 'Yes', 'Yes', 'Yes', 'No', 'Yes', 'No', 'Yes', 'Yes', 'Yes',\n       'Yes', 'Yes', 'No'],\n      dtype='object')] are in the [columns]"

In [59]:
df[df['Outlook'] == 'Overcast']

Unnamed: 0,Outlook,Temp,Humidity,Wind,Decision
3,Overcast,Hot,High,Weak,Yes
7,Overcast,Cool,Normal,Strong,Yes
12,Overcast,Mild,High,Strong,Yes
13,Overcast,Hot,Normal,Weak,Yes


In [60]:
df[df['Outlook'] == 'Sunny']

Unnamed: 0,Outlook,Temp,Humidity,Wind,Decision
1,Sunny,Hot,High,Weak,No
2,Sunny,Hot,High,Strong,No
8,Sunny,Mild,High,Weak,No
9,Sunny,Cool,Normal,Weak,Yes
11,Sunny,Mild,Normal,Strong,Yes


In [63]:
info_gain(df[df['Outlook'] == 'Sunny'], 'Temp', 'Play')

KeyError: 'Play'

In [74]:
import pandas as pd
from sklearn.tree import DecisionTreeClassifier
from sklearn import preprocessing
from sklearn.tree import export_graphviz
import pydotplus


data = pd.read_csv('tennis.txt', delimiter="\t", header=None, names=['a', 'b', 'c', 'd', 'e'])
print(data)

#encode the data so we can use it with our decision tree
# by converting categories into numbers
data_encoded = data.apply(preprocessing.LabelEncoder().fit_transform)
print(data_encoded)

#create our decision tree classifer with entropy
clf = DecisionTreeClassifier(criterion='entropy', max_depth=3)

# one_hot_data = pd.get_dummies(data[['a', 'b', 'c', 'd']], drop_first=True)
# print(one_hot_data)

#provide our feature array and target array (1-item),
# and train the model using a decision tree
clf.fit(data_encoded[['a', 'b', 'c', 'd']], data_encoded['e'])

# export our decision tree into data that can be plotted
dot_data = export_graphviz(clf, out_file=None, feature_names=['Outlook', 'Temp.', 'Humidity', 'Wind'])

# Draw graph
graph = pydotplus.graph_from_dot_data(dot_data)
graph.write_png('tennis_tree.png')

ModuleNotFoundError: No module named 'pydotplus'