# Decision Tree
use decision tree to solve classification and regression problem.For Decision Tree, we will implement ID3 algorithm. It's garanteed that all features are discrete.

- In ID3 algorithm, we use Entropy to measure the uncertainty in the data set. We use Information Gain to measure the quality of a split.
- Entropy: H(S)=\\(\sum_{x∈X} -p(x)log_2p(x)\\)
- Information_Gain: IG(S,A) = H(S)-\\(\sum_{t∈T}p(t)H(T)\\) = H(S) - H(S|A)
- see more detail on [ID3 Algorithm](https://en.wikipedia.org/wiki/ID3_algorithm)
In this section, you need to implement Information_Gain function on utils.py.
```
def Information_Gain(S, branches):
# calculate information_gain according to branches seperated by one feature
# input:
    -S: float Entropy of current state
    -branches: List[List[int]] for a specific attribute, number of cases belongs to each attribut value and class, num_attribute_values*num_classes
# return: float
```
- In ID3 algorithm, we use the largest information gain to split the set S. Please consult the Lecture 2 notes page 23.


- Implement TreeNode split function and TreeNode predict function in hw1_dt.py:
    - TreeNode.split<br>
    
    In the TreeNode class, the features variable means all the points in current TreeNode, and the labels variable means the corresponding labels for all data. The children variable is a list of TreeNode after split the current node based on the best attributs. This should be a recursive process that once we call the split function, the TreeNode will keep spliting untill we get the whole tree structure.
    
    **Note: when there is a tie of information gain when comparing the attributes, always choose the attribute which has more attribute values. If they are all same, use the one with small index. Also build your child list with increasing order of attribute value.**
    - TreeNode.predict
    
    This function will be called once we create the tree structure by the split function. It will take one single data point as a parameter, your code should process that data point and go through your tree to a leaf and make prediction.
    Thus, this function need to return a predicted lable.

## Sanity Test
Do the following steps, as a simple test to check your algorithm works well
- Load training data (features and values) from function data.sample_decision_tree_data.
- Create a Decision Tree based on training data.
- Load test data from function data.sample_decision_tree_test.
- Test the prediction function of your algorithm.

In [1]:
import data
import hw1_dt as decision_tree
import utils as Utils
from sklearn.metrics import accuracy_score

features, labels = data.sample_decision_tree_data()

# build the tree
dTree = decision_tree.DecisionTree()
dTree.train(features, labels)

# print
Utils.print_tree(dTree)

branch 0{
	deep: 0
	num of samples for each class: 2 : 2 
	split by dim 0
	branch 0->0{
		deep: 1
		num of samples for each class: 1 
		class: 0
	}
	branch 0->1{
		deep: 1
		num of samples for each class: 1 : 1 
		split by dim 1
		branch 0->1->0{
			deep: 2
			num of samples for each class: 1 
			class: 0
		}
		branch 0->1->1{
			deep: 2
			num of samples for each class: 1 
			class: 1
		}
	}
	branch 0->2{
		deep: 1
		num of samples for each class: 1 
		class: 1
	}
}


# Train and Predict
- Load data (features and values) from function data.load_decision_tree_data.
- Train your decision tree

In [2]:
#load data
X_train, y_train, X_test, y_test = data.load_decision_tree_data()

# set classifier
dTree = decision_tree.DecisionTree()

# training
dTree.train(X_train.tolist(), y_train.tolist())

# print
Utils.print_tree(dTree)

branch 0{
	deep: 0
	num of samples for each class: 845 : 260 : 2 
	split by dim 5
	branch 0->0{
		deep: 1
		num of samples for each class: 369 
		class: 0
	}
	branch 0->1{
		deep: 1
		num of samples for each class: 273 : 96 
		split by dim 3
		branch 0->1->0{
			deep: 2
			num of samples for each class: 123 
			class: 0
		}
		branch 0->1->1{
			deep: 2
			num of samples for each class: 78 : 45 
			split by dim 4
			branch 0->1->1->0{
				deep: 3
				num of samples for each class: 40 : 1 
				split by dim 0
				branch 0->1->1->0->0{
					deep: 4
					num of samples for each class: 16 
					class: 0
				}
				branch 0->1->1->0->1{
					deep: 4
					num of samples for each class: 8 : 1 
					split by dim 1
					branch 0->1->1->0->1->0{
						deep: 5
						num of samples for each class: 4 
						class: 0
					}
					branch 0->1->1->0->1->1{
						deep: 5
						num of samples for each class: 1 
						class: 1
					}
					branch 0->1->1->0->1->2{
						deep: 5
						num of samples for each c

In [3]:
import json
# testing
y_est_test = dTree.predict(X_test)
test_accu = accuracy_score(y_est_test, y_test)
print('test_accurracy', test_accu)

test_accurracy 0.6357267950963222


# Pruning The Tree

Sometimes, in order to prevent overfitting. We need to pruning our Decition Tree. There are several approaches to avoiding overfitting in building decision trees. 

- Pre-pruning that stop growing the tree earlier, before it perfectly classifies the training set.
- Post-pruning that allows the tree to perfectly classify the training set, and then post prune the tree. 

Practically, the second approach of post-pruning overfit trees is more successful because it is not easy to precisely estimate when to stop growing the tree.
We will use Reduced Error Pruning, as one of Post-pruning in this part.
```
Reduced Error Pruning
0. Split data into training and validation sets.
1. Do until further pruning is harmful:
2. Evaluate impact on validation set of pruning each possible node (plus those below it)
3. Greedily remove the one that most improves validation set accuracy
- Produces smallest version of most accurate subtree.
- Requires that a lot of data be available.
```
For Pruning of Decision Tree, you can refer [Reduce Error Pruning](http://jmvidal.cse.sc.edu/talks/decisiontrees/reducederrorprun.html?style=White) and P69 of Textbook: Machine Learning -Tom Mitchell.

In [5]:
Utils.reduced_error_prunning(dTree, X_test, y_test)

In [6]:
y_est_test = dTree.predict(X_test)
test_accu = accuracy_score(y_est_test, y_test)
print('test_accu', test_accu)

test_accu 0.7950963222416813


In [7]:
# decision tree after pruning.
Utils.print_tree(dTree)

branch 0{
	deep: 0
	num of samples for each class: 845 : 260 : 2 
	split by dim 5
	branch 0->0{
		deep: 1
		num of samples for each class: 369 
		class: 0
	}
	branch 0->1{
		deep: 1
		num of samples for each class: 273 : 96 
		split by dim 3
		branch 0->1->0{
			deep: 2
			num of samples for each class: 123 
			class: 0
		}
		branch 0->1->1{
			deep: 2
			num of samples for each class: 78 : 45 
			split by dim 4
			branch 0->1->1->0{
				deep: 3
				num of samples for each class: 40 : 1 
				class: 1
			}
			branch 0->1->1->1{
				deep: 3
				num of samples for each class: 26 : 15 
				class: 1
			}
			branch 0->1->1->2{
				deep: 3
				num of samples for each class: 12 : 29 
				class: 1
			}
		}
		branch 0->1->2{
			deep: 2
			num of samples for each class: 72 : 51 
			class: 1
		}
	}
	branch 0->2{
		deep: 1
		num of samples for each class: 203 : 164 : 2 
		split by dim 3
		branch 0->2->0{
			deep: 2
			num of samples for each class: 123 
			class: 0
		}
		branch 0->2->1{
			deep: 2
