### Decision Trees

Decision trees are versatile machine learning algorithm that tcan perform both classificaiton and regression task, andn even the multioutput tasks. They are fundamental component of random forest, which are one of the most powerful machine learning algorithms avaialble today. 

__Training and Visualizing a decision tree__

In [1]:
from sklearn.datasets import load_iris
from sklearn.tree import DecisionTreeClassifier

iris = load_iris()

##Selecting petal length and width
X = iris.data[:,:2]
y = iris.target

In [2]:
tree_clf = DecisionTreeClassifier(max_depth = 2)
#max_depth is the size of the tree 
tree_clf.fit(X,y)

DecisionTreeClassifier(max_depth=2)

In [3]:
from joblib import dump

dump(tree_clf, 'models/ch_06/tree_clf_iris.pkl')

['models/ch_06/tree_clf_iris.pkl']

In [6]:
##Visualizing the tree using export_graphviz() method

from sklearn.tree import export_graphviz

export_graphviz(
tree_clf,
out_file = "visualizations/iris_tree.dot",
feature_names = iris.feature_names[2:],
class_names = iris.target_names,
rounded = True,
filled = True)

We can use the dot command-line tool to convert this .dot file into png/pdf. 

```$ dot -Tpng iris_tree.dot -o iris_tree.png```

Or use pydot

In [21]:
import os
os.system('dot -Tpng visualizations/iris_tree.dot -o visualizations/iris_tree.png')

1

Checking the tree 

![DT](visualizations/iris_tree.png)

### Making Predictions

When we are making a classification, we start at the root node and check the petal length, if it's smaller or greater than 5.45 cm. Then we move to the second node and compare the petal width and petal length again based on if we are on left or right side of the tree.

The node's gini attribute measures its impurity: a node is pure when the gini = 0. If all the instances it applies to belong to that class. 

Formula for Gini impurity : $ G_i = 1 - \sum ^n _ {k=1} p_i k^2$ 
where, $p_{i,k}$ is the ratio of the class k instance among the training instances in the $i^{th}$ node.

One of the many qualities of Decision Trees is that they require very little data preparation. In fact, they don’t require feature scaling or centering at all. The nuber of children depend on the number of classification classes.

#### White box vs the Black box

- _white box models_ : which are easy to intrepret, like decision trees 
- _black box models_ : like random forest, the are great predictors and can be calculated but hard to explain the the reason of prediction since they are mor complex. 

### Estimating Class Probabilties

Instead of classifying into the classes, the DT can also output the probability of belonging to that class. Fot this, it traverses the tree to find the correct leaf node and then returns the ratio of training instance of class k in this node. 

In [23]:
tree_clf.predict_proba([[5,1.5]])

array([[0.14285714, 0.71428571, 0.14285714]])

In [24]:
tree_clf.predict([[5,1.5]])

array([1])

### CART Training Algorithm

```sklearn``` uses CART (classification and regression tree) algorithm to train decision trees (also called growing trees). The algorithm works by first splitting the training set into two subsets using a single feature k and a threshold $t_k$ by searching for a pair ($k,t_k$) that produces the purest subsets (wghtd by the size)

\begin{equation*} J(k, t_k) =  \frac {m_{left}} m G_{left} + \frac {m_{right}} m G_{right}\end{equation*}


where 
Gleft/right measures the impurity of the left/right subset,
mleft/right is the number of instances in the left/right subset.

After splitting the training set into two, it splits the subsets using same logic, then further down, recursively. It stops on reaching the maximum depth (defined by max_depth), or if it cannot find a split that reduces the impurity. 

Other hyperparamters for tuning are
- min_samples_split
- min_samples_leaf
- min_weight_fraction_leaf
- max_leaf_nodes

CART algorithm is a greedy algorithm, it greedily searches for an optimum split at the top level and then repeats the process at each subsequent level. It doesn't check if the split will lead to lowest possible impurity at later stages. Hence the tree is good, but not always optimal. Finding an optimal tree is a NP-Complete problem it requires O(exp(m)) time making the problem intractable even for small training sets. 

### Computional Complexity

Making predictions requires traversing the Decision Tree from the root to a leaf. Decision Trees generally are approximately balanced, so traversing the Decision Tree requires going through roughly O(log (m)) nodes. Since each node only requires checking the value of one feature, the overall prediction complexity is O(log (m)), independent of the number of features. So predictions are very fast, even when dealing with large training sets.

Comparing all features on all samples at each node results in a training complexity of O(n × m log (m)). For small training sets (less than a few thousand instances), Scikit-Learn can speed up training by presorting the data (set presort=True), but doing that slows down training considerably for larger training sets

### Gini Impurity or Entropy

By default the DT use gini impurity for spliting the node, but we can set it to entropy usign the ```criterion``` hyperparameter to ```"entropy"```. The concept of entropy is derived from thermodynamics. 

$H_i = - \sum ^n _{k=1} p_{i,k} log_2 (p_{i,k})$

Gini impurity or entropy do not make a big difference, they often lead to similar splits. Gini impurity is slightlty faster to compute, hence it's defualt. When they differ, gini impurity tends to isolate the most frequent class in its owen branch of tree, while entropy tends to produce slightly more balanced tree. 

### Regularization Hyperparameters

Decision tree make very few asusmptions about the training data (contrary to the linear algorithms). If left unconstrained, the tree structure will adapt itself to the training data, fitting it very closely - most likely overfitting it. Such a model is called non paramteric model, not because it does not have any parameters (it often has lot) but because the number of paramters is not determined prior to training, so the model structure id free to stick close to the data. In contrast, a parateric model, like linear model, has fixed (predetermined) number of paramters, so its degree of freedom is limited, reducting the risk of overffiting (but increasing the risk of underfitting)


To avoid overfitting the trianing data, you need to restrict the descision trees's freedom during the triaining. it's called regularization. Regularization paramters depend on the algorithm used, but generally we can at least restrict the maximum depth of the decision tree. Reducing the max_depth with contraint the model and reduce overfitting. 

Other algorithms work by first trianing the decision tree without restrictions and then pruning the unnecessary nodes. If a node has all unnecesary leafs based on gini impurity and other statistical test like chi-square test and p test, then the node is entirely removed. the pruning continues until all such nodes are removed. 


### Regression

DT are capable of performing regression tasks. We can do it in ```sklearn``` using ```DecisionTreeRegressor```

In [25]:
from sklearn.tree import DecisionTreeRegressor

tree_reg = DecisionTreeRegressor(max_depth = 2)
tree_reg.fit(X,y)

DecisionTreeRegressor(max_depth=2)

In [26]:
dump(tree_reg, 'models/ch_06/tree_reg_iris.pkl')

['models/ch_06/tree_reg_iris.pkl']

In [29]:
##Visualizing the tree using export_graphviz() method

from sklearn.tree import export_graphviz

export_graphviz(
tree_reg,
out_file = "visualizations/iris_tree_reg.dot",
rounded = True,
filled = True)

In [30]:
import os
os.system('dot -Tpng visualizations/iris_tree_reg.dot -o visualizations/iris_tree_reg.png')

1

Checking the graph

![Graph](visualizations/iris_tree_reg.png)


This model’s predictions are represented as a polynomial graph with each depth set. the predicted value for each region is always the average target value of the instances in that region. The algorithm splits each region in a way that makes most training instances as close as possible to that predicted value.

The CART algorithm works mostly the same way as earlier, except that instead of trying to split the training set in a way that minimizes impurity, it now tries to split the training set in a way that minimizes the MSE.

Just like for classification tasks, Decision Trees are prone to overfitting when dealing with regression tasks

### Instability

DT are versatile and powerful algorithms, but they do have some limitations. The divide the space in orthogonal patterns, making them sensitive to trainign set rotations. Putting limitations to generalization capactiy. This problem can be solved using the PCA (principle component analysis).

Another issue with DT is that they are sensitive to small variations to the data. Random Forests can limit this instability by averaging predictions over many trees

## Exercises

__1. What is the approximate depth of a Decision Tree trained (without restrictions) on a training set with one million instances?__

Considering a balanced binary tree for the questions, the depth of a binary tree containing m leaves is $log_2 (m)$. Here, m = 1 million. so  $log_2 (10^6) = 6*log_2 (10) = 6*3.32 = 19.9 $  This will be higher if we have an unbalanced tree. 

__2. Is a node’s Gini impurity generally lower or greater than its parent’s? Is it generally lower/greater, or always lower/greater?__

As we move down a tree, it starts to refine the classes criteria, hence the purity of a class in lower nodes is expected to be high, which sets the gini impurity of the child node to be lower than the parent. However, this might not be the case always, the gini impurity can increase at a child node before decreasing for the next child node.  

__3. If a Decision Tree is overfitting the training set, is it a good idea to try decreasing max_depth?__

Yes, since the DT is a non parametric model, it has a high degree of freedom which results in high likelihood of overfitting. To decrease the overfitting we can regularize the model by decreasing the max_depth, which limits its degree of freedom.

__4. If a Decision Tree is underfitting the training set, is it a good idea to try scaling the input features?__

Decision tree is indifferent to scaling of the input features. Hence, scaling the variables has no impact on underfitting or overfitting. 

__5. If it takes one hour to train a Decision Tree on a training set containing 1 million instances, roughly how much time will it take to train another Decision Tree on a training set containing 10 million instances?__

The computational complexity for a DT is defined by $O(n * m log(m))$ where n is the number of features and m is the number of instances. Since we know   $O(n * m log(m))$ = 1hr here, changign m to 10m will be  $O(n * 10* m log(10*m))$ ~ 12 hrs. 

__6. If your training set contains 100,000 instances, will setting presort=True speed up training?__

presort helps increase the trianing speed in small number of samples, with 100k samples, the presort will slow down the trianing due to large sorting task.

__7. Train and fine-tune a Decision Tree for the moons dataset by following these steps:__

__a. Use make_moons(n_samples=10000, noise=0.4) to generate a moons dataset.__

__b. Use train_test_split() to split the dataset into a training set and a test set.__

__c. Use grid search with cross-validation (with the help of the GridSearchCV class) to find good hyperparameter values for a DecisionTreeClassifier. Hint: try various values for max_leaf_nodes.__

__d. Train it on the full training set using these hyperparameters, and measure your model’s performance on the test set. You should get roughly 85% to 87% accuracy.__

In [70]:
from sklearn.datasets import make_moons

X,y = make_moons(n_samples=10000, noise=0.4) 

In [71]:
from sklearn.model_selection import train_test_split 

X_train, X_test, y_train , y_test = train_test_split(X,y,test_size = 0.25, random_state = 42)

In [78]:
from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import GridSearchCV

params = [{'max_leaf_nodes':[2,3,4,5,6,7]}]
tree_moon_clf = GridSearchCV(DecisionTreeClassifier(), param_grid = params, cv = 5,scoring='accuracy') 

In [79]:
tree_moon_clf.fit(X_train,y_train)

GridSearchCV(cv=5, estimator=DecisionTreeClassifier(),
             param_grid=[{'max_leaf_nodes': [2, 3, 4, 5, 6, 7]}],
             scoring='accuracy')

In [80]:
tree_moon_clf.best_params_

{'max_leaf_nodes': 4}

In [81]:
from joblib import dump

dump(tree_moon_clf,'models/ch_06/tree_moon_clf.pkl')

['models/ch_06/tree_moon_clf.pkl']

In [82]:
y_pred = tree_moon_clf.predict(X_test)

In [83]:
##Calculating accuracy
sum(y_pred == y_test)/len(y_pred)

0.854

__8. Grow a forest by following these steps:__

__a. Continuing the previous exercise, generate 1,000 subsets of the training set, each containing 100 instances selected randomly. Hint: you can use Scikit-
Learn’s ShuffleSplit class for this.__

__b. Train one Decision Tree on each subset, using the best hyperparameter
values found in the previous exercise. Evaluate these 1,000 Decision Trees
on the test set. Since they were trained on smaller sets, these Decision Trees
will likely perform worse than the first Decision Tree, achieving only about
80% accuracy.__

__c. Now comes the magic. For each test set instance, generate the predictions
of the 1,000 Decision Trees, and keep only the most frequent prediction
(you can use SciPy’s mode() function for this). This approach gives you
majority-vote predictions over the test set.__

__d. Evaluate these predictions on the test set: you should obtain a slightly
higher accuracy than your first model (about 0.5 to 1.5% higher). Congratulations, you have trained a Random Forest classifier!__

In [84]:
from sklearn.datasets import make_moons

X,y = make_moons(n_samples=10000, noise=0.4) 

In [85]:
from sklearn.model_selection import ShuffleSplit

moons = ShuffleSplit(n_splits=1000, test_size=.25, random_state=42)

In [86]:
moons.get_n_splits(X)

1000

In [87]:
for train_index, test_index in moons.split(X):
    print("TRAIN:", train_index, "TEST:", test_index)

TRAIN: [4901 4375 6698 ... 5390  860 7270] TEST: [6252 4684 1731 ... 7647 7161   73]
TRAIN: [8272 8183 4794 ... 7523 5830 1741] TEST: [6366 9628 5916 ... 2645 4344  706]
TRAIN: [4143 6934 5777 ... 4139 7944 2156] TEST: [8740 7755 1092 ...  474 9159 4707]
TRAIN: [9794 3720 2103 ... 6485 6249 8591] TEST: [4449 1468 6973 ... 1075 8382    4]
TRAIN: [1240 1334 9287 ... 6040 2088 1478] TEST: [8718 1765 3011 ... 4612 7133  610]
TRAIN: [6946 6654 4725 ... 2689 3367 7095] TEST: [1035 1792 8328 ...  606 6180 7114]
TRAIN: [8574 4552 5640 ... 1925   89 9213] TEST: [4378 8822 7307 ... 2285 2920 8659]
TRAIN: [5566 3090 5193 ... 6727 7699 9391] TEST: [4833 7004 1205 ... 1598 6870  187]
TRAIN: [6285 1303 9763 ... 2080 6112  403] TEST: [8924 2398 3630 ... 2145 8763 8512]
TRAIN: [3978 4851 1299 ... 9296  713 8780] TEST: [6169 7142  928 ... 4064 4129 2037]
TRAIN: [3715 9910 9796 ... 4878 6983 4554] TEST: [5891 7577 8085 ... 6271 6308 7185]
TRAIN: [4761 5931 4318 ... 2902 8812 3664] TEST: [4427 5691 5963 

TRAIN: [8387 7900 4861 ... 7143 8898  177] TEST: [3771 9720 1489 ... 5675 7180 3457]
TRAIN: [4477  583 3567 ... 5102 5043 9643] TEST: [8164 1562 2740 ... 7940 3569 8619]
TRAIN: [2880 9828 6970 ...  946 5357 7284] TEST: [1576 9832 1143 ... 2216  301 3590]
TRAIN: [4786   11 4857 ... 5053 7676 9662] TEST: [3118 3494 4746 ... 5523 4183 5457]
TRAIN: [2264 9909  690 ... 5288  607  453] TEST: [9706  233 8343 ... 6558    1   26]
TRAIN: [4327 4340 9525 ... 4061  617 5409] TEST: [4604 5789 1666 ... 4291 7819 2797]
TRAIN: [1380 2513 4297 ...  426 6439 8414] TEST: [ 890 8013 7226 ...  340 1140 9533]
TRAIN: [5540 5467 1699 ... 8115 4614 9810] TEST: [  87 1535 8308 ... 6558  569 9703]
TRAIN: [7413 4880 8504 ... 9136 8308 4383] TEST: [1293 6033 6956 ... 2181 4690 2308]
TRAIN: [1974 2947 3515 ... 7765 7401 2589] TEST: [7555 2267  530 ... 1430 4698 7553]
TRAIN: [2802  678 7712 ... 7349 8049 7499] TEST: [ 290 2767 8625 ... 5166 2346 4430]
TRAIN: [ 674  955 4079 ... 5936 5276 9398] TEST: [8482 8386 1856 

TRAIN: [7528 5744 2981 ...  249 6161 6373] TEST: [8285 1874 9187 ... 2725 3337 4039]
TRAIN: [5447 4488 2664 ... 2509 7275 8039] TEST: [8140  470 1568 ... 6023 1025 6863]
TRAIN: [6822 4409 3402 ... 9735  218 4291] TEST: [8848 5973 8378 ... 7098 6334 6082]
TRAIN: [9087 9286 1710 ... 2707 5173 5137] TEST: [1371 4608 5746 ... 2345 4842 1795]
TRAIN: [6006 9964 1832 ... 6422 5008 5581] TEST: [4090 1960 8049 ...  372 5033 2705]
TRAIN: [ 939   54  963 ...  827 6591 2099] TEST: [8697 1705 3672 ... 8963 8776 7609]
TRAIN: [9958 6916 2650 ... 2056 6714 9394] TEST: [9606 9186  350 ... 8453  368 8716]
TRAIN: [5081 4327 3895 ... 7090 1091 3314] TEST: [ 737 8332 8708 ... 1546 2483 5006]
TRAIN: [7642 7661 2374 ... 6086 9679 7211] TEST: [9004 3015 2364 ... 9328 5277 5807]
TRAIN: [6842  367 6363 ... 8327 7220 3932] TEST: [7931 3851 1913 ... 4686 8672 6685]
TRAIN: [6749 4155 4027 ... 2428 6216   50] TEST: [8752 8325 8545 ... 6532 8582 5416]
TRAIN: [1438 2764 2952 ... 8784 8699 9973] TEST: [3234 7027 8489 

TRAIN: [8003 8873 2860 ... 7477 8105 6196] TEST: [9372 7782 5331 ... 9307 4322 7476]
TRAIN: [2144 1446 1606 ... 1799 9131 6020] TEST: [3296 3775 5605 ... 9151 4747 6611]
TRAIN: [1435 5237 5318 ... 7935 3206 5382] TEST: [3793 5838 1927 ... 7199 5769 9145]
TRAIN: [7397 3989 8545 ... 1316  877 4033] TEST: [4262  128 7569 ... 7380 2787 7808]
TRAIN: [4408 3691 9321 ...  756 1539 9721] TEST: [5150 6506 8310 ... 4805 1224  811]
TRAIN: [3273 4247 4348 ...  335 8683 9779] TEST: [2622 6342 5255 ... 7371 5245 7465]
TRAIN: [1407 1102 9933 ... 3456 8599 4057] TEST: [9248  705  653 ... 8654 9772 5180]
TRAIN: [ 792  386 7067 ... 4476 7721  543] TEST: [5534 4589 5137 ... 5805 3782 6476]
TRAIN: [1568 7654 6928 ... 9578  504 1840] TEST: [3251 9551 8291 ... 1666  212 8561]
TRAIN: [3267 5735  335 ... 4934 9206 3008] TEST: [8131 3970 6527 ... 7451 2347 7719]
TRAIN: [5634 6024 6712 ... 5295 4393 6577] TEST: [4752 8718 9764 ... 2172 2787 6009]
TRAIN: [4136 3299 8958 ... 6395 3402 9142] TEST: [5541 4595  798 

TRAIN: [1965  646  788 ... 9754 1447 1607] TEST: [8779  517  738 ... 2514 7466 3448]
TRAIN: [2471 7334 9047 ...  576 5934 1113] TEST: [ 749 2164 1837 ... 8965 4951  263]
TRAIN: [4504  288 9940 ... 1125 4686 5979] TEST: [4945 5939 5861 ... 5559 9168 9836]
TRAIN: [4529 7697 8402 ... 4443 3315 9509] TEST: [6117 6639 3612 ... 8070 8362 8422]
TRAIN: [ 114 9350 9015 ... 3956 7926 8092] TEST: [7841 1568 5149 ... 4072 8207 6549]
TRAIN: [8958  131 8622 ... 3737 9729 8455] TEST: [4070 2972 4492 ... 4611 1499 2850]
TRAIN: [9218 5838 9395 ... 6486  124 1096] TEST: [1179 3150 2039 ... 7754 5336 5231]
TRAIN: [3908 9011 1110 ... 1232 1511 2752] TEST: [3878  828 5542 ... 1120 2856 6461]
TRAIN: [7573  846 3086 ... 2365 4683 2517] TEST: [1463 9419 9019 ... 7782 7979 2540]
TRAIN: [9536 2212 7378 ... 9751 1128 1227] TEST: [3696 2054  255 ... 6585 4947 5797]
TRAIN: [2181 2460 3932 ... 2734 1684 3981] TEST: [9095 9765 8201 ... 7010 3304 9081]
TRAIN: [5272 8745 1744 ... 6088 8008 5607] TEST: [6110 6209 9261 

TRAIN: [1526 6974  783 ... 4464 4617 3473] TEST: [3855 9099 3590 ... 9419 2448  577]
TRAIN: [2199 3423  706 ... 4872 8622 3720] TEST: [2504  273 1183 ... 1061 5180 2826]
TRAIN: [3708 5435 7274 ... 5357 8038   38] TEST: [1301 2656 4876 ...  776 9028 1546]
TRAIN: [9825 2709 2153 ... 3068 3724 3278] TEST: [7678  533 5100 ... 4878 6001 2542]
TRAIN: [8437 2642 2994 ...  274 8456 8731] TEST: [7476 8899 1915 ... 9366 9169 8942]
TRAIN: [4804 1146 4482 ...  231 4946 2458] TEST: [2089 1848 1659 ... 2312 4602 8538]
TRAIN: [8335 2599 4848 ...  205 5688 2056] TEST: [1363 8922 1974 ... 7619 2861 5709]
TRAIN: [9862 8798 2873 ...  494 5064 2057] TEST: [6408 3151 5588 ... 9035 3671 1186]
TRAIN: [5241 6534 7444 ... 1467 6523 8937] TEST: [4741 4960 3676 ... 8865 6988 2290]
TRAIN: [1766 5960  952 ... 1928 4680 2211] TEST: [8577 3074 3581 ... 3255   62 8942]
TRAIN: [4473 9099 5329 ... 8231 5935 1868] TEST: [2164  725 4065 ... 2514  152 9242]
TRAIN: [9309 8656 4591 ... 2593 9418 9892] TEST: [1630  344   98 

TRAIN: [3123 9844 5197 ... 4896 5685 6182] TEST: [5316 6410 4162 ... 1350 9714 7180]
TRAIN: [6396 8182  203 ... 5815 6496 2028] TEST: [6249  223 8562 ... 7294  882 2417]
TRAIN: [2113 6362 4324 ... 6173 7879 2088] TEST: [1886 9144 4467 ...  767  504 8291]
TRAIN: [8272 9429 3784 ... 1292 3475 1488] TEST: [7652 1975 6308 ... 7716 7921 4051]
TRAIN: [7566 9679 1150 ... 5006 5183 5911] TEST: [2208 4223 6658 ... 7238 7002 4580]
TRAIN: [ 345 8189 4939 ... 8260 5707 9326] TEST: [4057 9914 9535 ... 7903 4290  477]
TRAIN: [6639 8481 5737 ... 4370 9230 6151] TEST: [5328 8401 3894 ... 1146 2128 6824]
TRAIN: [4955 2207 6151 ... 6649 2590 2744] TEST: [6978 5206 5695 ... 8150 1211 6899]
TRAIN: [9341 6900  388 ... 4631 6719  633] TEST: [1115 7373 2018 ... 1747 5326 6639]
TRAIN: [6544 5550 5447 ... 2885 4329 8717] TEST: [2975 1935 3322 ... 8350 9395 7350]
TRAIN: [9090 2551 8929 ... 7706 3781 2189] TEST: [2303  816 5120 ... 3529 1563 9654]
TRAIN: [7005 1110 1373 ... 1051 6352 5468] TEST: [5163 8884 3005 

In [88]:
decision_trees = list()
dt_test_scores = list()
count = 0

for train_index, test_index in moons.split(X):
    count = count + 1
    ##Spliting the test and train
    X_train = X[train_index]
    y_train = y[train_index]
    X_test = X[test_index]
    y_test = y[test_index]
    
    ##Initializing the model
    split_tree_clf = DecisionTreeClassifier(max_leaf_nodes = 4)
    
    ##trainign the model
    split_tree_clf.fit(X_train,y_train)
    decision_trees.append(split_tree_clf)
    
    y_pred = tree_moon_clf.predict(X_test)
    print('Tree ',count, ', Accuracy',sum(y_pred == y_test)/len(y_pred))
    dt_test_scores.append(tree_moon_clf.score(X_test, y_test))

Tree  1 , Accuracy 0.8548
Tree  2 , Accuracy 0.8548
Tree  3 , Accuracy 0.8488
Tree  4 , Accuracy 0.846
Tree  5 , Accuracy 0.8492
Tree  6 , Accuracy 0.8496
Tree  7 , Accuracy 0.8504
Tree  8 , Accuracy 0.858
Tree  9 , Accuracy 0.8592
Tree  10 , Accuracy 0.848
Tree  11 , Accuracy 0.856
Tree  12 , Accuracy 0.8732
Tree  13 , Accuracy 0.8636
Tree  14 , Accuracy 0.8552
Tree  15 , Accuracy 0.8524
Tree  16 , Accuracy 0.8336
Tree  17 , Accuracy 0.8512
Tree  18 , Accuracy 0.8536
Tree  19 , Accuracy 0.8472
Tree  20 , Accuracy 0.8544
Tree  21 , Accuracy 0.8568
Tree  22 , Accuracy 0.8424
Tree  23 , Accuracy 0.8488
Tree  24 , Accuracy 0.8516
Tree  25 , Accuracy 0.862
Tree  26 , Accuracy 0.858
Tree  27 , Accuracy 0.85
Tree  28 , Accuracy 0.8568
Tree  29 , Accuracy 0.8484
Tree  30 , Accuracy 0.8568
Tree  31 , Accuracy 0.8512
Tree  32 , Accuracy 0.8624
Tree  33 , Accuracy 0.8468
Tree  34 , Accuracy 0.858
Tree  35 , Accuracy 0.858
Tree  36 , Accuracy 0.8708
Tree  37 , Accuracy 0.8572
Tree  38 , Accuracy 

Tree  305 , Accuracy 0.8524
Tree  306 , Accuracy 0.854
Tree  307 , Accuracy 0.848
Tree  308 , Accuracy 0.8568
Tree  309 , Accuracy 0.8628
Tree  310 , Accuracy 0.85
Tree  311 , Accuracy 0.8572
Tree  312 , Accuracy 0.8644
Tree  313 , Accuracy 0.8556
Tree  314 , Accuracy 0.852
Tree  315 , Accuracy 0.8592
Tree  316 , Accuracy 0.8624
Tree  317 , Accuracy 0.852
Tree  318 , Accuracy 0.8668
Tree  319 , Accuracy 0.8516
Tree  320 , Accuracy 0.8664
Tree  321 , Accuracy 0.85
Tree  322 , Accuracy 0.8548
Tree  323 , Accuracy 0.8548
Tree  324 , Accuracy 0.8464
Tree  325 , Accuracy 0.8568
Tree  326 , Accuracy 0.8564
Tree  327 , Accuracy 0.8504
Tree  328 , Accuracy 0.8532
Tree  329 , Accuracy 0.8536
Tree  330 , Accuracy 0.8516
Tree  331 , Accuracy 0.8532
Tree  332 , Accuracy 0.86
Tree  333 , Accuracy 0.852
Tree  334 , Accuracy 0.8632
Tree  335 , Accuracy 0.858
Tree  336 , Accuracy 0.8468
Tree  337 , Accuracy 0.8548
Tree  338 , Accuracy 0.8452
Tree  339 , Accuracy 0.852
Tree  340 , Accuracy 0.8596
Tree 

Tree  606 , Accuracy 0.8572
Tree  607 , Accuracy 0.86
Tree  608 , Accuracy 0.86
Tree  609 , Accuracy 0.854
Tree  610 , Accuracy 0.848
Tree  611 , Accuracy 0.8548
Tree  612 , Accuracy 0.8596
Tree  613 , Accuracy 0.8552
Tree  614 , Accuracy 0.85
Tree  615 , Accuracy 0.8488
Tree  616 , Accuracy 0.8592
Tree  617 , Accuracy 0.8528
Tree  618 , Accuracy 0.8624
Tree  619 , Accuracy 0.8552
Tree  620 , Accuracy 0.8592
Tree  621 , Accuracy 0.8452
Tree  622 , Accuracy 0.8576
Tree  623 , Accuracy 0.8604
Tree  624 , Accuracy 0.8564
Tree  625 , Accuracy 0.8492
Tree  626 , Accuracy 0.8588
Tree  627 , Accuracy 0.8656
Tree  628 , Accuracy 0.8552
Tree  629 , Accuracy 0.8572
Tree  630 , Accuracy 0.8528
Tree  631 , Accuracy 0.86
Tree  632 , Accuracy 0.846
Tree  633 , Accuracy 0.8596
Tree  634 , Accuracy 0.8476
Tree  635 , Accuracy 0.8468
Tree  636 , Accuracy 0.8604
Tree  637 , Accuracy 0.852
Tree  638 , Accuracy 0.8612
Tree  639 , Accuracy 0.8516
Tree  640 , Accuracy 0.8644
Tree  641 , Accuracy 0.8476
Tree

Tree  905 , Accuracy 0.8568
Tree  906 , Accuracy 0.8572
Tree  907 , Accuracy 0.8644
Tree  908 , Accuracy 0.8572
Tree  909 , Accuracy 0.8456
Tree  910 , Accuracy 0.8664
Tree  911 , Accuracy 0.8464
Tree  912 , Accuracy 0.8568
Tree  913 , Accuracy 0.8644
Tree  914 , Accuracy 0.846
Tree  915 , Accuracy 0.8436
Tree  916 , Accuracy 0.8516
Tree  917 , Accuracy 0.85
Tree  918 , Accuracy 0.8472
Tree  919 , Accuracy 0.8412
Tree  920 , Accuracy 0.8528
Tree  921 , Accuracy 0.8536
Tree  922 , Accuracy 0.8596
Tree  923 , Accuracy 0.8472
Tree  924 , Accuracy 0.8552
Tree  925 , Accuracy 0.858
Tree  926 , Accuracy 0.8608
Tree  927 , Accuracy 0.8596
Tree  928 , Accuracy 0.8648
Tree  929 , Accuracy 0.8548
Tree  930 , Accuracy 0.8492
Tree  931 , Accuracy 0.8428
Tree  932 , Accuracy 0.8572
Tree  933 , Accuracy 0.8536
Tree  934 , Accuracy 0.8528
Tree  935 , Accuracy 0.8508
Tree  936 , Accuracy 0.8452
Tree  937 , Accuracy 0.8604
Tree  938 , Accuracy 0.864
Tree  939 , Accuracy 0.8488
Tree  940 , Accuracy 0.86

In [89]:
all_predictions = list()
for tree in decision_trees:
    all_predictions.append(tree.predict(X_test).tolist())

In [90]:
import numpy as np

trees_preds = np.array(all_predictions)
trees_preds.shape

(1000, 2500)

In [91]:
from scipy.stats import mode
final_preds, _ = mode(trees_preds, axis=0)

In [92]:
##Checking the accuracy using final predictions

sum(final_preds.squeeze() == y_test)/len(y_test)

0.8516