# Homework - 2 
***
**Name**: Connor Larson 

***

This assignment is due on Canvas by **5pm on Friday October 2nd**. Submit only this Jupyter notebook to Canvas.  Do not compress it using tar, rar, zip, etc. Your solutions to analysis questions should be done in Markdown directly below the associated question.  Remember that you are encouraged to discuss the problems with your classmates and instructors, but **you must write all code and solutions on your own**, and list any people or sources consulted.

## Boosting - Extra Credit [5-points]
***

In this problem, we slightly modify the AdaBoost algorithm to better explore some properties of the algorithm. Specifically, we no longer normalize the weights on the training examples after each iteration. The modified algorithm, which is set to run for $T$ iterations, is shown in Algorithm I.

Note that in the modified version, the weights associated with the training examples are no longer guaranteed to sum to one after each iteration (and therefore can not be viewed as a "distribution"), but the algorithm is still valid. Let us denote the sum of weights at the start of iteration $t$ by $Z_t = \sum_{i=1}^{n}w_t^{(t)}$. At the start of the first iteration of boosting, $Z_1 = n$. Let us now investigate the behavior of $Z_t$, as a function of t

![image](fig-1.png)

**A:** At the $i^{th}$ iteration, we found a weak classifier that achieves a weighted training error $\epsilon_t$. Show that the choice, $\alpha_t = \frac{1}{2}\log\frac{1 - \epsilon_t}{\epsilon_t}$ is the optimal in the sense that it minimizes $Z_{t+1}$

*Hint: Look at $Z_{t+1}$ as function of $\alpha$ and find the value of $\alpha$ for which the function achieves the maximum. You may also find the following notational shorthand useful:

$$W_t = \sum_{i=1}^{n}w_i^{(t)}(1 - \delta(y_i, h_t(x_i)))$$
$$W_c = \sum_{i=1}^{n}w_i^{(t)}(\delta(y_i, h_t(x_i)))$$

where $W_c$ is the total weight of the points classified correctly by $h_t$ and $W_t$ is the total weight of the misclassified points. $\delta(y, h_t(x)) = 1$ whenever the label predicted by $h_t$ is correct and zero otherwise. The weights here are those available at the start of iteration $t$

**B:** Show that the sum of weights $Z_t$ is monotonically decreasing as a function of $t$.

## Training Data
***
Please do not change this class

In [1]:
import numpy as np
from sklearn.base import clone

In [4]:
class ThreesAndEights:
    """
    Class to store MNIST data
    """

    def __init__(self, location):

        import pickle, gzip

        # Load the dataset
        f = gzip.open(location, 'rb')

        # Split the data set 
#         X_train, y_train, X_valid, y_valid = pickle.load(f)
        train_set, valid_set, test_set = pickle.load(f)
    
        X_train, y_train = train_set
        X_valid, y_valid = valid_set

        # Extract only 3's and 8's for training set 
        self.X_train = X_train[np.logical_or( y_train==3, y_train == 8), :]
        self.y_train = y_train[np.logical_or( y_train==3, y_train == 8)]
        self.y_train = np.array([1 if y == 8 else -1 for y in self.y_train])
        
        # Shuffle the training data 
        shuff = np.arange(self.X_train.shape[0])
        np.random.shuffle(shuff)
        self.X_train = self.X_train[shuff,:]
        self.y_train = self.y_train[shuff]

        # Extract only 3's and 8's for validation set 
        self.X_valid = X_valid[np.logical_or( y_valid==3, y_valid == 8), :]
        self.y_valid = y_valid[np.logical_or( y_valid==3, y_valid == 8)]
        self.y_valid = np.array([1 if y == 8 else -1 for y in self.y_valid])
        
        f.close()

In [5]:
data = ThreesAndEights("data/mnist.pklz")

Feel free to explore this data and get comfortable with it before proceeding further.

## Bagging
Bootstrap aggregating, also called bagging, is a machine learning ensemble meta-algorithm designed to improve the stability and accuracy of machine learning algorithms used in statistical classification and regression. It also reduces variance and helps to avoid overfitting. Although it is usually applied to decision tree methods, it can be used with any type of method. Bagging is a special case of the model averaging approach.

Given a standard training set $D$ of size n, bagging generates $N$ new training sets $D_i$, roughly each of size n * ratio, by sampling from $D$ uniformly and with replacement. By sampling with replacement, some observations may be repeated in each $D_i$ The $N$ models are fitted using the above $N$ bootstraped samples and combined by averaging the output (for regression) or voting (for classification). 

-Source [Wiki](https://en.wikipedia.org/wiki/Bootstrap_aggregating)

## Implementing Bagging [5-points]
***

We've given you a skeleton of the class `BaggingClassifier` below which will train a classifier based on the decision trees as implemented by sklearn. Your tasks are as follows, please approach step by step to understand the code flow:
* Implement `bootstrap` method which takes in two parameters (`X_train, y_train`) and returns a bootstrapped training set ($D_i$)
* Implement `fit` method which takes in two parameters (`X_train, y_train`) and trains `N` number of base models on different bootstrap samples. You should call `bootstrap` method to get bootstrapped training data for each of your base model
* Implement `voting` method which takes the predictions from learner trained on bootstrapped data points `y_hats` and returns final prediction as per majority rule. In case of ties, return either of the class randomly.
* Implement `predict` method which takes in multiple data points and returns final prediction for each one of those. Please use the `voting` method to reach consensus on final prediction.

In [57]:
from sklearn.tree import DecisionTreeClassifier

class BaggingClassifier:
    def __init__(self, ratio = 0.73, N = 20, base=DecisionTreeClassifier(max_depth=4)):
        """
        Create a new BaggingClassifier
        
        Args:
            base (BaseEstimator, optional): Sklearn implementation of decision tree
            ratio: ratio of number of data points in subsampled data to the actual training data
            N: number of base estimator in the ensemble
        
        Attributes:
            base (estimator): Sklearn implementation of decision tree
            N: Number of decision trees
            learners: List of models trained on bootstrapped data sample
        """
        
        assert ratio <= 1.0, "Cannot have ratio greater than one"
        self.base = base
        self.ratio = ratio
        self.N = N
        self.learners = []
        
    def fit(self, X_train, y_train):
        """
        Train Bagging Ensemble Classifier on data
        
        Args:
            X_train (ndarray): [n_samples x n_features] ndarray of training data   
            y_train (ndarray): [n_samples] ndarray of data 
        """
        #TODO: Implement functionality to fit models on the bootstrapped samples
        # cloning sklearn models:
        # from sklearn.base import clone
        # h = clone(self.base)
        for x in range(self.N):
            h = clone(self.base)
            new_X, new_y = self.bootstrap(X_train, y_train)
            h = h.fit(new_X,new_y)
            self.learners.append(h)
        
    def bootstrap(self, X_train, y_train):
        """
        Args:
            n (int): total size of the training data
            X_train (ndarray): [n_samples x n_features] ndarray of training data   
            y_train (ndarray): [n_samples] ndarray of data 
        """
        indexes = np.random.choice(X_train.shape[0], int(self.ratio * X_train.shape[0]))
        return X_train[indexes], y_train[indexes]
            
    
    def predict(self, X):
        """
        BaggingClassifier prediction for data points in X
        
        Args:
            X (ndarray): [n_samples x n_features] ndarray of data 
            
        Returns:
            yhat (ndarray): [n_samples] ndarray of predicted labels {-1,1}
        """
        #TODO: Using the individual classifiers trained predict the final prediction using voting mechanism
        yhat = np.zeros(len(self.learners))
        for x in range(len(self.learners)):
            yhat[x]= self.learners[x].predict(X)
        
        final_predict= self.voting(yhat)
        return final_predict
        
    def voting(self, y_hats):
        """
        Args:
            y_hats (ndarray): [N] ndarray of data
        Returns:
            y_final : int, final prediction of the 
        """
        #TODO: Implement majority voting scheme and incase of ties return random label
        unique, counts = np.unique(y_hats, return_counts=True)
        count_dict= dict(zip(unique, counts))
        if (len(unique) == 1):
            return unique[0]
        else : 
            if (count_dict[1]> count_dict[-1]):
                return 1
            elif (count_dict[-1]> count_dict[1]):
                return -1 
            elif (count_dict[1] == count_dict[-1]):
                rand_label = np.random.choice([-1,1], 1)
                return rand_label[0]
        

## BaggingClassifier for Handwritten Digit Recognition [5-points]
***

After you've successfully completed `BaggingClassifier` find the optimal values of `N` and `depth` using k-fold cross validation. You are allowed to use sklearn library to split your training data in folds. Keep the other hyperparameters unchanged. Use the data from `ThreesAndEights` class initialized variable `data`. 

Justify why those values are optimal

Report accuracy on the validation data using the optimal parameter values.

What is the most deciding hyperparameter and why?

Hint:  Vary `depth` up to 10, `N` up to 40. The number of decision trees `N` is generally a trade-off between 'improvement in accuracy' vs 'computation time'.

In [71]:
#ratio = 0.73, N = 20, base=DecisionTreeClassifier(max_depth=4)
from sklearn.model_selection import KFold
depth_loop =[4,5,6,7,8,9,10]
N_loop =[20,23,25,27,30,33,35,37,40]
for depth in depth_loop:
    for N_n in N_loop:
        print('depth:', depth, 'N:', N_n)
        kf = KFold(n_splits=3)
        kf.get_n_splits(data.X_train)
        correct_list=[]
        for train_index, test_index in kf.split(data.X_train):
            x_tr, x_tes = data.X_train[train_index], data.X_train[test_index]
            y_tr, y_tes = data.y_train[train_index], data.y_train[test_index]
            BC= BaggingClassifier(ratio = 0.73, N = N_n, base=DecisionTreeClassifier(max_depth=depth))
            BC.fit(x_tr, y_tr)
            correct=0
            for index in range(x_tes.shape[0]):
                prediction = BC.predict([x_tes[index]])
                if prediction == y_tes[index]:
                    correct+=1
            correct_list.append(correct/y_tes.shape[0])
        total=0
        for item in correct_list:
            total+= item
        print(total/3)

depth: 4 N: 20
0.9547418466016925
depth: 4 N: 23
0.9546413239018584
depth: 4 N: 25
0.9555466957220657
depth: 4 N: 27
0.9550437788039406
depth: 4 N: 30
0.9557477714636292
depth: 4 N: 33
0.9551446656065208
depth: 4 N: 35
0.9557478321474203
depth: 4 N: 37
0.955445839261381
depth: 4 N: 40
0.9556472184218997
depth: 5 N: 20
0.9609776219418024
depth: 5 N: 23
0.9594687801617404
depth: 5 N: 25
0.9609774398904293
depth: 5 N: 27
0.9606759628166138
depth: 5 N: 30
0.9590666893624045
depth: 5 N: 33
0.9598715384827777
depth: 5 N: 35
0.9616813415244314
depth: 5 N: 37
0.9616815539177
depth: 5 N: 40
0.9604748263912594
depth: 6 N: 20
0.9652015475580388
depth: 6 N: 23
0.966509131545164
depth: 6 N: 25
0.9650003807907886
depth: 6 N: 27
0.9651011462257868
depth: 6 N: 30
0.9648998277490591
depth: 6 N: 33
0.9650005325002663
depth: 6 N: 35
0.9658048051246247
depth: 6 N: 37
0.9662073510523935
depth: 6 N: 40
0.9667102679705186
depth: 7 N: 20
0.9681181622642093
depth: 7 N: 23
0.968219018724894
depth: 7 N: 25
0.967

In [63]:
from sklearn.model_selection import KFold
loop =[20,23,25,27,30,33,35,37,40]
for x in loop:
    print (x)
    kf = KFold(n_splits=3)
    kf.get_n_splits(data.X_train)
    for train_index, test_index in kf.split(data.X_train):
        x_tr, x_tes = data.X_train[train_index], data.X_train[test_index]
        y_tr, y_tes = data.y_train[train_index], data.y_train[test_index]
        BC= BaggingClassifier(ratio = 0.73, N = x, base=DecisionTreeClassifier(max_depth=4))
        BC.fit(x_tr, y_tr)
        correct=0
        for index in range(x_tes.shape[0]):
            prediction = BC.predict([x_tes[index]])
            if prediction == y_tes[index]:
                correct+=1
        print(correct/y_tes.shape[0])
        
    

20
0.9541478129713424
0.9574532287266143
0.9493059746529873
23
0.9532428355957768
0.9574532287266143
0.952021726010863
25
0.9541478129713424
0.9565479782739892
0.9553409776704889
27
0.9556561085972851
0.9571514785757392
0.9517199758599879
30
0.955052790346908
0.9565479782739892
0.9532287266143633
33
0.9577677224736049
0.955642727821364
0.9499094749547374
35
0.951131221719457
0.9574532287266143
0.9550392275196138
37
0.9583710407239819
0.9580567290283645
0.9523234761617381
40
0.9577677224736049
0.9586602293301146
0.9526252263126131


In [74]:
kf = KFold(n_splits=3)
kf.get_n_splits(data.X_train)
for train_index, test_index in kf.split(data.X_train):
    x_tr, x_tes = data.X_train[train_index], data.X_train[test_index]
    y_tr, y_tes = data.y_train[train_index], data.y_train[test_index]
    BC= BaggingClassifier(ratio = 0.73, N = 25, base=DecisionTreeClassifier(max_depth=10))
    BC.fit(x_tr, y_tr)
    correct=0
    for index in range(x_tes.shape[0]):
        prediction = BC.predict([x_tes[index]])
        if prediction == y_tes[index]:
            correct+=1
    print(correct/y_tes.shape[0])

0.971342383107089
0.9740494870247435
0.9740494870247435


In [75]:
correct=0
for index in range(data.X_valid.shape[0]):
    prediction = BC.predict([data.X_valid[index]])
    if prediction == data.y_valid[index]:
        correct+=1
print(correct/y_tes.shape[0])


0.6004828002414001


# Random Decision Tree [10-points]

In this assignment you are going to implement a random decision tree using random vector method as discussed in the lecture.

Best split: One that achieves maximum reduction in gini index across multiple candidate splits. (decided by `candidate_splits` attribute of the class `RandomDecisionTree`)

Use `TreeNode` class as node abstraction to build the tree

You are allowed to add new attributes in the `TreeNode` and `RandomDecisionTree` class - if that helps.

Your tasks are as follows:
* Implement `gini_index` method which takes in class labels as parameter and returns the gini impurity as measure of uncertainty

* Implement `majority` method which picks the most frequent class label. In case of tie return any random class label

* Implement `find_best_split` method which finds the random vector/hyperplane which causes most reduction in the gini index. 

* Implement `build_tree` method which uses `find_best_split` method to get the best random split vector for current set of training points. This vector partitions the training points into two sets, and you should call `build_tree` method on two partitioned sets and build left subtree and right subtree. Use `TreeNode` as abstraction for a node.

> The method calls itself recursively to the generate left and right subtree till the point either `max_depth` is reached or no good random split is found.  When either of two cases is encountered, you should make that node as leaf and identify the label for that leaf to be the most frequent class (use `majority` method). Go through lecture slides for better understanding

* Implement `predict` method which takes in multiple data points and returns final prediction for each one of those using the tree built. (`root` attribute of the class)

In [181]:
import math 
class TreeNode:
    def __init__(self):
        self.left = None
        self.right = None
        self.isLeaf = False
        self.label = None
        self.split_col = None
        self.split_val = None

    def getLabel(self):
        if not self.isLeaf:
            raise Exception("Should not do getLabel on a non-leaf node")
        return self.label
    
class RandomDecisionTree:
            
    def __init__(self, candidate_splits = 100, depth = 10, t_val= .4):
        """
        Args:
            candidate_splits (int) : number of random decision splits to test
            depth (int) : maximum depth of the random decision tree
        """
        self.candidate_splits = candidate_splits
        self.depth = depth
        self.threshold_val = t_val
        self.root = None
    
    def fit(self, X_train, y_train):
        """
        Args:
            X_train (ndarray): [n_samples x n_features] ndarray of training data   
            y_train (ndarray): [n_samples] ndarray of data
            
        """
        self.root = self.build_tree(X_train[:], y_train[:], 0)
        return self
        
    def build_tree(self, X_train, y_train, height, node = TreeNode()):
        """
        Args:
            X_train (ndarray): [n_samples x n_features] ndarray of training data   
            y_train (ndarray): [n_samples] ndarray of data
            
       Return:
            split_vector: random vector which gives most reduction in uncertainty
            feature_indices: indices of the random sub-features used
            lindices: indices of training example which should be in left subtree
            rindices: indices of training example which should be in right subtree
            
        """
        
        # your logic here
            
        if self.gini_index(y_train) ==0: 
            # if data is pure 
            print("pure data found")
            node.isLeaf = true 
            node.label = y_train[0] # since all the values are the same, I can pick the first value 
            return node

        elif height == self.depth : 
            # if max height is reached
            print("Max height reached")
            node.isLeaf= true
            node.label= self.majority(y_train)
            return node
        elif self.gini_index(y_train) <self.threshold_val:
            node.isLeaf= true
            node.label= self.majority(y_train)
            return node
        else:
            print("splitting: height", height)
            split_col, split_val =self.find_best_split(X_train, y_train)
            left_x, left_y, right_x, right_y = self.split_data(split_col, split_val, X_train, y_train)
            node.split_col = split_col
            node.split_val = split_val
            print("splitting: Right")
            node.left = self.build_tree(left_x,left_y,height+1,node)
            print("splitting: left")
            node.right = self.build_tree(right_x,right_y,height+1,node)
            
            return node

    def split_data(self, col, val, X_train, y_train):
        """
        Args:
            col (int)  : Column to split the data on 
            val (float): value to split the data on in said column 
            X_train (ndarray): [n_samples x n_features] ndarray of training data   
            y_train (ndarray): [n_samples] ndarray of data
        Return:
            data_below      (ndarray): the X values below the split 
            data_below_vals (ndarray): the Y vals that go with the X vals below
            data_above      (ndarray): the x values above the split 
            data_above_vals (ndarray): the Y vals that go with the X vals above 
            
        """
        data_below=[] 
        data_below_vals=[] 
        data_above=[] 
        data_above_vals=[]
        for item_index in range(X_train.shape[0]):
            if X_train[item_index][col] < val: 
                data_below.append(X_train[item_index])
                data_below_vals.append(y_train[item_index])
            else: 
                data_above.append(X_train[item_index])
                data_above_vals.append(y_train[item_index])
        # converts to np array 
        data_below=      np.array(data_below)
        data_below_vals= np.array(data_below_vals)
        data_above=      np.array(data_above)
        data_above_vals= np.array(data_above_vals)
        
        return data_below, data_below_vals, data_above, data_above_vals
        
        
    def find_best_split(self, X_train, y_train):
        """
        Args:
            X_train (ndarray): [n_samples x n_features] ndarray of training data   
            y_train (ndarray): [n_samples] ndarray of data
            
        """
        min_gini = 1
        split_col=None
        split_val=None
        # gets the colmns to test splits on 
#         random_splits = np.random.choice(X_train.shape[1], self.candidate_splits, replace=False)        
#         for col in random_splits:

        for col in range(X_train.shape[1]):
            if col%100 == 0:
                print('col: ', col)
            # gets the values to test splits on for the specefied col 
            potential_split_vals = np.random.rand(100)
            for partition_value in potential_split_vals:
                less, less_label, greater, greater_label = self.split_data(col, partition_value, X_train, y_train)     
                # calc Gini for each side 
                left_score=0
                right_score=0
               
                if less_label.shape[0]>0:
                    left_score = self.gini_index(less_label)
                if greater_label.shape[0]> 0:
                    right_score= self.gini_index(greater_label)

                total_score = (left_score * (less_label.shape[0]/X_train.shape[0])) + (right_score * (greater_label.shape[0]/X_train.shape[0]))

                if total_score < min_gini:
                    min_gini = total_score
                    split_col = col
                    split_val = partition_value
        print(min_gini)
        return split_col, split_val
            
        
    def gini_index(self, y):
        """
        Args:
            y (ndarray): [n_samples] ndarray of data
        """
        unique, counts = np.unique(y, return_counts=True)
        count_dict= dict(zip(unique, counts))
        if (len(unique) == 1):
            return 0
        else: 
            ratio_one = count_dict[1]/ y.shape[0]
            return(-1*((ratio_one*math.log2(ratio_one))+((1-ratio_one)*math.log2(1-ratio_one))))
            
    
    def majority(self, y):
        """
        Return the major class in ndarray y
        """
        unique, counts = np.unique(y_hats, return_counts=True)
        count_dict= dict(zip(unique, counts))
        if (len(unique) == 1):
            return unique[0]
        else : 
            if (count_dict[1]> count_dict[-1]):
                return 1
            elif (count_dict[-1]> count_dict[1]):
                return -1 
            elif (count_dict[1] == count_dict[-1]):
                rand_label = np.random.choice([-1,1], 1)
                return rand_label[0]
    
    def predict(self, X):
        """
        BaggingClassifier prediction for new data points in X
        
        Args:
            X (ndarray): [n_samples x n_features] ndarray of data 
            
        Returns:
            yhat (ndarray): [n_samples] ndarray of predicted labels {-1,1}
        """

In [182]:
RD = RandomDecisionTree()
RD.fit(data.X_train, data.y_train)

splitting: height 0
col:  0
col:  100
col:  200
col:  300


KeyboardInterrupt: 

In [184]:
RD = RandomDecisionTree()

X_train=data.X_train
y_train=data.y_train
def find_best_split( X_train, y_train):
        """
        Args:
            X_train (ndarray): [n_samples x n_features] ndarray of training data   
            y_train (ndarray): [n_samples] ndarray of data
            
        """
        min_gini = 1
        split_col=None
        split_val=None
        # gets the colmns to test splits on 
#         random_splits = np.random.choice(X_train.shape[1], self.candidate_splits, replace=False)        
#         for col in random_splits:

        for col in range(X_train.shape[1]):
            if col%100 == 0:
                print('col: ', col)
            # gets the values to test splits on for the specefied col 
            potential_split_values =[]
            for item in X_train:
                potential_split_values.append(item[col])
            potential_split_values= np.array(potential_split_values)
            unique, counts = np.unique(potential_split_values, return_counts=True)
            if unique.shape[0]> self.candidate_splits:
                unique = np.random.choice(unique, self.candidate_splits, replace = False)
#             print('num_unique',unique.shape[0])
            for partition_value in unique:
                less, less_label, greater, greater_label = RD.split_data(col, partition_value, X_train, y_train)     
                # calc Gini for each side 
                left_score=0
                right_score=0
               
                if less_label.shape[0]>0:
                    left_score = RD.gini_index(less_label)
                if greater_label.shape[0]> 0:
                    right_score= RD.gini_index(greater_label)

                total_score = (left_score * (less_label.shape[0]/X_train.shape[0])) + (right_score * (greater_label.shape[0]/X_train.shape[0]))

                if total_score < min_gini:
                    min_gini = total_score
                    split_col = col
                    split_val = partition_value
        print(min_gini)
        return split_col, split_val
    
t = np.random.rand(100)
print(t.shape[0], t)


100 [0.99103293 0.73493051 0.77328569 0.29197335 0.11751074 0.31090922
 0.47390915 0.24234842 0.63655838 0.31416762 0.09516273 0.78096486
 0.04069096 0.5328597  0.76286195 0.48365198 0.46413188 0.8497956
 0.72319886 0.06152411 0.91652561 0.88020049 0.09937733 0.90701467
 0.8760287  0.05697669 0.80382833 0.96500583 0.04156106 0.34821466
 0.41396195 0.59063842 0.31806584 0.43054594 0.60676649 0.27758135
 0.8275886  0.81717303 0.1602623  0.76933576 0.54811974 0.12569193
 0.8496947  0.2707646  0.77895879 0.64372012 0.16023451 0.8410038
 0.71380677 0.02073296 0.72172966 0.32757411 0.44415531 0.21022716
 0.0148484  0.84204034 0.88845447 0.51972173 0.20269109 0.90131831
 0.12499979 0.01661075 0.21007203 0.21460979 0.97438732 0.38968209
 0.60131622 0.23538271 0.07022862 0.33160898 0.35934958 0.9557119
 0.23733623 0.70999763 0.50965862 0.72959178 0.80622442 0.63577066
 0.21346794 0.83456683 0.72645712 0.83001456 0.95838558 0.9864673
 0.22095382 0.38203977 0.76811211 0.22450809 0.90291008 0.8892

In [165]:

X_train=data.X_train
y_train=data.y_train
RD = RandomDecisionTree()
min_gini = 10
split_col=None
split_val=None
# gets the colmns to test splits on 
random_splits = np.random.choice(X_train.shape[1], RD.candidate_splits, replace=False)
print(random_splits.shape[0])
for col in random_splits:
    # gets the values to test splits on for the specefied col 
    potential_split_values =[]
    for item in X_train:
        potential_split_values.append(item[col])
    potential_split_values= np.array(potential_split_values)
    unique, counts = np.unique(potential_split_values, return_counts=True)
    print('num_unique',unique.shape[0])
    for partition_value in unique:
        less = []
        less_label=[]
        greater=[]
        greater_label=[]

        for item_index in range(X_train.shape[0]):
            if X_train[item_index][col]< partition_value: 
                less.append(X_train[item_index])
                less_label.append(y_train[item_index])
            else:
                greater.append(X_train[item_index])
                greater_label.append(y_train[item_index])        
        # calc Gini for each side 
        left_score=0
        right_score=0
        less_label= np.array(less_label)
        greater_label= np.array(greater_label)
        if less_label.shape[0]>0:
            left_score = RD.gini_index(less_label)
        if greater_label.shape[0]> 0:
            right_score= RD.gini_index(greater_label)

        total_score = (left_score * (less_label.shape[0]/X_train.shape[0])) + (right_score * (greater_label.shape[0]/X_train.shape[0]))
        print(total_score)
        if total_score < min_gini:
            min_gini = total_score
            split_col = col
            split_val = partition_value
print(min_gini)
print(split_col)
print(split_val)

100
num_unique 193
0.9995104936653652
0.9589879441892597
0.9592775511351255
0.9593353155184261
0.9590996632835795
0.9600669192064305
0.959926260705038
0.9604098289838524
0.9607961359893804
0.9605567559873653
0.9610402144008791
0.9606471517820936
0.9608412896876587
0.9614229900113266
0.9615198355277174
0.9614640871774776
0.961755057131931
0.9622394112479011
0.9627230151945548
0.9625688394048502
0.9628950158893643
0.9633795722395594
0.9637666805776389
0.9639600546641723
0.96379974113013
0.963896638847011
0.964090345125043
0.9641871535785325
0.9641194390456056
0.9643134298200425
0.964240884556714
0.963980729801378
0.9640784311616833
0.9641761045682586
0.964371367324879
0.9645665176769191
0.9647615552241589
0.9649564795603722
0.9654432923813371
0.9656378171906527
0.9659293884238892
0.966123624612932
0.9662206991488733
0.9664147608182042
0.9667056339464325
0.9666218038118428
0.9667189676760165
0.9670102858745915
0.9671073339481295
0.9667259235713384
0.9668233008094766
0.9669206497042504
0.9

0.9995104936653652
0.9309911039563474
0.9310929373215417
0.9316984091060078
0.9320009787955333
0.9324009584000986
0.9326999891711636
0.9328908017640254
0.9334801026681766
0.9340852093743965
0.9344880445445801
0.9349819561126482
0.9353841044241472
0.9361870285123794
0.936567821876351
0.9372693142876161
0.9374578533784015
0.9381581813007984
0.9386575547547114
0.9391562082294294
0.9397536386221783
0.9401513455227781
0.9405382011641393
0.9409353622392707
0.9407249594244639
0.94039706628318
0.9409776789559154
0.9413758616797225
0.9415747809718558
0.9417735853262245
0.9424512721196376
0.9428481482528023
0.9425315149272365
0.9426099854119467
0.9427094051079663
0.9425621447904803
0.9429605355901438
0.9433584735092733
0.9435572721998235
0.9442521706444106
0.9444255229496279
0.9447230825528924
0.9450203852032684
0.945119428875388
0.9452184439018999
0.9457130885008772
0.9460095299484618
0.945530381114592
0.9459271405892733
0.9457957717030262
0.9459598675044182
0.9465554914833124
0.946753807921931

0.9965826726730047
0.9963834896027532
0.996487713840883
0.9964777691756044
0.9963163012716147
0.9962687123579114
0.9965639780755631
0.9964151989041391
0.9962897903496893
0.9961220572388388
0.996007545756481
0.9961714998818045
0.9961004599722383
0.9956009081162948
0.9953790427099549
0.9953602615111774
0.9951855245068446
0.9950303765150101
0.99503589783283
0.9950607341388118
0.9952412873807437
0.9952418290229887
0.9948645763346149
0.9946734870353325
0.9944919718215337
0.994580620792896
0.9939780765459296
0.9938626772469701
0.9937776640900555
0.9937108195467163
0.9931811060882703
0.9927057939178496
0.9932184353219594
0.9951262191414674
0.999022507606748
0.9990276661406801
num_unique 1
0.9995104936653652
num_unique 256
0.9995104936653652
0.9659626595743089
0.9656293239016197
0.9658140416858549
0.9660845411761013
0.9658862522429708
0.9665777575442411
0.9665527143055636
0.9666025089700598
0.9668682516825025
0.9671443712915859
0.9674834965945058
0.9671862190867029
0.9677284608161006
0.9678143

0.9093814706254211
0.9087587459462643
0.908663283483278
0.9080022471252747
0.90836861298606
0.908800078213754
0.9089126634523239
0.9087749712924214
0.9089034900649213
0.9090488126015467
0.9095171959626093
0.9097408802342885
0.910490449741707
0.9106747146381822
0.910815008560969
0.9110523122591685
0.9114047973170738
0.9111162793893581
0.9113136285588828
0.911412396348361
0.9106925501000697
0.9107312986978675
0.9112758919495418
0.9114107975915646
0.9108217193624024
0.9113233646457981
0.9109648110857946
0.9111936374861545
0.9112828283937184
0.9107464645195285
0.9105960276301622
0.910628077196716
0.9109288417921536
0.9106774176759
0.9110503253564916
0.9111598855064416
0.9107323876143627
0.9106399728244867
0.910643720959683
0.9110357722092208
0.9110372577237915
0.9101188850063269
0.9105299821506974
0.9105038735803437
0.9108131797776425
0.9116296585562236
0.9116001938384517
0.9117163406654838
0.9118573879105722
0.9120197672299827
0.9120631080062229
0.9119476567095908
0.912127821811238
0.9123

0.9995104931934757
0.9995085355837923
0.9995023162447011
0.9994914634667218
0.9995014012441161
0.9994896504012788
0.9994724720215289
0.9994493028979251
0.9994689773878753
0.9994438515407674
0.9994649804833202
0.9994825598225701
0.9994962339357282
0.9995055890469231
0.9995047533992257
0.9994921905983566
0.9994713889953586
0.9994410143896255
0.9993993517661776
0.9994317198437059
0.9994596088073681
0.9994823401304602
0.9994990636557847
0.9995086925735535
0.9994965463524318
0.9995080445623912
0.9995099020424106
0.9995071242651173
0.9994883239882708
0.9995057490116749
0.9994809752998974
0.9995035249006746
0.9993167817347592
num_unique 4
0.9995104936653652
0.9994894683660106
0.9995103951607622
0.9994060830685106
num_unique 1
0.9995104936653652
num_unique 249
0.9995104936653652
0.9517619422168395
0.9525858442800572
0.9526757652523075
0.9530349479798665
0.9539073278512347
0.9540941665891638
0.9543147397671937
0.9542052440071228
0.9547946164145491
0.9552055408686795
0.9556725968355373
0.9559874

0.9492689444094308
0.9490704924213123
0.9489497070294439
0.9490488314211417
0.9489529042817698
0.9493603620755946
0.9495558675127429
0.9493121364995203
0.9495953894273044
0.9491704208268009
0.9491092015082658
0.9492048717660285
0.9499734912614338
0.9501236378960132
0.9494249138304711
0.9499385526623366
0.949474853999495
0.9493116456442248
0.9494220224097301
0.9500233753769027
0.9501865727567118
0.9504217396705631
0.9497281856305476
0.9493230981771443
0.9492280301836934
0.9494443030775434
0.9492748865876044
0.9491239692272029
0.9490995941864122
0.9490010806361975
0.9495813167447569
0.9497211166778394
0.9496013553775053
0.9499060278121134
0.9501168083520192
0.9497561646136965
0.9493304388253886
0.9491492298210601
0.9487118951635426
0.9485801581711846
0.9488903033697138
0.949044699145954
0.9496100620285142
0.9498650756996645
0.949572973029156
0.9492613269100552
0.948901829258535
0.948711328108944
0.948543227068258
0.9487261063821284
0.9492213001217911
0.9494950140940555
0.9496932495094237

0.9676046181017975
0.9681031179351357
0.968351280754918
0.9685361089472525
0.9687199289721122
0.9687197927816604
0.9687194804970216
0.9687794511755525
0.9689611613389475
0.9690796539874356
0.969320585370808
0.969016929191622
0.9691953382990668
0.9693747143543727
0.9691922246371114
0.9696101178849406
0.9694886693566691
0.969122688955103
0.9691212717913373
0.9693566442902231
0.9694139590155801
0.9691038954287384
0.9689750382142834
0.9689086865963293
0.9690254598185166
0.9693221043912201
0.9691309690225278
0.9690608966452436
0.9695920677926104
0.9696425607189423
0.9695153133108443
0.9699184853505102
0.9697272522905738
0.9696598003096022
0.9697007266855948
0.9693853683568523
0.969613191410551
0.9697164006461005
0.9695779052058966
0.9693839494258263
0.9694856565623383
0.9690589327954338
0.9692251484518971
0.969150271825547
0.9692596658815217
0.9696664210704551
0.9697163169207015
0.9698241669400988
0.9697488565304444
0.9699692237567502
0.9705331340901101
0.9708666196673605
0.970968056383648


0.9994242924923308
0.9994348759069605
0.9994559804565603
0.9994660537726748
0.9994648021393066
0.9994741078585324
0.9994635062080873
0.9994815122733349
0.9994718840056478
0.9994805771879288
0.9994796064745967
0.9994694916781909
0.99947859825077
0.9994682267163034
0.9994775504985385
0.9994669125649255
0.9994921891275701
0.9994914702064757
0.9994907198695949
0.9994972071236862
0.9995024508305964
0.9995020337315874
0.9994882610728189
0.9995052555243983
0.9995084449433183
0.9995101744803543
0.9995045706450358
0.9994990395827815
0.9994916025901731
0.9994821901247553
0.9994809930286244
0.9994898163852362
0.9994657257295525
0.9994507224133945
0.999475521159076
0.9994619175310293
0.9994722948344188
0.9994922444627855
0.9994912382135241
0.9995079195945422
0.999420123987168
0.9993140215408978
num_unique 256
0.9995104936653652
0.9975094078406945
0.9975943336569804
0.9979496881174564
0.9981321295399799
0.9980914332511446
0.998032171932642
0.9982139071986961
0.998390533711448
0.9984637239314343
0.9

0.9942666989881835
0.9944026493641993
0.994469965335589
0.9945780016237586
0.9946151650204836
0.9946870633639489
0.9947945944473509
0.9949216884027166
0.9949555495680218
0.9952201368527823
0.9951395246578909
0.9951632211425703
0.9951482426833895
0.9953157927827003
0.9953892214651455
0.9953996593633638
0.9954063281591556
0.9954203463703994
0.9954908798651647
0.9952922485393101
0.9954414821996893
0.9954979361855958
0.9956208288211725
0.9957400623234243
0.9959015939447057
0.995953151441713
0.9960440940621083
0.9960877071328855
0.9962611605486686
0.9962123805720525
0.9961631559453925
0.9960453135197427
0.9960597838977245
0.9960904084975132
0.9960163809807092
0.996066977859885
0.9960180248294315
0.9960269340612752
0.9960774136124732
0.9961490314993714
0.9963025628796613
0.9963511430386596
0.9962830040447117
0.9962776254195708
0.9963089513881298
0.99630177132743
0.9961038802685013
0.9960430449164066
0.9962310763748436
0.996378765012407
0.9964022045972948
0.996547325425472
0.9965197076612684


0.9990072513967897
0.999069053133531
0.9991286099395741
0.9991856744282434
0.9992399551729615
0.9992911054697843
0.9993387082241079
0.9993822552152104
0.9994211179850724
0.9993673817085498
0.9994104644754841
0.9994780068835529
0.9995100994089143
0.9991972157356805
0.9993016570935239
num_unique 1
0.9995104936653652
num_unique 9
0.9995104936653652
0.9991198939525662
0.9992035831212204
0.9992837600752587
0.9993590016764385
0.9994267820534102
0.9994819052843283
0.9995103951607622
0.9994136446280635
num_unique 4
0.9995104936653652
0.9992199049813407
0.9993167817347592
0.9994136446280635
num_unique 44
0.9995104936653652
0.9980133046846382
0.9980814614101632
0.9981489376991965
0.9982157022193099
0.9983469590358117
0.9982784190830403
0.998345363488508
0.9980394623690595
0.9978571906554763
0.9979342397959385
0.9980106961880365
0.9980865256036232
0.9982361505172351
0.9983098602310903
0.9981267495725107
0.9979114967987287
0.9979936525988957
0.998075256712095
0.9979114476725996
0.9979972137075812


0.9691853950584963
0.9693347848817045
0.9697245958561342
0.9698481901257787
0.9694627365161084
0.9697252156058014
0.9705114824004916
0.9704390335431318
0.9699361808121926
0.9703849680578205
0.9702719094775348
0.9701369156998163
0.9708215510021807
0.9709524129308602
0.9714095946532407
0.9714147054960237
0.9711254451265939
0.971099657041146
0.9710249847382956
0.9710755193152056
0.9709840452097851
0.9712532414998576
0.9713484062940825
0.9715689579007252
0.9715074805082429
0.9717429672741928
0.9722117735604592
0.9724767994441891
0.9725246739064923
0.9725407162606383
0.9725566049885754
0.972524777524063
0.9727734565785425
0.9730056604934475
0.9731748755414804
0.9730342478469133
0.9729383977087072
0.9734179810614472
0.9733701544137356
0.9734777253405129
0.9736467719916515
0.9739065363492385
0.9742107664746398
0.9744234531815449
0.974799771957604
0.9745982304628582
0.9745947086252648
0.97460706391269
0.9745415392479385
0.9742573342992485
0.9739870246356575
0.974587207044935
0.9750158264346123

0.7803427259955812
0.7810962904319905
0.7814390206391839
0.781671769098294
0.7823933714529976
0.7824722812608148
0.7837473032644587
0.784084149879083
0.7849049252245425
0.7855005722461879
0.7856099229209725
0.786240066801261
0.7874614350719301
0.7880532310076391
0.7877127987062067
0.7881415337970294
0.7884692001256663
0.789269894271371
0.7903305591090173
0.7906852551957331
0.7922092988804187
0.792559319628438
0.793490787473048
0.793955138471802
0.7937445486480906
0.7939675215634903
0.7945985904366167
0.7940897600444293
0.7926251961236276
0.7932290635511483
0.7937142928013876
0.7937317189944976
0.7942168252151794
0.7951168159474016
0.7957098913089083
0.7961574505045236
0.7967481894450597
0.7983301084074352
0.7986436039794973
0.7986685549572436
0.7995399100709825
0.7998260302262885
0.8004199744707412
0.8018661052209601
0.8014752503661068
0.8017680067285012
0.8020742105889688
0.8025128698757107
0.8032222838348733
0.8043550376185529
0.8038084753572442
0.8042329064172835
0.8043844476667583


0.9896841761367443
0.9897045893884752
0.9896773965843382
0.9897800837296961
0.9897156278636674
0.9899663586336604
0.9901602377352033
0.9902799025976604
0.9903714960869713
0.9903649811154721
0.9904926423684077
0.9905562306947219
0.9906829211742825
0.9907460226875111
0.9908446648454292
0.9908545460020532
0.9908459239165609
0.9907579033200363
0.9909108443405595
0.9911507224634096
0.9913794811257687
0.9913543277689706
0.9913205400928742
0.9916861554924421
0.991772839775421
0.9916876550111956
0.991593557849657
0.9914467210816555
0.9915514902766643
0.9916472882543168
0.991682979469064
0.9921035121540938
0.9921630613842511
0.9921306882831239
0.9921319825023774
0.992334307739181
0.9923357607633246
0.9925027534146948
0.9926351305644339
0.9927105209286575
0.9927027593831097
0.9926133216156943
0.9926321088661059
0.992829656440506
0.9929198257705798
0.9930095312483891
0.9930190198771849
0.9930760536496169
0.9930608211843657
0.9931895849639696
0.993285424752858
0.9933754553757015
0.9934313808717871

0.9925434695164036
0.9925169986128511
0.99249516815922
0.9927580676157244
0.9926759738658029
0.9928499607280866
0.992965772068008
0.9929417028747756
0.9929162798657714
0.9927490987153278
0.9927802164013152
0.9928965192946486
0.993070484487421
0.9929332522931944
0.9931055606475777
0.9932452941311539
0.9932210515575215
0.9933048474125579
0.9933890913742627
0.9934454304702489
0.9935296892371221
0.9935043681235605
0.9934242121024135
0.9933162864697176
0.9934278423206908
0.9934839132836992
0.9937281452336066
0.9938081750500943
0.9937832927781027
0.9938902537512564
0.9939958170622942
0.9941292298460127
0.9939761836977696
0.9940810090397534
0.9942869613075886
0.9944141075091958
0.9943659375076171
0.9943170750959154
0.9944468784664031
0.9944737879212245
0.9944493729889168
0.9944751909970166
0.9945508843577531
0.9945269714741944
0.9945771345385657
0.9946029674706607
0.9946033014007978
0.9946787333501248
0.9946797106448864
0.9947058940134388
0.9947071423018297
0.9946834663706032
0.99475945422953

0.9622433612974348
0.9624539381095987
0.9627695694632783
0.9629798337957706
0.9637147652441024
0.9638196284702423
0.964029259076995
0.9637598657174072
0.9644948135568718
0.9648093159334081
0.9641484369125098
0.9643593307864495
0.9643864020705448
0.9649143369539948
0.9650408651910466
0.9652522816282233
0.9657803025971944
0.966307574651308
0.9668340901667487
0.967359841183431
0.9670623761837072
0.9671678040042685
0.9673785695183812
0.9674839070833934
0.9680101403648678
0.9686406113725459
0.968850521364054
0.9690747661693979
0.9698101554897876
0.9702347808366023
0.9706422826389783
0.9710648711338337
0.9720140083669598
0.9729607660350995
0.9804518598549836
0.99339392142643
0.9972641477320401
num_unique 47
0.9995104936653652
0.9982115548519606
0.998273477486278
0.9981516352096155
0.9982786521126537
0.998341032166127
0.9984026156391282
0.9984633664071668
0.9985232457706359
0.9985822122027005
0.9986402210653776
0.9986972242888289
0.9987531700079676
0.9988080021493181
0.9988616599596765
0.9987

0.9990927589902537
num_unique 1
0.9995104936653652
num_unique 256
0.9995104936653652
0.9752807252555237
0.9755702171176739
0.9759285724033941
0.9764359766843622
0.9766908120236455
0.9770979769291732
0.9778929432818387
0.9785281812775446
0.9786052909653946
0.9792271794051068
0.9793983931735304
0.9797668828025127
0.9792038381726136
0.9795636921525364
0.979530046550235
0.9798365520808753
0.9798507648265102
0.9802470574926745
0.980636682334828
0.9808360758420649
0.9808577829270833
0.9810116534397254
0.9813459159694198
0.9812567121462499
0.9817762915975439
0.9821518122483571
0.9819741953653147
0.9818443893425399
0.9820340492557659
0.9815327087034724
0.9813506605317984
0.9816790703199936
0.981495787959578
0.9816832835191001
0.9819617753698311
0.9821004071795597
0.9821468198721104
0.9823758307307069
0.9824669992083663
0.982557712244333
0.9827387939463768
0.9821882319225897
0.9822325731363035
0.9822756888740961
0.9824124705801549
0.9829093156941786
0.9827719685275098
0.982540369254743
0.982763

0.9978416951851727
0.9978140250269505
0.9978088462679982
0.9978534872400678
0.9978283371633904
0.9978257419182882
0.9978002892498341
0.9978451686987638
0.9979630129878014
0.9979435805934296
0.9979725421491585
0.9979700842534612
0.9979481606096989
0.9979601918159094
0.9979746694521645
0.9980558397908379
0.9979911881505525
0.9979521811742675
0.997910049229948
0.9979343361591761
0.9979661509964081
0.9979187868829567
0.9977753996911221
0.9977543617874198
0.9977879103396278
0.9977128778718136
0.9977720434095024
0.9977604940913827
0.9977546738242196
0.997748822830092
0.99768391797213
0.9976180760908389
0.9976341192143575
0.9976117573569613
0.997722738868408
0.9977726848400083
0.9978642516561879
0.9978220828013823
0.9978007781382019
0.9976889250972303
0.9976185728435643
0.9975958640422403
0.9975893145715111
0.997698831723406
0.9977683097017187
0.9977021475732802
0.9977497803323381
0.9977843662570516
0.9978527118425276
0.9979017798605576
0.9979474085702399
0.997948127866845
0.997960384940267
0

0.970577501701803
0.9709020148839878
0.970402035809593
0.9705568455663334
0.9703580336973443
0.9704484137539282
0.9704138162616585
0.970917443845176
0.9708377299559707
0.9705730519757121
0.9706024263249908
0.9703278296331264
0.9702189117660845
0.9698704261273139
0.9698881599955909
0.9699006591714251
0.9695708365248752
0.9697798681917156
0.9696919810304221
0.9696476945875713
0.9695931454574993
0.9697904527756191
0.96974311289875
0.9696274796974962
0.9698121065145404
0.969548141499361
0.9694954605525381
0.9692635496814208
0.968664695835995
0.9689792606246393
0.9690244382052213
0.9686592363002451
0.9690342715399767
0.9695952684230194
0.9698461038842022
0.9697326830655788
0.9699845415525823
0.9697872936670682
0.9698936759576439
0.969634341855526
0.9702173069982449
0.9705505717216806
0.9705729419399116
0.9702540244859448
0.9707016279406062
0.9710158449790007
0.9707393437030281
0.9707046318607244
0.9704420853383282
0.9696087105701898
0.9703166731821942
0.971664290653407
0.9922077728575038
0.

0.9973412700521957
0.9972873874017023
0.9970268192093581
0.9969317011194483
0.996856442877766
0.9968528966511885
0.9968638343088114
0.9968430616497188
0.996768260476973
0.9967091492123323
0.996628676601635
0.9966596049915939
0.9965641441896984
0.9964988072464269
0.9963613617277516
0.9963925387268506
0.9965657326882664
0.9963677181755711
0.9962976383068602
0.9963111087438089
0.9963046277395347
0.9962178023274357
0.9961126240833731
0.9960657188750108
0.9960450141078987
0.9960334145280678
0.99612153385759
0.9961338886636422
0.9962453906903467
0.9962147420855259
0.9962624515254609
0.9960763843103103
0.9961309700785099
0.9960237866594244
0.9960188281529856
0.9959878826102766
0.9962146018847295
0.9962514519966539
0.9962428938172954
0.9962923442255744
0.9962672932626109
0.9962951907709844
0.9962889785283804
0.9963216565022749
0.9963454271401275
0.9963076971155613
0.9963213961003432
0.9962935688939574
0.9962740216977799
0.9963227403077013
0.9963497570637607
0.9963029207279351
0.996352631556668

0.9786619982707504
0.9789615542852372
0.9791611845183019
0.9792609775154478
0.9794605192935976
0.9795602680836474
0.9799591160224486
0.980158451722203
0.9802580975204522
0.9805569467627359
0.980656533807677
0.9808556638708097
0.9809552068979868
0.9810547352615577
0.9811542489660104
0.9812537480158301
0.9814527021695016
0.981751023602274
0.9818504348183699
0.9819498314111711
0.9820492133851457
0.9822479334944788
0.9823472716387632
0.9824465951820737
0.9828437434346975
0.9829429940399615
0.9832406584279774
0.9834390285412657
0.9835381917676097
0.9836373404463333
0.9841328657807158
0.9842319272667281
0.9844300066932613
0.9845290246426153
0.98462802808858
0.9847270170355668
0.9848259914879852
0.9849249514502431
0.985023896926745
0.9851228279218944
0.9852217444400927
0.9855184071769608
0.9856172658313246
0.9857161100307114
0.9859137550821083
0.9860125559428895
0.9863088719181476
0.9865063437728607
0.9867037579653614
0.9872956549189513
0.987492838811801
0.9875914091981344
0.9876899652168736


0.9916717770622384
0.9918214602262748
0.9919442158386832
0.9920109512080242
0.9920628699847387
0.9921034618232065
0.9922102476075781
0.9923566834130977
0.992218193068081
0.9922293372003024
0.9922146000879827
0.9920321387268468
0.9921809140793884
0.9922216950198727
0.9920072771660503
0.9920331729459054
0.9921415677380651
0.9922495048016865
0.992121085742379
0.9921055276816956
0.9921516415718628
0.9922756997100781
0.9923579575988792
0.9923682292201236
0.9922803288504453
0.9920849696125348
0.9921688927717407
0.9926749115057246
0.9927003564599665
0.9927257842612237
0.9929236761779408
0.9931645385882858
0.993244061584177
0.9933338126857296
0.9933588534051467
0.9933298014812963
0.9934875724490398
0.9935516813307012
0.993444379308803
0.9934693407237276
0.9934402349394087
0.993479772651333
0.9935296244320548
0.9935794070058869
0.9936435844894421
0.9937466776589022
0.9937714310177383
0.9937571344569339
0.9937488129322627
0.9936899108154801
0.9937542592789221
0.9937095942010468
0.993774033673673

0.9739700231128017
0.9742866565955178
0.97431687549966
0.9744874101780623
0.9746227592107125
0.9749743152953209
0.9752912156821022
0.9755366648899704
0.9759922408996897
0.976652314192251
0.97717187592866
0.9771675688898813
0.9773037075505906
0.9775059716402653
0.9778129786650149
0.9782241768067568
0.9784635299137493
0.9790789394960441
0.9794846751488653
0.9804251270874548
0.9807895634406354
0.9874737436513893
0.9951170362459442
0.9993720902827623
num_unique 256
0.9995104936653652
0.9724776873936476
0.9722920673633328
0.9726267209075452
0.9727627077094692
0.9735360158791346
0.9744401075402102
0.9745304355915844
0.9754792270922303
0.9754107096907244
0.9761492573443881
0.9759588203464337
0.9757673047427433
0.9760098412570514
0.9769581255960752
0.977491354943312
0.9773613616540029
0.977516955227076
0.977822669611046
0.9780587983302709
0.9787147675518008
0.9789707628277223
0.9791978993068224
0.9794921660344433
0.9789066256923011
0.9790270710443753
0.9797367419545807
0.9800399629330209
0.980

0.9981242741653993
0.9980857252848336
0.998134577259159
0.9981828688041163
0.9981149164128418
0.9982121352625664
0.9982413808767651
0.9981732194079933
0.9982222123164665
0.9982023299380826
0.9982513763959241
0.9981816923612602
0.99823141214289
0.9983581498571531
0.998453194634084
0.9985454768939974
0.9986348649525174
0.998678429791189
0.9986196410512943
0.9986638940892778
0.9987073647967998
0.9987500332516427
0.9986328663184114
0.9986777800780487
0.9987218814558655
0.9986619597418247
0.9987068188677937
0.9987508384910013
0.998793995340167
0.9987797701572537
0.9988228100560657
0.9988086756243751
0.9988515912852567
0.9988935524243298
0.9988375539374181
0.9987781498004812
0.9989911954119888
0.9989791272907873
0.9989240139899292
0.9988650966873777
0.9990358933083555
0.9990641747338975
0.9991784017915691
0.9992730339286464
0.9993043784066664
0.9993339481129013
0.9993287939372195
0.9992847048014635
0.9993171423906256
0.9992212294287017
0.9991842208339379
0.9987638690572201
num_unique 256
0.9

0.9887783624118841
0.9889697739521541
0.988915908821649
0.9892790306060358
0.9891834337124634
0.9892562148003828
0.9893026399135635
0.989422383132909
0.9894620657890298
0.9895352019065589
0.9896820473286327
0.9896225175562041
0.9894603951164441
0.9894027063233592
0.9894840215321087
0.9896612820100907
0.9897065826536062
0.9897822674935024
0.9897245909302319
0.9900714467942651
0.990210583238086
0.9903186659273966
0.990295589584622
0.990405126752447
0.9904139981675837
0.9903871497555873
0.9905597796525607
0.9906320617299533
0.9906021620247178
0.9906371732838873
0.9905762289043353
0.9907119115760412
0.9906520677325074
0.9907572096167803
0.9906004162800887
0.9904429709418123
0.9906095748027994
0.9908732929295069
0.9906877081631411
0.9906607713250164
0.9906023102094398
0.9908047993862725
0.9908443410753776
0.9908517816275619
0.9909245830462389
0.9908969461679872
0.990806564081662
0.9910051314041461
0.9910088346907964
0.9910774393368631
0.9911776421813296
0.9914948146337934
0.9912787715897202

0.9756971064417032
0.9756133565125926
0.9760902086393418
0.9761854779022261
0.9766613118739661
0.9767563753425917
0.9764771962865713
0.976858952939396
0.977049629198488
0.9773353882140146
0.9770486659894907
0.9771443295126492
0.9775266530375942
0.977717614726391
0.977813045109413
0.9779084416789842
0.9776130803248962
0.9778047715016993
0.9781877631174144
0.9782834288850973
0.9784746609549035
0.9785702269932653
0.9787612584058605
0.9791429147158657
0.9792382432795707
0.9793335373487589
0.9795240214116026
0.979128718232029
0.9793202150984276
0.9794159140051341
0.9795115797085207
0.9797028109379754
0.9792933657291768
0.9793895217209296
0.9791538415727491
0.9792505027450429
0.9789939273219942
0.9794795615589631
0.9795766053228702
0.9791018696973186
0.979297130241329
0.9793947222474855
0.9794922886264068
0.9800771436568654
0.9801745280829176
0.9802718861341667
0.9806610525811416
0.9808554749023023
0.9809526454649569
0.9810497888163627
0.9811469048450505
0.9812439934379176
0.9813410544801994

0.9899765976847942
0.9897572895625837
0.9899729253289837
0.9899529638762952
0.9899189106890727
0.9896785665955256
0.989764380006544
0.9901971461439657
0.9901684232255283
0.990264142264392
0.9898463952464969
0.990081090474406
0.9899903732039548
0.9901617220593417
0.9903034855701569
0.9903163177221723
0.9903395637241186
0.9902690370761027
0.9903622344986103
0.9901640358965869
0.9904365978031001
0.9904793261366106
0.9901988825711983
0.9902534400710603
0.9903508362519049
0.9905534053222587
0.9905317320261074
0.9904607260593403
0.990405486371271
0.990601759849518
0.9907454281039964
0.990762347097804
0.9910703217002703
0.9909410127364007
0.9911117763890178
0.9909495017845452
0.9909058355051047
0.9909150401591776
0.9910891042853871
0.9911509555641729
0.9913793509305127
0.9915421919553942
0.9916321475075707
0.9916907065056733
0.9917228424956812
0.9916231596405181
0.9915845738548079
0.9914702887833602
0.991414132499248
0.9913611834904603
0.9910941912701878
0.9910227134129246
0.9912038825122971


0.9564056024507304
0.9568121876836482
0.9568747654040064
0.9570135832964902
0.9575161691510234
0.9574429285883748
0.9580975960656392
0.9580800895761457
0.9585221410527128
0.9587321723750538
0.9588461341342064
0.9595349651872388
0.9598959055267714
0.9599107930627429
0.959496814631024
0.9602211941244992
0.960211774203502
0.9605072276920223
0.9606926691962049
0.9606789864579613
0.9611852982123626
0.9609962194603485
0.9611153470313966
0.961387618765901
0.9615309284814242
0.9613427939093662
0.9627699427098919
0.9740757884352591
0.9920389501228797
0.9948099346385055
num_unique 1
0.9995104936653652
num_unique 256
0.9995104936653652
0.9846322501904541
0.9844218359577963
0.9845279335833361
0.9841967681737325
0.9841630244424093
0.9835559793200328
0.9834817148245745
0.9835387066381938
0.9832790052296355
0.9832431290435871
0.9831653286327044
0.9832160946542876
0.9834888452482264
0.9834478749535065
0.983405092366194
0.9836719028837313
0.9835414302975052
0.9837179955841806
0.9837181295859767
0.98389

0.999448421575834
0.9994647545571355
0.9994559306785706
0.9994549598647042
0.9994539685570375
0.9994612592925857
0.9994519220413438
0.9994681349820085
0.9994818395442219
0.9994799102500489
0.9994860869434661
0.9994855125902745
0.9994785463103607
0.99949061874376
0.9994947797050964
0.9994992480600037
0.9994943423835337
0.9994885156536081
0.9994985397315063
0.9995055932077349
0.9995079758012315
0.9995103896027786
0.9995095801758725
0.9995104139906152
0.9995092441792583
0.9995072261537472
0.9995002530406986
0.9994952610434533
0.9994995342684239
0.9995031636507556
0.9995063163774871
0.9994860859836738
0.9994782657716964
0.9994853966285144
0.9994774138532689
0.9994579160063766
0.9994671985394457
0.9994898784992442
0.9994824162081622
0.9994892404580051
0.999488581114602
0.9994878995276165
0.9994798984769834
0.9994540906155828
0.9994056310129104
0.7173481471986892
486
0.00390625


In [None]:
# old random 100 values for splitting 
            potential_split_values =[]
            for item in X_train:
                potential_split_values.append(item[col])
            potential_split_values= np.array(potential_split_values)
            unique, counts = np.unique(potential_split_values, return_counts=True)
            if unique.shape[0]> self.candidate_splits:
                unique = np.random.choice(unique, self.candidate_splits, replace = False)
#             print('num_unique',unique.shape[0])

## RandomDecisionTree for Handwritten Digit Recognition

a) After you've successfully completed `RandomDecisionTree`, and train using the default values in the constructor and report accuracy on the `valid_set`. Use the data from `ThreesAndEights` class initialized variable `data` 

b) Vary the `depth` up to 20 and comment on the trend that you observe in the accuracy scores of the `valid_set`. Base your comments on the concepts taught in the class. Keep the other hyperparameters unchanged.

# Random Forest [5-points]
Random forests or random decision forests are an ensemble learning method for classification, regression and other tasks, that operate by constructing a multitude of decision trees at training time and outputting the class that is the mode of the classes (classification) or mean prediction (regression) of the individual trees. Random decision forests correct for decision trees' habit of overfitting to their training set.

Random forest trains random decision trees on bootstrapped training points. Thus, you can try implementation of methods (`bootstrap`, `predict`) from `BaggingClassifier` class directly. Only difference being, you have to use the `RandomDecisionTree` as base which you implemented previously instead of sklearn's implementation of `DecisionTreeClassifier`). Implement the `fit` method in the class below accordingly.

In [7]:
class RandomForest(BaggingClassifier):
    def __init__(self, ratio = 0.63, N = 20, max_depth = 10, candidate_splits = 500):
        self.ratio = ratio
        self.N = N  
        self.learners = []
        self.candidate_splits = candidate_splits
        self.max_depth = max_depth
        
    def fit(self, X_train, y_train):
        """
        Train Bagging Ensemble Classifier on data
        
        Args:
            X_train (ndarray): [n_samples x n_features] ndarray of training data   
            y_train (ndarray): [n_samples] ndarray of data 
        """
        

## RandomForest for Handwritten Digit Recognition [5-points]
***

After you've successfully completed `RandomForest` find the optimal values of `N` and `max_depth` ,  using k-fold cross validation. Fix the values of the other hyperparameters to the given defaults. Feel free to use sklearn library to split your training data. Use the data from `ThreesAndEights` class intialized variable `data`. 

Justify why those values are optimal. 

Report best accuracy on the testing data using the optimal `N`.

Hint: Vary `N` up to 25 and set `max_depth` up to 10. Plan ahead as it might take some time.