#ID3 is one of the algorithms used to create decision trees for predictive tasks when all descriptive features are catagorical. There are exist other variants. Search for these models and describe them briefly.

Decision trees use multiple algorithms to decide to split a node into two or more sub-nodes.
The creation of sub-nodes increases the homogeneity of resultant sub-nodes. In other words, we can say that the purity of the node increases with respect to the target variable. The decision tree splits the nodes on all available variables and then selects the split which results in most homogeneous sub-nodes.


>The different types of algorithms are:


1. ID3 → (extension of D3)
2. C4.5 → (successor of ID3)
3. CART → (Classification And Regression Tree)
4. CHAID → (Chi-square automatic interaction detection Performs multi-level splits when computing classification trees)
5. MARS → (multivariate adaptive regression splines)

> **C4.5 algorithm**

* The C4.5 algorithm is used in Data Mining as a Decision Tree Classifier which can be employed to generate a decision, based on a certain sample of data (univariate or multivariate predictors).

* Advantages of C4.5 over other Decision Tree systems:
1. The algorithm inherently employs Single Pass Pruning Process to Mitigate overfitting.
2. It can work with both Discrete and Continuous Data
3. C4.5 can handle the issue of incomplete data very well

* We should also keep in mind that C4.5 is not the best algorithm out there but it does certainly prove to be useful in certain cases.


>  **CART algorithm**

* The CART algorithm is structured as a sequence of questions, the answers to which determine what the next question, if any should be.  The result of these questions is a tree like structure where the ends are terminal nodes at which point there are no more questions

* Creating a CART model involves selecting input variables and split points on those variables until a suitable tree is constructed. The selection of which input variable to use and the specific split or cut-point is chosen using a greedy algorithm to minimize a cost function. Tree construction ends using a predefined stopping criterion, such as a minimum number of training instances assigned to each leaf node of the tree.

* The representation for the CART model is a binary tree.
Each root node represents a single input variable (x) and a split point on that variable (assuming the variable is numeric). The leaf nodes of the tree contain an output variable (y) which is used to make a prediction.



> **CHAID algorithm**

* CHAID (Chi-square Automatic Interaction Detector) analysis is an algorithm used for discovering relationships between a categorical response variable and other categorical predictor variables. It is useful when looking for patterns in datasets with lots of categorical variables and is a convenient way of summarising the data as the relationships can be easily visualised

* As indicated in the name, CHAID uses Person’s Chi-square tests of independence, which test for an association between two categorical variables. A statistically significant result indicates that the two variables are not independent, i.e., there is a relationship between them.



> **MARS algorithm**
* Multivariant adaptive regression splines or MARS (also called Earth in many open-source implementations because MARS is a trademark) performs a similar function to least-squares regression, but is used when the relationship of one or more predictor variables to the dependent variable are thought to vary over its value range. A simple example would be the increase in air humidity caused by heating water, which would be much more rapid once the boiling point of 100 Celcius had been reached.

* The output of a MARS regression is a set of basis functions whose outputs are added together. The most important of these are hinge functions with the form max(0,x-c) which has the effect of meaning that the function only becomes relevant for values of x that are greater than c, which is referred to as a knot.

* MARS builds its models using a type of stepwise regression where predictor variables with candidate knots are added to the model one by one. However, MARS does not share the weaknesses of general stepwise regression:

1. the set of predictor variables is still determined by the data scientist; MARS guarantees that all predictor variables will make it into the final model and only aims to find out the optimal knots.
 
2. MARS contains an inbuilt solution to overfitting: following the stepwise regression step, all generated basis functions that do not contribute to the accuracy of the model above a certain threshold are pruned out, meaning that only the more efficient predictors remain in the final version.

* The complexity of MARS means it requires considerably more training data than ordinary least-squares regression - at least an order of magnitude more. However, there are various optional constraints that the user can put on the procedure to simplify it. A MARS model is isomorphic with some types of decision tree used for value prediction, although the way the algorithms generate the models are different.

# Random forest can be considered as an extension to decision tree models. Search and describe this model

* Random forests or random decision forests are an ensemble learning method for classification, regression and other tasks that operate by constructing a multitude of decision trees at training time and outputting the class that is the mode of the classes (classification) or mean prediction (regression) of the individual trees. Random decision forests correct for decision trees' habit of overfitting to their training set

* The fundamental concept behind random forest is a simple but powerful one — the wisdom of crowds. In data science speak, the reason that the random forest model works so well is: **A large number of relatively uncorrelated models (trees) operating as a committee will outperform any of the individual constituent models.**

* The low correlation between models is the key.The reason for this wonderful effect is that the trees protect each other from their individual errors (as long as they don’t constantly all err in the same direction). While some trees may be wrong, many other trees will be right, so as a group the trees are able to move in the correct direction.

* Steps to Ensure that the Models Diversify Each Other

1. Bagging (Bootstrap Aggregation) - Decisions trees are very sensitive to the data they are trained on — small changes to the training set can result in significantly different tree structures. Random forest takes advantage of this by allowing each individual tree to randomly sample from the dataset with replacement, resulting in different trees. This process is known as bagging.

2. Feature Randomness — In a normal decision tree, when it is time to split a node, we consider every possible feature and pick the one that produces the most separation between the observations in the left node vs. those in the right node. In contrast, each tree in a random forest can pick only from a random subset of features. This forces even more variation amongst the trees in the model and ultimately results in lower correlation across trees and more diversification.

* So in our random forest, we end up with trees that are not only trained on different sets of data (thanks to bagging) but also use different features to make decisions.

# Provide a brief description of dataset

Data processing


In [1]:
import pandas as pd 
import warnings
warnings.filterwarnings('ignore')

Reading the datafile

In [2]:
import pandas as pd
df = pd.read_csv('play_tennis.csv')
df= df.drop(['day'], axis=1)


Check dimensions of the dataframe in terms of rows and columns

In [3]:
df.shape

(14, 5)

View the head of the data

In [4]:
df.head(5)

Unnamed: 0,outlook,temp,humidity,wind,play
0,Sunny,Hot,High,Weak,No
1,Sunny,Hot,High,Strong,No
2,Overcast,Hot,High,Weak,Yes
3,Rain,Mild,High,Weak,Yes
4,Rain,Cool,Normal,Weak,Yes


Check the data types

In [5]:
df.dtypes

outlook     object
temp        object
humidity    object
wind        object
play        object
dtype: object

Check for Duplicates

In [6]:
sum(df.duplicated())

0

Since there are no duplicates, we don't need to drop any rows

Check for null values

In [7]:
df.isna().sum()

outlook     0
temp        0
humidity    0
wind        0
play        0
dtype: int64

There are no null or missing values in the dataset

Understand the variables

**Variable 'outlook'**

In [8]:
df.outlook.unique()

array(['Sunny', 'Overcast', 'Rain'], dtype=object)

This column has 3 unique values in the appropriate datatype

**Variable 'temp'**

In [9]:
df.temp.unique()

array(['Hot', 'Mild', 'Cool'], dtype=object)

This column has 3 unique values in the appropriate datatype

**Variable 'humidity'**

In [10]:
df.humidity.unique()

array(['High', 'Normal'], dtype=object)

This column has 2 unique values in the appropriate datatype

**Variable 'wind'**

In [11]:
df.wind.unique()

array(['Weak', 'Strong'], dtype=object)

This column has 2 unique values in the appropriate datatype

**Variable 'play'**

In [12]:
df.play.unique()

array(['No', 'Yes'], dtype=object)

This column has 2 unique values in the appropriate datatype

Study summary statistics

In [13]:
df.describe()

Unnamed: 0,outlook,temp,humidity,wind,play
count,14,14,14,14,14
unique,3,3,2,2,2
top,Sunny,Mild,High,Weak,Yes
freq,5,6,7,8,9


It shows the number of values in a particular column, number of unique values in that and which cell value has the most occurence in that specific column

# We want to create a predictive model using decision trees. Which algorithm do you suggest for this purpose? Justify your answer

Considering the type of data which we have which is catagorical data for he columns 'outlook' , 'temp' , 'humidity', 'wind' and 'play' each at the most have 3 unique values, ID3 algorithm is the best to create a predictive model using decision trees for this dataset.
ID3 is used to generate a decision tree from a dataset by employing a top down, greedy searcxh, to search each attribute at every node of the tree.

# Create the decision tree showing all the intermediate steps. Provide the obtained set of classifaction rules. Is the model consistent?

Seperating the dataset into target attributes and prediciting attributes

In [14]:
t = df.keys()[-1]
print('Target Attribute is: ', t)
# Get the attribute names from input dataset
attribute_names = list(df.keys())
#Remove the target attribute from the attribute names list
attribute_names.remove(t) 
print('Predicting Attributes: ', attribute_names)

Target Attribute is:  play
Predicting Attributes:  ['outlook', 'temp', 'humidity', 'wind']


Function to calculate the entropy of collection

In [15]:

import math
def entropy(probs):  
    return sum( [-prob*math.log(prob, 2) for prob in probs])
    

Function to calulate the entropy of the given Data Sets/List with respect to target attributes

In [16]:
def entropy_of_list(ls,value):  
    from collections import Counter
    cnt = Counter(x for x in ls)# Counter calculates the propotion of class
    print('Target attribute class count(Yes/No)=',dict(cnt))
    total_instances = len(ls)  
    print("Total no of instances/records associated with {0} is: {1}".format(value,total_instances ))
    probs = [x / total_instances for x in cnt.values()]  # x means no of YES/NO
    print("Probability of Class {0} is: {1:.4f}".format(min(cnt),min(probs)))
    print("Probability of Class {0} is: {1:.4f}".format(max(cnt),max(probs)))
    return entropy(probs)

Calculate Information Gain:

In [17]:
def information_gain(df, split_attribute, target_attribute,battr):
    print("\n\n-----Information Gain Calculation of ",split_attribute, " --------") 
    df_split = df.groupby(split_attribute) # group the data based on attribute values
    glist=[]
    for gname,group in df_split:
        print('Grouped Attribute Values \n',group)
        glist.append(gname) 
    
    glist.reverse()
    nobs = len(df.index) * 1.0   
    df_agg1=df_split.agg({target_attribute:lambda x:entropy_of_list(x, glist.pop())})
    df_agg2=df_split.agg({target_attribute :lambda x:len(x)/nobs})
    
    df_agg1.columns=['Entropy']
    df_agg2.columns=['Proportion']
    
    new_entropy = sum( df_agg1['Entropy'] * df_agg2['Proportion'])
    if battr !='S':
        old_entropy = entropy_of_list(df[target_attribute],'S-'+df.iloc[0][df.columns.get_loc(battr)])
    else:
        old_entropy = entropy_of_list(df[target_attribute],battr)
    return old_entropy - new_entropy

Implementing the ID3 algorithm for the decision treee

In [18]:
def id3(df, target_attribute, attribute_names, default_class=None,default_attr='S'):
    
    from collections import Counter
    cnt = Counter(x for x in df[target_attribute])
    
    if len(cnt) == 1:
        return next(iter(cnt))
    elif df.empty or (not attribute_names):
        return default_class
    else:  
        default_class = max(cnt.keys()) 
        gainz=[]
        for attr in attribute_names:
            ig= information_gain(df, attr, target_attribute,default_attr)
            gainz.append(ig)
            print('Information gain: ',attr,' is : ',ig)
        
        index_of_max = gainz.index(max(gainz))              
        best_attr = attribute_names[index_of_max]           
        print("\nMax gain attribute is: ", best_attr)
        tree = {best_attr:{}} 
        remaining_attribute_names =[i for i in attribute_names if i != best_attr]
        for attr_val, data_subset in df.groupby(best_attr):
            subtree = id3(data_subset,target_attribute, remaining_attribute_names,default_class,best_attr)
            tree[best_attr][attr_val] = subtree
        return tree


Printing the resultant Decision Tree

In [19]:
from pprint import pprint
tree = id3(df,t,attribute_names)
print("Decision tree:")
pprint(tree)



-----Information Gain Calculation of  outlook  --------
Grouped Attribute Values 
      outlook  temp humidity    wind play
2   Overcast   Hot     High    Weak  Yes
6   Overcast  Cool   Normal  Strong  Yes
11  Overcast  Mild     High  Strong  Yes
12  Overcast   Hot   Normal    Weak  Yes
Grouped Attribute Values 
    outlook  temp humidity    wind play
3     Rain  Mild     High    Weak  Yes
4     Rain  Cool   Normal    Weak  Yes
5     Rain  Cool   Normal  Strong   No
9     Rain  Mild   Normal    Weak  Yes
13    Rain  Mild     High  Strong   No
Grouped Attribute Values 
    outlook  temp humidity    wind play
0    Sunny   Hot     High    Weak   No
1    Sunny   Hot     High  Strong   No
7    Sunny  Mild     High    Weak   No
8    Sunny  Cool   Normal    Weak  Yes
10   Sunny  Mild   Normal  Strong  Yes
Target attribute class count(Yes/No)= {'Yes': 4}
Total no of instances/records associated with Overcast is: 4
Probability of Class Yes is: 1.0000
Probability of Class Yes is: 1.0000
Target

In [20]:
def classify(instance, tree,default=None):    
    attribute = next(iter(tree))       
    if instance[attribute] in tree[attribute].keys(): 
        result = tree[attribute][instance[attribute]]
        if isinstance(result, dict):
            return classify(instance, result)
        else:
            return result 
    else:
        return default

In [21]:
df_new=df.drop('play',axis=1)
df_new['predicted'] = df_new.apply(classify, axis=1, args=(tree,'?'))
df_new['play'] = df['play']
print(df_new)

     outlook  temp humidity    wind predicted play
0      Sunny   Hot     High    Weak        No   No
1      Sunny   Hot     High  Strong        No   No
2   Overcast   Hot     High    Weak       Yes  Yes
3       Rain  Mild     High    Weak       Yes  Yes
4       Rain  Cool   Normal    Weak       Yes  Yes
5       Rain  Cool   Normal  Strong        No   No
6   Overcast  Cool   Normal  Strong       Yes  Yes
7      Sunny  Mild     High    Weak        No   No
8      Sunny  Cool   Normal    Weak       Yes  Yes
9       Rain  Mild   Normal    Weak       Yes  Yes
10     Sunny  Mild   Normal  Strong       Yes  Yes
11  Overcast  Mild     High  Strong       Yes  Yes
12  Overcast   Hot   Normal    Weak       Yes  Yes
13      Rain  Mild     High  Strong        No   No


# Using the convenient tool evaluate the model. Does it generalize well?

Mapping integer values to the output column, 'play'

In [22]:
df['play']=df['play'].map({'No': 0, 'Yes': 1})

Splitting the data into test and train

In [23]:
from sklearn.model_selection import train_test_split


X = df.drop(['play'], axis=1)
Y = df['play']
X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size = 0.2)

Using Label Encoder to convert categorical data

In [24]:
from sklearn import preprocessing

categorical = ['outlook', 'temp', 'humidity', 'wind']
for feature in categorical:
        le = preprocessing.LabelEncoder()
        X_train[feature] = le.fit_transform(X_train[feature])
        X_test[feature] = le.transform(X_test[feature])

Applying the decision tree classifier

In [25]:
from sklearn.tree import DecisionTreeClassifier
from sklearn import metrics 
decision_tree = DecisionTreeClassifier()
decision_tree.fit(X_train, Y_train)
Y_pred = decision_tree.predict(X_test)


Evaluating the model

In [26]:
acc_decision_tree_ans=metrics.accuracy_score(Y_test,Y_pred)
print(acc_decision_tree_ans)

0.6666666666666666


Since we have achieved an accuracy of only 66%, we can say it does not generalise well because it has less amount of data

#Now consider Teperature as a continuous column. Calculate the information gain for this attribute.(Show all the steps)

Reading the datafile

In [27]:
data = pd.read_csv("play.csv")
data = data.drop(['day'], axis=1)

Check dimensions of the dataframe in terms of rows and columns

In [28]:
data.shape

(14, 5)

View the head of the data

In [29]:
data.head(5)

Unnamed: 0,outlook,temp,humidity,wind,play
0,Sunny,17.0,High,Weak,No
1,Sunny,20.5,High,Strong,No
2,Overcast,26.5,High,Weak,Yes
3,Rain,20.0,High,Weak,Yes
4,Rain,19.0,Normal,Weak,Yes


In [30]:
data.describe()

Unnamed: 0,temp
count,14.0
mean,21.857143
std,3.370851
min,17.0
25%,19.625
50%,21.0
75%,24.75
max,27.5


Since the 'temp' variable has continuous values, we can apply the process of binning and convert it into categorical values


Seperating the dataset into target attributes and prediciting attributes

In [31]:
tar = data.keys()[-1]
print('Target Attribute is: ', tar)
# Get the attribute names from input dataset
att = data.keys()[1]
print('Predicting Attributes: ', att)

Target Attribute is:  play
Predicting Attributes:  temp


Calculate Information Gain:

In [32]:
def informationgain(data, split_attribute, target_attribute,battr):
    print("\n\n-----Information Gain Calculation of ",split_attribute, " --------") 
    data_split = data.groupby(split_attribute) # group the data based on attribute values
    glist=[]
    for gname,group in data_split:
        print('Grouped Attribute Values \n',group)
        glist.append(gname) 
    
    glist.reverse()
    nobs = len(data.index) * 1.0   
    data_agg1=data_split.agg({target_attribute:lambda x:entropy_of_list(x, glist.pop())})
    data_agg2=data_split.agg({target_attribute :lambda x:len(x)/nobs})
    
    data_agg1.columns=['Entropy']
    data_agg2.columns=['Proportion']
    
    new_entropy = sum( data_agg1['Entropy'] * data_agg2['Proportion'])
    if battr !='S':
        old_entropy = entropy_of_list(data[target_attribute],'S-'+data.iloc[0][data.columns.get_loc(battr)])
    else:
        old_entropy = entropy_of_list(data[target_attribute],battr)
    return old_entropy - new_entropy

Print Information Gain

In [33]:
from pprint import pprint
ig= informationgain(data, att, tar,'S')
print("\nThe Resultant Information Gain is:")
pprint(ig)



-----Information Gain Calculation of  temp  --------
Grouped Attribute Values 
   outlook  temp humidity  wind play
0   Sunny  17.0     High  Weak   No
Grouped Attribute Values 
   outlook  temp humidity    wind play
5    Rain  17.5   Normal  Strong   No
Grouped Attribute Values 
   outlook  temp humidity  wind play
4    Rain  19.0   Normal  Weak  Yes
Grouped Attribute Values 
   outlook  temp humidity  wind play
8   Sunny  19.5   Normal  Weak  Yes
Grouped Attribute Values 
   outlook  temp humidity  wind play
3    Rain  20.0     High  Weak  Yes
Grouped Attribute Values 
   outlook  temp humidity    wind play
1   Sunny  20.5     High  Strong   No
Grouped Attribute Values 
   outlook  temp humidity  wind play
7   Sunny  21.0     High  Weak   No
9    Rain  21.0   Normal  Weak  Yes
Grouped Attribute Values 
      outlook  temp humidity    wind play
10     Sunny  22.5   Normal  Strong  Yes
11  Overcast  22.5     High  Strong  Yes
Grouped Attribute Values 
      outlook  temp humidity  wi

The resultant information gain is 79%

# Using an appropriate tool provide the decision tree in this case and all the possible interpretations that can be made

In [34]:
data.head()
data['temp']=data['temp'].astype("int64")
data.dtypes

outlook     object
temp         int64
humidity    object
wind        object
play        object
dtype: object

Using the method of binning to convert continuous data into categorical data

In [35]:

data.loc[data['temp']<22,'temp']=1
data.loc[ (data['temp']>=22) & (data['temp']<=25) ,'temp']=2
data.loc[data['temp']>=25,'temp']=3
data.dtypes

data['temp']=data['temp'].astype(str)

In [36]:
weather_dict={"1":"cold","2":"mild" , "3":"hot"}

In [37]:
ans_val=[]
for i in data['temp']:
  if i in weather_dict:
    ans_val.append(weather_dict[i])
data['temp']=ans_val




Implementing the ID3 algorithm for the decision treee

In [38]:
def id3(df, target_attribute, attribute_names, default_class=None,default_attr='S'):
    
    from collections import Counter
    cnt = Counter(x for x in df[target_attribute])
    
    
    if len(cnt) == 1:
        return next(iter(cnt))  
    
    elif df.empty or (not attribute_names):
        return default_class  
    else:
        
        default_class = max(cnt.keys())
        gainz=[]
        for attr in attribute_names:
            ig= information_gain(df, attr, target_attribute,default_attr)
            gainz.append(ig)
            print('Information gain :',attr,' is : ',ig)
        
        index_of_max = gainz.index(max(gainz))             
        best_attr = attribute_names[index_of_max]          
        print("\nMax gain attribute is: ", best_attr)
       
        tree = {best_attr:{}} 
        remaining_attribute_names =[i for i in attribute_names if i != best_attr]
  
        for attr_val, data_subset in df.groupby(best_attr):
            subtree = id3(data_subset,target_attribute, remaining_attribute_names,default_class,best_attr)
            tree[best_attr][attr_val] = subtree
        return tree

Printing the resultant Decision Tree

In [39]:
from pprint import pprint
tree = id3(data,t,attribute_names)
print("\nThe Resultant Decision Tree is:")
pprint(tree)



-----Information Gain Calculation of  outlook  --------
Grouped Attribute Values 
      outlook  temp humidity    wind play
2   Overcast   hot     High    Weak  Yes
6   Overcast   hot   Normal  Strong  Yes
11  Overcast  mild     High  Strong  Yes
12  Overcast  mild   Normal    Weak  Yes
Grouped Attribute Values 
    outlook  temp humidity    wind play
3     Rain  cold     High    Weak  Yes
4     Rain  cold   Normal    Weak  Yes
5     Rain  cold   Normal  Strong   No
9     Rain  cold   Normal    Weak  Yes
13    Rain   hot     High  Strong   No
Grouped Attribute Values 
    outlook  temp humidity    wind play
0    Sunny  cold     High    Weak   No
1    Sunny  cold     High  Strong   No
7    Sunny  cold     High    Weak   No
8    Sunny  cold   Normal    Weak  Yes
10   Sunny  mild   Normal  Strong  Yes
Target attribute class count(Yes/No)= {'Yes': 4}
Total no of instances/records associated with Overcast is: 4
Probability of Class Yes is: 1.0000
Probability of Class Yes is: 1.0000
Target

# Apply Random Forest to create predictive models for the two previous datasets

Reading the first datset

In [40]:
df= pd.read_csv("play_tennis.csv")
df.head()

Unnamed: 0,day,outlook,temp,humidity,wind,play
0,D1,Sunny,Hot,High,Weak,No
1,D2,Sunny,Hot,High,Strong,No
2,D3,Overcast,Hot,High,Weak,Yes
3,D4,Rain,Mild,High,Weak,Yes
4,D5,Rain,Cool,Normal,Weak,Yes


Mapping integer values to the output column, 'play'


In [41]:
df['play']=df['play'].map({'No': 0, 'Yes': 1})
df= df.drop(['day'],axis=1)


Splitting the data into test and train


In [42]:
X = df.drop(['play'], axis=1)
Y = df['play']

In [43]:
from sklearn.model_selection import train_test_split

X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size = 0.2)

Using Label Encoder to convert categorical data

In [44]:
from sklearn import preprocessing

categorical = ['outlook', 'temp', 'humidity', 'wind']
for feature in categorical:
        le = preprocessing.LabelEncoder()
        X_train[feature] = le.fit_transform(X_train[feature])
        X_test[feature] = le.transform(X_test[feature])

Applying the random forest classifier

In [45]:
from sklearn.ensemble import RandomForestClassifier
random_forest = RandomForestClassifier(n_estimators=100)
random_forest.fit(X_train, Y_train)

Y_prediction = random_forest.predict(X_test)

random_forest.score(X_train, Y_train)
acc_random_forest = metrics.accuracy_score(Y_test,Y_prediction)*100


Reading the second dataset

In [46]:
df1=pd.read_csv("play.csv")

Mapping integer values to the output column, 'play'


In [47]:
df1['play']=df1['play'].map({'No': 0, 'Yes': 1})
df1= df1.drop(['day'],axis=1)

Splitting the data into test and train

In [48]:
X = df1.drop(['play'], axis=1)
Y = df1['play']
X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size = 0.2)

Using Label Encoder to convert categorical data

In [49]:
from sklearn import preprocessing

categorical = ['outlook', 'humidity', 'wind']
for feature in categorical:
        le = preprocessing.LabelEncoder()
        X_train[feature] = le.fit_transform(X_train[feature])
        X_test[feature] = le.transform(X_test[feature])

Applying the random forest classifier

In [50]:
from sklearn.ensemble import RandomForestClassifier
random_forest = RandomForestClassifier(n_estimators=100)
random_forest.fit(X_train, Y_train)

Y_prediction = random_forest.predict(X_test)

random_forest.score(X_train, Y_train)
acc_random_forest = metrics.accuracy_score(Y_test,Y_pred)*100
