## Random Forest
#### Randomly choose a subset of the features AND randomly choose a subset of the training examples to train each individual tree
#### All of the hyperparameters found in the decision tree model will also exist in this algorithm, since a random forest is an ensemble of many Decision Trees
#### One additional hyperparameter for Random Forest is called n_estimators which is the number of Decision Trees that make up the Random Forest
#### If N is the number of features, we will randomly select  sqrt(N) of these features to train each individual tree (Can be modified by setting the max_features parameter)
#### You can also speed up your training jobs with another parameter, n_jobs, since the fitting of each tree is independent of each other, it is possible fit more than one tree in parallel,  setting n_jobs higher will increase how many CPU cores it will use, Changing this parameter does not impact on the final result but can reduce the training time.


In [None]:
from sklearn.ensemble import RandomForestRegressor, RandomForestClassifier
from sklearn.metrics import accuracy_score

# The random forest uses many trees, and it makes a prediction by averaging the predictions of each component tree
# It generally has much better predictive accuracy than a single decision tree and it works well with default parameters 
# But one of the best features of Random Forest models is that they generally work reasonably even without any tuning

forest_model = random_forest_model = RandomForestClassifier(n_estimators = 100,
                                             max_depth = 16, 
                                             min_samples_split = 10).fit(X_train,y_train)

# Same as we did to the Decision Tree, to figure out the best hyper arameteres to use
min_samples_split_list = [2,10, 30, 50, 100, 200, 300, 700]  ## If the number is an integer, then it is the actual quantity of samples,
                                             ## If it is a float, then it is the percentage of the dataset
max_depth_list = [2, 4, 8, 16, 32, 64, None]
n_estimators_list = [10,50,100,500]

In [None]:
# min_samples_split hyper parameter 
accuracy_list_train = []
accuracy_list_val = []
for min_samples_split in min_samples_split_list:
    # You can fit the model at the same time you define it, because the fit function returns the fitted estimator.
    model = RandomForestClassifier(min_samples_split = min_samples_split,
                                   random_state = RANDOM_STATE).fit(X_train,y_train) 
    predictions_train = model.predict(X_train) ## The predicted values for the train dataset
    predictions_val = model.predict(X_val) ## The predicted values for the test dataset
    accuracy_train = accuracy_score(predictions_train,y_train)
    accuracy_val = accuracy_score(predictions_val,y_val)
    accuracy_list_train.append(accuracy_train)
    accuracy_list_val.append(accuracy_val)

In [None]:
    
# min_depth hyper parameter 
accuracy_list_train = []
accuracy_list_val = []
for max_depth in max_depth_list:
    # You can fit the model at the same time you define it, because the fit function returns the fitted estimator.
    model = RandomForestClassifier(max_depth = max_depth,
                                   random_state = RANDOM_STATE).fit(X_train,y_train) 
    predictions_train = model.predict(X_train) ## The predicted values for the train dataset
    predictions_val = model.predict(X_val) ## The predicted values for the test dataset
    accuracy_train = accuracy_score(predictions_train,y_train)
    accuracy_val = accuracy_score(predictions_val,y_val)
    accuracy_list_train.append(accuracy_train)
    accuracy_list_val.append(accuracy_val)

In [None]:
# n_estimators hyper parameter 
accuracy_list_train = []
accuracy_list_val = []
for n_estimators in n_estimators_list:
    # You can fit the model at the same time you define it, because the fit function returns the fitted estimator.
    model = RandomForestClassifier(n_estimators = n_estimators,
                                   random_state = RANDOM_STATE).fit(X_train,y_train) 
    predictions_train = model.predict(X_train) ## The predicted values for the train dataset
    predictions_val = model.predict(X_val) ## The predicted values for the test dataset
    accuracy_train = accuracy_score(predictions_train,y_train)
    accuracy_val = accuracy_score(predictions_val,y_val)
    accuracy_list_train.append(accuracy_train)
    accuracy_list_val.append(accuracy_val)
    
accuracy_score(random_forest_model.predict(X_train),y_train)
accuracy_score(random_forest_model.predict(X_val),y_val)