## **Random Forest Classifier**

Random forests or random decision forests are an ensemble learning method for classification, regression and other tasks that operate by constructing a multitude of decision trees at training time and outputting the class that is the mode of the classes (classification) or mean prediction (regression) of the individual trees. Random decision forests correct for decision trees' habit of overfitting to their training set.


In [1]:
import pandas as pd
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score, confusion_matrix
from sklearn.model_selection import train_test_split, GridSearchCV, cross_val_predict, KFold
from sklearn import datasets

## **Dataset Loading**

We will load data from the csv dataset

In [2]:
credit_data = pd.read_csv("./credit_data.csv")

## **Features Extraction**

To apply a classifier on this data, we need to extract features and target data and split it into test and train.

In [3]:
features = credit_data[["income","age","loan"]]
targets = credit_data.default

feature_train, feature_test, target_train, target_test = train_test_split(features,targets,test_size=0.2)

## **Finding an Optimal Value**

It may take upto 45min in execution on Colab or maybe hours in your local desktop, based on processing power.

In [4]:
param_grid = {
    'max_depth': [1,5,10,15],
    'n_estimators' : [10,100,500,1000],
    'min_samples_leaf' : [1,2,3,4,5,10,15,20,30,40,50]
    }

grid_search = GridSearchCV(estimator=RandomForestClassifier(n_jobs=-1,max_features = 'sqrt'), param_grid=param_grid,cv=10)
grid_search.fit(feature_train, target_train)

print(grid_search.best_params_)

optimal_estimators = grid_search.best_params_.get("n_estimators")
optimal_depth = grid_search.best_params_.get("max_depth")
optimal_leaf = grid_search.best_params_.get("min_samples_leaf")

KeyboardInterrupt: 

## **Training the Model**

We will use Random Forest Classifier for training the model.

In [None]:
best_model = RandomForestClassifier(n_estimators=optimal_estimators, max_depth = optimal_depth,min_samples_leaf = optimal_leaf)
k_fold = KFold(n_splits=10,random_state=123)

predictions = cross_val_predict(best_model,feature_test,target_test,cv=k_fold)



## **Printing an Error Matrix and Accuracy Score**

In [None]:
print(confusion_matrix(target_test,predictions))
print(accuracy_score(target_test,predictions))

[[336   6]
 [ 13  45]]
0.9525
