# **What is Random Forest?**

The forest is said to robust when there are a lot of trees in the forest. Random Forest is an ensemble technique that is a tree-based algorithm. The process of fitting no decision trees on different subsample and then taking out the average to increase the performance of the model is called “Random Forest”. Suppose we have to go on a vacation to someplace. Before going to the destination we vote for the place where we want to go. Once we have voted for the destination then we choose hotels, etc. And then come back with the final choice of hotel as well. The whole process of getting the vote for the place to the hotel is nothing but a Random Forest Algorithm. This is the way the algorithm works and the reason it is preferred over all other algorithms because of its ability to give high accuracy and to prevent overfitting by making use of more trees. There are several different hyperparameters like no trees, depth of trees, jobs, etc in this algorithm. Check here the Sci-kit documentation for the same. 

In [None]:
!python -m pip install pip --upgrade --user -q
!python -m pip install numpy pandas seaborn matplotlib scipy sklearn statsmodels scikit-image --user -q

In [None]:
import IPython
IPython.Application.instance().kernel.do_shutdown(True)

In [None]:
from sklearn.ensemble import RandomForestClassifier

rfcl = RandomForestClassifier()

# **Build a Random Forest Classification Model**

## **Import the required Libraries**

In [None]:
import pandas as pd
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score,classification_report


##**Read the data**

In [None]:
data = pd.read_csv('pimadiabetes.csv')

We will check what is there in the data and its shape. Refer to the below code for the same.

In [None]:
data.head()

In [None]:
data.shape

## **Define Dependent and Independent Variables**


Now we will define the dependent and independent features X and y respectively. We will then divide the dataset into training and testing sets. Use the below code for the same

In [None]:
X = data.drop('Outcome',axis = 1)
y= data['Outcome']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33)
print(X_train.shape)
print(X_test.shape)

There are 514 rows in the training set and 254 rows in the testing set. Now we will fit the training data on both the model built by random forest and xgboost using default parameters. Then we will compute prediction over the testing data by both the models.

## **Fit the Random Forest Classifier**

In [None]:
rfcl.fit(X_train,y_train)

## **Predict the Y values on Test Data**

In [None]:
y_rfcl = rfcl.predict(X_test)

## **Evaluation of the Model**

We have stored the prediction on testing data for both the models in y_rfcl and Now we will evaluate the model performance to check how much the model is able to generalize. We will make use of evaluation metrics like accuracy score and classification report from sklearn. 

In [None]:
print("Random Forest Accuracy: ", accuracy_score(y_rfcl,y_test))

We implemented a classification model for the Pima Indian Diabetes data set using Random Forest algorithm. We did not even normalize the data and directly fed it to the model still we were able to get 80%. If we work more on data and feature engineering then this accuracy can be improved further.

The algorithms work efficiently even if we have missing values in the dateset and prevent the model from getting over fitted and easy to implement.  

# **Related Articles --**

>* [Random Forest V/s XG Boost](https://analyticsindiamag.com/random-forest-vs-xgboost-comparing-tree-based-algorithms-with-codes/)
> * [Basics of Ensemble Learning](https://analyticsindiamag.com/basics-of-ensemble-learning-in-classification-techniques-explained/) 
> * [Bagging V/S Boosting](https://analyticsindiamag.com/guide-to-ensemble-methods-bagging-vs-boosting/)
> * [Guide to Ensemble Learning](https://analyticsindiamag.com/a-hands-on-guide-to-hybrid-ensemble-learning-models-with-python-code/)
> * [Ensemble Methods](https://analyticsindiamag.com/primer-ensemble-learning-bagging-boosting/)