# Random Forest


Random Forest is a robust machine learning algorithm that can be used for a variety of tasks including regression and classification. It is an ensemble method, meaning that a random forest model is made up of a large number of small decision trees, called estimators, which each produce their own predictions. The random forest model combines the predictions of the estimators to produce a more accurate prediction.


## How Random Forest works
> 1. Select N number of samples from datasets.
> 2. Build Decision for each sample and predict the result for each decision tree.
> 3. Voting is done for each predicted result of decision tree.
> 4. The prediction result with majority votes win. 

### Iris Datasets 
Iris is a family of flower which contains three type of flower called setosa ,versicolor ,virginica .

#### Problem: 
>The problem is that, we have given some features of a flower, and based on these features we have to identify which flower belongs to which category.

#### Solution : 
>Know we now this type of problems belong to classification  problems. We can solve this by using supervised machine learning classification algorithm.

In [39]:
#importing required libraries 

import numpy as np 

import pandas as pd 

import sklearn

from sklearn.model_selection import train_test_split

from sklearn.ensemble import RandomForestClassifier 

from sklearn.metrics import accuracy_score 

from sklearn.datasets import load_iris

import sklearn.metrics as metrics

from sklearn.metrics import confusion_matrix

In [40]:
#Loading datasets 

iris_data = load_iris() 

iris=pd.DataFrame(iris_data.data)

#shape of datasets 

print ("Dataset Shape: ", iris.shape) 

#first five sample 

print ("Dataset: ",iris.head())  

Dataset Shape:  (150, 4)
Dataset:       0    1    2    3
0  5.1  3.5  1.4  0.2
1  4.9  3.0  1.4  0.2
2  4.7  3.2  1.3  0.2
3  4.6  3.1  1.5  0.2
4  5.0  3.6  1.4  0.2


In [41]:
 # printing categories (setosa, versicolor,virginica)

print(iris_data.target_names)

# printing features of flower 

print(iris_data.feature_names)

['setosa' 'versicolor' 'virginica']
['sepal length (cm)', 'sepal width (cm)', 'petal length (cm)', 'petal width (cm)']


In [42]:
#printing samples and target 

X = iris.values[:, 0:4] 

Y = iris_data.target

print(X[0:5])

print(Y)

[[5.1 3.5 1.4 0.2]
 [4.9 3.  1.4 0.2]
 [4.7 3.2 1.3 0.2]
 [4.6 3.1 1.5 0.2]
 [5.  3.6 1.4 0.2]]
[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2
 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
 2 2]


In [43]:

# Splitting the dataset into train and test 

X_train, X_test, y_train, y_test = train_test_split(X, Y, test_size = 0.3, random_state = 100)

In [44]:

#defining random forest classifier 

model= RandomForestClassifier(random_state = 100)

# Performing training 

model.fit(X_train, y_train)

RandomForestClassifier(random_state=100)

In [45]:
Y_pred=model.predict(X_test)

Y_pred

array([2, 0, 2, 0, 2, 2, 0, 0, 2, 0, 0, 2, 0, 0, 2, 1, 1, 2, 2, 2, 2, 0,
       2, 0, 1, 2, 1, 0, 1, 2, 1, 1, 1, 0, 0, 1, 0, 1, 2, 2, 0, 1, 2, 2,
       0])

In [46]:
#Accuray of the model 

print("Accuracy:",metrics.accuracy_score(y_test, Y_pred))

cm=np.array(confusion_matrix(y_test,Y_pred))

cm

Accuracy: 0.9555555555555556


array([[16,  0,  0],
       [ 0, 10,  1],
       [ 0,  1, 17]], dtype=int64)

In [47]:
#Making predicton on new data
model.predict([[4,5,3,2]])

array([2])