# Random Forest Classification Intuition

# Ensemble Learning

Ensemble learning is when we take **multiple machine learning algorithms and put them together to create one bigger machine learning algorithm. So that the final one is using or leveraging many different other machine learning algorithms.

The random forest method is combining a lots of decision tree methods. So instead of running the decision tree once, we will run it multiple times.

Let's have a look how random forest works.

Step1: Pick a random k data points from the training set.

Step2: Build a decision tree associated to these k data points. Rather than building a decision tree based on everything in our dataset, we build decision tree to those dataset-a subset of the dataset

Step3: Choose a number Ntree of trees that we want to build and repeat step1 & step2. It means, just building and building and building these trees or building a lot of decision trees.

Step4: For a new data point, make each one of your Ntree trees predict the value of y for the datapoint in question and assign the new datapoint the average across all of the predicted y values.

The best part is that when Microsoft decided to develop kinect (a device that allows us to play games on our television, the little device that attached to the xbox without any controllers it let us to play games. The IR grids understand where the hands/arms are moving.) uses the **random forest algorithm** over other ML algos that are availbable to them.

# Importing the libraries

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

# Importing dataset

In [None]:
df = pd.read_csv('/kaggle/input/networks-ads/Social_Network_Ads.csv')
X = df.iloc[:,1:-1].values
y = df.iloc[:,-1].values

# Splitting the dataset into the Training set and Test set

In [None]:
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.25, random_state = 0)

# Feature Scaling

In [None]:
from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)
print(X_train)
print(X_test)

# Training the Random Forest Classification model on the Training set

In [None]:
from sklearn.ensemble import RandomForestClassifier
classifier = RandomForestClassifier(n_estimators=10,criterion='entropy',random_state=0)
classifier.fit(X_train,y_train)

# Predicting a new result

In [None]:
print(classifier.predict([[30,87000]]))

# # Predicting the Test set results

In [None]:
y_pred = classifier.predict(X_test)
print(np.concatenate((y_pred.reshape(len(y_pred),1), y_test.reshape(len(y_test),1)),1))

# Making the Confusion Matrix

In [None]:
from sklearn.metrics import confusion_matrix, accuracy_score
cm = confusion_matrix(y_test, y_pred)
print(cm)
accuracy_score(y_test, y_pred)

# Visualising the Training set results

In [None]:
from matplotlib.colors import ListedColormap
X_set, y_set = sc.inverse_transform(X_train), y_train
X1, X2 = np.meshgrid(np.arange(start = X_set[:, 0].min() - 10, stop = X_set[:, 0].max() + 10, step = 0.25),
                     np.arange(start = X_set[:, 1].min() - 1000, stop = X_set[:, 1].max() + 1000, step = 0.25))
plt.contourf(X1, X2, classifier.predict(sc.transform(np.array([X1.ravel(), X2.ravel()]).T)).reshape(X1.shape),
             alpha = 0.75, cmap = ListedColormap(('red', 'green')))
plt.xlim(X1.min(), X1.max())
plt.ylim(X2.min(), X2.max())
for i, j in enumerate(np.unique(y_set)):
    plt.scatter(X_set[y_set == j, 0], X_set[y_set == j, 1], c = ListedColormap(('red', 'green'))(i), label = j)
plt.title('Random Forest Classification (Training set)')
plt.xlabel('Age')
plt.ylabel('Estimated Salary')
plt.legend()
plt.show()

# Visualising the Test set results

In [None]:
from matplotlib.colors import ListedColormap
X_set, y_set = sc.inverse_transform(X_test), y_test
X1, X2 = np.meshgrid(np.arange(start = X_set[:, 0].min() - 10, stop = X_set[:, 0].max() + 10, step = 0.25),
                     np.arange(start = X_set[:, 1].min() - 1000, stop = X_set[:, 1].max() + 1000, step = 0.25))
plt.contourf(X1, X2, classifier.predict(sc.transform(np.array([X1.ravel(), X2.ravel()]).T)).reshape(X1.shape),
             alpha = 0.75, cmap = ListedColormap(('red', 'green')))
plt.xlim(X1.min(), X1.max())
plt.ylim(X2.min(), X2.max())
for i, j in enumerate(np.unique(y_set)):
    plt.scatter(X_set[y_set == j, 0], X_set[y_set == j, 1], c = ListedColormap(('red', 'green'))(i), label = j)
plt.title('Random Forest Classification (Test set)')
plt.xlabel('Age')
plt.ylabel('Estimated Salary')
plt.legend()
plt.show()