# Random Forest Algorithm


### Intuition

Problem with **Single Decision trees**:
 1. They are too sensitive to change --> even slightly change to the data, can change tree enormously.
 2. They overfit easily --> memorize noise instead of learning patterns

Here the **Random Forest Solution** come into play, using not one but multiple Decision trees, which differ slightly in data,  it can effectively can adopt to changes.

#### Advantages: 
 1. Works with both Classification and Regression Model Learnings
 2. Combines weak data to build a Strong one.
 3. Immune for the changes, works good.
    

### Randomness

1. **Bootstrap sampling** - Getting random subset from the data with replacement <br>
   Example:<br>
       - (x1,x2,x3,x4...)<br>
       - (x2,x2,x3,x4...)<br>
       - (x3,x2,x3,x4...)<br>
       ...
3. **Feature Randomness** - Not all the features used, just random subset of all given features(e.g. 3 out of 10)

### Implementation

In [20]:
import pandas as pd

data = pd.read_csv('data/pokemon.csv')
data = data.rename(columns={'Type 1': 'Type'})
data = data.query("Type.isin(('Grass', 'Electric'))")

X = data[['HP', 'Attack', 'Defense', 'Speed']]
y = (data['Type'] == 'Electric')


Unnamed: 0,#,Name,Type,Type 2,Total,HP,Attack,Defense,Sp. Atk,Sp. Def,Speed,Generation,Legendary
0,1,Bulbasaur,Grass,Poison,318,45,49,49,65,65,45,1,False
1,2,Ivysaur,Grass,Poison,405,60,62,63,80,80,60,1,False
2,3,Venusaur,Grass,Poison,525,80,82,83,100,100,80,1,False
3,3,VenusaurMega Venusaur,Grass,Poison,625,80,100,123,122,120,80,1,False
30,25,Pikachu,Electric,,320,35,55,40,50,50,90,1,False
...,...,...,...,...,...,...,...,...,...,...,...,...,...
740,672,Skiddo,Grass,,350,66,65,48,62,57,52,6,False
741,673,Gogoat,Grass,,531,123,100,62,97,81,68,6,False
764,694,Helioptile,Electric,Normal,289,44,38,33,61,43,70,6,False
765,695,Heliolisk,Electric,Normal,481,62,55,52,109,94,109,6,False


In [29]:
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

rf = RandomForestClassifier(n_estimators=100, class_weight='balanced', random_state=42)
rf.fit(X_train, y_train)

y_pred = rf.predict(X_test)
print("Accuracy", accuracy_score(y_test, y_pred))

Accuracy 0.6956521739130435
