# Random Forests (RF)
**Definition**:
- Base estimator: Decision Tree
- Each estimator is trained on a different bootstrrap sample having the same size as the training set. 
- RF introduces further randomization in the training of individual trees

![title](https://drive.google.com/uc?export=view&id=1r5FGL17FR5IHOm8RAArXXJWl2NWGIQHF)

**Feature Importance**:
- Tree-based methods: enable measuring the importance of each feature in prediction. 
    - how much the tree nodes use a particular feature to reduce impurity

In [5]:
import pandas as pd 

from sklearn.ensemble import RandomForestRegressor
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error as MSE 

# Import data
file1 = 'https://raw.githubusercontent.com/prince381/car_mpg_predict/master/cars1.csv'
file2 = 'https://raw.githubusercontent.com/prince381/car_mpg_predict/master/cars2.csv'
cars1 = pd.read_csv(file1).dropna(how='all', axis=1)
cars2 = pd.read_csv(file2)  
df = pd.concat([cars1, cars2], ignore_index=True, sort=False)

# Split data
seed = 1
X = df[['displacement']].to_numpy().reshape(-1, 1)
y = df['mpg'].to_numpy()
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=seed)

# Instantiate and train model
rf = RandomForestRegressor(n_estimators=400, min_samples_leaf=0.12, random_state=seed)
rf.fit(X_train, y_train)
y_pred = rf.predict(X_test)

# Evaluate the model
rmse_test = MSE(y_test, y_pred) ** (1/2)
print(f"Test set RMSE: {rmse_test}")

Test set RMSE: 3.7603781331419897
