### Random forests
The random forest uses many trees, and it makes a prediction by averaging the predictions of each component tree.  
It generally has much better predictive accuracy than a single decision tree and it work well with default parameters.  

You also have to understand that, you can learn more models with even better performance, but many of those are `sensitive to getting the right parameters`.

In [1]:
import pandas as pd
from sklearn.metrics import mean_absolute_error
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestRegressor

# Path of the file to read
iowa_file_path = './data/home_data_for_machine_learning/train.csv'
home_data = pd.read_csv(iowa_file_path)

# Create target object and call it y
y = home_data.SalePrice
# Create X
feature_columns = ['LotArea', 'YearBuilt', '1stFlrSF', '2ndFlrSF', 'FullBath', 'BedroomAbvGr', 'TotRmsAbvGrd']
X = home_data[feature_columns]

# Split into validation and training data
train_X, val_X, train_y, val_y = train_test_split(X, y, random_state=1)

# Specify Model
iowa_model = RandomForestRegressor(random_state=1)
# Fit Model
iowa_model.fit(train_X, train_y)

# Make validation predictions and calculate mean absolute error
val_predictions = iowa_model.predict(val_X)
val_mae = mean_absolute_error(val_predictions, val_y)
print(f"Validation MAE: {val_mae}")

print("First 5 predictions result:", iowa_model.predict(X.head()))
print("Actual target values for those homes:", y.head().tolist())

Validation MAE: 21857.159075016305
First 5 predictions result: [210037.1  175173.   220261.78 138555.74 261146.56]
Actual target values for those homes: [208500, 181500, 223500, 140000, 250000]
