# House Prices Prediction Using Scikit Learn
This is a simple project to learn the basics of supervised machine learning. It helps to predict the price of houses given some certain features.
This is my solution to this [Kaggle competition](https://www.kaggle.com/code/alexisbcook/machine-learning-competitions)

## import the necessary library

In [137]:
from sklearn.metrics import mean_absolute_error
from sklearn.tree import DecisionTreeRegressor
from sklearn.ensemble import    RandomForestRegressor, BaggingRegressor, HistGradientBoostingRegressor, StackingRegressor, VotingRegressor
from sklearn.model_selection import train_test_split    
import pandas as pd

## Prepare the data

In [132]:
# import the training data
data = pd.read_csv('train.csv')
# data.drop()

# select the features
features = data.describe().columns.drop('SalePrice')


# get the training data
X = data[features]

# drop a row to match the size of the test data
X.drop(X.tail(1).index, inplace=True)

X.describe()

# get the target data
y = data.SalePrice
# drop a row to match the size of the test data
y.drop(y.tail(1).index, inplace=True)

# split the training data into training and validation data
train_X, val_X, train_y, val_y = train_test_split(X, y, random_state=1)

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  X.drop(X.tail(1).index, inplace=True)


## Create model

In [138]:
random_forest_model = RandomForestRegressor(random_state=1)
bagging_model = BaggingRegressor(random_state=1)
decision_tree_model = DecisionTreeRegressor(random_state=1)
# voting_model = VotingRegressor()
histogram_model = HistGradientBoostingRegressor(random_state=1)

## Train the model

In [139]:
random_forest_model.fit(X, y)
bagging_model.fit(X, y)
decision_tree_model.fit(X, y)
histogram_model.fit(X, y)

## Make predictions with the model

In [140]:
test_data = pd.read_csv('test.csv')
features = test_data.describe().columns
# get the training data

X = test_data[features]
X.describe()

# make prediction on the new data
random_preds = random_forest_model.predict(X)
bagging_preds = bagging_model.predict(X)
decision_tree_preds = decision_tree_model.predict(X)
histogram_preds = histogram_model.predict(X)

## Evaluate The Model

In [143]:
random_mae = mean_absolute_error(y.tolist(), random_preds.tolist())
bagging_mae = mean_absolute_error(y.tolist(), bagging_preds.tolist())
decision_tree_mae = mean_absolute_error(y.tolist(), decision_tree_preds.tolist())
histogram_mae = mean_absolute_error(y.tolist(), histogram_preds.tolist())
print(f'The following are the MAE:\n\
RandomForestRegressor: {random_mae}\n\
BaggingRegressor: {bagging_mae}\n\
DecisionTreeRegressor: {decision_tree_mae}\n\
HistGradientBoostingRegressor: {histogram_mae}')

The following are the MAE:
RandomForestRegressor: 22671.43708704592
BaggingRegressor: 25614.48910212474
DecisionTreeRegressor: 467.3036326250857
HistGradientBoostingRegressor: 22333.58173361369


In [151]:
bagging_preds[:5].tolist(), y.head(5).tolist()

([219100.0, 181300.0, 247800.0, 166450.0, 211350.0],
 [208500, 181500, 223500, 140000, 250000])