**This notebook is an exercise in the [Introduction to Machine Learning](https://www.kaggle.com/learn/intro-to-machine-learning) course.  You can reference the tutorial at [this link](https://www.kaggle.com/alexisbcook/machine-learning-competitions).**

---


# Introduction

* You will create and submit predictions for a Kaggle competition. 
* You can improve your model by adding features and other optimizations.
* Begin by running the code cell below to set up code checking and the filepaths for the dataset.

In [1]:
# Set up code checking
from learntools.core import binder
binder.bind(globals())
from learntools.machine_learning.ex7 import *

# Set up filepaths
import os
if not os.path.exists("../input/train.csv"):
    os.symlink("../input/home-data-for-ml-course/train.csv", "../input/train.csv")  
    os.symlink("../input/home-data-for-ml-course/test.csv", "../input/test.csv") 

# Model for Training Data

* Train a Random Forest model on **`train_X`** and **`train_y`**.  

In [2]:
# Import packages
import pandas as pd
from sklearn.ensemble import RandomForestRegressor
from sklearn.metrics import mean_absolute_error
from sklearn.model_selection import train_test_split

In [3]:
# Import training data
iowa_file_path = '../input/train.csv'
home_data = pd.read_csv(iowa_file_path)

In [4]:
# Define target
y = home_data.SalePrice

In [5]:
# Define features
features = ['MSSubClass', 'LotArea', 'OverallQual', 'OverallCond', 'YearBuilt', 'YearRemodAdd', '1stFlrSF', '2ndFlrSF', 'LowQualFinSF', 'GrLivArea', 'FullBath', 'HalfBath', 'KitchenAbvGr', 'Fireplaces', '3SsnPorch', 'ScreenPorch', 'PoolArea', 'MiscVal']
X = home_data[features]

In [6]:
# Output results
X.head()

Unnamed: 0,MSSubClass,LotArea,OverallQual,OverallCond,YearBuilt,YearRemodAdd,1stFlrSF,2ndFlrSF,LowQualFinSF,GrLivArea,FullBath,HalfBath,KitchenAbvGr,Fireplaces,3SsnPorch,ScreenPorch,PoolArea,MiscVal
0,60,8450,7,5,2003,2003,856,854,0,1710,2,1,1,0,0,0,0,0
1,20,9600,6,8,1976,1976,1262,0,0,1262,2,0,1,1,0,0,0,0
2,60,11250,7,5,2001,2002,920,866,0,1786,2,1,1,1,0,0,0,0
3,70,9550,7,5,1915,1970,961,756,0,1717,1,0,1,1,0,0,0,0
4,60,14260,8,5,2000,2000,1145,1053,0,2198,2,1,1,1,0,0,0,0


In [7]:
# Split into training and validation data
train_X, val_X, train_y, val_y = train_test_split(X, y, random_state=1)

In [8]:
# Define, fit and evaluate model
rf_model = RandomForestRegressor(random_state=1)
rf_model.fit(train_X, train_y)
rf_val_predictions = rf_model.predict(val_X) 
rf_val_mae = mean_absolute_error(rf_val_predictions, val_y)

In [9]:
# Output results
print("Validation MAE for Random Forest Model: {:,.0f}".format(rf_val_mae))

Validation MAE for Random Forest Model: 16,927


# Model for all Data

* Train a Random Forest model on all of **`X`** and **`y`**.

In [10]:
# Define and fit model on all data from the training data
rf_model_on_full_data = RandomForestRegressor(random_state=1)
rf_model_on_full_data.fit(X, y)

RandomForestRegressor(random_state=1)

In [11]:
# Import test data
test_data_path = '../input/test.csv'
test_data = pd.read_csv(test_data_path)

In [12]:
# Define features
test_X = test_data[features]

In [13]:
# Make predictions
test_preds = rf_model_on_full_data.predict(test_X)

In [14]:
# Check answer
step_1.check()

<IPython.core.display.Javascript object>

<span style="color:#33cc33">Correct</span>

In [15]:
# Save predictions in correct format
output = pd.DataFrame({'Id': test_data.Id, 'SalePrice': test_preds})
output.to_csv('submission.csv', index=False)

---




*Have questions or comments? Visit the [course discussion forum](https://www.kaggle.com/learn/intro-to-machine-learning/discussion) to chat with other learners.*