**This notebook is an exercise in the [Introduction to Machine Learning](https://www.kaggle.com/learn/intro-to-machine-learning) course.  You can reference the tutorial at [this link](https://www.kaggle.com/dansbecker/model-validation).**

---


## Recap
You've built a model. In this exercise you will test how good your model is.

Run the cell below to set up your coding environment where the previous exercise left off.

In [5]:
# Code you have previously used to load data
import pandas as pd
from sklearn.tree import DecisionTreeRegressor

# Path of the file to read
iowa_file_path = '\\Users\\512391\\Downloads\\iowa\\train.csv'

home_data = pd.read_csv(iowa_file_path)
y = home_data.SalePrice
feature_columns = ['Lot Area', 'Year Built', '1st Flr SF', '2nd Flr SF', 'Full Bath', 'Bedroom AbvGr', 'TotRms AbvGrd']
x = home_data[feature_columns]

# Specify Model
iowa_model = DecisionTreeRegressor()
# Fit Model
iowa_model.fit(x, y)

print("First in-sample predictions:", iowa_model.predict(x.head()))
print("Actual target values for those homes:", y.head().tolist())

First in-sample predictions: [159000. 271900. 137500. 248500. 167000.]
Actual target values for those homes: [159000, 271900, 137500, 248500, 167000]


# Exercises

## Step 1: Split Your Data
Use the `train_test_split` function to split up your data.

Give it the argument `random_state=1` so the `check` functions know what to expect when verifying your code.

Recall, your features are loaded in the DataFrame **X** and your target is loaded in **y**.


In [6]:
# Import the train_test_split function and uncomment
from sklearn.model_selection import train_test_split

# fill in and uncomment
train_x, val_x, train_y, val_y = train_test_split(x, y, random_state=50,test_size=0.25, shuffle=True)

In [None]:
# The lines below will show you a hint or the solution.
# step_1.hint() 
# step_1.solution()


## Step 2: Specify and Fit the Model

Create a `DecisionTreeRegressor` model and fit it to the relevant data.
Set `random_state` to 1 again when creating the model.

In [8]:
# You imported DecisionTreeRegressor in your last exercise
# and that code has been copied to the setup code above. So, no need to
# import it again

# Specify the model
iowa_model = DecisionTreeRegressor(random_state=40)
# Fit Model
iowa_model.fit(train_x, train_y)

In [None]:
# step_2.hint()
# step_2.solution()

## Step 3: Make Predictions with Validation data


In [9]:
# Predict with all validation observations
val_predictions = iowa_model.predict(val_x)





In [None]:
# step_3.hint()
# step_3.solution()

Inspect your predictions and actual values from validation data.

In [10]:
# print the top few validation predictions
print(val_predictions)
print()
print(val_y)

[ 87000. 315000. 195500. 192100. 206900. 159000. 150000.  87000. 319500.
 130000. 145000. 175000. 155000. 125500. 312500. 269500. 158000. 255000.
 162000. 147500. 125000. 145000. 300000. 144000. 157000. 105000. 150000.
 201000. 143000. 257000. 240000.  80000. 137000. 130000. 132500. 216000.
 152500. 383000. 178740. 152500. 125000. 418000. 115000. 100000. 147500.
 127500.  93000. 173900. 362500.  99500. 175000. 257500. 163000. 146000.
 401179. 245350.  75200. 274725. 143500. 142250. 141500. 207000. 112500.
  60000. 194500. 154000. 205000. 130000. 156820. 246900. 187500. 177625.
  46500. 196000. 166000. 224243. 202000. 156820. 186500. 180000. 126000.
 143900. 236500.  79900. 318000. 118000. 260000. 152500. 135000. 220000.
 217000. 231000. 119000. 137000. 165000. 222000. 163000. 144500. 252000.
 386250. 159900. 166000.  82000. 250000. 124500. 235876. 131900. 100000.
 120000. 210900. 214500. 234000. 123000. 195500. 239900. 240000. 150500.
 161500. 212000. 132000. 280000. 225000. 123000. 13

What do you notice that is different from what you saw with in-sample predictions (which are printed after the top code cell in this page).

Do you remember why validation predictions differ from in-sample (or training) predictions? This is an important idea from the last lesson.

## Step 4: Calculate the Mean Absolute Error in Validation Data


In [11]:
from sklearn.metrics import mean_absolute_error
val_mae = mean_absolute_error(val_y, val_predictions)

# uncomment following line to see the validation_mae
print(val_mae)



30716.50909090909


In [None]:
# step_4.hint()
# step_4.solution()

Is that MAE good?  There isn't a general rule for what values are good that applies across applications. But you'll see how to use (and improve) this number in the next step.

# Keep Going

You are ready for **[Underfitting and Overfitting](https://www.kaggle.com/dansbecker/underfitting-and-overfitting).**


---




*Have questions or comments? Visit the [course discussion forum](https://www.kaggle.com/learn/intro-to-machine-learning/discussion) to chat with other learners.*