**This notebook is an exercise in the [Introduction to Machine Learning](https://www.kaggle.com/learn/intro-to-machine-learning) course.  You can reference the tutorial at [this link](https://www.kaggle.com/dansbecker/model-validation).**

---


## Recap
You've built a model. In this exercise you will test how good your model is.

Run the cell below to set up your coding environment where the previous exercise left off.

In [2]:
# Code you have previously used to load data
import pandas as pd
from sklearn.tree import DecisionTreeRegressor

# Path of the file to read
iowa_file_path = '../input/home-data-for-ml-course/train.csv'

home_data = pd.read_csv(iowa_file_path)
y = home_data.SalePrice
feature_columns = ['LotArea', 'YearBuilt', '1stFlrSF', '2ndFlrSF', 'FullBath', 'BedroomAbvGr', 'TotRmsAbvGrd']
X = home_data[feature_columns]

# Specify Model
iowa_model = DecisionTreeRegressor()
# Fit Model
iowa_model.fit(X, y)

print("First in-sample predictions:", iowa_model.predict(X.head()))
print("Actual target values for those homes:", y.head().tolist())

# Set up code checking
from learntools.core import binder
binder.bind(globals())
from learntools.machine_learning.ex4 import *
print("Setup Complete")

First in-sample predictions: [208500. 181500. 223500. 140000. 250000.]
Actual target values for those homes: [208500, 181500, 223500, 140000, 250000]
Setup Complete


# Exercises

## Step 1: Split Your Data
Use the `train_test_split` function to split up your data.

Give it the argument `random_state=1` so the `check` functions know what to expect when verifying your code.

Recall, your features are loaded in the DataFrame **X** and your target is loaded in **y**.


In [4]:
# Import the train_test_split function and uncomment
from sklearn.model_selection import train_test_split

# fill in and uncomment
train_X, val_X, train_y, val_y = train_test_split(X, y, random_state = 0)

# Check your answer
#step_1.check()

In [None]:
# The lines below will show you a hint or the solution.
# step_1.hint() 
# step_1.solution()

## Step 2: Specify and Fit the Model

Create a `DecisionTreeRegressor` model and fit it to the relevant data.
Set `random_state` to 1 again when creating the model.

In [5]:
# You imported DecisionTreeRegressor in your last exercise
# and that code has been copied to the setup code above. So, no need to
# import it again

# Specify the model
iowa_model = DecisionTreeRegressor(random_state=1)

# Fit iowa_model with the training data.
iowa_model.fit(train_X, train_y)

# Check your answer
step_2.check()

[335000. 205000. 139000. 205000.  89500.  58500. 350000. 149000. 755000.
 180000.]
[335000. 205000. 139000. 205000.  89500.  58500. 350000. 149000. 755000.
 180000.]


<IPython.core.display.Javascript object>

<span style="color:#33cc33">Correct</span>

In [8]:
step_2.hint()
step_2.solution()

<IPython.core.display.Javascript object>

<span style="color:#3366cc">Hint:</span> Remember, you fit with training data. You will test with validation data soon

<IPython.core.display.Javascript object>

<span style="color:#33cc99">Solution:</span> 
```python
iowa_model = DecisionTreeRegressor(random_state=1)
iowa_model.fit(train_X, train_y)
```

## Step 3: Make Predictions with Validation data


In [6]:
# Predict with all validation observations
val_predictions = iowa_model.predict(val_X)

# Check your answer
step_3.check()

<IPython.core.display.Javascript object>

<span style="color:#33cc33">Correct</span>

In [10]:
step_3.hint()
step_3.solution()

<IPython.core.display.Javascript object>

<span style="color:#3366cc">Hint:</span> Run predict on the right validation data object.

<IPython.core.display.Javascript object>

<span style="color:#33cc99">Solution:</span> 
```python
val_predictions = iowa_model.predict(val_X)
```

Inspect your predictions and actual values from validation data.

In [7]:
# print the top few validation predictions
print(iowa_model.predict(val_X))


[335000. 205000. 139000. 205000.  89500.  58500. 350000. 149000. 755000.
 180000. 230000. 150900. 222000. 130000. 142125. 174000. 236000. 126000.
 153337. 118500. 137500. 149000. 132500. 189000. 139000. 138887. 195000.
  92000. 318000. 105500. 165500. 178000. 135000. 258000. 253293. 177000.
 192500. 123000. 219500. 315750. 237500. 153000. 185500. 326000. 335000.
 142600. 105500. 111250. 165150. 137000. 315750. 154500. 196000. 126000.
 184000.  62383. 153575. 190000. 148000. 134900. 132000. 150000. 140000.
 141000. 224900. 174000. 108000. 210000. 107400. 183500. 178000.  98000.
 124000. 250580. 135750. 255000. 132500.  91000. 325300. 179900. 143000.
 135000. 153900. 145000. 226000. 192000.  68400. 202500. 198900. 139000.
 227000. 130500. 175500. 254900. 206000. 140000. 190000. 311500.  88000.
 137450. 318000. 192140. 168500. 149500.  76000. 268000. 180000. 132000.
 184000. 128000.  37900. 136500. 311500. 126000. 176000. 151000. 466500.
 129500. 195000. 205000. 157000. 164500. 136500. 20

In [8]:
# print the top few actual prices from validation data
print(iowa_model.predict(X))

[208500. 175500. 235000. ... 266500. 142125. 147500.]


What do you notice that is different from what you saw with in-sample predictions (which are printed after the top code cell in this page).

Do you remember why validation predictions differ from in-sample (or training) predictions? This is an important idea from the last lesson.

## Step 4: Calculate the Mean Absolute Error in Validation Data


In [13]:
from sklearn.metrics import mean_absolute_error
val_mae = iowa_model.predict(X)
mean_absolute_error(y, val_mae)

# uncomment following line to see the validation_mae
#print(val_mae)
print(mean_absolute_error(val_y, val_predictions))
# Check your answer
#step_4.check()

32966.449315068494


In [None]:
 step_4.hint()
 step_4.solution()

Is that MAE good?  There isn't a general rule for what values are good that applies across applications. But you'll see how to use (and improve) this number in the next step.

# Keep Going

You are ready for **[Underfitting and Overfitting](https://www.kaggle.com/dansbecker/underfitting-and-overfitting).**


---




*Have questions or comments? Visit the [Learn Discussion forum](https://www.kaggle.com/learn-forum/161285) to chat with other Learners.*