**This notebook is an exercise in the [Introduction to Machine Learning](https://www.kaggle.com/learn/intro-to-machine-learning) course.  You can reference the tutorial at [this link](https://www.kaggle.com/dansbecker/underfitting-and-overfitting).**

---


## Recap(요약)
You've built your first model, and now it's time to optimize the size of the tree to make better predictions. Run this cell to set up your coding environment where the previous step left off.     
첫 모델을 만들었으니 더 좋은 예측을 위해 트리의 크기를 최적화할 차례이다. 이 셀을 실행시켜 이전 단계에 멈춘 코딩 환경을 설정하자.

In [19]:
# Code you have previously used to load data
import pandas as pd
from sklearn.metrics import mean_absolute_error
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeRegressor


# Path of the file to read
iowa_file_path = r'C:\Users\kr937\Desktop\drive\Kaggle\train (2).csv'

home_data = pd.read_csv(iowa_file_path)
# Create target object and call it y
y = home_data.SalePrice
# Create X
features = ['LotArea', 'YearBuilt', '1stFlrSF', '2ndFlrSF', 'FullBath', 'BedroomAbvGr', 'TotRmsAbvGrd']
X = home_data[features]

# Split into validation and training data
train_X, val_X, train_y, val_y = train_test_split(X, y, random_state=1)

# Specify Model
iowa_model = DecisionTreeRegressor(random_state=1)
# Fit Model
iowa_model.fit(train_X, train_y)

# Make validation predictions and calculate mean absolute error
val_predictions = iowa_model.predict(val_X)
val_mae = mean_absolute_error(val_predictions, val_y)
print("MAE값: {:,.0f}".format(val_mae))

# Set up code checking
from learntools.core import binder
binder.bind(globals())
from learntools.machine_learning.ex5 import *
print("\nSetup complete")



MAE값: 29,653

Setup complete


# Exercises(연습)
You could write the function `get_mae` yourself. For now, we'll supply it. This is the same function you read about in the previous lesson. Just run the cell below.

스스로 'get_mae'함수를 직접 작성할 수있다. 이제는 이것을 공급할 것이다. 너가 이전단원에서 읽은 함수와 동일하다

In [20]:
def get_mae(max_leaf_nodes, train_X, val_X, train_y, val_y):
    model = DecisionTreeRegressor(max_leaf_nodes=max_leaf_nodes, random_state=0)
    model.fit(train_X, train_y)
    preds_val = model.predict(val_X)
    mae = mean_absolute_error(val_y, preds_val)
    return(mae)

## Step 1: Compare Different Tree Sizes(각각 다른 나무 사이즈와 비교하기)
Write a loop that tries the following values for *max_leaf_nodes* from a set of possible values.
가능한 값 집합에서 'max_Leaf_nodes'에 대해 다음 값을 시도하는 루프를 작성합니다.   

Call the *get_mae* function on each value of max_leaf_nodes. Store the output in some way that allows you to select the value of `max_leaf_nodes` that gives the most accurate model on your data.       
'max_leaf_nodes'의 각 값에 대해 'get_mae' 함수를 호출한다

In [8]:
candidate_max_leaf_nodes = [5, 25, 50, 100, 250, 500]
# Write loop to find the ideal tree size from candidate_max_leaf_nodes
for max_leaf_nodes in candidate_max_leaf_nodes:
    my_mae = get_mae(max_leaf_nodes, train_X, val_X, train_y, val_y)
    print('max_leaf_nodes: %d  \t\t MAE: %d' %(max_leaf_nodes, my_mae))

max_leaf_nodes: 5  		 MAE: 35044
max_leaf_nodes: 25  		 MAE: 29016
max_leaf_nodes: 50  		 MAE: 27405
max_leaf_nodes: 100  		 MAE: 27282
max_leaf_nodes: 250  		 MAE: 27893
max_leaf_nodes: 500  		 MAE: 29454


In [21]:
candidate_max_leaf_nodes = [5, 25, 50, 100, 250, 500]
# Write loop to find the ideal tree size from candidate_max_leaf_nodes
for max_leaf_nodes in candidate_max_leaf_nodes:
    my_mae = get_mae(max_leaf_nodes, train_X, val_X, train_y, val_y)
    print('max_leaf_nodes: %d  \t\t MAE: %d' %(max_leaf_nodes, my_mae))

# Store the best value of max_leaf_nodes (it will be either 5, 25, 50, 100, 250 or 500)
best_tree_size = 100

# Check your answer
step_1.check()

max_leaf_nodes: 5  		 MAE: 35044
max_leaf_nodes: 25  		 MAE: 29016
max_leaf_nodes: 50  		 MAE: 27405
max_leaf_nodes: 100  		 MAE: 27282
max_leaf_nodes: 250  		 MAE: 27893
max_leaf_nodes: 500  		 MAE: 29454


<IPython.core.display.Javascript object>

<span style="color:#33cc33">Correct</span>

In [None]:
# The lines below will show you a hint or the solution.
# step_1.hint() 
# step_1.solution()

## Step 2: Fit Model Using All Data(모든데이터를 사용하여 모델 적합)
You know the best tree size. If you were going to deploy this model in practice, you would make it even more accurate by using all of the data and keeping that tree size.  That is, you don't need to hold out the validation data now that you've made all your modeling decisions.   
최적의 나무사이즈를 알았다. 실제로 이 모델을 배포할려면 모든데이터를 사용하고 나무 크기를 유지하여 더 정확하게 모델을 구축할 수 있다. 모델링 결정을 했으므로 더이상 검증데이터를 보류할 필요가 없다.

In [23]:
# Fill in argument to make optimal size and uncomment
final_model = DecisionTreeRegressor(max_leaf_nodes=best_tree_size, random_state=1)

# fit the final model and uncomment the next two lines
final_model.fit(X, y)

# Check your answer
step_2.check()

<IPython.core.display.Javascript object>

<span style="color:#33cc33">Correct</span>

In [25]:
y

0       208500
1       181500
2       223500
3       140000
4       250000
         ...  
1455    175000
1456    210000
1457    266500
1458    142125
1459    147500
Name: SalePrice, Length: 1460, dtype: int64

In [24]:
final_model.predict(X)

array([209133.65384615, 146415.0075188 , 209133.65384615, ...,
       238763.63636364, 130629.        , 146415.0075188 ])

In [16]:
step_2.hint()
step_2.solution()

<IPython.core.display.Javascript object>

<span style="color:#3366cc">Hint:</span> Fit with the ideal value of max_leaf_nodes. In the fit step, use all of the data in the dataset

<IPython.core.display.Javascript object>

<span style="color:#33cc99">Solution:</span> 
```python
# Fit the model with best_tree_size. Fill in argument to make optimal size
final_model = DecisionTreeRegressor(max_leaf_nodes=best_tree_size, random_state=1)

# fit the final model
final_model.fit(X, y)
```

You've tuned this model and improved your results. But we are still using Decision Tree models, which are not very sophisticated by modern machine learning standards. In the next step you will learn to use Random Forests to improve your models even more.

# Keep Going

You are ready for **[Random Forests](https://www.kaggle.com/dansbecker/random-forests).**


---




*Have questions or comments? Visit the [course discussion forum](https://www.kaggle.com/learn/intro-to-machine-learning/discussion) to chat with other learners.*