# Model Validation in Python

* **Model Validation** consists of various steps and processes that ensure your model performs as expected on new data. The most common way to do this is to test your model's accuracy (or, insert evaluation metric of your choice) on data it has never seen before (called a **holdout set**). If your model's accuracy is similar for the data it was trained on and the holdout data. You can claim that your model is validated. The ultimate goal of model validation is to end up with the best performing model possible that achieves high accuracy on new data. 

#### Model validation consists of:
    * Ensuring your model performs as expected on new data
    * Testing model performance on holdout datasets
    * Selecting the best model, parameters, and accuracy metrics
    * Achieving the best accuracy for the data given
    
### Scikit-learn modeling review
#### Basic modeling steps
* 1. Create a model by specifying model type and its parameters
* 2. Fit the model using the `.fit()` method
* 3. To assess model accuracy, we generate predictions for data using the `.predict()` method. 
* 4. Look at accuracy metrics.

* **The process of generating a model, fitting, predicting, and then reviewing model accuracy was introduced earlier in:**
    * Intermediate Python
    * Supervised Learning with scikit-learn

```
model = RandomForestRegressor(n_estimators=500, random_state=1111)
model.fit(X= X_train, y= y_train)
predictions = model.predict(X_test)
print("{0:.2f}".format(mae(y_true = y_test, y_pred= predictions)))
```
* **Model validation's main goal is to ensure that a predictive model will perform as expected on new data.**
* Training data = seen data

```
model = RandomForestRegressor(n_estimators=500, random_state=1111)
model.fit(X_train, y_train)
train_predictions = model.predict(X_train)
```

* Testing data = unseen data

```
model = RandomForestRegressor(n_estimators = 500, random_state=1111)
model.fit(X_train, y_train)
test_predictions = model.predict(X_test)
```

* If your training and testing errors are vastly different, it may be a sign that your model is overfitted
* Use model validation to make sure you get the best testing error possible