# Model Evaluation
- Previously we had discussed about In-sample evaluation. Tells us how or model will fit the data used to train it.
- Problem: It does not tell us how the trained model can be used to predict new data.
- But in real world we will be given a new data set which we have not seen before. We need to know how our model will perform on this new data.
- Solution: We can use `train_test_split()` function from `sklearn.model_selection` library to split the data into training and testing data.
- It divides the data into `training set` or `In-Sample data` and `testing set` or `Out-of-sample data`.
- When we split our dataset, usually the larger portion of the dataset is used for `training` and smaller part is used for `testing`.

In [None]:
from sklearn.model_selection import train_test_split
x_train, x_test, y_train, y_test = train_test_split(X, Y, test_size=0.3, random_state=1)

In [None]:
pipe.fit(x_train, y_train)

In [None]:
pipe.score(x_test, y_test)

# Generalization Error
- Generalization error is the error that results from using a model to predict new data.
- It is calculated by comparing the predicted values with the actual values.
- The smaller the difference between the predicted and actual values, the lower the generalization error.
- Generalization error is also known as `out-of-sample error`.
- The goal of any model is to have the lowest possible generalization error.
- Using a lot of data for training and testing will help us to achieve this goal.
- For example, let's say we take a random sample of the data using 90% of the data for training and 10% for testing.
    - The first time we experiment we get a good estimate of the training data.
    - If we experiment again, training the model with a different combination of samples, we also get a good result, but the results will be different relative to the first time we run the experiment.
    - Repeating the experiment again with a different combination of training and testing samples, the results are relatively close to the Generalization error, but distinct from each other.
    - Repeating the process, we get good approximation of the generalization error, but the precision is poor i.e., all the results are extremely different from one another.
    - If we use fewer data points to train the model and more to test the model, the accuracy of the generalization performance will be less, but the model will have good precision.
    - If we use more data points to train the model and less to test the model, the accuracy of the generalization performance will be high, but the model will have poor precision.
    - To overcome this problem, we use `cross validation`.
        - It is one of the most common `out-of-sample evaluation metrics`.
        - In this method, the dataset is split into k-equal groups; each group is referred to as a fold.
        - For example 4 folds.
        - Some of the folds can be used as a training set, which we use to train the model, and the remaining parts are used as a test set, which we use to test the model.
        - For example, we can use three folds for training; then use one fold for testing.
        - This is repeated until each partition is used for both training and testing.
        - At the end, we use the average results as the estimate of `out-of-sample error`.
        - The advantage of this method is that it matters less how the training and testing sets are partitioned.
        - The disadvantage of this method is that it is more computationally expensive than train/test split.
- The Simplest way to apply cross validation is to call the `cross_val_score()` function, which performs multiple `out-of-sample` evaluations.
- This method is imported from `sklearn.model selection` package.

In [None]:
from sklearn.model_selection import cross_val_score
scores = cross_val_score(estimator=pipe, X=X, y=Y, cv=3, n_jobs=1)

- We then use the function `cross_val_score()`: 
    - The first input parameter is the type of model we are using to do the `cross validation`.
    - The second parameter is the `feature data`.
    - The third parameter is the `target data`.
    - The fourth parameter is the `number of folds`. Here, cv = 3, which means the data set is split into 3 equal partitions.
    - The fifth parameter is the `R-squared` scoring we would like to use to evaluate the model.
- The function returns a list of R-squared scores.


- If we want to know the actual predicted values supplied by our model before the R squared values are calculated we use the `cross_val_predict()` function.
- The input parameters are exactly the same as the `cross_val_score()` function, but the output is a prediction.

In [None]:
from sklearn.model_selection import cross_val_predict
y_pred = cross_val_predict(estimator=pipe, X=X, y=Y, cv=3, n_jobs=1)