# Error Measurement

So far, we have used a variety of measures to compare models or judge how well a model performed its task. Now, we will analyze best practices for judging the accuracy of forecasts, emphasizing the specific issues regarding **time series** data.

For those new to time series forecasting, it is most important to understand that standard cross-validation is typically not recommended. It is not possible to select randomly sampled training, validation, and testing data sets for each of these categories in a time-independent manner.

However, things are even more complicated. You need to think about how different data samples relate to each other over time, even though they appear independent. For example, suppose you are working on a **time series** classification task, so that you have many separate **time series** samples, each of which is its own data point. It may be tempting to think that in this case it is possible to randomly choose **time series** for each training, validation and test set, but this does not work. The problem with this approach is that it does not reflect how you would use your model, i.e. it would not reflect training your model on earlier data nor testing it on later data.

We don't want future information to leak into your model, as modeling doesn't work like that in practice. In turn, this means that the prediction error we measure in our model will be lower during testing than in production, since in testing we will have used cross-validation in our model in order to generate future information.

Let's look at a realistic scenario of how this could happen. Imagine you are training an air quality detector for major cities in the Western US. In your training set, you include all data from 2017 and 2018 for San Francisco, Salt Lake City, Denver, and San Diego. And your test suite, you include the same date range for Las Vegas, Los Angeles, Oakland, and Phoenix. You discover that your air quality model does very well in Las Vegas and Los Angeles measurements, but it does even better in 2018. Great.

Then you try to replicate the model training process on data from previous decades and find that it doesn't perform as well in the test as it does in the training run. So you remember the record-breaking wildfires in Southern California in 2018 and realize that they were "incorporated" into the original test/training because your training set gave you a window into the future. This is precisely why we should avoid standard cross-validation.

There are times when propagating information from the future to choosing a model is not a problem. For example, if you are just trying to understand the dynamics of a **time series** when testing the quality level of a forecast, you are not trying to make a prediction, but rather testing the best possible fit of a given model to the data. In this case, including future data helps you understand the dynamics, although you should be careful about overfitting. And even in this case, there is no doubt that maintaining a valid test set - whose requirement is not to allow information to leak in the future - would still justify concerns about **time series** and cross-validation.

Now that we've clarified things, let's go back to a concrete example of splitting data for training, validating, and testing a model. Next, we'll look more generally at how to determine when a prediction is good enough, or as good as possible. We will also examine how to estimate the uncertainty of our forecast when using techniques that do not directly produce an uncertainty or error measure as part of the output. We will end the chapter with a list of pitfalls that can help with building your **time series** model or preparing to put it into production.