In studies on machine learning and statistical analysis, the focus is predominantly on the performance of models in terms of accuracy. While accuracy should typically be the primary concern when evaluating a model, sometimes computational performance considerations are imperative when looking at large data sets or models that are widely deployed to serve large populations of client applications.

**Time series** data sets become so large that you simply cannot perform any analysis - or cannot do them correctly - as they are quite demanding in terms of available computational resources. In these cases, many organizations do the following:

- maximized computational resources (expensive and often wasteful, both economic and environmental);
- conduct a project poorly (insufficient hyperparameter adjustment, insufficient data, etc...);
- they do not create a project;

Neither of these options is satisfactory, especially when you are just starting out with a new data set or new analytical technique. It can be frustrating not knowing whether your failures are the result of bad data, a thorny problem, or a lack of resources. Fortunately, we will cover some workarounds to expand your options for very demanding analyzes or huge data sets.

The purpose of the notebook is to guide you with some considerations on how to reduce the computational resources required for training or inference on a specific model. Most of the time, these questions are specific to a particular data set, as well as the resources you have available and your accuracy and speed goals. In this chapter, we will address these concerns, with the hope that they partially cover the problems you encounter and can inspire future brainstorming. These considerations will come to the fore when you have completed your first few rounds of analysis and modeling and should not be a priority when you are dealing with a problem for the first time. However, when the time comes to put something into production or extend a small research project, you should revisit these concerns frequently.

## Working with Tools Built for General Use Cases

One challenge with time series data is that most tools, particularly those for machine learning, are built for a more general use case, and most illustrative examples show the use of cross-sectional data. But these machine learning methods are not as efficient with **time series** data. The solutions to your individual problems will vary, but the general ideas are the same. 

### Models Built for Cross-Sectional Data Do Not "Share" Data Across Samples

In many cases, when feeding discrete samples of **time series** data to an algorithm, most often machine learning models, you will notice that large chunks of data being fed between the samples overlap. For example, suppose you have the following data on monthly widget sales:


| Month    | Sold Widgets |
|----------|------------:|
| Jan 2014 | 11,221 |
| Feb 2014 |  9,880 |
| Mar 2014 | 14,423 |
| Apr 2014 | 16,720 |
| May 2014 | 17,347 |