# Welcome to Time Series! #

In this course, you'll learn to:
- a
- b
- c
- d

After completing the course, you'll be able to:
- forecasting large, real-world datasets
- Kaggle forecasting competitions

As part of the exercises, you'll get a chance to participate in our [Store Forecasting](https://www.kaggle.com/c/29781) Getting Started competition. In this competition, you're tasked with forecasting sales for *Corporación Favorita* (a large Ecuadorian-based grocery retailer) in almost 1800 product categories.

# What is a Time Series? #

A **time series** is simply a set of observations recorded over time. Typically, the observations are taken over some common period, like daily or monthly, which we refer to as the "frequency" of the series. In this course, we'll represent a time series as a Pandas `Series` object with either a `PeriodIndex` (spans of time) or a `DatetimeIndex` (instants in time).

Here are a few entries from a time series of monthly automobiles sales in the United States:

In [None]:
#$HIDE_INPUT$
import pandas as pd

# Read 
autos = (
    pd.read_csv(
        "../input/ts-course-data/us-retail-sales.csv",
        usecols=["Month", "Automobiles"],
        parse_dates=["Month"],
        index_col="Month",
    )
    .to_period("M")
    .loc["2009":]
)

autos.head()

# Time Dependence and Serial Dependence #

A time series' index is no less important than its observations. This indexing by time is what allows time series a characteristically rich behavior. It does this in two ways: first, by allowing a relationship between the times in the index and the values of the observations -- a **time dependence**; and, second, by allowing a relationship among observations through the ordering it imposes -- a **serial dependence**.

Rephrasing, a time series exhibits:
- **time dependence** if the value of an observation can be predicted from the time it occured, and,
- **serial dependence** if the value of an observation can be predicted from previous observations.

Practically, this means we can model time dependence by using the time index as a feature and serial dependence by using "lagged" copies of the observations as features. (We'll learn how in future lessons.)

Data visualization helps us to know what kinds of relationships we should try to model. By plotting a time series' observations against its index, we can see its time dependence. By plotting a time series' observations against past observations, we can see its serial dependence.

<figure style="padding: 1em;">
<img src="https://i.imgur.com/qL8h0Qi.png" width=800, alt="">
<figcaption style="textalign: center; font-style: italic"><center><strong>Left: </strong>This series has an obvious dependence on time, both in its upward movement (trend) and in its periodic movement (seasonality). <strong>Right: </strong>
</center></figcaption>
</figure>

Sometimes, some pattern in a time series can appear as both time dependence and serial dependence. The trend in the *Auto Sales* series, for instance, appears both as the upward movement in the figure on the left and as the strong linear correlation in the figure on the right. This is just to say that these two kinds of dependence aren't entirely distinct (which sometimes makes modeling choices difficult), though it is possible to have one without the other.

Time and serial dependence are fundamental to time series and are what make forecasting different from and uniquely challenging among machine learning tasks. Time and serial dependence will inform almost everything we do in this course.

# Time Series Decomposition and Additive Models #

A profitable strategy when working with a time series is to separate out its various patterns of dependence. *Trend* and *seasonality* are two kinds of time dependence especially common in business time series. (We'll learn more about them in Lessons 2 and 3.) They give the "Season-Trend" decomposition:

<figure style="padding: 1em;">
<img src="https://i.imgur.com/7JZjtWu.png" width=800, alt="">
<figcaption style="textalign: center; font-style: italic"><center>Season-Trend decomposition of the <em>Auto Sales</em> time series. The "residuals" are another name for the error component (the "left-over" part).
</center></figcaption>
</figure>

This kind of decomposition is an *additive* decomposition: add up all the components and you get back the original series.

The idea that a time series can be represented as a sum of components motivates most of the models we'll create in this course. The *Prophet* algorithm we use in this lesson models a time series as:

```
target = trend + seasonality + holidays + error
```

where `error` is just anything that wasn't captured by the other components.

In future lessons, we'll create models similar to the Prophet model through a combination of feature engineering and linear regression. We'll also see how we can use this additive decomposition strategy to create *hybrid models* that can accomodate combinations of algorithms, like XGBoost stacked on top of a trend model.

# Example - Tunnel Traffic #

Let's get a start on how to approach these problems by applying Facebook's Prophet forecaster to the *Tunnel Traffic* dataset. *Tunnel Traffic* is a time series describing the number of vehicles traveling through the Baregg Tunnel in Switzerland each day from November 2003 to November 2005. The hidden cell sets up the example.

In [None]:
#$HIDE_INPUT$
from pathlib import Path

import pandas as pd
import matplotlib.pyplot as plt

# Set Matplotlib defaults
plt.style.use("seaborn-whitegrid")
plt.rc("figure", autolayout=True, figsize=(11, 5))
plt.rc(
    "axes",
    labelweight="bold",
    labelsize="large",
    titleweight="bold",
    titlesize=14,
    titlepad=10,
)
%config InlineBackend.figure_format = 'retina'

data_dir = Path("../input/ts-course-data")
tunnel = pd.read_csv(
    data_dir / "tunnel.csv",
    parse_dates=["Day"],
    index_col="Day",
)

plot_params = dict(
    color="0.75",
    style=".-",
    markeredgecolor="0.25",
    markerfacecolor="0.25",
    legend=False,
)

tunnel.plot(
    title="Tunnel Traffic",
    ylabel="Number of Vehicles",
    **plot_params,
);

Our data is in a dataframe called `tunnel`. Note the two columns: a time index `Day`, and a column of observations `NumVehicles`.

In [None]:
tunnel

As mentioned, Prophet is an additive model (similar to ordinary linear regression). We'll create a model using the trend and seasonal components we discussed above (leaving out holidays for now):

```
target = trend + seasonal + error
```

The trend and seasonality are components expressing time dependence, so for this example the only thing we'll need for training data is the target and the time index.

The code cell below illustrates the Prophet workflow:

In [None]:
from fbprophet import Prophet

# Prophet requires the training data to be in a dataframe like this:
df = pd.DataFrame({
    "ds": tunnel.index,  # a column of timestamps named "ds", the only feature
    "y": tunnel.NumVehicles,  # a column of observations named "y", the target
})

# You can customize the Prophet model in a number of ways, but the
# defaults will work well here.
prophet = Prophet(holidays=new_years)
prophet.fit(df)

# Now we create predictions from the training set, the fitted values.
y_fit = prophet.predict()

# Prophet returns a complete decomposition. `yhat` is the overall
# predicted value. `weekly` and `yearly` are seasonal components.
columns = ["ds", "trend", "weekly", "yearly", "yhat"]
y_fit[columns].head()

Let's take a look at the components Prophet found. Do you recognize these characteristics in the original series?

In [None]:
prophet.plot_components(y_fit);

Prophet has a special method to create inputs for forecasting, which we will use to make a 90-day forecast:

In [None]:
# Call `make_future_dataframe` after fitting the model to extend the
# index of the training data. This will extend it 90 days into the
# future:
df_future = prophet.make_future_dataframe(periods=90)

# We create the forecasts the same as before, but this time passing in
# the new dataframe.
y_pred = prophet.predict(df_future)

# Prophet also computes uncertainty intervals, but we'll ignore those.
prophet.plot(y_pred, uncertainty=False, figsize=(11, 5));

The fit Prophet found is pretty good. The weekly and yearly seasonality is clearly represented, though it does seem to have underestimated the amount of variation in some places, especially around the New Year's holiday. If we wanted to improve this model, we could try incorporating some holiday features.

Prophet can model other components besides trend and seasonality, including holidays, changepoints, and categorical features. The library also has a number of useful plotting and diagnostic utilities. You can learn more from the official docs.

# Your Turn #