# Welcome to Time Series! #

Have you ever wished you could predict the future? In this course you'll learn how!

Time 

You'll learn how to:
- make forecasts with the powerful and easy to use *Prophet* model
- model time-dependent data through feature engineering
- create custom forecasting models
- avoid common mistakes with time dependent data
- forecast multiple series by combining *local* and *global* models

In the exercises you'll apply your new knowledge to the *1C Store Sales* dataset. Afterwards, complete our guided project for the *Predict Future Sales* Learn competition.

This course focuses on *forecasting*, predicting the future, but you can apply many of the techniques you'll learn to other problems like time series classification, too.

# The Nature of Forecating #

A time series is simply data that's been observed over time. Examples of time series are:
- the wholesale price of avocados each week
- the number of visitors to Wikipedia each minute
- the average daily rainfall in Seattle

Time series are *sequential*: the order of the observations matters.

The traditional machine learning problem requires that the data used for training and the data used for inference are drawn from the same population -- that they are, for all purposes, the same. In this sense, the problem of traditional machine learning is *interpolation*: filling in the blanks from the training data to the world at large.   In this sense, a forecasting problem is about *extrapolation*: making predictions about a future population that might look quite different than the one you trained your model on.

Time series are characterized by *change*, change over time. We in particular don't make the assumption that each data point is drawn independently from some larger population, that any particular data point is interchangable with any other. This independence is what allows random sampling techniques like cross-validation or random forests, for instance.    Time series data are *dependent* data. With time series, order matters.

You'll learn in this course techniques to capture this time dependence. You'll learn how, through feature engineering, to transform a forecasting problem into an ordinary supervised machine learning problem. By the end of this course, you'll know how to make forecasts using any of the machine learning libraries you may have used before -- like XGBoost, scikit-learn, or Keras -- and how to scale these techniques to forecast hundreds or thousands of time series simultaneously.

# Time Series Components #

The dependent nature of time series data exhibits itself in some typical ways. 

Two patterns are especially common: trend and seasonality. **Seasonality** in a time series refers to a periodic rise and fall in the series, typically following the calendar or clock... Temperatures tend to rise in the morning and fall at night, and rise in the first half of the year and fall in the second. The **trend** of a time series is the long-term trajectory of the series, any movement in the series over a period longer than its seasonality. Global temperatures rise and fall over the seasons, but have been trending upwards long-term.

What's remaining after the trend and seasonality have been accounted for is an irregular part of the series, sometimes called the *error* or *remainder*. Sometimes the irregular part of a series will contain predictable patterns and can be forecast along with the trend and seasonality.

<img>TS decomposition</img>

Additional characteristics, like **changepoints** and **holidays**.

# Forecasting with Prophet #

Facebook developed their Prophet forecaster for ... It's a great way to get started with time series forecasting.

In *additive* mode, the model is:
`target = trend + seasonality + remainder`

In *multiplicative* mode, the model is:
`target = trend * seasonality * remainder`

<img>Additive and multiplicative change</img>

**Additive change** means the size of the fluctations is *constant*. This kind of change is typical of weather patterns or short-term energy demand. amount of rise and fall doesn't depend on the river's current level.

**Multiplicative change** means the size of the fluctuations is *proportionate to the level* of the series. This kind of change is typical in economic time series: larger populations exhibit larger changes. Percent change means multiplicative change.

Prophet also lets you specify additional properties the trend or seasonality might have. A **trend changepoint** is an abrupt change in the trajectory of a series. You might see a changepoint in the direction of your business's revenue at the start of a new marketing campaign, for instance.

You can also tell Prophet to model **holidays**. Holidays include traditional holidays like New Year's Day, but could be any sort of exceptional event -- a 50%-off sale, a natural disaster, or an election, say.

<img>Changepoints and holidays.</img>

# Example - Tunnel Traffic #

Let's see how Prophet works by applying it to the *Tunnel Traffic* dataset. This dataset contains the daily ... for a tunnel in Switzerland. The next hidden cell loads up the data.

In [None]:
#$HIDE_INPUT$
from pathlib import Path

import pandas as pd
import matplotlib.pyplot as plt

# Set Matplotlib defaults
plt.style.use("seaborn-whitegrid")
plt.rc("figure", autolayout=True, figsize=(11, 5))
plt.rc(
    "axes",
    labelweight="bold",
    labelsize="large",
    titleweight="bold",
    titlesize=14,
    titlepad=10,
)

data_dir = Path("../input/ts-course-data")
tunnel = pd.read_csv(data_dir / "tunnel.csv", parse_dates=["Day"])

plot_params = dict(
    color="0.75",
    style=".-",
    markeredgecolor="0.25",
    markerfacecolor="0.25",
    legend=False,
)

tunnel.set_index("Day").plot(
    title="Number of Vehicles per Day",
    **plot_params,
);

Our data is in a dataframe called `tunnel`. Note the two columns: a column of timestamps, `Day`, and a column of observations, `NumVehicles`.

In [None]:
tunnel.head()

The code cell below illustrates how to instantiate the Prophet model and fit it to the time series.

In [None]:
from fbprophet import Prophet

# Prophet requires the training data to be in a dataframe like this:
df = pd.DataFrame({
    "ds": tunnel.Day,  # column of timestamps named "ds"
    "y": tunnel.NumVehicles,  # column of observations named "y"
})

# Describe trend and seasonality when you instantiate the model
prophet = Prophet(seasonality_mode="additive")
prophet.fit(df)

y_pred = prophet.predict()
prophet.plot_components(y_pred);  # show the trend and seasonality decomposition

Take a look at the components Prophet found. Do you recognize these characteristics in the original series?

Prophet has a special method to create dataframes for forecasting. After fitting the model, call `make_future_dateframe` with the number of timesteps into the future you want predictions for. We'll make a 90-day forecast.

In [None]:
df_future = prophet.make_future_dataframe(periods=90)
y_forecast = prophet.predict(df_future)

prophet.plot(y_forecast, uncertainty=False, figsize=(11, 5));

The fit Prophet found is pretty good: the weekly and yearly seasonality is clearly represented. It does seem to have underestimated the amount of variation, especially around the New Year's holiday. You'll learn how to include such holidays in the model in the exercise.

Examining the residuals of a regression are a great way to see how you might be able to improve your model.

In [None]:
residuals = pd.Series(
    (df.y - y_forecast.yhat[:-90]).to_numpy(),
    index=pd.DatetimeIndex(df.ds),
)
residuals.plot(title="Remainder (Residuals)", **plot_params);

And here we can more clearly see the regular parts of the series Prophet wasn't able to model: the New Year's holiday and perhaps something occuring around April.

---------------------------------------------------------------------------

Prophet is a powerful and flexible model for time series forecasting, and can do much more than we've described here. 

In the next two lessons, we'll see how to replicate Prophet. In lessons 4 and 6, we'll go beyond.

# Your Turn #