# Get started

<a href="https://mybinder.org/v2/gh/tinkoff-ai/etna/master?filepath=examples/get_started.ipynb">
    <img src="https://mybinder.org/badge_logo.svg"  align='left'>
</a>

This notebook contains the simple examples of time series forecasting pipeline
using ETNA library.

**Table of Contents**

* [Creating TSDataset](#chapter1)
* [Plotting](#chapter2)
* [Forecast single time series](#chapter3)
    * [Simple forecast](#section_3_1)
    * [Prophet](#section_3_2)
    * [Catboost](#section_3_3)
* [Forecast multiple time series](#chapter4)
* [Pipeline](#chapter5)

In [None]:
import warnings

warnings.filterwarnings(action="ignore", message="Torchmetrics v0.9")

## 1. Creating TSDataset <a class="anchor" id="chapter1"></a>

Let's load and look at the dataset

In [None]:
import pandas as pd

In [None]:
original_df = pd.read_csv("data/monthly-australian-wine-sales.csv")
original_df.head()

etna_ts is strict about data format:
* column we want to predict should be called `target`
* column with datatime data should be called `timestamp`
* because etna is always ready to work with multiple time series, column `segment` is also compulsory

Our library works with the special data structure TSDataset. So, before starting anything, we need to convert the classical DataFrame to TSDataset.

Let's rename first

In [None]:
original_df["timestamp"] = pd.to_datetime(original_df["month"])
original_df["target"] = original_df["sales"]
original_df.drop(columns=["month", "sales"], inplace=True)
original_df["segment"] = "main"
original_df.head()

Time to convert to TSDataset!

To do this, we initially need to convert the classical DataFrame to the special format.

In [None]:
from etna.datasets.tsdataset import TSDataset

In [None]:
df = TSDataset.to_dataset(original_df)
df.head()

Now we can construct the TSDataset.

Additionally to passing dataframe we should specify frequency of our data.
In this case it is monthly data.

In [None]:
ts = TSDataset(df, freq="1M")

Oups. Let's fix that

In [None]:
ts = TSDataset(df, freq="MS")

We can look at the basic information about the dataset

In [None]:
ts.info()

Or in DataFrame format

In [None]:
ts.describe()

## 2. Plotting <a class="anchor" id="chapter2"></a>

Let's take a look at the time series in the dataset

In [None]:
ts.plot()

## 3. Forecasting single time series <a class="anchor" id="chapter3"></a>

Our library contains a wide range of different models for time series forecasting. Let's look at some of them.

### 3.1 Simple forecast<a class="anchor" id="section_3_1"></a>
Let's predict the monthly values in 1994 in our dataset using the ```NaiveModel```

In [None]:
train_ts, test_ts = ts.train_test_split(
    train_start="1980-01-01",
    train_end="1993-12-01",
    test_start="1994-01-01",
    test_end="1994-08-01",
)

In [None]:
HORIZON = 8
from etna.models import NaiveModel

# Fit the model
model = NaiveModel(lag=12)
model.fit(train_ts)

# Make the forecast
future_ts = train_ts.make_future(future_steps=HORIZON, tail_steps=model.context_size)
forecast_ts = model.forecast(future_ts, prediction_size=HORIZON)

Here we pass `prediction_size` parameter during `forecast` because in `forecast_ts` few first points are dedicated to be a context for `NaiveModel`.

Now let's look at a metric and plot the prediction.
All the methods already built-in in etna.

In [None]:
from etna.metrics import SMAPE

In [None]:
smape = SMAPE()
smape(y_true=test_ts, y_pred=forecast_ts)

In [None]:
from etna.analysis import plot_forecast

In [None]:
plot_forecast(forecast_ts, test_ts, train_ts, n_train_samples=10)

### 3.2 Prophet<a class="anchor" id="section_3_2"></a>

Now try to improve the forecast and predict the values with Prophet.

In [None]:
from etna.models import ProphetModel

model = ProphetModel()
model.fit(train_ts)

# Make the forecast
future_ts = train_ts.make_future(HORIZON)
forecast_ts = model.forecast(future_ts)

In [None]:
smape(y_true=test_ts, y_pred=forecast_ts)

In [None]:
plot_forecast(forecast_ts, test_ts, train_ts, n_train_samples=10)

### 3.2 Catboost<a class="anchor" id="section_3_3"></a>
And finally let's try the Catboost model.

Also etna has wide range of transforms you may apply to your data.

Transforms are not stored in the dataset, so you should pass them explicitly to `transform`, `inverse_transform` and `make_future` methods.

Here how it is done:

In [None]:
from etna.transforms import LagTransform, LogTransform

lags = LagTransform(in_column="target", lags=list(range(8, 24, 1)))
log = LogTransform(in_column="target")
transforms = [log, lags]
train_ts.fit_transform(transforms)

In [None]:
from etna.models import CatBoostMultiSegmentModel

model = CatBoostMultiSegmentModel()
model.fit(train_ts)
future_ts = train_ts.make_future(future_steps=HORIZON, transforms=transforms)
forecast_ts = model.forecast(future_ts)
forecast_ts.inverse_transform(transforms)

In [None]:
from etna.metrics import SMAPE

smape = SMAPE()
smape(y_true=test_ts, y_pred=forecast_ts)

In [None]:
from etna.analysis import plot_forecast

train_ts.inverse_transform(transforms)
plot_forecast(forecast_ts, test_ts, train_ts, n_train_samples=10)

## 4. Forecasting multiple time series <a class="anchor" id="chapter4"></a>

In this section you may see example of how easily etna works
with multiple time series and get acquainted with other transforms etna contains.

In [None]:
original_df = pd.read_csv("data/example_dataset.csv")
original_df.head()

In [None]:
df = TSDataset.to_dataset(original_df)
ts = TSDataset(df, freq="D")
ts.plot()

In [None]:
ts.info()

In [None]:
import warnings

from etna.transforms import (
    MeanTransform,
    LagTransform,
    LogTransform,
    SegmentEncoderTransform,
    DateFlagsTransform,
    LinearTrendTransform,
)

warnings.filterwarnings("ignore")

log = LogTransform(in_column="target")
trend = LinearTrendTransform(in_column="target")
seg = SegmentEncoderTransform()

lags = LagTransform(in_column="target", lags=list(range(30, 96, 1)))
d_flags = DateFlagsTransform(
    day_number_in_week=True,
    day_number_in_month=True,
    week_number_in_month=True,
    week_number_in_year=True,
    month_number_in_year=True,
    year_number=True,
    special_days_in_week=[5, 6],
)
mean30 = MeanTransform(in_column="target", window=30)
transforms = [log, trend, lags, d_flags, seg, mean30]

In [None]:
HORIZON = 30
train_ts, test_ts = ts.train_test_split(
    train_start="2019-01-01",
    train_end="2019-10-31",
    test_start="2019-11-01",
    test_end="2019-11-30",
)
train_ts.fit_transform(transforms)

In [None]:
test_ts.info()

In [None]:
from etna.models.catboost import CatBoostMultiSegmentModel

model = CatBoostMultiSegmentModel()
model.fit(train_ts)
future_ts = train_ts.make_future(future_steps=HORIZON, transforms=transforms)
forecast_ts = model.forecast(future_ts)
forecast_ts.inverse_transform(transforms)

In [None]:
smape = SMAPE()
smape(y_true=test_ts, y_pred=forecast_ts)

In [None]:
train_ts.inverse_transform(transforms)
plot_forecast(forecast_ts, test_ts, train_ts, n_train_samples=20)

## 5. Pipeline <a class="anchor" id="chapter5"></a>

Let's wrap everything into pipeline to create the end-to-end model from previous section.

In [None]:
from etna.pipeline import Pipeline

In [None]:
train_ts, test_ts = ts.train_test_split(
    train_start="2019-01-01",
    train_end="2019-10-31",
    test_start="2019-11-01",
    test_end="2019-11-30",
)

We put: **model**, **transforms** and **horizon** in a single object, which has the similar interface with the model(fit/forecast)

In [None]:
model = Pipeline(
    model=CatBoostMultiSegmentModel(),
    transforms=transforms,
    horizon=HORIZON,
)
model.fit(train_ts)
forecast_ts = model.forecast()

As in the previous section, let's calculate the metrics and plot the forecast

In [None]:
smape = SMAPE()
smape(y_true=test_ts, y_pred=forecast_ts)

In [None]:
plot_forecast(forecast_ts, test_ts, train_ts, n_train_samples=20)