# Forecasting web traffic with TimeGPT

Our task is to forecast the next 7 days of daily visits to the website [cienciadedatos.net](cienciadedatos.net).

In this tutorial we will show:
- How to load time series data to be used for forecasting with TimeGPT
- How to create cross-validated forecasts with TimeGPT

This tutorial is an adaptation from [Joaquín Amat Rodrigo, Javier Escobar Ortiz, "Forecasting web traffic with machine learning and Python"](https://cienciadedatos.net/documentos/py37-forecasting-web-traffic-machine-learning.html)

## 1. Import packages
First, we import the required packages and initialize the Nixtla client.

In [None]:
#| hide
from nixtla.utils import colab_badge

In [None]:
#| echo: false
colab_badge('docs/how-to-guides/forecasting_web_traffic_with_timegpt')

In [None]:
#| hide
from dotenv import load_dotenv

In [None]:
#| hide
load_dotenv()

In [None]:
import pandas as pd
from nixtla import NixtlaClient

In [None]:
nixtla_client = NixtlaClient(
    # defaults to os.environ.get("NIXTLA_API_KEY")
    api_key = 'my_api_key_provided_by_nixtla'
)

In [None]:
#| hide
nixtla_client = NixtlaClient()

## 2. Load data

We load the website visit data, and set it to the right format to use with TimeGPT. In this case, we only need to add an identifier column for the timeseries, which we will call `daily_visits`.

In [None]:
url = ('https://raw.githubusercontent.com/JoaquinAmatRodrigo/Estadistica-machine-learning-python/' +
       'master/data/visitas_por_dia_web_cienciadedatos.csv')
df = pd.read_csv(url, sep=',', parse_dates=[0], date_format='%d/%m/%y')
df['unique_id'] = 'daily_visits'

df.head(10)

That's it! No more preprocessing is necessary.

## 3. Cross-validation with TimeGPT

We can perform cross-validation on our data as follows:

In [None]:
timegpt_cv_df = nixtla_client.cross_validation(
    df, 
    h=7, 
    n_windows=8, 
    time_col='date', 
    target_col='users', 
    freq='D',
    level=[80, 90, 99.5]
)
timegpt_cv_df.head()

Here, we have performed a rolling cross-validation of 8 folds. Let's plot the cross-validated forecasts including the prediction intervals:

In [None]:
nixtla_client.plot(
    df, 
    timegpt_cv_df.drop(columns=['cutoff', 'users']), 
    time_col='date',
    target_col='users',
    max_insample_length=90, 
    level=[80, 90, 99.5]
)

This looks reasonable, and very comparable to the results obtained [here](https://cienciadedatos.net/documentos/py37-forecasting-web-traffic-machine-learning.html).

Let's check the Mean Absolute Error of our cross-validation:

In [None]:
from utilsforecast.losses import mae

In [None]:
mae_timegpt = mae(df = timegpt_cv_df.drop(columns=['cutoff']),
    models=['TimeGPT'],
    target_col='users')

mae_timegpt

The MAE of our backtest is `167.69`. Hence, not only did TimeGPT achieve a lower MAE compared to the fully customized pipeline [here](https://cienciadedatos.net/documentos/py37-forecasting-web-traffic-machine-learning.html), the error of the forecast is also lower.

#### Exogenous variables

Now let's add some exogenous variables to see if we can improve the forecasting performance further.

We will add weekday indicators, which we will extract from the `date` column.

In [None]:
# We have 7 days, for each day a separate column denoting 1/0
for i in range(7):
    df[f'week_day_{i + 1}'] = 1 * (df['date'].dt.weekday == i)

df.head(10)

Let's rerun the cross-validation procedure with the added exogenous variables.

In [None]:
timegpt_cv_df_with_ex = nixtla_client.cross_validation(
    df, 
    h=7, 
    n_windows=8, 
    time_col='date', 
    target_col='users', 
    freq='D',
    level=[80, 90, 99.5]
)
timegpt_cv_df_with_ex.head()

Let's plot our forecasts again and calculate our error.

In [None]:
nixtla_client.plot(
    df, 
    timegpt_cv_df_with_ex.drop(columns=['cutoff', 'users']), 
    time_col='date',
    target_col='users',
    max_insample_length=90, 
    level=[80, 90, 99.5]
)

In [None]:
mae_timegpt_with_exogenous = mae(df = timegpt_cv_df_with_ex.drop(columns=['cutoff']),
    models=['TimeGPT'],
    target_col='users')

mae_timegpt_with_exogenous

To conclude, we obtain the following forecast results in this notebook:

In [None]:
mae_timegpt['Exogenous features'] = False
mae_timegpt_with_exogenous['Exogenous features'] = True

df_results = pd.concat([mae_timegpt, mae_timegpt_with_exogenous])
df_results = df_results.rename(columns={'TimeGPT':'MAE backtest'})
df_results = df_results.drop(columns={'unique_id'})
df_results['model'] = 'TimeGPT'

df_results[['model', 'Exogenous features', 'MAE backtest']]

We've shown how to forecast daily visits of a website. 

Did you notice how little effort that took? What you did not have to do, is:

- Elaborate data preprocessing - just a table with timeseries is sufficient
- Creating a validation- and test set - TimeGPT handles the cross-validation in a single function
- Choosing and testing different models - It's just a single call to TimeGPT
- Hyperparameter tuning - Not necessary.

Happy forecasting!