# Install Prerequisites

This notebook installs packages needed by subsequent notebooks. On your SageMaker notebook instance, you must:

1. open this notebook with the `conda_mxnet_p36` kernel,
2. then execute the cells 

**NOTE**: you must always rerun this notebook after you restart your notebook instance.

In [None]:
%pip uninstall --yes enum34
%pip install gluonts "pandas>=1.0,<1.1" "git+https://github.com/aws-samples/smallmatter-package@master#egg=smallmatter"
%mkdir ../data

Download data

In [None]:
from gluonts.dataset.repository import datasets
from pathlib import Path
dataset = datasets.get_dataset('m4_weekly', path=Path('../data/'))

Next steps:
    
1. Convert to .csv
2. Bump time from start 1750-01-01 00:00:00 to something saner
3. Remember this is weekly, so align to W-SUN
4. Simulate holes.

# Use `gluonts` API to generate synthetic demand .csv

See:

- Generator: [gluonts.dataset.artificial.generate_synthetic.generate_sf2s_and_csv](https://gluon-ts.mxnet.io/api/gluonts/gluonts.dataset.artificial.generate_synthetic.html)
- Customize properties of synthetic timeseries:
  * [gluonts.dataset.artificial.ConstantDataset](https://gluon-ts.mxnet.io/api/gluonts/gluonts.dataset.artificial.html)
  * [gluonts.dataset.artificial.ComplexSeasonalTimeSeries](https://gluon-ts.mxnet.io/api/gluonts/gluonts.dataset.artificial.html)

In [None]:
%load_ext autoreload
%autoreload 2

from gluonts.dataset.artificial import ConstantDataset, ComplexSeasonalTimeSeries
import haha

In [None]:
%rm ../data/haha/input_to_forecast.csv

# Synthetic data of daily-demand of 200 SKUs for year 2019.
# This example uses ConstantDataset

# ComplexSeasonalTimeSeries uses start_y, start_m, start_d = 2013, 11, 28 => Thu
# Next SUN is 1-Dec-2013.
# Next MON is 2-Dec-2013.
# Weeks: 2-8/Dec/2013, 9-15/Dec/2013, ...
haha.generate_daily_data(
    '../data/',  # Need slash at the end
    'haha',
    ComplexSeasonalTimeSeries(
        num_series=2,
        prediction_length=2,
        freq_str="D",
        length_low=20,
        length_high=40,
        min_val = 0,
        is_integer=True,
        proportion_missing_values=0.30,
        is_noise=True,
        is_scale=True,
        percentage_unique_timestamps=0.0,
    ),
    #is_missing=True,
    #num_missing=4,     # Delete every 4th timestamp
    colnames=["sku", "timestamp", "quantity"],
    ts_prefix="item_",
)

#!head ../data/haha/input_to_forecast.csv
#!echo ...
#!tail ../data/haha/input_to_forecast.csv
!cat ../data/haha/input_to_forecast.csv

In [None]:
#%rm -fr ../data/haha/
%rm ../data/haha/input_to_forecast.csv

# Synthetic data of daily-demand of 200 SKUs for year 2019.
# This example uses ConstantDataset
haha.generate_weekly_data(
    '../data/',  # Need slash at the end
    'haha',
    haha.MyConstantDataset(
        freq="W",
        start="2019-01-05",   # W-SUN at ending
        #num_timeseries=200,
        #num_steps=365,
        num_timeseries=100,
        num_steps=10,
        is_random_constant=False,  # If True, takes precedence over is_different_scales
        is_noise=True,
        is_trend=True,
    ),
    #is_missing=False,
    #num_missing=4,   # If is_missing, then delete every 4th timestamp.
    ts_prefix="item_",
)

!head ../data/haha/input_to_forecast.csv
!tail ../data/haha/input_to_forecast.csv

Postprocessing generated .csv: remove NaN to simulate 