# Practical Assigment

- **DEADLINE**: 23/06/2023
- Task: Time Series Forecasting on a Synthetic Data Set
- Data: please see `train.csv` available on Brightspace
- [IMPORTANT] Specifications/requirements:
  * You are required to implement a **recurrent neural network** in **PyTorch**, which takes as input,
  * a recent history of time step $t$, e.g., ... , $t-3$, $t-2$, $t-1$, $t$.
  * to predict **five** time step in the future, i.e., $t+1$, $t+2$, $t+3$, $t+4$, $t+5$.
  * You can use any recurrent NN models taught from the class.
  * You could choose the length of the history fed into the model by yourselves.
  * The resulting code structure should contain
    1. `model.py` -> the implementation of your own RNN model;
    2. `train.py` -> the training code, which can be executed from the command line by `python train.py`;
    3. `test.py` -> the testing code, which tests the trained model on the testing data set and save the performance score. You have to adjust the given code snippet (see below) to your need/implementation;
    4. `requirements.txt` that lists the Python packages your are using, including the version information.
    5. `README.txt` that describes the basics of your implementation. Please see below.
  * You need to submit your source code and **a dumpy file of the best model you ever trained**. When handing in the assigment, please put `model.py`, `train.py`, `requirements.txt`, and the model dump file in the same folder named by your student ID. Please see [https://pytorch.org/tutorials/beginner/saving_loading_models.html](https://pytorch.org/tutorials/beginner/saving_loading_models.html) for a tutorial on how to save/load the model.
  * You should include a `README.txt` file in the submission, describing the algorithm you implemented. A short description is sufficient, e.g.,
    * if you perform any data cleaning and preprocessing, please mention it here.
    * which RNN model you implemented and description of its architecture, i.e., how many layers, how many neurons/units per layer
    * any other specific details of your implementation.
- Please submit your code to: w.b.saib@liacs.leidenuniv.nl
- The practical assignment accounts for 30% of the final grade.
- When training your RNN model locally on `train.csv`, we suggest to use the [Mean Absolute Percentage Error (MAPE)](Mean Absolute Percentage Error) metric to track the performance since we will use this metric to evaluate your model (see below)
- Evaluation criteria:
  * Your `train.py` should be executable - We will contact you in case a bug is encountered. In this case, you will have one chance to fix it, with a penalty of 1 out of 10.
  * We will execute your `train.py` on the training data set `train.csv`, checking against bugs.
  * We will load your best saved model and evaluate it on a testing data set hidden to you.
  * Any bugs occur in the evaluation phase will incur a penalty of 1 out of 10.
  <!-- The evaluation performance - MAPE - on the testing data will be ranked and the top-10 students will get a bonus of 2 of 10.  -->

## Data set

As you can see from below, the training set is simple - it contains the timestamp at which the target column `number_sold` is recorded. The forecasting task is to take some historical records to predict the value of `number_sold` in the future.

Please keep in mind that there are two extra columns indicating the location (`store`) and type (`product`) of the selling event.

In [None]:
import pandas as pd

df = pd.read_csv("train.csv")
df.head()

Unnamed: 0,Date,store,product,number_sold
0,2010-01-01,0,0,801
1,2010-01-02,0,0,810
2,2010-01-03,0,0,818
3,2010-01-04,0,0,796
4,2010-01-05,0,0,808


## Code Snippet for the testing file

In [None]:
import numpy as np
import pandas as pd
from sklearn.metrics import mean_absolute_percentage_error

df = pd.read_csv("test_example.csv")
model = None  # load your model here
window_size = 10  # please fill in your own choice: this is the length of history you have to decide

# split the data set by the combination of `store` and `product``
gb = df.groupby(["store", "product"])
groups = {x: gb.get_group(x) for x in gb.groups}
scores = {}

for key, data in groups.items():
    # By default, we only take the column `number_sold`.
    # Please modify this line if your model takes other columns as input
    X = data.drop(["Date", "store", "product"], axis=1).values  # convert to numpy array
    N = X.shape[0]  # total number of testing time steps

    mape_score = []
    start = window_size
    # prediction by window rolling
    while start + 5 <= N:
        inputs = X[(start - window_size) : start, :]
        targets = X[start : (start + 5), :]

        # you might need to modify `inputs` before feeding it to your model, e.g., convert it to PyTorch Tensors
        # you might have a different name of the prediction function. Please modify accordingly
        predictions = model.predict(inputs)
        start += 5
        # calculate the performance metric
        mape_score.append(mean_absolute_percentage_error(targets, predictions))
    scores[key] = mape_score

# save the performance metrics to file
np.savez("score.npz", scores=scores)