# Sequence Processing using Univariate Time-Series

_**Analyze a simple time-series dataset and perform univariate time-series forecasting with a simple Recurrent Neural Network (RNN).**_

The following experiment considers Chicago Transit Authority (CTA) daily ridership dataset available at   (CTA). This dataset shows system-wide boardings for both bus and rail services provided by CTA and it is available at https://data.cityofchicago.org/. The dataset having updates till August 1, 2024 was considered in this experiment. Note that attribute value **W**, **A** ans **U** in attribute **day_type** represent **Weekday**, **Saturday** and **Sunday/Holiday**, respectively.

In [2]:
# Imports required packages

import pandas as pd
import tensorflow as tf
import matplotlib.pyplot as plt

## Retrieving & Analysing Dataset

In [4]:
# Load the ridership dataset
ridership = pd.read_csv("./data/CTA-Ridership-Daily_Boarding_Totals_20240829.csv", parse_dates=["service_date"])

In [None]:
# Print the dataset

<code here>

In [9]:
# Set column "service_date" as index to make date/time related operations easier
ridership = ridership.sort_values("service_date").set_index("service_date")

In [None]:
# Print the dataset to check for new index

<code here>

In [13]:
# Drop the calculated column "total_rides" as this is just element-wise addition from columns "bus" and "rail_boardings".

ridership = ridership.drop("total_rides", axis=1)

In [15]:
# Remove duplicate observations, if any
ridership = ridership.drop_duplicates()

In [None]:
# Print shape of the dataset

<code here>

In [None]:
# Look at the ridership for March, April and May of 2019

ridership["2019-03":"2019-05"].plot(grid=True, marker=".", figsize=(8, 3.5))

plt.show()

Looking at the above figure, weekly seasonality was observed.

## Univariate Forecasting usning Simple RNN
Forecasting tomorrow's rail ridership based (only) on rail ridership [single variable as input] of the past 8 weeks (56 days). 

The inputs to our model will be sequences, each containing 56 values from time steps _t_ – 55 to t. For each input sequence, the model will output a single value as a forecast for time step _t_ + 1.

**Prepares Datasets for Modeling**

In [36]:
# Splits the time-series into three periods, for training, validation and testing
# The values are scaled down by a factor of one million, to ensure the values are near the 0–1 range

rail_train = ridership["rail_boardings"]["2016-01":"2018-12"] / 1e6  # 3 years
rail_val = ridership["rail_boardings"]["2019-01":"2019-05"] / 1e6    # 5 months
rail_test = ridership["rail_boardings"]["2019-06":] / 1e6            # remaining period from 2019-06

In [None]:
# Prepares TensorFlow specific datasets

seq_length = 56    # represents sequence of past 8 weeks (56 days) of ridership data

tf.random.set_seed(42)

rail_train_ds = tf.keras.utils.timeseries_dataset_from_array(
    rail_train.to_numpy(),
    targets=rail_train[seq_length:],
    sequence_length=seq_length,
    batch_size=32,
    shuffle=True,  # shuffling in training windows is recommended for gradient descent optimizer
    seed=42
)

rail_val_ds = tf.keras.utils.timeseries_dataset_from_array(
    rail_val.to_numpy(),
    targets=rail_val[seq_length:],
    sequence_length=seq_length,
    batch_size=32,
    shuffle=False  # shuffling is not required for any testing data including validation data
)

In [45]:
# Resets all the keras states
tf.keras.backend.clear_session()

tf.random.set_seed(42)

# Creates an RNN with 32 recurrent neurons followed by a dense output layer with one output neuron
univar_simple_rnn = tf.keras.Sequential([
    # Instantiate a "SimpleRNN" layer with 32 units as output and input shape as [None, 1]
    <code here>

    # Instantiate a "Dense" layer with 1 unit as output
    <code here>
])

In [None]:
# Initialize callback to stop training when model does improve after a certain number of training iterations
early_stopping_callback = tf.keras.callbacks.EarlyStopping(
    monitor="val_mae", patience=50, restore_best_weights=True)

# Instantiate SGD optimizer with 0.05 as learning rate and 0.9 as momentum
optimizer = <code here>

# Compile the model with "huber" as loss function, already created optimizer and "mae" as metric
<code here>

# Fit the model already created training dataset, 500 as epochs, validation dataset and early stopping callback
history = <code here>

In [None]:
# After training, evaluate model's performance on validation data using <model>.evaluate method
# Show the metric with factor i.e. 1e6 multiplied during scaling

val_loss, val_mae = <code here>
print("Validation MAE of the Simple RNN:", val_mae * 1e6)

**Note down the univariate model's performance.**

**Observations:**

Note down all your observations in green/blue book.