# Neural Networks and Reinforcement Learning: Advanced Architectures and Applications

This notebook demonstrates advanced neural network architectures for sequence modeling (RNN, LSTM, NARX) using PyTorch and real-world Kaggle datasets, as well as reinforcement learning algorithms (DQN, Policy Gradient, Actor-Critic) using stable-baselines3 and OpenAI Gym.

**Author:** Carlos Andrés Sierra, M.Sc.

**Course:** Systems Sciences Foundations

**Universidad Distrital Francisco José de Caldas**

## Table of Contents

1. [Setup and Requirements](#setup)
2. [Neural Networks](#nn)
    - [Recurrent Neural Network (RNN)](#rnn)
    - [Long Short-Term Memory (LSTM)](#lstm)
    - [Nonlinear Autoregressive Networks (NARX)](#narx)
3. [Reinforcement Learning](#rl)
    - [Deep Q-Networks (DQN)](#dqn)
    - [Policy Gradient Methods](#pg)
    - [Actor-Critic Methods](#ac)

<a id="setup"></a>
## Setup and Requirements

Install and import the required libraries for neural networks, data handling, and reinforcement learning. This includes PyTorch, pandas, matplotlib, kaggle, and stable-baselines3. You will also need to set up the Kaggle API for dataset downloads.

In [None]:
# Install required packages
!pip install scikit-learn torch torchtext pandas matplotlib  stable-baselines3[extra] gym

In [None]:
# Import libraries
import torch
import torch.nn as nn
import torch.optim as optim
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.preprocessing import MinMaxScaler
import os

# For RL
import gym
from stable_baselines3 import DQN, PPO, A2C

plt.style.use('seaborn-v0_8-darkgrid')
plt.rcParams['figure.figsize'] = [12, 7]

<a id="nn"></a>
# Neural Networks

This section covers advanced neural network architectures for sequence modeling using PyTorch and real-world datasets from Kaggle.

<a id="rnn"></a>
## Recurrent Neural Network (RNN)

We will use the [Household Electric Power Consumption Dataset](https://www.kaggle.com/datasets/uciml/electric-power-consumption-data-set) from Kaggle. The goal is to predict future power consumption using a simple RNN.

**Steps:**
1. Download and preprocess the dataset.
2. Build a PyTorch RNN for sequence prediction.
3. Train and evaluate the model.

In [None]:
def load_data() -> tuple:
    """Load and preprocess household power consumption data.
    
    
    This function reads the data from a CSV file, processes it, and returns a DataFrame
    with scaled numerical values and datetime as the index.

    Returns:
        A DataFrame containing the scaled household power consumption data and the scaler used for scaling.
    """
    df = pd.read_csv('household_power_consumption.txt', sep=';', na_values=['?'])

    # Combine date and time, drop missing values
    df['datetime'] = pd.to_datetime(df['Date'] + ' ' + df['Time'], format='%d/%m/%Y %H:%M:%S')
    df = df.drop(['Date', 'Time'], axis=1)
    df = df.dropna()

    df[num_cols] = df[num_cols].astype(float)

    # Scale all numerical columns
    scaler = MinMaxScaler()
    df_scaled = df.copy()
    df_scaled[num_cols] = scaler.fit_transform(df[num_cols])

    # Set datetime as index (optional)
    return df_scaled.set_index('datetime'), scaler

# Convert all relevant columns to float
num_cols = ['Global_active_power', 'Global_reactive_power', 'Voltage', 
            'Global_intensity', 'Sub_metering_1', 'Sub_metering_2', 'Sub_metering_3']
    
df_scaled, scaler = load_data()
print(df_scaled.head())

In [None]:
# Define datasets

def create_sequences(data: np.ndarray, seq_length: int) -> tuple:
    """
    This function createa sequences of data for RNN input.
    It takes a numpy array and a sequence length as input and returns
    a tuple of sequences and targets.

    Args:
        data (np.ndarray): Input data.
        seq_length (int): Length of each sequence.
    Returns:
        tuple: Tuple of (sequences, targets).
    """
    xs, ys = [], []
    for i in range(len(data) - seq_length):
        x = data[i:(i+seq_length)]
        y = data[i+seq_length, 0]  # Predict only the first column (target)
        xs.append(x)
        ys.append(y)
    return np.array(xs), np.array(ys)

# Function to split data into train and test sets by percentage
def train_test_split_sequences(X: np.ndarray, y: np.ndarray, train_pct: float = 0.8) -> tuple:
    """
    This function aplita the data into training and testing sets based on the given percentage.

    Args:
        X (np.ndarray): Input features.
        y (np.ndarray): Target values.
        train_pct (float): Percentage of data to use for training.
    Returns:
        tuple: Tuple of (X_train, y_train, X_test, y_test).
    """
    n_train = int(len(X) * train_pct)
    X_train, y_train = X[:n_train], y[:n_train]
    X_test, y_test = X[n_train:], y[n_train:]
    return X_train, y_train, X_test, y_test

In [None]:
# Define a simple RNN model and train on the scaled dataframe

class SimpleRNN(nn.Module):

    def __init__(self, input_size: int, hidden_size: int = 32, num_layers: int = 1):
        """
        Initializes the SimpleRNN model.
        Args:
            input_size (int): Number of input features.
            hidden_size (int): Number of features in the hidden state.
            num_layers (int): Number of recurrent layers.
        """
        super().__init__()
        self.rnn = nn.RNN(input_size, hidden_size, num_layers, batch_first=True)
        self.fc = nn.Linear(hidden_size, 1)

    def forward(self, x: torch.Tensor) -> torch.Tensor:
        """
        Forward pass through the RNN and fully connected layer.

        Args:
            x (torch.Tensor): Input tensor of shape (batch_size, seq_length, input_size).
        Returns:
            Output tensor after passing through the RNN and fully connected layer.
        """
        out, _ = self.rnn(x)
        out = out[:, -1, :]
        return self.fc(out).squeeze()


def train_rnn(model: nn.Module, X_train_t: torch.Tensor, y_train_t: torch.Tensor, epochs: int = 20) -> list:
    """
    Train the RNN model on the training data.

    Args:
        model: The RNN model to train.
        X_train_t (torch.Tensor): Training input data.
        y_train_t (torch.Tensor): Training target data.
        epochs (int): Number of training epochs.

    Returns:
        List of training losses for each epoch.
    """
    losses = []
    
    for epoch in range(epochs):
        model.train()
        optimizer.zero_grad()
        output = model(X_train_t)
        loss = criterion(output, y_train_t)
        loss.backward()
        optimizer.step()
        losses.append(loss.item())
        if (epoch+1) % 5 == 0:
            print(f"Epoch {epoch+1}, Loss: {loss.item():.4f}")
    return losses


# Prepare data
seq_length = 24
data = df_scaled[num_cols].values[:50000]  # Use all features
X, y = create_sequences(data, seq_length)
X_train, y_train, X_test, y_test = train_test_split_sequences(X, y, train_pct=0.8)

# Convert to PyTorch tensors
X_train_t = torch.tensor(X_train, dtype=torch.float32)
y_train_t = torch.tensor(y_train, dtype=torch.float32)
X_test_t = torch.tensor(X_test, dtype=torch.float32)
y_test_t = torch.tensor(y_test, dtype=torch.float32)

# Initialize model, loss, and optimizer
model = SimpleRNN(input_size=len(num_cols))
criterion = nn.MSELoss()
optimizer = optim.Adam(model.parameters(), lr=0.01)


losses = train_rnn(model, X_train_t, y_train_t, epochs=50)

# Plot training loss
epochs = np.arange(1, len(losses) + 1)
plt.plot(epochs, losses)
plt.xlabel('Epoch')
plt.ylabel('MSE Loss')
plt.title('RNN Training Loss')
plt.xticks(epochs)
plt.show()

In [None]:
# Evaluate on test set and plot results with x axis labels starting from 1 (no zero)
model.eval()
with torch.no_grad():
    pred = model(X_test_t).numpy()
    true = y_test_t.numpy()
    # Inverse transform only the target column using the same scaler and num_cols
    pred_inv = scaler.inverse_transform(
        np.concatenate([pred.reshape(-1, 1), np.zeros((pred.shape[0], len(num_cols)-1))], axis=1)
    )[:, 0]
    true_inv = scaler.inverse_transform(
        np.concatenate([true.reshape(-1, 1), np.zeros((true.shape[0], len(num_cols)-1))], axis=1)
    )[:, 0]

# Set x axis to start at 1 and increment by 1 for better visualization
x = np.arange(1, len(true_inv) + 1)
plt.plot(x, true_inv, label='True')
plt.plot(x, pred_inv, label='Predicted')
plt.title('RNN: Power Consumption Prediction')
plt.xlabel('Time Step')
plt.ylabel('Global Active Power (kW)')
plt.xticks(x[::500])  # Show every time step as a tick, starting from 1
plt.legend()
plt.show()

<a id="lstm"></a>
## Long Short-Term Memory (LSTM)

We will use the [Air Passengers Dataset](https://www.kaggle.com/datasets/rakannimer/air-passengers) from Kaggle. The goal is to forecast stock prices using an LSTM.

**Steps:**
1. Download and preprocess the dataset.
2. Build a PyTorch LSTM for sequence forecasting.
3. Train and evaluate the model.

In [None]:
# Download and load stock dataset
file = "AirPassengers.csv"
    
if not os.path.exists(file):
    print("Not dataset found.")
    
def load_air_passengers():
    """
    Load and preprocess the Air Passengers dataset.
    Returns the scaled dataframe, scaler, and original dataframe.
    """
    passenger_df = pd.read_csv(file, parse_dates=['Month'])
    passenger_df = passenger_df.set_index('Month')
    passenger_df = passenger_df.rename(columns={'#Passengers': 'Passengers'})
    
    # Add lagged features and rolling mean as extra columns
    passenger_df['lag1'] = passenger_df['Passengers'].shift(1)
    passenger_df['lag2'] = passenger_df['Passengers'].shift(2)
    passenger_df['rolling_mean_3'] = passenger_df['Passengers'].rolling(window=3).mean()
    passenger_df = passenger_df.dropna()
    
    scaler = MinMaxScaler()
    feature_cols = ['Passengers', 'lag1', 'lag2', 'rolling_mean_3']
    passenger_df[feature_cols] = scaler.fit_transform(passenger_df[feature_cols])
    return passenger_df, scaler, feature_cols

def create_sequences(data: np.ndarray, seq_length: int) -> tuple:
    """
    Create sequences for multivariate time series data.
    Args:
        data (np.ndarray): Input data.
        seq_length (int): Length of each sequence.
    Returns:
        Tuple of (sequences, targets).
    """
    xs, ys = [], []
    for i in range(len(data) - seq_length):
        x = data[i:(i+seq_length), :]
        y = data[i+seq_length, 0]  # Predict only the original Passengers column
        xs.append(x)
        ys.append(y)
    return np.array(xs), np.array(ys)

def train_test_split_sequences(X: np.ndarray, y: np.ndarray, train_pct: float = 0.8) -> tuple:
    """
    Split sequences into train and test sets.
    Args:
        X (np.ndarray): Input sequences.
        y (np.ndarray): Target values.
        train_pct (float): Percentage of data for training.
    Returns:
        Tuple of (X_train, y_train, X_test, y_test).
    """
    n_train = int(len(X) * train_pct)
    X_train, y_train = X[:n_train], y[:n_train]
    X_test, y_test = X[n_train:], y[n_train:]
    return X_train, y_train, X_test, y_test

# Load and preprocess data
passengers_df, scaler, feature_cols = load_air_passengers()
data = passengers_df[feature_cols].values
seq_length = 12  # Use past 12 months to predict next month
X, y = create_sequences(data, seq_length)
X_train, y_train, X_test, y_test = train_test_split_sequences(X, y, train_pct=0.7)

# Convert to PyTorch tensors
X_train_t = torch.tensor(X_train, dtype=torch.float32)
y_train_t = torch.tensor(y_train, dtype=torch.float32)
X_test_t = torch.tensor(X_test, dtype=torch.float32)
y_test_t = torch.tensor(y_test, dtype=torch.float32)

In [None]:
class SimpleLSTM(nn.Module):
    """This class defines a simple LSTM model for time series prediction."""

    def __init__(self, input_size: int = 1, hidden_size: int = 32, num_layers: int = 1):
        """
        Initializes the SimpleLSTM model.

        Args:
            input_size (int): Number of input features.
            hidden_size (int): Number of features in the hidden state.
            num_layers (int): Number of recurrent layers.
        """
        super().__init__()
        self.lstm = nn.LSTM(input_size, hidden_size, num_layers, batch_first=True)
        self.fc = nn.Linear(hidden_size, 1)

    def forward(self, x: torch.Tensor) -> torch.Tensor:
        """
        Forward pass through the LSTM and fully connected layer.
        Args:
            x (torch.Tensor): Input tensor of shape (batch_size, seq_length, input_size).
        Returns:
            Output tensor after passing through the LSTM and fully connected layer.
        """
        out, _ = self.lstm(x)
        out = out[:, -1, :]
        return self.fc(out).squeeze()


# Initialize model, loss, and optimizer
model = SimpleLSTM(input_size=len(feature_cols))
criterion = nn.MSELoss()
optimizer = optim.Adam(model.parameters(), lr=0.01)

def train_lstm(model: nn.Module, X_train_t: torch.Tensor, y_train_t: torch.Tensor, epochs: int = 20) -> list:
    """
    Train the LSTM model and print loss every 5 epochs.

    Args:
        model: The LSTM model to train.
        X_train_t (torch.Tensor): Training input data.
        y_train_t (torch.Tensor): Training target data.
        epochs (int): Number of training epochs.

    Returns:
        List of training losses for each epoch.
    """
    losses = []
    for epoch in range(epochs):
        model.train()
        optimizer.zero_grad()
        output = model(X_train_t)
        loss = criterion(output, y_train_t)
        loss.backward()
        optimizer.step()
        losses.append(loss.item())
        if (epoch+1) % 5 == 0:
            print(f"Epoch {epoch+1}, Loss: {loss.item():.4f}")
    return losses

losses = train_lstm(model, X_train_t, y_train_t, epochs=50)


# Plot training loss
epochs_arr = np.arange(1, len(losses) + 1)
plt.plot(epochs_arr, losses, marker='o')
plt.xlabel('Epoch')
plt.ylabel('MSE Loss')
plt.title('LSTM Training Loss')
plt.xticks(epochs_arr)
plt.grid(True)
plt.show()

In [None]:
# Evaluate on test set and plot predictions
model.eval()
with torch.no_grad():
    pred = model(X_test_t).numpy()
    true = y_test_t.numpy()
    # Inverse transform only the target column
    pred_inv = scaler.inverse_transform(
        np.concatenate([pred.reshape(-1, 1), np.zeros((pred.shape[0], len(feature_cols)-1))], axis=1)
    )[:, 0]
    true_inv = scaler.inverse_transform(
        np.concatenate([true.reshape(-1, 1), np.zeros((true.shape[0], len(feature_cols)-1))], axis=1)
    )[:, 0]

x = np.arange(1, len(true_inv) + 1)
plt.plot(x, true_inv, label='True')
plt.plot(x, pred_inv, label='Predicted')
plt.title('LSTM: Air Passengers Prediction (Multivariate)')
plt.xlabel('Time Step')
plt.ylabel('Number of Passengers')
plt.xticks(x)
plt.legend()
plt.grid(True)
plt.show()

<a id="narx"></a>
## Nonlinear Autoregressive Networks (NARX)

We will use the [Daily Minimum Temperatures in Melbourne](https://www.kaggle.com/datasets/paulbrabban/daily-minimum-temperatures-in-melbourne) dataset. We'll implement a NARX-like architecture in PyTorch for multi-step forecasting.

**Steps:**
1. Download and preprocess the dataset.
2. Implement a NARX-like model (feedforward with lagged inputs).
3. Demonstrate multi-step forecasting.

In [None]:
# Download and load temperature dataset 
file = "daily-minimum-temperatures-in-me.csv"

if not os.path.exists(file):
    print("Not dataset found.")

def load_temperature_data() -> tuple:
    """
    Load and preprocess the daily minimum temperatures dataset.
    
    Returns:
        A DataFrame containing the scaled temperature data and the scaler used for scaling.
    """
    df_temp = pd.read_csv(file, parse_dates=['Date'])
    df_temp = df_temp.sort_values('Date')
    df_temp = df_temp.set_index('Date')
    df_temp = df_temp.rename(columns={'Daily minimum temperatures in Melbourne, Australia, 1981-1990': 'Temperature'})
    
    df_temp = df_temp[~df_temp['Temperature'].astype(str).str.contains(r'\?', regex=True)].dropna()
    # Convert to float and fill missing values with the mean
    df_temp['Temperature'] = df_temp['Temperature'].astype(float) 
    df_temp['Temperature'] = df_temp['Temperature'].fillna(df_temp['Temperature'].mean())
    
    # Scale the temperature column
    scaler_temp = MinMaxScaler()
    df_temp['temp'] = scaler_temp.fit_transform(df_temp[['Temperature']])
    
    df_temp = df_temp[['temp']]
    print(df_temp.head())
    return df_temp, scaler_temp


def create_narx_sequences(data: np.ndarray, input_lags=10, output_lags=5) -> tuple:
    """
    Create NARX sequences for time series data.

    Args:
        data (np.ndarray): Input data.
        input_lags (int): Number of input lags.
        output_lags (int): Number of output lags.

    Returns:
        Tuple of (input sequences, output sequences).
    """
    xs, ys = [], []
    for i in range(len(data) - input_lags - output_lags):
        x = data[i:(i+input_lags)]
        y = data[(i+input_lags):(i+input_lags+output_lags)]
        xs.append(x)
        ys.append(y)
    return np.array(xs), np.array(ys)


input_lags = 10
output_lags = 5
df_temp, scaler_temp = load_temperature_data()
data = df_temp['temp'].values
X, y = create_narx_sequences(data, input_lags, output_lags)

X_train, y_train = X[:2750], y[:2750]
X_test, y_test = X[2750:], y[2750:]

X_train_t = torch.tensor(X_train, dtype=torch.float32)
y_train_t = torch.tensor(y_train, dtype=torch.float32)
X_test_t = torch.tensor(X_test, dtype=torch.float32)
y_test_t = torch.tensor(y_test, dtype=torch.float32)

In [None]:
class NARXNet(nn.Module):
    """This class implements a NARX-like model using a multi-layer perceptron (MLP). """

    def __init__(self, input_lags: int, output_lags: int, hidden_size: int = 32):
        """
        Initializes the NARXNet model.
        
        Args:
            input_lags (int): Number of input lags.
            output_lags (int): Number of output lags.
            hidden_size (int): Number of features in the hidden layer.
        """
        super().__init__()
        self.fc1 = nn.Linear(input_lags, hidden_size)
        self.relu = nn.ReLU()
        self.fc2 = nn.Linear(hidden_size, output_lags)
    
    def forward(self, x: torch.Tensor) -> torch.Tensor:
        """
        Forward pass through the MLP.

        Args:
            x (torch.Tensor): Input tensor of shape (batch_size, input_lags).

        Returns:
            Output tensor after passing through the MLP.
        """
        x = self.relu(self.fc1(x))
        return self.fc2(x)

narx_model = NARXNet(input_lags, output_lags)
criterion = nn.MSELoss()
optimizer = optim.Adam(narx_model.parameters(), lr=0.01)

# Training loop
losses = []
for epoch in range(50):
    narx_model.train()
    optimizer.zero_grad()
    output = narx_model(X_train_t)
    loss = criterion(output, y_train_t)
    loss.backward()
    optimizer.step()
    losses.append(loss.item())
    if (epoch+1) % 5 == 0:
        print(f"Epoch {epoch+1}, Loss: {loss.item():.4f}")

plt.plot(losses)
plt.xlabel('Epoch')
plt.ylabel('MSE Loss')
plt.title('NARX Training Loss')
plt.show()

In [None]:
# Multi-step forecasting on test set
narx_model.eval()
with torch.no_grad():
    pred = narx_model(X_test_t).numpy()
    true = y_test_t.numpy()
    pred_inv = scaler_temp.inverse_transform(pred)
    true_inv = scaler_temp.inverse_transform(true)

plt.plot(true_inv.flatten(), label='True')
plt.plot(pred_inv.flatten(), label='Predicted')
plt.title('NARX: Multi-step Temperature Forecast')
plt.xlabel('Time Step')
plt.ylabel('Temperature')
plt.legend()
plt.show()

<a id="rl"></a>
# Reinforcement Learning

This section demonstrates reinforcement learning algorithms using stable-baselines3 and OpenAI Gym environments.

<a id="dqn"></a>
## Deep Q-Networks (DQN)

We will use stable-baselines3 to train a DQN agent on the CartPole-v1 environment.

**Reference:** [DQN Documentation](https://stable-baselines3.readthedocs.io/en/master/modules/dqn.html)

In [None]:
!pip install 'shimmy>=2.0'
!pip install --upgrade gymnasium stable-baselines3 shimmy
!pip install numpy==1.23.5 --force-reinstall

# Train DQN agent on CartPole-v1
import gym
import numpy as np
import matplotlib.pyplot as plt
from stable_baselines3 import DQN

# Create environment and train agent
env = gym.make('CartPole-v1')
dqn_model = DQN('MlpPolicy', env, verbose=1)
dqn_model.learn(total_timesteps=10000)

# Evaluate agent and collect rewards for plotting
episode_rewards = []
for _ in range(20):
    obs = env.reset()
    total_reward = 0
    done = False
    while not done:
        action, _ = dqn_model.predict(obs, deterministic=True)
        obs, reward, done, info = env.step(action)
        total_reward += reward
    episode_rewards.append(total_reward)

# Plot episode rewards
plt.plot(range(1, len(episode_rewards) + 1), episode_rewards, marker='o')
plt.xlabel('Episode')
plt.ylabel('Total Reward')
plt.title('DQN: CartPole-v1 Performance')
plt.grid(True)
plt.show()


In [None]:
!pip freeze


<a id="pg"></a>
## Policy Gradient Methods (PPO)

We will use stable-baselines3 to train a PPO agent on the LunarLander-v2 environment.

**Reference:** [PPO Documentation](https://stable-baselines3.readthedocs.io/en/master/modules/ppo.html)

In [None]:
# Train PPO agent on LunarLander-v2
from stable_baselines3 import PPO

env = gym.make('LunarLander-v2')
ppo_model = PPO('MlpPolicy', env, verbose=1)
ppo_model.learn(total_timesteps=20000)

# Evaluate agent
obs = env.reset()
rewards = []
for _ in range(10):
    obs = env.reset()
    total_reward = 0
    done = False
    while not done:
        action, _ = ppo_model.predict(obs, deterministic=True)
        obs, reward, done, info = env.step(action)
        total_reward += reward
    rewards.append(total_reward)
print(f"Average reward over 10 episodes: {np.mean(rewards):.2f}")

<a id="ac"></a>
## Actor-Critic Methods (A2C)

We will use stable-baselines3 to train an A2C agent on the MountainCarContinuous-v0 environment.

**Reference:** [A2C Documentation](https://stable-baselines3.readthedocs.io/en/master/modules/a2c.html)

In [None]:
# Train A2C agent on MountainCarContinuous-v0
from stable_baselines3 import A2C

env = gym.make('MountainCarContinuous-v0')
a2c_model = A2C('MlpPolicy', env, verbose=1)
a2c_model.learn(total_timesteps=20000)

# Evaluate agent and plot rewards
episode_rewards = []
for _ in range(10):
    obs = env.reset()
    total_reward = 0
    done = False
    while not done:
        action, _ = a2c_model.predict(obs, deterministic=True)
        obs, reward, done, info = env.step(action)
        total_reward += reward
    episode_rewards.append(total_reward)

plt.bar(range(1, 11), episode_rewards)
plt.xlabel('Episode')
plt.ylabel('Total Reward')
plt.title('A2C: MountainCarContinuous-v0 Performance')
plt.show()
print(f"Average reward over 10 episodes: {np.mean(episode_rewards):.2f}")