# PatchTST Model Development

In this section we will develop the PatchTST model to predict S&P Close, Dow Jones Close.

## Model Congfiguration 

We will configure the PatchTST model based on the `Economic_Data_1994_2025` dataset we processed

In [1]:
from transformers import PatchTSTConfig, PatchTSTForPrediction, PatchTSTForPretraining
from torch.utils.data import TensorDataset, DataLoader
import pandas as pd
import numpy as np
from tqdm import tqdm
import torch

  from .autonotebook import tqdm as notebook_tqdm


In [2]:
# For faster development
device = torch.device('mps')
device

device(type='mps')

In [3]:
dataset = pd.read_csv('../data/Economic_Data_1994-2025.csv')
dataset = dataset.drop(['DATE', 'Unnamed: 0'], axis=1)

### Understanding PatchTST

- **Context Length**

    Context length is how far we look back in total. If we were trying to predict the closing price for the SP500 tomorrow, our context length would be how far we look back to make our prediction.

- **Patch Length**

    Patch length is like a subset of our context length. When looking at our entire context length, patch length is the looking at each individual week up until tomorrow to make our final prediction

- **Patch Stride**

    Patch stride is how far our patch length will move after observing an individual week. We can overlap weeks to see any comparisons.
    
For each batch we will pass N amount of rows. Each row has previous rows (context length) attached to it. For each row & it's context length we pass it into our model to train on. During the training process we will used. masked forecasting. This will mask the last portion of our patch's for the model to predict. It then check's it's guesses and updates its weights accordingly.

## PreTraining Model

The pre-training model will learn ***every*** column in our dataset from all dates. This will help the model develop relationships between all variables. 

In [4]:
# How many features we are including 
NUM_INPUT = len(dataset.columns)

# Batch size for training
BATCH_SIZE = 16

# For What we are predicting
NUM_TARGET = 2

# How many steps we take in the context length
CONTEXT_LEN = 190

# How many steps we take in the context length
PATCH_LEN = 10

# How far we move our patch length
PATCH_STRD = 8

# How our model is trained
MASK_TYPE = 'forecast'

# How many patches are masked for prediction during training

NUM_ATT_HEADS = 8

# How many days to predict into the future
PRED_LEN = int(365 / 4)

# Configuring Model
pretrain_config = PatchTSTConfig(
    num_input_channels = NUM_INPUT,
    context_length = CONTEXT_LEN,
    patch_length = PATCH_LEN,
    stride = PATCH_STRD,
    mask_type='forecast',
    num_forecast_mask_patches = [int(BATCH_SIZE * .2)], # 20% of our batch
    
)

pretrain_model = PatchTSTForPretraining(pretrain_config)

Here we are splitting up our data into 2 portions. A `test` set and a `train` set. The test set is used to evalute our model based on training from the train set

We split it 80/10/10, where 80% of our data is training data, and 10% of our data is testing, and 10% is validation for each epoch.

In [5]:
# Set up constraints for development
num_train = int(len(dataset) * .8)
num_test = int(len(dataset) * .1)
num_val = int(len(dataset) * .1)

# Breaking up the data into train/test sets.
train = dataset[0: num_train]
test = dataset[num_train:num_test + num_train]
val = dataset[num_test+num_train:(num_test+num_train) + num_val]

This portion here grabs context windows for all rows in our train/test sets

In [6]:
# Creates a context window for each data point to feed into the model during training
def create_sequence_windows(data, window_size):
    windows = []
    
    # We start in the dataFrame at an index 'window_size' and look back depending on the window size
    # We will grab a context window for all data points
    for i in range(len(data) - window_size + 1):
        windows.append(data.iloc[i:i+window_size].values)
    return np.array(windows)

train_windows = create_sequence_windows(train, CONTEXT_LEN)
test_windows = create_sequence_windows(test, CONTEXT_LEN)
val_windows = create_sequence_windows(val, CONTEXT_LEN)

In [7]:
# Puts our data in PyTorch tensors for proper data types during training
past_values_train = torch.tensor(train_windows, dtype=torch.float32)
past_values_test = torch.tensor(test_windows, dtype=torch.float32)
past_values_val = torch.tensor(val_windows, dtype=torch.float32)

In [8]:
# Puts the tensors in a dataset for the dataloader to properly use
data_train = TensorDataset(past_values_train)

# Divides our data into batches based on the BATCH_SIZE
dataloader_train = DataLoader(data_train, batch_size=BATCH_SIZE, shuffle=True)

data_test = TensorDataset(past_values_test)
dataloader_test = DataLoader(data_test, batch_size=BATCH_SIZE)

data_val = TensorDataset(past_values_val)
dataloader_val = DataLoader(data_val, batch_size=BATCH_SIZE)

In [9]:
pretrain_model = pretrain_model.to(device)

In [10]:
optimizer = torch.optim.Adam(pretrain_model.parameters(), lr=.001)
epochs = 10

# Puts model in train mode
pretrain_model.train()

for epoch in range(epochs):

    # Allows for progress bar during training per epoch
    loop = tqdm(dataloader_train, leave=True)
    losses = []
    
    for batch in loop:
        
        # Clears any previous gradient calculations
        optimizer.zero_grad()
        
        # Transfers batch onto GPU for faster processing
        past_values = batch[0].to(device)
        
        # Foward pass through our model, generates predictions
        outputs = pretrain_model(past_values=past_values)
        
        # Get's the loss for our predictions (how far off our predictions were)
        loss = outputs.loss
        
        # Calculates which weights contributed to the error of our prediction
        loss.backward()
        
        # Updates the optimizer based on the calculations made from loss.backward()
        optimizer.step()
        
        loop.set_description(f'Epoch {epoch}')
        loop.set_postfix(loss=loss.item())
        losses.append(loss.item())
    
    print("Mean Training Loss", np.mean(losses))
        
    pretrain_model.eval()
    losses = []

    loop = tqdm(dataloader_val, leave=True)
    
    for batch in loop:
        
        past_values = batch[0].to(device)
        
        outputs = pretrain_model(past_values=past_values)
        
        loss = outputs.loss
        
        loop.set_description(f'Epoch {epoch}')
        loop.set_postfix(loss=loss.item())
        losses.append(loss.item())
        
    print(f"Mean Training Loss for validation set on EPOCH {epoch} : {np.mean(losses)}")

pretrain_model.eval()
losses = []

loop = tqdm(dataloader_test, leave=True)

for batch in loop:
    
    past_values = batch[0].to(device)
    
    outputs = pretrain_model(past_values=past_values)
    
    loss = outputs.loss
    
    loop.set_description(f'Test')
    loop.set_postfix(loss=loss.item())
    losses.append(loss.item())
    
print(f"Mean Training Loss for test set : {np.mean(losses)}")

Epoch 0: 100%|████████████████████| 380/380 [03:04<00:00,  2.06it/s, loss=0.217]


Mean Training Loss 0.18786994352152472


Epoch 0: 100%|█████████████████████| 38/38 [00:06<00:00,  5.64it/s, loss=0.0928]


Mean Training Loss for validation set on EPOCH 0 : 0.11471409291813248


Epoch 1: 100%|███████████████████| 380/380 [03:01<00:00,  2.10it/s, loss=0.0823]


Mean Training Loss 0.39369783721079954


Epoch 1: 100%|█████████████████████| 38/38 [00:05<00:00,  6.61it/s, loss=0.0785]


Mean Training Loss for validation set on EPOCH 1 : 0.1124115552949278


Epoch 2: 100%|███████████████████| 380/380 [02:59<00:00,  2.12it/s, loss=0.0556]


Mean Training Loss 0.10369532685726882


Epoch 2: 100%|█████████████████████| 38/38 [00:05<00:00,  6.46it/s, loss=0.0726]


Mean Training Loss for validation set on EPOCH 2 : 0.09929597515024637


Epoch 3: 100%|███████████████████| 380/380 [03:00<00:00,  2.10it/s, loss=0.0584]


Mean Training Loss 0.08981309481161205


Epoch 3: 100%|█████████████████████| 38/38 [00:05<00:00,  6.52it/s, loss=0.0698]


Mean Training Loss for validation set on EPOCH 3 : 0.08511232109250207


Epoch 4: 100%|███████████████████| 380/380 [03:01<00:00,  2.09it/s, loss=0.0604]


Mean Training Loss 0.07742584626141348


Epoch 4: 100%|█████████████████████| 38/38 [00:05<00:00,  6.50it/s, loss=0.0595]


Mean Training Loss for validation set on EPOCH 4 : 0.07640274035695352


Epoch 5: 100%|███████████████████| 380/380 [02:59<00:00,  2.11it/s, loss=0.0434]


Mean Training Loss 0.06810858739834083


Epoch 5: 100%|█████████████████████| 38/38 [00:05<00:00,  6.69it/s, loss=0.0568]


Mean Training Loss for validation set on EPOCH 5 : 0.06970560236981041


Epoch 6: 100%|███████████████████| 380/380 [03:00<00:00,  2.11it/s, loss=0.0419]


Mean Training Loss 0.058398038854724485


Epoch 6: 100%|█████████████████████| 38/38 [00:05<00:00,  6.55it/s, loss=0.0349]


Mean Training Loss for validation set on EPOCH 6 : 0.05085165386921481


Epoch 7: 100%|███████████████████| 380/380 [02:58<00:00,  2.13it/s, loss=0.0224]


Mean Training Loss 0.04695020574880274


Epoch 7: 100%|█████████████████████| 38/38 [00:05<00:00,  6.65it/s, loss=0.0292]


Mean Training Loss for validation set on EPOCH 7 : 0.035477304169417995


Epoch 8: 100%|████████████████████| 380/380 [02:59<00:00,  2.12it/s, loss=0.015]


Mean Training Loss 0.034364779609696645


Epoch 8: 100%|██████████████████████| 38/38 [00:05<00:00,  6.48it/s, loss=0.012]


Mean Training Loss for validation set on EPOCH 8 : 0.026010966737215455


Epoch 9: 100%|██████████████████| 380/380 [03:01<00:00,  2.09it/s, loss=0.00612]


Mean Training Loss 0.021633091327538222


Epoch 9: 100%|████████████████████| 38/38 [00:05<00:00,  6.64it/s, loss=0.00979]


Mean Training Loss for validation set on EPOCH 9 : 0.020318992595237336


Test: 100%|████████████████████████| 38/38 [00:05<00:00,  6.66it/s, loss=0.0113]

Mean Training Loss for test set : 0.017642615479417145





In [12]:
torch.save(pretrain_model.state_dict(), 'pretrain_model_v1.bin')

## Fine-Tuning

In [64]:
# How many features we are including 
NUM_INPUT = len(dataset.columns)

# Batch size for training
BATCH_SIZE = 16

# For What we are predicting
NUM_TARGET = 2

# How many steps we take in the context length
CONTEXT_LEN = 190

# How many steps we take in the context length
PATCH_LEN = 10

# How far we move our patch length
PATCH_STRD = 8

# How our model is trained
MASK_TYPE = 'forecast'

# How many patches are masked for prediction during training

NUM_ATT_HEADS = 8

# How many days to predict into the future
PRED_LEN = 30 # One Month Prediction

ft_config = PatchTSTConfig(
    num_input_channels = NUM_INPUT,
    num_targets = NUM_INPUT,
    context_length = CONTEXT_LEN,
    patch_length = PATCH_LEN,
    stride = PATCH_STRD,
    prediction_length=PRED_LEN
)

ft_model = PatchTSTForPrediction(ft_config)

# First, load your saved pretrained model
pretrained_weights = torch.load('pretrain_model_v1.bin')

# Copy weights from the encoder part of the pretrained model
# This will transfer only the compatible weights
prediction_model_dict = ft_model.state_dict()
for name, param in pretrained_weights.items():
    if 'encoder' in name:
        # The encoder part is usually named like 'encoder.xxx' in both models
        if name in prediction_model_dict:
            prediction_model_dict[name] = param

ft_model.load_state_dict(prediction_model_dict)

<All keys matched successfully>

In [65]:
# Set up constraints for development
num_train = int(len(dataset) * .7)
num_test = int(len(dataset) * .2)
num_val = int(len(dataset) * .1)

train_targ = dataset[0: num_train]
train_feat = dataset[0: num_train]

test_targ = dataset[num_train:num_test + num_train]
test_feat = dataset[num_train:num_test + num_train]

val_targ = dataset[num_test+num_train:(num_test+num_train) + num_val]
val_feat = dataset[num_test+num_train:(num_test+num_train) + num_val]

**Getting Target/Input Features**

This part is a little odd. 

- **Input Features**

    To get the target features all we need to do is construct a window that looks at the past N amount of days for each data point.
    We include the features we want to predict which makes it **Self Supervised**. 

- **Output Features**

    What we are doing is getting the actual targets we want to predict and making a future window for just the 2 features. 
    In this case we are looking 90 days into the future, or what the model is predicting, and grabbing those values. This is used 
    for the model to evalute it's prediction

In [66]:
# Creates a context window for each data point to feed into the model during training
def create_sequence_windows(data, window_size):
    windows = []
    
    # We start in the dataFrame at an index 'window_size' and look back depending on the window size
    # We will grab a context window for all data points
    for i in range(len(data) - window_size + 1):
        windows.append(data.iloc[i:i+window_size].values)
    return np.array(windows)

train_feat_windows = create_sequence_windows(train_feat, CONTEXT_LEN)
test_feat_windows = create_sequence_windows(test_feat, CONTEXT_LEN)
val_feat_windows = create_sequence_windows(val_feat, CONTEXT_LEN)

# Remove values at the end that don't have enough future data
train_feat_windows = train_feat_windows[0:len(train_feat_windows) - PRED_LEN]
test_feat_windows = test_feat_windows[0:len(test_feat_windows) - PRED_LEN]
val_feat_windows = val_feat_windows[0:len(val_feat_windows) - PRED_LEN]

In [67]:
# Gets indices for the target variables, starting from where we first start predicting with a full context length
# to the last index that will allow for a full prediction
train_targ_indices = range(CONTEXT_LEN, len(train_feat) - PRED_LEN + 1)
test_targ_indices = range(CONTEXT_LEN, len(test_feat) - PRED_LEN + 1)
val_targ_indices = range(CONTEXT_LEN, len(val_feat) - PRED_LEN + 1)

In [68]:
train_targ_windows = [train_targ.iloc[i:i+PRED_LEN].values for i in train_targ_indices]
test_targ_windows = [test_targ.iloc[i:i+PRED_LEN].values for i in test_targ_indices]
val_targ_windows = [val_targ.iloc[i:i+PRED_LEN].values for i in val_targ_indices]

train_targ_windows = np.array(train_targ_windows)
test_targ_windows = np.array(test_targ_windows )
val_targ_windows = np.array(val_targ_windows)

In [69]:

past_train = torch.tensor(train_feat_windows, dtype=torch.float32)
past_test = torch.tensor(test_feat_windows, dtype=torch.float32)
past_val = torch.tensor(val_feat_windows, dtype=torch.float32)

future_train = torch.tensor(train_targ_windows, dtype=torch.float32)
future_test = torch.tensor(test_targ_windows, dtype=torch.float32)
future_val = torch.tensor(val_targ_windows, dtype=torch.float32)
past_test.shape, future_test.shape

(torch.Size([1345, 190, 37]), torch.Size([1345, 30, 37]))

In [70]:
train_data = TensorDataset(past_train, future_train)
test_data = TensorDataset(past_test, future_test)
val_data = TensorDataset(past_val, future_val)

dataloader_train = DataLoader(train_data, batch_size=32, shuffle=True)
dataloader_test = DataLoader(test_data, batch_size=32, shuffle=True)
dataloader_val = DataLoader(val_data, batch_size=32, shuffle=True)

In [71]:
device = torch.device('mps')
ft_model = ft_model.to(device)

In [72]:
optimizer = torch.optim.Adam(ft_model.parameters(), lr=.00001)
epochs = 10

# Puts model in train mode
ft_model.train()

for epoch in range(epochs):

    # Allows for progress bar during training per epoch
    loop = tqdm(dataloader_train, leave=True)
    losses = []
    
    for past_values, future_values in loop:
        
        # Clears any previous gradient calculations
        optimizer.zero_grad()
        
        # Transfers batch onto GPU for faster processing
        past_values = past_values.to(device)
        future_values = future_values.to(device)
        
        # Foward pass through our model, generates predictions
        outputs = ft_model(past_values=past_values, future_values=future_values)
        
        # Get's the loss for our predictions (how far off our predictions were)
        loss = outputs.loss
        
        # Calculates which weights contributed to the error of our prediction
        loss.backward()
        
        # Updates the optimizer based on the calculations made from loss.backward()
        optimizer.step()
        
        loop.set_description(f'Epoch {epoch}')
        loop.set_postfix(loss=loss.item())
        losses.append(loss.item())
    
    print("Mean Training Loss", np.mean(losses))
        
    pretrain_model.eval()
    losses = []

    loop = tqdm(dataloader_val, leave=True)
    
    for past_values, future_values in loop:
        
        past_values = past_values.to(device)
        future_values = future_values.to(device)
        
        # Foward pass through our model, generates predictions
        outputs = ft_model(past_values=past_values, future_values=future_values)
        
        loss = outputs.loss
        
        loop.set_description(f'Epoch {epoch}')
        loop.set_postfix(loss=loss.item())
        losses.append(loss.item())
        
    print(f"Mean Training Loss for validation set on EPOCH {epoch} : {np.mean(losses)}")

pretrain_model.eval()
losses = []

loop = tqdm(dataloader_test, leave=True)

for past_values, future_values in loop:
    
    past_values = past_values.to(device)
    future_values = future_values.to(device)
    
    # Foward pass through our model, generates predictions
    outputs = ft_model(past_values=past_values, future_values=future_values)
    
    loss = outputs.loss
    
    loop.set_description(f'Test')
    loop.set_postfix(loss=loss.item())
    losses.append(loss.item())
    
print(f"Mean Training Loss for test set : {np.mean(losses)}")


  0%|                                                   | 0/165 [00:23<?, ?it/s]


KeyboardInterrupt: 

In [None]:
torch.save(model.state_dict(), 'Economic_Model_V1.bin')

## Evaluate

In [None]:
model_eval = PatchTSTForPrediction(config=config)

model_eval.load_state_dict(torch.load('Economic_Model_V1.bin'))
model_eval.eval()


In [None]:
features = dataset.iloc[-CONTEXT_LEN:].values  # Last CONTEXT_LEN days
x = torch.FloatTensor(features).unsqueeze(0)

In [None]:
with torch.no_grad():
    predictions = model_eval(x)

predictions = predictions.prediction_outputs

In [None]:
predictions = predictions.squeeze(0).numpy()

In [None]:
pred_np =predictions

In [None]:
import pandas as pd
from datetime import datetime, timedelta

df = pd.read_csv('../data/Economic_Data_1994-2025.csv')

# Get the last date in your dataset
last_date = pd.to_datetime(df['DATE'])
last_date = last_date[7822]

# Create date range for predictions
future_dates = pd.date_range(
    start=last_date + timedelta(days=1),
    periods=PRED_LEN,
    freq='D'
)



In [None]:
df_copy = df.drop(['DATE', 'Unnamed: 0'], axis=1)

pred_df = pd.DataFrame(pred_np, columns=df_copy.columns)

pred_df['DATE'] = future_dates

result_df = pd.concat([df, pred_df], ignore_index=True)

In [None]:
result_df['DATE'] = pd.to_datetime(result_df['DATE'])

result_df.to_csv('data_check.csv')

In [None]:
import matplotlib.pyplot as plt

def plot_time_frame_x2(year_start, year_end, df, x1, x2):
    df_tf = df[(df['DATE'].dt.year >= year_start) & (df['DATE'].dt.year <= year_end)]

    plt.plot(df_tf['DATE'], df_tf[x1], label=x1)
    plt.plot(df_tf['DATE'], df_tf[x2], label=x2)
    plt.legend()
    plt.show()

In [None]:
plot_time_frame_x2(2025, 2025, result_df, )