# TimeLLM

The paper introduces **TIME-LLM**, a framework that reprograms large language models (LLMs) for time series forecasting without modifying their pre-trained backbones. It transforms time series data into text-like formats using "text prototypes" and leverages a novel **Prompt-as-Prefix (PaP)** technique to guide LLMs. TIME-LLM achieves superior performance over specialized forecasting models in both few-shot and zero-shot scenarios. This approach bridges the gap between time series and language data, showcasing LLMs' potential to generalize across diverse tasks. Comprehensive evaluations demonstrate significant accuracy and efficiency improvements. The framework points towards using multimodal foundation models for time series forecasting.


#### Problem Formulation

Given a sequence of historical observations $X \in R^{N \times T}$, where $T$ is time steps and $N$ is 1 dimensional variables.

The framework aims to reprogram a LLM $f(.)$ To accurately forecast the $H$ future time steps denoted by $\hat{Y} \in R^{N \times H}$ by minimizing Objective function

$min(\frac{1}{H} \left( \sum_{h=1}^{H} \left\| \hat{Y}_h - Y_h \right\|^2_F \right))$

#### Methodology
Initially, a multivariate time series is partitioned into $N$ univariate time series, which are subsequently processed independently (Nie et al., 2023). The i-th series is denoted as $X^{(i)} \in R^{1 \times T}$ , which undergoes normalization, patching, and embedding prior to being reprogrammed with learned text prototypes to align the source and target modalities. Then, we augment the LLM’s time series reasoning ability by prompting it together with reprogrammed patches to generate output representations, which are projected to the final forecasts $\hat{Y}^{(i)} \in R^{1 \times H}$. We note that only the parameters of the lightweight input transformation and output projection are updated, while the backbone language model is frozen. In contrast to vision-language and other multimodal language models, which usually fine-tune with paired cross-modality data, TIME-LLM is directly optimized and becomes readily available with only a small set of time series and a few training epochs, maintaining high efficiency and imposing fewer resource constraints compared to building large domain-specific models from scratch or fine-tuning them. To further reduce memory footprints, various off-the-shelf techniques (e.g., quantization) can be seamlessly integrated for slimming TIME-LLM. 

In [1]:
import pandas as pd
df_raw = pd.read_csv("data/ETTh1.csv")
df_raw = df_raw.T
print(df_raw.head())
print(f"shape: {df_raw.shape}")

               0               1               2               3      \
date  2016/7/1 00:00  2016/7/1 01:00  2016/7/1 02:00  2016/7/1 03:00   
HUFL           5.827           5.693           5.157            5.09   
HULL           2.009           2.076           1.741           1.942   
MUFL           1.599           1.492           1.279           1.279   
MULL           0.462           0.426           0.355           0.391   

               4               5               6               7      \
date  2016/7/1 04:00  2016/7/1 05:00  2016/7/1 06:00  2016/7/1 07:00   
HUFL           5.358           5.626           7.167           7.435   
HULL           1.942           2.143           2.947           3.282   
MUFL           1.492           1.528           2.132            2.31   
MULL           0.462           0.533           0.782           1.031   

               8               9      ...            17410            17411  \
date  2016/7/1 08:00  2016/7/1 09:00  ...  2018/6/26 10

Let's look at how these data is created into patches as part of processing.

In [85]:
from dataloader import Dataset_ETT_hour
from torch.utils.data import DataLoader
from tqdm import tqdm
import warnings

warnings.filterwarnings('ignore')

dataset = Dataset_ETT_hour()
data_loader = DataLoader(
        dataset,
        batch_size=24,
        shuffle=False,
        num_workers=1,
        drop_last=True
    )
for i, (batch_x, batch_y, batch_x_mark, batch_y_mark) in tqdm(enumerate(data_loader)):
        print(f"batch {i + 1}:")
        print("\nbatch x: This is the input sequence for the model, containing the historical time series data points that the model will use to predict future values.")
        print("batch x:", batch_x.shape)
        print("\nbatch y: This is the target sequence for the model, containing the true future values that the model will try to predict.")
        print("batch y:", batch_y.shape)
        print("\nbatch x': This is the time encoding for the input sequence, which provides additional information about the time of day, day of the week, etc.")
        print("batch x':", batch_x_mark.shape)
        print("\nbatch y': This is the time encoding for the target sequence, which provides additional information about the time of day, day of the week, etc.")
        print("batch y':", batch_y_mark.shape)
        print("\n")

        if i == 2:
            break
    

2it [00:00,  5.89it/s]

batch 1:

batch x: This is the input sequence for the model, containing the historical time series data points that the model will use to predict future values.
batch x: torch.Size([24, 384, 1])

batch y: This is the target sequence for the model, containing the true future values that the model will try to predict.
batch y: torch.Size([24, 192, 1])

batch x': This is the time encoding for the input sequence, which provides additional information about the time of day, day of the week, etc.
batch x': torch.Size([24, 384, 4])

batch y': This is the time encoding for the target sequence, which provides additional information about the time of day, day of the week, etc.
batch y': torch.Size([24, 192, 4])


batch 2:

batch x: This is the input sequence for the model, containing the historical time series data points that the model will use to predict future values.
batch x: torch.Size([24, 384, 1])

batch y: This is the target sequence for the model, containing the true future values that 




Let's build the model step by step.

In [86]:
import torch
import torch.nn as nn
from layers.StandardNorm import Normalize

class Model(nn.Module):

    def __init__(self, configs, patch_len=16, stride=8):
        super(Model, self).__init__()
        self.task_name = "long_term_forecast"
        self.pred_len = 96
        self.seq_len = 512
        self.d_ff = 128
        self.top_k = 5
        self.d_llm = 4096
        self.patch_len = 16
        self.stride = 8
        
        # taken as it is from the paper
        # configs.enc_in = 7
        self.normalize_layers = Normalize(7, affine=False)
    
    def forward(self, x_enc, x_mark_enc, x_dec, x_mark_dec, mask=None):
        dec_out = self.forecast(x_enc, x_mark_enc, x_dec, x_mark_dec)
        return dec_out
        #return dec_out[:, -self.pred_len:, :]
    
    def forecast(self, x_enc, x_mark_enc, x_dec, x_mark_dec):
            x_enc = self.normalize_layers(x_enc, 'norm')
            return x_enc

In [87]:
dataset = Dataset_ETT_hour()
data_loader = DataLoader(
        dataset,
        batch_size=24,
        shuffle=False,
        num_workers=1,
        drop_last=True
    )
for i, (batch_x, batch_y, batch_x_mark, batch_y_mark) in tqdm(enumerate(data_loader)):
        print(f"batch {i}")
        model = Model(configs=None)
        print(torch.reshape(batch_x[0], (1, 384)))
        print(batch_x.shape)
        output = model(batch_x, batch_x_mark, batch_y, batch_y_mark)
        print(torch.reshape(output[0], (1, 384)))
        print(output.shape)
        if i == 0:
            break

0it [00:00, ?it/s]

batch 0
tensor([[30.5310, 27.7870, 27.7870, 25.0440, 21.9480, 21.1740, 22.7920, 23.1440,
         21.6670, 17.4460, 19.9790, 20.1190, 19.2050, 18.5720, 19.5560, 17.3050,
         19.4860, 19.1340, 20.6820, 18.7120, 17.8680, 18.0090, 18.0090, 19.7680,
         21.1040, 19.6970, 20.0490, 20.7520, 21.3850, 22.2300, 20.2600, 21.1040,
         20.6120, 18.3610, 20.9630, 19.4160, 20.8230, 20.1900, 21.3150, 22.0190,
         20.6820, 25.4660, 25.8880, 27.8570, 27.2950, 22.2300, 21.9480, 27.2950,
         29.3350, 26.0280, 24.3400, 26.4500, 25.9580, 24.0590, 25.3250, 23.6370,
         26.3800, 27.3650, 28.0680, 29.4750, 26.8020, 29.9680, 30.3900, 31.1640,
         29.7570, 32.2890, 31.9380, 28.5610, 21.5260, 22.2300, 19.4160, 18.5720,
         21.6670, 25.5360, 27.8570, 27.9280, 24.6210, 23.8480, 23.0740, 22.5110,
         21.6670, 25.3950, 25.1840, 29.5460, 29.4750, 29.2640, 30.9530, 31.7260,
         33.1330, 28.9830, 28.9830, 31.7260, 25.1840, 30.5310, 27.6460, 25.4660,
         25.9580, 25




Prompt generation

In [88]:
def generate_prompt(x_enc, description, pred_len, seq_len):
    def calcute_lags(_x_enc):
        q_fft = torch.fft.rfft(x_enc.permute(0, 2, 1).contiguous(), dim=-1)
        k_fft = torch.fft.rfft(x_enc.permute(0, 2, 1).contiguous(), dim=-1)
        res = q_fft * torch.conj(k_fft)
        corr = torch.fft.irfft(res, dim=-1)
        mean_value = torch.mean(corr, dim=1)
        _, lags = torch.topk(mean_value, 5, dim=-1)
        return lags
    
    B, T, N = x_enc.size()
    x_enc = x_enc.permute(0, 2, 1).contiguous().reshape(B * N, T, 1)
    min_values = torch.min(x_enc, dim=1)[0]
    max_values = torch.max(x_enc, dim=1)[0]
    medians = torch.median(x_enc, dim=1).values
    lags = calcute_lags(x_enc)
    trends = x_enc.diff(dim=1).sum(dim=1)
    
    prompt = []
    for b in range(x_enc.shape[0]):
        min_values_str = str(min_values[b].tolist()[0])
        max_values_str = str(max_values[b].tolist()[0])
        median_values_str = str(medians[b].tolist()[0])
        lags_values_str = str(lags[b].tolist())
        prompt_ = (
            f"<|start_prompt|>Dataset description: {description}"
            f"Task description: forecast the next {str(pred_len)} steps given the previous {str(seq_len)} steps information; "
            "Input statistics: "
            f"min value {min_values_str}, "
            f"max value {max_values_str}, "
            f"median value {median_values_str}, "
            f"the trend of input is {'upward' if trends[b] > 0 else 'downward'}, "
            f"top 5 lags are : {lags_values_str}<|<end_prompt>|>"
        )
        prompt.append(prompt_)
    return prompt

def tokenize_prompt_and_get_prompt_embeddings(prompt, tokenizer, llm_model, device):
    prompt = tokenizer(prompt, return_tensors="pt", padding=True, truncation=True, max_length=2048).input_ids
    prompt_embeddings = llm_model.get_input_embeddings()(prompt.to(device))  # (batch, prompt_token, dim)

we can now update our model with the prompt generation and tokenization

In [89]:
import torch
import torch.nn as nn
from llama import Llama
from layers.StandardNorm import Normalize
from Embeddings.patch_embedding import PatchEmbedding

def generate_prompt(x_enc, description, pred_len, seq_len):
    def calcute_lags(_x_enc):
        q_fft = torch.fft.rfft(x_enc.permute(0, 2, 1).contiguous(), dim=-1)
        k_fft = torch.fft.rfft(x_enc.permute(0, 2, 1).contiguous(), dim=-1)
        res = q_fft * torch.conj(k_fft)
        corr = torch.fft.irfft(res, dim=-1)
        mean_value = torch.mean(corr, dim=1)
        _, lags = torch.topk(mean_value, 5, dim=-1)
        return lags
    
    B, T, N = x_enc.size()
    x_enc = x_enc.permute(0, 2, 1).contiguous().reshape(B * N, T, 1)
    min_values = torch.min(x_enc, dim=1)[0]
    max_values = torch.max(x_enc, dim=1)[0]
    medians = torch.median(x_enc, dim=1).values
    lags = calcute_lags(x_enc)
    trends = x_enc.diff(dim=1).sum(dim=1)
    
    prompt = []
    for b in range(x_enc.shape[0]):
        min_values_str = str(min_values[b].tolist()[0])
        max_values_str = str(max_values[b].tolist()[0])
        median_values_str = str(medians[b].tolist()[0])
        lags_values_str = str(lags[b].tolist())
        prompt_ = (
            f"<|start_prompt|>Dataset description: {description}"
            f"Task description: forecast the next {str(pred_len)} steps given the previous {str(seq_len)} steps information; "
            "Input statistics: "
            f"min value {min_values_str}, "
            f"max value {max_values_str}, "
            f"median value {median_values_str}, "
            f"the trend of input is {'upward' if trends[b] > 0 else 'downward'}, "
            f"top 5 lags are : {lags_values_str}<|<end_prompt>|>"
        )
        prompt.append(prompt_)
    return prompt

def tokenize_prompt_and_get_prompt_embeddings(prompt, tokenizer, llm_model, device):
    prompt = tokenizer(prompt, return_tensors="pt", padding=True, truncation=True, max_length=2048).input_ids
    prompt_embeddings = llm_model.get_input_embeddings()(prompt.to(device))  # (batch, prompt_token, dim)
    return prompt_embeddings

class MappingLayer(nn.Module):
    def __init__(self, word_embeddings, *args, **kwargs):
        super().__init__(*args, **kwargs)
        self.vocab_size = word_embeddings.shape[0]
        self.num_tokens = 1000
        self.mapping_layer = nn.Linear(self.vocab_size, self.num_tokens)
    def forward(self, word_embeddings):
        source_embeddings = self.mapping_layer(word_embeddings)
        return source_embeddings
        

class TlModel(nn.Module):
    def __init__(self, configs, patch_len=16, stride=8):
        super(TlModel, self).__init__()
        self.task_name = "long_term_forecast"
        self.pred_len = 96
        self.seq_len = 512
        self.d_ff = 128
        self.top_k = 5
        self.d_llm = 4096
        self.patch_len = 16
        self.stride = 8
        
        self.llama = Llama()
        
        # taken as it is from the paper
        # configs.enc_in = 7
        self.normalize_layers = Normalize(7, affine=False)
        self.mapping_layer = MappingLayer(self.llama.get_model().get_input_embeddings().weight)
        
        self.word_embeddings = self.llama.get_model().get_input_embeddings().weight

        self.patch_embedding = PatchEmbedding(32, self.patch_len, self.stride, 0.1)
        
        self.description = "The Electricity Transformer Temperature (ETT) is a crucial indicator in the electric power long-term deployment."
    
    def forward(self, x_enc, x_mark_enc, x_dec, x_mark_dec, mask=None):
        dec_out = self.forecast(x_enc, x_mark_enc, x_dec, x_mark_dec)
        return dec_out
        #return dec_out[:, -self.pred_len:, :]
    
    def forecast(self, x_enc, x_mark_enc, x_dec, x_mark_dec):
            x_enc = self.normalize_layers(x_enc, 'norm')
            
            # Generate prompt and get embeddings
            prompt = generate_prompt(x_enc, self.description, self.pred_len, self.seq_len)
            tokenizer= self.llama.get_tokenizer()
            tokenizer.pad_token = tokenizer.eos_token
            
            prompt_embeddings = tokenize_prompt_and_get_prompt_embeddings(prompt, 
                                                                          self.llama.get_tokenizer(), 
                                                                          self.llama.get_model(), 
                                                                          x_enc.device)
            print(f"prompt embedding shape {prompt_embeddings.shape}")
            print(f"prompt embedding {prompt_embeddings.shape}")
            
            # Get source embeddings 
            word_embeddings = self.llama.get_model().get_input_embeddings().weight
            source_embeddings = self.mapping_layer(word_embeddings.permute(1, 0)).permute(1, 0)    
            print(f"source embeddings shape {source_embeddings.shape}")
            
            # Get patch embeddings
            x_enc = x_enc.permute(0, 2, 1).contiguous()
            x_enc = x_enc.to(torch.bfloat16)
            enc_out, n_vars = self.patch_embedding(x_enc.to(torch.bfloat16))
            
            print(f"patch embeddings shape {enc_out.shape}")
            #enc_out = reprograming_layer(source_embeddings, x_enc, prompt_embeddings)
            #llama_enc_out = torch.cat([prompt_embeddings, enc_out], dim=1)
            
            
            return x_enc

In [90]:
dataset = Dataset_ETT_hour()
data_loader = DataLoader(
        dataset,
        batch_size=24,
        shuffle=False,
        num_workers=1,
        drop_last=True
    )
for i, (batch_x, batch_y, batch_x_mark, batch_y_mark) in tqdm(enumerate(data_loader)):
        print(f"batch {i}")
        model = TlModel(configs=None)
        #print(torch.reshape(batch_x[0], (1, 384)))
        print(batch_x.shape)
        output = model(batch_x, batch_x_mark, batch_y, batch_y_mark)
        #print(torch.reshape(output[0], (1, 384)))
        print(output.shape)
        if i == 0:
            break

0it [00:00, ?it/s]The argument `trust_remote_code` is to be used with Auto classes. It has no effect here and is ignored.


batch 0


Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

torch.Size([24, 384, 1])
prompt embedding shape torch.Size([24, 175, 4096])
prompt embedding torch.Size([24, 175, 4096])


0it [00:08, ?it/s]

source embeddings shape torch.Size([1000, 4096])





RuntimeError: expected scalar type BFloat16 but found Float