# Chapter 94: Large Language Models for Time-Series

## **Learning Objectives**

By the end of this chapter, you will be able to:

- Understand how large language models (LLMs) can be applied to time‑series data, beyond traditional numerical forecasting.
- Design effective prompts for time‑series tasks such as forecasting, anomaly explanation, and trend summarisation.
- Use few‑shot learning to adapt LLMs to specific time‑series datasets with minimal examples.
- Apply chain‑of‑thought prompting to improve reasoning about temporal patterns.
- Integrate LLMs with external tools (calculators, databases) to enhance numerical accuracy.
- Build simple agents that can answer questions about time‑series data, such as “Why did the NEPSE index drop last week?”
- Recognise the limitations of LLMs for time‑series, including token limits, hallucinations, and cost.
- Evaluate when to use an LLM versus a traditional time‑series model.

---

## **94.1 Introduction to Large Language Models for Time‑Series**

Large language models (LLMs) like GPT‑4, Claude, and Llama have revolutionised natural language processing. Their ability to understand context, reason, and generate human‑like text has led researchers and practitioners to explore their application to time‑series data. However, LLMs are not inherently numerical models; they are trained on text. So how can they help with time‑series?

The answer lies in their ability to:

- **Understand and generate textual descriptions** of time‑series (e.g., “sales peaked in December”).
- **Reason about temporal patterns** when provided with numerical data in a textual format.
- **Incorporate external knowledge** (e.g., news, events) that may affect the time series.
- **Explain predictions** from black‑box models in natural language.
- **Answer questions** about historical data or future trends.

In the context of the NEPSE stock prediction system, an LLM could:

- Summarise daily market movements: “The NEPSE index rose 2% today, driven by banking stocks.”
- Explain a model's prediction: “The model predicted a drop because of high volatility and negative RSI divergence.”
- Answer a trader's query: “What happened to NABIL stock on June 1st, and how does that compare to historical patterns?”
- Generate synthetic news or sentiment to augment features.

This chapter explores these capabilities, with practical examples using OpenAI's API and the LangChain framework.

---

## **94.2 LLM Capabilities Relevant to Time‑Series**

Before diving into implementation, let's outline what LLMs can and cannot do well in the time‑series domain.

### **94.2.1 Strengths**
- **Language understanding**: They can parse questions about time‑series and generate coherent answers.
- **Pattern recognition in text**: If time‑series data is presented as a sequence of numbers (e.g., “100, 105, 103, …”), LLMs can sometimes detect simple patterns (trends, seasonality) because they have seen similar sequences in their training data (e.g., stock prices in financial news).
- **External knowledge integration**: They can incorporate news, earnings reports, or macroeconomic indicators that are not in the numerical data.
- **Explainability**: They can generate human‑readable explanations for model outputs or data anomalies.
- **Few‑shot learning**: With a few examples in the prompt, they can adapt to new tasks.

### **94.2.2 Weaknesses**
- **Numerical precision**: LLMs are not calculators. They may make arithmetic errors, especially with large numbers or complex operations.
- **Token limits**: Time‑series can be long. Even with 128k context, you cannot feed years of hourly data.
- **Hallucination**: They may invent facts or numbers that are not present in the data.
- **Lack of temporal understanding**: While they can recognise patterns, they do not inherently understand concepts like autocorrelation or stationarity.
- **Cost and latency**: API calls are slower and more expensive than traditional models.

Given these trade‑offs, LLMs are best used as **assistants** to traditional forecasting systems, not as replacements.

---

## **94.3 Prompt Engineering for Time‑Series**

The way you present data and instructions to an LLM dramatically affects the quality of the response. This is **prompt engineering**.

### **94.3.1 Structuring Numerical Data**
LLMs work with text, so we need to convert time‑series into a textual representation. Options include:

- **List of values**: “Close prices for the last 10 days: 105.2, 106.1, 104.5, 107.3, 108.0, 109.2, 108.5, 110.1, 111.0, 112.3”
- **Table format**: Use markdown tables for clarity.
- **With dates**: “2024‑05‑01: 105.2, 2024‑05‑02: 106.1, …”

For long series, summarise statistics instead of listing every point.

### **94.3.2 Zero‑Shot Prompting**
Simply ask the model to perform a task, with the data in the prompt.

**Example**: Ask for a forecast.

```
You are a financial analyst. Given the last 10 days of closing prices for the NEPSE index, predict the next day's price. Prices: 105.2, 106.1, 104.5, 107.3, 108.0, 109.2, 108.5, 110.1, 111.0, 112.3.
```

The model might respond with a prediction and a brief rationale. However, zero‑shot may be unreliable.

### **94.3.3 Few‑Shot Prompting**
Provide a few examples of the task in the prompt, then ask for the new case.

```
You are a time‑series forecasting assistant. Given the last few days of prices, predict the next value.

Example 1:
Prices: 50, 52, 51, 53, 55
Next: 56

Example 2:
Prices: 100, 102, 101, 104, 107
Next: 108

Now, predict for:
Prices: 105.2, 106.1, 104.5, 107.3, 108.0, 109.2, 108.5, 110.1, 111.0, 112.3
Next:
```

The model learns the pattern from the examples. This can improve accuracy.

### **94.3.4 Role Prompting**
Assign the model a persona to guide its response.

```
You are an expert stock market analyst with 20 years of experience. Analyse the following NEPSE price sequence and explain any notable patterns.
```

### **94.3.5 Instruction Formatting**
Use clear delimiters (e.g., triple backticks) for data, and specify the desired output format (JSON, bullet points).

```
Given the following daily closing prices for NEPSE, provide a JSON object with fields: trend (up/down/sideways), volatility (low/medium/high), and next_day_prediction (a number).

Data:
```
2024-05-20: 110.2
2024-05-21: 111.5
2024-05-22: 109.8
2024-05-23: 112.0
2024-05-24: 113.1
```
```

---

## **94.4 Few‑Shot Learning for NEPSE**

Let's implement a practical example using OpenAI's API to perform few‑shot forecasting on NEPSE data.

```python
import openai
import os

openai.api_key = os.getenv("OPENAI_API_KEY")

def few_shot_forecast(prices_series, n_examples=3):
    """
    Use few‑shot prompting to forecast the next value.
    prices_series: list of floats (recent prices)
    """
    # Build few‑shot examples (these could be from historical data)
    examples = [
        ([50, 52, 51, 53, 55], 56),
        ([100, 102, 101, 104, 107], 108),
        ([200, 205, 203, 208, 210], 212),
    ]
    
    prompt = "You are a time‑series forecasting assistant. Given the last few days of prices, predict the next value.\n\n"
    
    for prices, target in examples:
        prompt += f"Prices: {', '.join(map(str, prices))}\nNext: {target}\n\n"
    
    prompt += f"Now, predict for:\nPrices: {', '.join(map(str, prices_series))}\nNext:"
    
    response = openai.ChatCompletion.create(
        model="gpt-4",
        messages=[
            {"role": "system", "content": "You are a helpful assistant that predicts the next number in a sequence. Provide only the number as your answer."},
            {"role": "user", "content": prompt}
        ],
        temperature=0.0,  # low temperature for deterministic output
        max_tokens=10
    )
    
    prediction = response.choices[0].message.content.strip()
    try:
        return float(prediction)
    except ValueError:
        # If model returns extra text, we may need to parse
        # In practice, you'd add instructions to output only number.
        print(f"Could not parse: {prediction}")
        return None

# Example usage with NEPSE data
recent_prices = [105.2, 106.1, 104.5, 107.3, 108.0, 109.2, 108.5, 110.1, 111.0, 112.3]
pred = few_shot_forecast(recent_prices)
print(f"LLM forecast: {pred}")
```

**Explanation**:

- We provide a few examples of price sequences and their next values. The model infers the pattern (which appears to be a simple upward trend).
- We set `temperature=0.0` for reproducibility.
- The model returns a string; we attempt to convert to float.
- In practice, you might need more examples or better examples tailored to NEPSE.

This approach is surprisingly effective for simple patterns, but it will struggle with complex seasonality or noise.

---

## **94.5 Chain‑of‑Thought Prompting**

Chain‑of‑thought (CoT) prompting encourages the model to reason step by step before giving an answer. This can improve accuracy for tasks that require intermediate reasoning.

For time‑series, CoT might involve:

- Describing the trend (e.g., “prices have been increasing over the last 5 days”).
- Noting volatility.
- Considering seasonality (e.g., “this is the end of the month”).
- Then making a prediction.

**Example prompt**:

```
You are a financial analyst. Given the following daily closing prices for NEPSE, think step by step and then predict the next day's price.

Prices: 105.2, 106.1, 104.5, 107.3, 108.0, 109.2, 108.5, 110.1, 111.0, 112.3

First, describe the overall trend.
Second, note any recent volatility.
Third, based on these observations, predict the next price.
```

The model might respond:

```
Step 1: The overall trend over the last 10 days is upward. The price started at 105.2 and ended at 112.3, an increase of about 6.7%.
Step 2: There is some volatility; for example, a dip from 106.1 to 104.5 on day 3, but generally the movement is smooth.
Step 3: Given the strong upward momentum and no signs of reversal, I predict the next price will be around 113.5.
```

CoT can be implemented by simply including these instructions in the prompt.

---

## **94.6 Tool Use: Augmenting LLMs with Calculators**

LLMs are bad at arithmetic. To get accurate numerical results, we can give them access to tools. This is a key capability of modern LLM frameworks like LangChain.

For example, we can let the LLM decide when to use a calculator to compute moving averages or percentage changes.

```python
from langchain.agents import initialize_agent, Tool
from langchain.llms import OpenAI
from langchain.tools import BaseTool
import math

class CalculatorTool(BaseTool):
    name = "Calculator"
    description = "Useful for arithmetic operations. Input should be a mathematical expression."

    def _run(self, query: str) -> str:
        try:
            # Safely evaluate expression (in production, use a safer evaluator)
            result = eval(query)
            return f"Result: {result}"
        except Exception as e:
            return f"Error: {e}"

    async def _arun(self, query: str):
        return self._run(query)

# Initialize LLM and agent
llm = OpenAI(temperature=0)
tools = [CalculatorTool()]

agent = initialize_agent(
    tools, llm, agent="zero-shot-react-description", verbose=True
)

# Ask a question that requires calculation
query = """
Given the following NEPSE prices: 105.2, 106.1, 104.5, 107.3, 108.0.
Compute the 3-day moving average of the last three prices, then add 2% to that value.
"""

response = agent.run(query)
print(response)
```

**Explanation**:

- The agent can decide to call the `CalculatorTool` with an expression like `(107.3 + 108.0 + 108.5)/3 * 1.02`.
- The tool returns the result, and the agent incorporates it into the final answer.
- This overcomes the LLM's numerical weakness.

In a time‑series context, you could provide tools for:

- Fetching historical data from a database.
- Computing technical indicators (RSI, MACD).
- Running a pre‑trained forecasting model.

This leads to **agentic systems** that can combine reasoning with computation.

---

## **94.7 Agents for Time‑Series Analysis**

An agent is an LLM‑powered system that can use multiple tools to accomplish a goal. For the NEPSE system, an agent could answer complex user queries like:

> “Compare the performance of banking stocks and hydropower stocks over the last month. Which sector had higher volatility?”

The agent would need to:

1. Query a database for stock prices.
2. Compute returns and volatility per sector.
3. Generate a comparative summary.

We can build such an agent using LangChain.

```python
from langchain.agents import Tool, AgentExecutor, LLMSingleActionAgent
from langchain.chains import LLMChain
from langchain.llms import OpenAI
from langchain.prompts import StringPromptTemplate
from typing import List, Union
import pandas as pd
import numpy as np

# Define tools: one to fetch data, one to compute metrics
class FetchDataTool(BaseTool):
    name = "FetchStockData"
    description = "Fetches historical price data for a given symbol. Input: symbol (string). Returns: list of prices and dates."

    def _run(self, symbol: str) -> str:
        # In reality, query database. Here, we simulate.
        # For demo, return a small sample.
        data = {
            'NABIL': [105.2, 106.1, 104.5, 107.3, 108.0],
            'HRL': [210.0, 212.5, 209.0, 215.0, 218.0]
        }
        return str(data.get(symbol, []))

class ComputeVolatilityTool(BaseTool):
    name = "ComputeVolatility"
    description = "Computes the volatility (standard deviation of returns) for a list of prices. Input: list of prices. Returns: volatility value."

    def _run(self, prices_str: str) -> str:
        import ast
        prices = ast.literal_eval(prices_str)
        returns = np.diff(prices) / prices[:-1]
        vol = np.std(returns)
        return str(vol)

# Create agent
tools = [FetchDataTool(), ComputeVolatilityTool()]

# Custom prompt template (simplified)
prompt_template = """
You are a financial analyst assistant. Use the tools provided to answer the question.

Question: {input}

You have access to the following tools: {tool_names}.
Use them in a Thought/Action/Observation cycle.

{agent_scratchpad}
"""

# Build agent (details omitted for brevity, but standard LangChain pattern)
# ...

# Example query
result = agent.run("Compare the volatility of NABIL and HRL over the last 5 days.")
print(result)
```

**Explanation**:

- The agent decides which tools to call based on the query.
- It fetches data for each symbol, then computes volatility.
- Finally, it synthesises an answer: “NABIL had a volatility of X, HRL had Y, so HRL was more volatile.”
- This demonstrates how LLMs can orchestrate multiple steps to answer complex questions.

---

## **94.8 Applications in the NEPSE System**

Let's explore concrete applications of LLMs within our NEPSE prediction system.

### **94.8.1 Generating Explanations for Predictions**
After a traditional model (e.g., XGBoost) makes a prediction, we can use an LLM to explain it in plain language. Provide the model's input features and output, and ask the LLM to generate a rationale.

```python
def explain_prediction(features, prediction):
    prompt = f"""
    A machine learning model predicted the NEPSE closing price to be {prediction:.2f}.
    The input features were:
    - Lag 1 close: {features['lag_1']}
    - 20-day SMA: {features['sma_20']}
    - RSI: {features['rsi']}
    - Volume Z-score: {features['volume_z']}
    
    Explain why the model might have made this prediction in simple terms.
    """
    response = openai.ChatCompletion.create(...)
    return response.choices[0].message.content
```

This can build trust with traders.

### **94.8.2 Incorporating News Sentiment**
LLMs can analyse news headlines or articles to generate sentiment scores that become features in the model.

```python
def sentiment_from_news(headline):
    prompt = f"Classify the sentiment of this headline as positive, negative, or neutral, and provide a score from -1 (very negative) to +1 (very positive).\nHeadline: {headline}"
    # parse response
    return sentiment_score
```

### **94.8.3 Answering Ad‑Hoc Questions**
Create a chatbot that allows traders to ask natural language questions about the market, using the LLM to query the database and generate answers.

### **94.8.4 Anomaly Explanation**
When the monitoring system detects an anomaly (e.g., a sudden price spike), an LLM can suggest possible causes based on recent news or historical patterns.

---

## **94.9 Limitations and Challenges**

While LLMs are powerful, they come with significant caveats.

### **94.9.1 Token Limits**
Even with extended context windows (128k tokens for GPT‑4), you cannot feed years of daily data. For NEPSE, 10 years of daily data is about 3650 points. If each point is represented as "2024-06-01: 110.2", that's roughly 20 characters per point, totalling 73k characters, which is within limits. But if you need multiple stocks or higher frequency, it becomes challenging. Solutions include summarising (e.g., providing weekly aggregates) or using a retrieval system.

### **94.9.2 Hallucination**
LLMs may generate plausible‑sounding but incorrect numbers or facts. Always verify outputs, especially if used for trading decisions.

### **94.9.3 Cost**
API calls cost money. A single complex query might cost cents, but at scale it adds up. Fine‑tuning a smaller open‑source model (e.g., Llama) can reduce cost.

### **94.9.4 Latency**
LLM inference is slow (seconds) compared to traditional models (milliseconds). Not suitable for real‑time trading.

### **94.9.5 Numerical Reasoning**
Even with tools, the LLM must decide when to use them. This can fail if the tool choice is suboptimal.

### **94.9.6 Data Privacy**
Sending proprietary financial data to an external API (OpenAI) may violate policies. Consider using a local model (e.g., through Ollama) for sensitive data.

---

## **94.10 Future Directions**

The intersection of LLMs and time‑series is an active research area. Expect to see:

- **Fine‑tuned time‑series LLMs**: Models specifically trained on numerical sequences and financial text.
- **Multimodal models**: Combining text, images (charts), and numbers.
- **Improved tool use**: LLMs that can reliably call forecasting models, databases, and visualisation libraries.
- **Lower latency**: Smaller, faster models for real‑time assistance.
- **Integration with forecasting pipelines**: LLMs as a natural language interface to complex forecasting systems.

For the NEPSE system, you might soon have an AI assistant that can:

- “Show me the forecast for NABIL for the next week.”
- “Why did the model change its prediction from yesterday?”
- “What would happen if interest rates rise by 1%?”

---

## **Chapter Summary**

In this chapter, we explored how large language models can complement traditional time‑series prediction systems. We covered:

- The strengths and weaknesses of LLMs for time‑series tasks.
- Prompt engineering techniques including zero‑shot, few‑shot, and chain‑of‑thought.
- Practical implementation of few‑shot forecasting using OpenAI's API.
- Tool use to overcome numerical limitations, with examples using LangChain.
- Building agents that can answer complex questions by combining data fetching, computation, and reasoning.
- Specific applications in the NEPSE system: explanation generation, sentiment analysis, and ad‑hoc query answering.
- The limitations of LLMs: token limits, hallucinations, cost, and latency.
- Future directions in this rapidly evolving field.

LLMs are not a replacement for traditional forecasting models, but they are a powerful interface layer that can make time‑series insights more accessible and explainable. As the technology matures, we can expect even tighter integration between language and numbers.

In the next chapter, we will explore **Automated Scientific Discovery**, where AI is used to uncover new knowledge from data, including causal relationships and physical laws.

---

**End of Chapter 94**