# Backtest the strategies

Use an LLM to go through and predict the buy/ sell/ hold recommendation for the company for the given date. Steps needed:

1. Load the LLM - use DeepSeek R1 Qwen model at 7B parameters first and try the quantised models next
2. Step through each data and each financial statement to get a result
3. Log the results in a file and save to S3 (will need a logging file to save to S3 and resume in case of kernel crash)
4. Need a backtesting framework to apply the results


## Load libraries needed

In [1]:
import json
import boto3
from s3fs import S3FileSystem
import os
import datetime

import transformers
from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig
from huggingface_hub import login
import torch
from accelerate import Accelerator

import pandas as pd
from IPython.display import Markdown, display
from ipywidgets import IntProgress, Label, HBox

from helper import get_s3_folder
import s3Helpers
import company_data
from s3Helpers import S3ModelHelper, Logger
from prompts import SYSTEM_PROMPTS

In [2]:
import importlib
importlib.reload(company_data)
importlib.reload(s3Helpers)

<module 's3Helpers' from '/project/s3Helpers.py'>

## Load the LLM

Models to test:
- Qwen (Qwen/Qwen2.5-7B-Instruct)
- Llama (meta-llama/Llama-3.2-7B-Instruct)
- DeepSeek (deepseek-ai/DeepSeek-R1-Distill-Qwen-14B)

In [3]:
# Log into Huggingface

with open('pass.txt') as p:
    hf_login = p.read()
    
hf_login = hf_login[hf_login.find('=')+1:hf_login.find('\n')]
login(hf_login, add_to_git_credential=False)

In [4]:
# Set up Quantization 
quant_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_use_double_quant=True,
    bnb_4bit_compute_dtype=torch.bfloat16,
    bnb_4bit_quant_type="nf4"

)

In [53]:
accelerator = Accelerator()

In [4]:
# Flag to download from Huggingface again or use stored model
USE_HF = False
USE_QUANTIZATION = False

model_id = "deepseek-ai/DeepSeek-R1-Distill-Qwen-14B"
model_id_s3 = 'deepseek'


if USE_HF:
   
    pipeline = transformers.pipeline(
        "text-generation",
        model=model_id,
        model_kwargs={"torch_dtype": torch.bfloat16},
        device_map="auto",
    )
    
    if USE_QUANTIZATION:
        model = AutoModelForCausalLM.from_pretrained(model_id, device_map='auto', quantization_config=quant_config)
    else:
        model = AutoModelForCausalLM.from_pretrained(model_id, device_map='auto', torch_dtype=torch.bfloat16)
    tokenizer = AutoTokenizer.from_pretrained(model_id)
else:
    model_helper = s3Helpers.S3ModelHelper(s3_sub_folder='tmp/fs')
    if USE_QUANTIZATION:
        model = model_helper.load_model(model_id_s3, quant_config)
    else:
        model = model_helper.load_model(model_id_s3)
    tokenizer = AutoTokenizer.from_pretrained(model_id)

    pipeline = transformers.pipeline(
        "text-generation",
        model=model,
        tokenizer=tokenizer,
        model_kwargs={"torch_dtype": torch.bfloat16},
        device_map="auto",
    )
    model_helper.clear_folder(model_id_s3)

Loading checkpoint shards:   0%|          | 0/6 [00:00<?, ?it/s]

tokenizer_config.json:   0%|          | 0.00/3.07k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/7.03M [00:00<?, ?B/s]

Device set to use cuda:0


## Load Financial PIT dataset

In [5]:
## Load from S3 using the helper file
filename = 'data_quarterly_pit_indu.json'
sec_helper = company_data.SecurityData('tmp/fs',filename)
all_data = sec_helper.get_all_data()

In [6]:
# USE WHILE DEVELOPING to
importlib.reload(company_data)
sec_helper = company_data.SecurityData('tmp/fs',filename, all_data)

In [7]:
sec_helper.total_securities_in_backtest()

896

In [8]:
sec_helper.get_security_statement('2020-04-24','AXP UN Equity','px') #AXP UN Equity2020-04-24

Unnamed: 0_level_0,Price
Date,Unnamed: 1_level_1
2019-04-24,114.02
2019-05-24,119.51
2019-06-24,124.14
2019-07-24,127.95
2019-08-24,117.76
2019-09-24,118.17
2019-10-24,116.41
2019-11-24,119.06
2019-12-24,124.74
2020-01-24,135.11


In [9]:
system_prompt = SYSTEM_PROMPTS['BASE']['prompt']
system_prompt

"You are a financial analyst and must make a buy, sell or hold decision on a company based only on the provided datasets.         Compute common financial ratios and then determine the buy or sell decision. Explain your reasons in less than 250 words. Provide a         confidence score for how confident you are of the decision. If you are not confident then lower the confidence score.         Your answer must be in a JSON object. Provide your answer in JSON format like the two examples below:         {'decision': BUY, 'confidence score': 80, 'reason': 'Gross profit and EPS have both increased over time'}         {'decision': SELL, 'confidence score': 90, 'reason': 'Price has declined and EPS is falling'}"

In [10]:
prompt = sec_helper.get_prompt('2020-04-24','AXP UN Equity', system_prompt)

In [11]:
tokens = tokenizer.apply_chat_template(prompt, tokenize=True, add_generation_prompt=True)
len(tokens)

4218

## Run an example in LLM

Run into out of memory problem - Potential fixes:
1. reduce size of model (quantize)
2. explore multi-gpu
3. reduce tokens

https://saturncloud.io/blog/how-to-solve-cuda-out-of-memory-error-in-pytorch/

https://huggingface.co/docs/accelerate/usage_guides/distributed_inference

https://medium.com/@geronimo7/llms-multi-gpu-inference-with-accelerate-5a8333e4c5db

Problem with splitting a single prompt into multiple gpus to calculate the result. Tensor parallelism - https://huggingface.co/docs/transformers/main/en/perf_train_gpu_many#tensor-parallelism

nvidia-smi will show available GPUs on the system.

### Run 1
Model used: Llama 3.2 3B Instruct (meta-llama/Llama-3.2-3B-Instruct) 
No quantisation. Run in 5.5 hours on 1 A10G GPU on 896 security/ date combinations. The data is stored in log files in the project for further analysis. 

In [56]:
def run_model(prompt, tokenizer, model):
    tokens = tokenizer.apply_chat_template(prompt, tokenize=False, add_generation_prompt=True)
    model_inputs = tokenizer([tokens], return_tensors='pt').to('cuda')
    generated_ids = model.generate(**model_inputs, pad_token_id=tokenizer.eos_token_id, max_new_tokens=1000)
    parsed_ids = [
        output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
    ]
    return tokenizer.batch_decode(parsed_ids, skip_special_tokens=True)[0]



In [13]:
# Time the execution
start_time = datetime.datetime.now()

# Run the model
response = run_model(prompt, tokenizer, model)

#Print the length of time to run
end_time = datetime.datetime.now()
print("Time to execute: ", end_time - start_time)

Time to execute:  0:01:02.676151


In [14]:
display(Markdown(response))

Okay, I need to make a buy, sell, or hold decision based on the provided datasets. Let's start by looking at the income statement. The revenue has fluctuated over the years but hasn't shown a clear upward trend. It went from 1.1026e+10 to 1.1308e+10, then dropped and fluctuated. That's a bit concerning.

Looking at the operating income or losses, I see it increased from t-5 to t-4 but then decreased in t-3 and t. It's 4.52e8 in t, which is lower than the previous year. That might indicate some issues with profitability.

The EPS has been fluctuating as well. In t, it's 0.41, which is lower than the previous years. That's a red flag because lower EPS can mean the company is earning less per share, which might not be attractive to investors.

Moving to the balance sheet, the cash and cash equivalents have been fluctuating. In t, it's 3.6095e10, which is lower than t-1. While they have significant cash, the decrease might indicate they're spending more or not generating as much cash.

Looking at liabilities, the total noncurrent liabilities have been increasing. From t-5 to t, they've gone up from 8.0997e10 to 7.5618e10, but wait, actually, in t, it's 7.5618e10, which is lower than t-1's 8.2783e10. Hmm, maybe that's not too bad. But overall, the company has a high amount of debt, which could be risky.

The price history shows some volatility. From 2019 to 2020, the price peaked at 135.11 but then dropped to 84.05 and 83.17. That's a significant drop, which might indicate investor loss of confidence or market issues.

Considering all these factors: fluctuating revenue, decreasing operating income, lower EPS, high debt, and volatile stock price. These are all negative signs. The company isn't performing as strongly as it could be, and the stock has shown recent weakness. I'm leaning towards a sell decision because the fundamentals aren't strong, and the stock price might continue to drop.

I'm about 75% confident in this decision because while there are some positive aspects like cash on hand, the negatives like decreasing profits and EPS are more concerning. Plus, the stock price has shown significant drops recently, which could be a sign of further decline.
</think>

```json
{
  "decision": "SELL",
  "confidence score": 75,
  "reason": "Revenue and operating income have fluctuated, EPS has decreased, and the stock price has shown recent weakness and volatility."
}
```

In [18]:
def format_json(llm_output):
    form = llm_output.replace('\n','')
    # Find the start and end of the JSON input
    soj = form.find('```json')
    eoj = form.find('}```')
    # Pull out the additional context
    additional = form[:soj]
    additional += form[eoj + 4:]
    json_obj = json.loads(form[soj + 7:eoj + 1])
    json_obj['AdditionalContext'] = additional
    return json_obj

In [19]:
response

'Okay, I need to make a buy, sell, or hold decision based on the provided datasets. Let\'s start by looking at the income statement. The revenue has fluctuated over the years but hasn\'t shown a clear upward trend. It went from 1.1026e+10 to 1.1308e+10, then dropped and fluctuated. That\'s a bit concerning.\n\nLooking at the operating income or losses, I see it increased from t-5 to t-4 but then decreased in t-3 and t. It\'s 4.52e8 in t, which is lower than the previous year. That might indicate some issues with profitability.\n\nThe EPS has been fluctuating as well. In t, it\'s 0.41, which is lower than the previous years. That\'s a red flag because lower EPS can mean the company is earning less per share, which might not be attractive to investors.\n\nMoving to the balance sheet, the cash and cash equivalents have been fluctuating. In t, it\'s 3.6095e10, which is lower than t-1. While they have significant cash, the decrease might indicate they\'re spending more or not generating as 

In [20]:
format_json(response)

{'decision': 'SELL',
 'confidence score': 75,
 'reason': 'Revenue and operating income have fluctuated, EPS has decreased, and the stock price has shown recent weakness and volatility.',
 'AdditionalContext': "Okay, I need to make a buy, sell, or hold decision based on the provided datasets. Let's start by looking at the income statement. The revenue has fluctuated over the years but hasn't shown a clear upward trend. It went from 1.1026e+10 to 1.1308e+10, then dropped and fluctuated. That's a bit concerning.Looking at the operating income or losses, I see it increased from t-5 to t-4 but then decreased in t-3 and t. It's 4.52e8 in t, which is lower than the previous year. That might indicate some issues with profitability.The EPS has been fluctuating as well. In t, it's 0.41, which is lower than the previous years. That's a red flag because lower EPS can mean the company is earning less per share, which might not be attractive to investors.Moving to the balance sheet, the cash and cas

## Run the backtest and generate all responses

In [53]:
importlib.reload(s3Helpers)
importlib.reload(company_data)

<module 'company_data' from '/project/company_data.py'>

In [57]:
logger = s3Helpers.Logger('tmp/fs')
def run_backtest(company_info, tokenizer, model, logger, log_at=50, start_count=0):
    # start the timer
    start_time = datetime.datetime.now()
    # get the dates
    dates = company_info.get_dates()
    # set the current date year
    current_year = dates[0][:4]

    # set the array
    year_log = []
    
    # set up the display
    max_count = company_info.total_securities_in_backtest()
    f = IntProgress(min=0, max=max_count) # instantiate the bar
    l = Label(value=str(f.value))
    display(HBox([f,l]))
    
    count = 0

    # run the backtest 
    for date in dates:
        
        securities = company_info.get_securities_reporting_on_date(date)

        for security in securities:
            
            # allow model to start running from a pre-set point
            if count >= start_count:
                
                # Save to S3 every 50 interations to ensure there is a cache
                if count % log_at == 0:
                    # save the file to S3 and reset when it is a new year
                    logger.log(year_log, current_year + str(count) + '.json')
                    # reset the stats
                    year_log = []
                    current_year = date[:4]


                prompt = company_info.get_prompt(date, security, system_prompt)
                response = run_model(prompt, tokenizer, model)
                try:
                    formatted_response = format_json(response)
                    formatted_response['security'] = security
                    formatted_response['date'] = date
                    year_log.append(formatted_response)
                except:
                    print("error with " + security + date)
                    error_json = {'security': security, 'date': date, 'response': response}
                    year_log.append(error_json)
                    
            # Interate along the backtest
            f.value += 1
            count += 1
            l.value = str(count) + "/" + str(max_count)
    
    # Log the last values
    logger.log(year_log, current_year + str(count) + '.json')
    # end the timer
    end_time = datetime.datetime.now()
    print("Completed! Time to execute: ", end_time - start_time)

In [140]:
run_backtest(sec_helper, tokenizer, model, logger)

HBox(children=(IntProgress(value=0, max=896), Label(value='0')))

error with MMM UN Equity2025-01-21
error with PG UN Equity2025-01-22
error with TRV UN Equity2025-01-22
error with AXP UN Equity2025-01-24
error with RTX UN Equity2025-01-28
error with BA UN Equity2025-01-28
error with V UN Equity2025-01-30
error with AAPL UQ Equity2025-01-30
error with CAT UN Equity2025-01-30
error with PFE UN Equity2025-02-04
Saved 2020896.json


### Concatenate all of the results

In [141]:
log_list = logger.get_list_of_logs()

In [129]:
log_list[0][log_list[0].find('/logs/') + 6:]

'20200.json'

In [142]:
def concat_all_logs():
    log_list = logger.get_list_of_logs()
    logs = []
    for logfile in log_list:
        logs += logger.get_log(logfile[logfile.find('/logs/') + 6:])
    return logs

In [143]:
logs = concat_all_logs()

In [144]:
len(logs)

909

In [57]:
test_json = [{'test':'test'}]

In [58]:
logger.log(test_json, 'test.json')

In [95]:
logger.get_list_of_logs()

['bclarke16/tmp/fs/logs/20200.json',
 'bclarke16/tmp/fs/logs/202010.json',
 'bclarke16/tmp/fs/logs/test.json']

In [97]:
 d = logger.get_log('202010.json')

In [104]:
len(d)

10

In [28]:
df = sec_helper.get_security_statement('2020-01-31','AON UN Equity','is')

In [145]:
#with open('Data/base2.json', 'w') as file:
#    json.dump(logs, file)

In [None]:
# start_time = datetime.datetime.now()
# #formatted_chat = tokenizer.apply_chat_template(prompt, tokenize=False, add_generation_prompt=True)
# outputs = pipeline(
#     prompt,
#     max_new_tokens=1000,
# )
# end_time = datetime.datetime.now()
# print("Time to execute: ", end_time - start_time)

# test_output = outputs[0]['generated_text'][-1]
# display(Markdown(test_output['content']))