# Run Inference 

This notebook runs inference on the open source models using the multi-gpu process implemented in the modelinference file. This file requires the llm-base environment to run correctly. The results of the backtest are stored in the Data folder under each model folder. 

In [1]:
import modelinference
import prompts
import importlib
import datetime
import torch
from huggingface_hub import login
import utils.model_helper as mh


from accelerate import Accelerator, notebook_launcher

In [2]:
importlib.reload(modelinference)

<module 'modelinference' from '/project/modelinference.py'>

### Set up the environment
Log into Huggingface and check the number of GPUs available

In [3]:
# Log into Huggingface
with open('pass.txt') as p:
    hf_login = p.read()
    
hf_login = hf_login[hf_login.find('=')+1:hf_login.find('\n')]
login(hf_login, add_to_git_credential=False)

In [4]:
torch.__version__

'2.1.2.post300'

In [5]:
# Check the number of GPUs available
torch.cuda.device_count()

4

### Create the inference object
Specify the run configuration, model to use, the system prompt to use and dataset to run on

In [6]:

run_config = {
    'model_hf_id': 'deepseek-ai/DeepSeek-R1-Distill-Qwen-14B',
    'model_s3_loc': 'deepseek14Q',
    'model_reload': False,
    'model_quant': None,
    'system_prompt': prompts.SYSTEM_PROMPTS['CoT'],
    'multi-gpu':True,
    'dataset': 'data_annual_pit_spx',
    'data_location': 'data_annual_pit_spx.json'
}

run_config = {
    'model_hf_id': 'Qwen/Qwen2.5-7B-Instruct',
    'model_s3_loc': 'qwen',
    'model_reload': False,
    'model_quant': None,
    'system_prompt': prompts.SYSTEM_PROMPTS['BASE'],
    'multi-gpu':True,
    'dataset': 'data_quarterly_pit_indu',
    'data_location': 'data_quarterly_pit_indu.json'
}

# RUN 1
run_config = {
    'model_hf_id': 'meta-llama/Llama-3.2-3B-Instruct',
    'model_s3_loc': 'llama',
    'model_reload': False,
    'model_quant': None,
    'system_prompt': prompts.SYSTEM_PROMPTS['BASE'],
    'multi-gpu':True,
    'dataset': 'data_quarterly_pit_indu',
    'data_location': 'data_quarterly_pit_indu.json'
}

# RUN 2
run_config = {
    'model_hf_id': 'meta-llama/Llama-3.2-3B-Instruct',
    'model_s3_loc': 'llama',
    'model_reload': False,
    'model_quant': None,
    'system_prompt': prompts.SYSTEM_PROMPTS['CoT'],
    'multi-gpu':True,
    'dataset': 'data_quarterly_pit_indu',
    'data_location': 'data_quarterly_pit_indu.json'
}

# RUN 3
run_config = {
    'model_hf_id': 'deepseek-ai/DeepSeek-R1-Distill-Qwen-14B',
    'model_s3_loc': 'deepseek14Q',
    'model_reload': False,
    'model_quant': None,
    'system_prompt': prompts.SYSTEM_PROMPTS['BASE'],
    'multi-gpu':True,
    'dataset': 'data_quarterly_pit_indu',
    'data_location': 'data_quarterly_pit_indu.json'
}

# RUN 4
run_config = {
    'model_hf_id': 'deepseek-ai/DeepSeek-R1-Distill-Qwen-14B',
    'model_s3_loc': 'deepseek14Q',
    'model_reload': False,
    'model_quant': None,
    'system_prompt': prompts.SYSTEM_PROMPTS['CoT'],
    'multi-gpu':True,
    'dataset': 'data_quarterly_pit_indu',
    'data_location': 'data_quarterly_pit_indu.json'
}

# RUN 5 - failed
run_config = {
    'model_hf_id': 'deepseek-ai/DeepSeek-R1-Distill-Qwen-32B',
    'model_s3_loc': 'deepseek32',
    'model_reload': False,
    'model_quant': None,
    'system_prompt': prompts.SYSTEM_PROMPTS['CoT'],
    'multi-gpu':True,
    'dataset': 'data_quarterly_pit_indu',
    'data_location': 'data_quarterly_pit_indu.json'
}

# Run 6 - failed
run_config = {
    'model_hf_id': 'Qwen/Qwen2.5-7B-Instruct',
    'model_s3_loc': 'qwen',
    'model_reload': False,
    'model_quant': None,
    'system_prompt': prompts.SYSTEM_PROMPTS['CoTDetailed'],
    'multi-gpu':True,
    'dataset': 'data_quarterly_pit_indu',
    'data_location': 'data_quarterly_pit_indu.json'
}



run_name = f"{run_config['model_s3_loc']}_{run_config['dataset']}"

In [7]:
ir = modelinference.InferenceRun(run_name, run_config)

In [8]:
# Create the prompts and save to the Data folder
prompts = ir.create_all_prompts(True)

In [9]:
# Run the multi-gpu model with notebook_launcher
notebook_launcher(ir.run_multi_gpu, num_processes=torch.cuda.device_count())

Launching training on 4 GPUs.
qwen
qwen
qwen
qwen


Loading checkpoint shards:   0%|          | 0/4 [00:00<?, ?it/s]

Waiting...


Loading checkpoint shards:   0%|          | 0/4 [00:00<?, ?it/s]

Loading checkpoint shards:   0%|          | 0/4 [00:00<?, ?it/s]

Loading checkpoint shards:   0%|          | 0/4 [00:00<?, ?it/s]

  0%|          | 0/896 [00:00<?, ?it/s]

Memory footprint: 15.2 GB
Waiting...
Waiting...
Waiting...
starting backtest...
starting backtest...
starting backtest...
starting backtest...


  9%|▉         | 84/896 [10:57<1:58:12,  8.74s/it]

Finished run...


 11%|█         | 96/896 [13:16<2:22:25, 10.68s/it]

Finished run...
Finished run...
Finished run...
Gathered results...Gathered results...Gathered results...Gathered results...



Finished run in 0:16:58.403019
Called Save run
called log
Saved bclarke16/tmp/fs/logs/Results_2025-03-14 16:13:20.736708.json
Run Completed!


 11%|█         | 96/896 [16:58<2:21:29, 10.61s/it]


In [12]:
import json
with open('Data/qwen_data_quarterly_pit_indu/prompts.json', 'rb') as f:
    data = json.load(f)

In [13]:
len(data)

896

In [17]:
data[870]

{'security': 'UNH UN Equity',
 'date': '2025-01-16',
 'prompt': [{'role': 'system',
   'content': "You are a financial analyst and must make a buy, sell, hold decision on a company based only on the provided financial statement. Your goal is to buy stocks you think will increase over the next financial period and sell stocks you think will decline. Think step-by-step through the financial statement analysis workflow. State the ratios you are using and then use your analysis along with the current stock price trends to determine if the stock should be bought or sold. Provide your answer in JSON format like the two examples: {'decision': BUY, 'confidence score': 80, 'reason': 'Gross profit and EPS have both increased over time'}, {'decision': SELL, 'confidence score': 90, 'reason': 'Price has declined and EPS is falling'} Company financial statements:  "},
  {'role': 'user',
   'content': 'Income Statement:                                                        t           t-1           

In [11]:
ir = modelinference.InferenceRun(run_name, run_config)

In [12]:
p1 = ir.create_all_prompts(True)

Requesting all datasets...
Saving data...


In [11]:
len(p1)

896

In [12]:
model = ir.load_model_from_storage(ir.model_s3_loc)

deepseek32


Unused kwargs: ['_load_in_4bit', '_load_in_8bit', 'quant_method']. These kwargs are not used in <class 'transformers.utils.quantization_config.BitsAndBytesConfig'>.


Loading checkpoint shards:   0%|          | 0/4 [00:00<?, ?it/s]

Qwen2Config {
  "_attn_implementation_autoset": true,
  "_name_or_path": "deepseek32",
  "architectures": [
    "Qwen2ForCausalLM"
  ],
  "attention_dropout": 0.0,
  "bos_token_id": 151643,
  "eos_token_id": 151643,
  "hidden_act": "silu",
  "hidden_size": 5120,
  "initializer_range": 0.02,
  "intermediate_size": 27648,
  "max_position_embeddings": 131072,
  "max_window_layers": 64,
  "model_type": "qwen2",
  "num_attention_heads": 40,
  "num_hidden_layers": 64,
  "num_key_value_heads": 8,
  "quantization_config": {
    "_load_in_4bit": true,
    "_load_in_8bit": false,
    "bnb_4bit_compute_dtype": "bfloat16",
    "bnb_4bit_quant_storage": "uint8",
    "bnb_4bit_quant_type": "nf4",
    "bnb_4bit_use_double_quant": true,
    "llm_int8_enable_fp32_cpu_offload": false,
    "llm_int8_has_fp16_weight": false,
    "llm_int8_skip_modules": null,
    "llm_int8_threshold": 6.0,
    "load_in_4bit": true,
    "load_in_8bit": false,
    "quant_method": "bitsandbytes"
  },
  "rms_norm_eps": 1e-05,

In [23]:
from transformers import AutoTokenizer

In [24]:
tokenizer = AutoTokenizer.from_pretrained(ir.model_hf_id)

In [28]:
output_result = ir.run_model(p1[0]['prompt'],tokenizer,model)

running model...


In [8]:
output_result = {"date": "2020-02-06", "security": "MMM UN Equity", "response": "Here is the JSON response:\n\n```json\n{\n  \"decision\": \"BUY\",\n  \"confidence score\": 80,\n  \"reason\": \"Gross profit and EPS have increased over time, indicating a strong financial performance\"\n}\n```\n\nI have computed the following financial ratios:\n\n1. Gross Margin: \n   - 2020: 3.786000e+09 / 8.111000e+09 = 0.466\n   - 2019: 3.803000e+09 / 7.991000e+09 = 0.477\n   - 2018: 3.858000e+09 / 8.171000e+09 = 0.471\n   - 2017: 3.553000e+09 / 7.863000e+09 = 0.453\n   - 2016: 3.885000e+09 / 7.945000e+09 = 0.492\n\nThe gross margin has been increasing over time, indicating a strong financial performance.\n\n2. EPS:\n   - 2020: 9.690000e+08 / 1.012600e+10 = 0.953\n   - 2019: 1.583000e+09 / 1.076400e+10 = 1.465\n   - 2018: 1.127000e+09 / 1.014200e+10 = 1.117\n   - 2017: 1.347000e+09 / 9.848000e+09 = 0.137\n   - 2016: 1.543000e+09 / 9.848000e+09 = 0.156\n\nThe EPS has been increasing over time, indicating a strong financial performance.\n\n3. Current Ratio:\n   - 2020: 2.441000e+09 / 9.222000e+09 = 0.264\n   - 2019: 1.588000e+09 / 7.821000e+09 = 0.201\n   - 2018: 1.131000e+09 / 7.265000e+09 = 0.157\n   - 2017: 8.930000e+08 / 7.244000e+09 = 0.123\n   - 2016: 8.910000e+08 / 5.020000e+09 = 0.177\n\nThe current ratio has been increasing over time, indicating a strong financial performance.\n\nBased on these financial ratios, the company has been performing well financially and has a strong track record of increasing gross profit and EPS over time. Therefore, I recommend a BUY decision with a confidence score of 80."}

In [10]:
output_result['response']

'Here is the JSON response:\n\n```json\n{\n  "decision": "BUY",\n  "confidence score": 80,\n  "reason": "Gross profit and EPS have increased over time, indicating a strong financial performance"\n}\n```\n\nI have computed the following financial ratios:\n\n1. Gross Margin: \n   - 2020: 3.786000e+09 / 8.111000e+09 = 0.466\n   - 2019: 3.803000e+09 / 7.991000e+09 = 0.477\n   - 2018: 3.858000e+09 / 8.171000e+09 = 0.471\n   - 2017: 3.553000e+09 / 7.863000e+09 = 0.453\n   - 2016: 3.885000e+09 / 7.945000e+09 = 0.492\n\nThe gross margin has been increasing over time, indicating a strong financial performance.\n\n2. EPS:\n   - 2020: 9.690000e+08 / 1.012600e+10 = 0.953\n   - 2019: 1.583000e+09 / 1.076400e+10 = 1.465\n   - 2018: 1.127000e+09 / 1.014200e+10 = 1.117\n   - 2017: 1.347000e+09 / 9.848000e+09 = 0.137\n   - 2016: 1.543000e+09 / 9.848000e+09 = 0.156\n\nThe EPS has been increasing over time, indicating a strong financial performance.\n\n3. Current Ratio:\n   - 2020: 2.441000e+09 / 9.22200

In [11]:
import json

In [20]:
def format_json(llm_output):
    # remove all the broken lines
    form = llm_output.replace('\n','')
    # Find the start and end of the JSON input
    #try:
    soj = form.find('```json')
    eoj = form.find('}```')

    if eoj == -1:
        eoj = len(llm_output)
        llm_output = llm_output + '}```'
    # Pull out the additional context
    additional = form[:soj]
    additional += form[eoj + 4:]
    json_obj = json.loads(form[soj + 7:eoj + 1])
    json_obj['AdditionalContext'] = additional
    return json_obj
    #except:
    #    return llm_output

In [21]:
format_json(output_result['response'])

{  "decision": "BUY",  "confidence score": 80,  "reason": "Gross profit and EPS have increased over time, indicating a strong financial performance"}


{'decision': 'BUY',
 'confidence score': 80,
 'reason': 'Gross profit and EPS have increased over time, indicating a strong financial performance',
 'AdditionalContext': 'Here is the JSON response:I have computed the following financial ratios:1. Gross Margin:    - 2020: 3.786000e+09 / 8.111000e+09 = 0.466   - 2019: 3.803000e+09 / 7.991000e+09 = 0.477   - 2018: 3.858000e+09 / 8.171000e+09 = 0.471   - 2017: 3.553000e+09 / 7.863000e+09 = 0.453   - 2016: 3.885000e+09 / 7.945000e+09 = 0.492The gross margin has been increasing over time, indicating a strong financial performance.2. EPS:   - 2020: 9.690000e+08 / 1.012600e+10 = 0.953   - 2019: 1.583000e+09 / 1.076400e+10 = 1.465   - 2018: 1.127000e+09 / 1.014200e+10 = 1.117   - 2017: 1.347000e+09 / 9.848000e+09 = 0.137   - 2016: 1.543000e+09 / 9.848000e+09 = 0.156The EPS has been increasing over time, indicating a strong financial performance.3. Current Ratio:   - 2020: 2.441000e+09 / 9.222000e+09 = 0.264   - 2019: 1.588000e+09 / 7.821000e+09

In [14]:
ir.save_run({'test':'1234'})

Saved s3://awmgd-prod-finml-sandbox-user/bclarke16/tmp/fs/logs/results - llama - data_quarterly_pit_indu
Run Completed!


In [10]:
helper = mh.ModelHelper('tmp/fs')

In [11]:
helper.clear_folder('Data/Del')