# Run Open Source Inference 

This notebook runs inference on the open source models using the multi-gpu process implemented in the modelinference file. This file requires the llm-base environment to run correctly. The results of the backtest are stored in the Data folder under each model folder. 

We test with a handful of open source models:

- Llama 3.2 3B Instruct
- Qwen 2.5 7B Instruct
- Deepseek

Also test with a fine-tuned model for estimating next period EPS.

In [1]:
import model_inference
import prompts
import importlib
import datetime
import torch
from huggingface_hub import login
import utils.model_helper as mh

from transformers import BitsAndBytesConfig
from accelerate import Accelerator, notebook_launcher

In [2]:
importlib.reload(model_inference)
importlib.reload(prompts)
importlib.reload(mh)

<module 'utils.model_helper' from '/project/utils/model_helper.py'>

### Set up the environment
Log into Huggingface and check the number of GPUs available

In [3]:
# Log into Huggingface
with open('pass.txt') as p:
    hf_login = p.read()
    
hf_login = hf_login[hf_login.find('=')+1:hf_login.find('\n')]
login(hf_login, add_to_git_credential=False)

In [4]:
print(f'Torch version: {torch.__version__}')
#print(f'Device Count: {torch.cuda.device_count()}')
import accelerate
accelerate.__version__

Torch version: 2.1.2.post300


'1.4.0'

### Run 1: Llama 3.2 Earning analysis - Base


In [5]:
# Create the run config
run_config = {
    'model_hf_id': 'meta-llama/Llama-3.2-3B-Instruct',
    'model_s3_loc': 'llama',
    'model_reload': False,
    'model_quant': None,
    'system_prompt': prompts.SYSTEM_PROMPTS['BASE_EARN'],
    'multi-gpu':True,
    'dataset': 'data_quarterly_pit_indu_blended',
    'data_location': 'data_quarterly_pit_indu_refresh_blended.json'
    #'fine_tuned_dir':None
}

In [6]:
run_name = f"{run_config['model_s3_loc']}_{run_config['dataset']}"
ir = model_inference.InferenceRun(run_name, run_config)

# Create the prompts and save to the Data folder
#prompt_set = ir.create_all_prompts(force_refresh=True, is_save_prompts=True)
torch.cuda.is_initialized()

False

In [7]:
# Run the multi-gpu model with notebook_launcher
notebook_launcher(ir.run_multi_gpu, num_processes=torch.cuda.device_count())

Launching training on 4 GPUs.
llama
llama
llama
llama


Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

tokenizer_config.json:   0%|          | 0.00/54.5k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/9.09M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/296 [00:00<?, ?B/s]

  0%|          | 0/887 [00:00<?, ?it/s]

Memory footprint: 6.4 GB
starting backtest...
starting backtest...
starting backtest...
starting backtest...


888it [22:53,  1.45s/it]                         

Finished run in 0:24:51.327519
Called Save run
called log
Saved bclarke16/tmp/fs/logs/Results_2025-04-11 10:41:09.337985.json
Run Completed!


888it [24:51,  1.68s/it]


### Run 2: Llama 3.2 Earning analysis - Chain of Thought


In [5]:
run_config = {
    'model_hf_id': 'meta-llama/Llama-3.2-3B-Instruct',
    'model_s3_loc': 'llama',
    'model_reload': False,
    'model_quant': None,
    'system_prompt': prompts.SYSTEM_PROMPTS['COT_EARN'],
    'multi-gpu':True,
    'dataset': 'data_quarterly_pit_indu_blended_cot',
    'data_location': 'data_quarterly_pit_indu_refresh_blended.json'
}

In [6]:
run_name = f"{run_config['model_s3_loc']}_{run_config['dataset']}"
ir = model_inference.InferenceRun(run_name, run_config)

# Create the prompts and save to the Data folder
prompt_set = ir.create_all_prompts(force_refresh=True, is_save_prompts=True)

Requesting all datasets...
Saving data...


In [7]:
# Run the multi-gpu model with notebook_launcher
notebook_launcher(ir.run_multi_gpu, num_processes=torch.cuda.device_count())

Launching training on 4 GPUs.
llama
llama
llama
llama


Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

Waiting...


Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

Waiting...
Waiting...


  0%|          | 0/887 [00:00<?, ?it/s]

Memory footprint: 6.4 GB
Waiting...
starting backtest...
starting backtest...
starting backtest...
starting backtest...


 92%|█████████▏| 812/887 [53:59<04:41,  3.75s/it]  

Finished run...


 95%|█████████▍| 840/887 [56:09<03:06,  3.98s/it]

Finished run...


 98%|█████████▊| 868/887 [58:12<01:21,  4.30s/it]

Finished run...


888it [59:41,  4.45s/it]                         

Finished run...
Gathered results...Gathered results...Gathered results...Gathered results...



Finished run in 0:59:41.735791
Called Save run
called log
Saved bclarke16/tmp/fs/logs/Results_2025-03-30 10:36:38.332729.json
Run Completed!


888it [59:42,  4.03s/it]


### Run 3: DeepSeek R1 Qwen 7B Base

In [15]:
quant_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_use_double_quant=True,
    bnb_4bit_compute_dtype=torch.bfloat16,
    bnb_4bit_quant_type="nf4"

)

model_helper = mh.ModelHelper('tmp/fs')
model_helper.get_model_and_save('deepseek-ai/DeepSeek-R1-Distill-Qwen-7B',
                                'deepseek7B', 
                                'Data',
                                True,
                                quant_config)

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

None


In [5]:
run_config = {
    'model_hf_id': 'deepseek-ai/DeepSeek-R1-Distill-Qwen-7B',
    'model_s3_loc': 'deepseek7B',
    'model_reload': False,
    'model_quant': None,
    'system_prompt': prompts.SYSTEM_PROMPTS['BASE_EARN'],
    'multi-gpu':True,
    'dataset': 'data_quarterly_pit_indu_blended_base',
    'data_location': 'data_quarterly_pit_indu_refresh_blended.json'
}

In [6]:
run_name = f"{run_config['model_s3_loc']}_{run_config['dataset']}"
ir = model_inference.InferenceRun(run_name, run_config)

# Create the prompts and save to the Data folder
prompt_set = ir.create_all_prompts(force_refresh=True, is_save_prompts=True)

Requesting all datasets...
Saving data...


In [7]:
# Run the multi-gpu model with notebook_launcher
notebook_launcher(ir.run_multi_gpu, num_processes=torch.cuda.device_count())

Launching training on 8 GPUs.
deepseek7B
deepseek7B
deepseek7B
deepseek7B
deepseek7B
deepseek7B
deepseek7Bdeepseek7B



Sliding Window Attention is enabled but not implemented for `sdpa`; unexpected results may be encountered.
Sliding Window Attention is enabled but not implemented for `sdpa`; unexpected results may be encountered.


Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

Sliding Window Attention is enabled but not implemented for `sdpa`; unexpected results may be encountered.
Sliding Window Attention is enabled but not implemented for `sdpa`; unexpected results may be encountered.
Sliding Window Attention is enabled but not implemented for `sdpa`; unexpected results may be encountered.
Sliding Window Attention is enabled but not implemented for `sdpa`; unexpected results may be encountered.
Sliding Window Attention is enabled but not implemented for `sdpa`; unexpected results may be encountered.
Sliding Window Attention is enabled but not implemented for `sdpa`; unexpected results may be encountered.


Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

tokenizer_config.json:   0%|          | 0.00/3.07k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/7.03M [00:00<?, ?B/s]

  0%|          | 0/887 [00:00<?, ?it/s]

Memory footprint: 5.4 GB
starting backtest...starting backtest...starting backtest...


starting backtest...starting backtest...
starting backtest...

starting backtest...
starting backtest...


888it [2:17:03,  9.49s/it]                           

Finished run in 2:17:37.334527
Called Save run
called log
Saved bclarke16/tmp/fs/logs/Results_2025-03-30 17:44:18.481428.json
Run Completed!


888it [2:17:37,  9.30s/it]


### Run 4: DeepSeek 7B COT

In [5]:
run_config = {
    'model_hf_id': 'deepseek-ai/DeepSeek-R1-Distill-Qwen-7B',
    'model_s3_loc': 'deepseek7B',
    'model_reload': False,
    'model_quant': None,
    'system_prompt': prompts.SYSTEM_PROMPTS['COT_EARN'],
    'multi-gpu':True,
    'dataset': 'data_quarterly_pit_indu_blended_base',
    'data_location': 'data_quarterly_pit_indu_refresh_blended.json'
}

In [6]:
run_name = f"{run_config['model_s3_loc']}_{run_config['dataset']}"
ir = model_inference.InferenceRun(run_name, run_config)

# Create the prompts and save to the Data folder
prompt_set = ir.create_all_prompts(force_refresh=True, is_save_prompts=True)

Requesting all datasets...
Saving data...


In [7]:
notebook_launcher(ir.run_multi_gpu, num_processes=torch.cuda.device_count())

Launching training on 8 GPUs.
deepseek7B
deepseek7B
deepseek7B
deepseek7B
deepseek7B
deepseek7B
deepseek7B
deepseek7B


Sliding Window Attention is enabled but not implemented for `sdpa`; unexpected results may be encountered.
Sliding Window Attention is enabled but not implemented for `sdpa`; unexpected results may be encountered.
Sliding Window Attention is enabled but not implemented for `sdpa`; unexpected results may be encountered.
Sliding Window Attention is enabled but not implemented for `sdpa`; unexpected results may be encountered.
Sliding Window Attention is enabled but not implemented for `sdpa`; unexpected results may be encountered.
Sliding Window Attention is enabled but not implemented for `sdpa`; unexpected results may be encountered.
Sliding Window Attention is enabled but not implemented for `sdpa`; unexpected results may be encountered.
Sliding Window Attention is enabled but not implemented for `sdpa`; unexpected results may be encountered.


Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

  0%|          | 0/887 [00:00<?, ?it/s]

Memory footprint: 5.4 GB
starting backtest...starting backtest...starting backtest...
starting backtest...

starting backtest...

starting backtest...
starting backtest...
starting backtest...


888it [2:26:10,  9.47s/it]                           

Finished run in 2:30:12.540523
Called Save run
called log
Saved bclarke16/tmp/fs/logs/Results_2025-03-30 20:11:39.828629.json
Run Completed!


888it [2:30:13, 10.15s/it]


### Qwen 2.5 - 3B Instruct

In [7]:
quant_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_use_double_quant=True,
    bnb_4bit_compute_dtype=torch.bfloat16,
    bnb_4bit_quant_type="nf4"

)

model_helper = mh.ModelHelper('tmp/fs')
model_helper.get_model_and_save('Qwen/Qwen2.5-3B-Instruct',
                                'qwen3b', 
                                'Data',
                                True,
                                quant_config)

config.json:   0%|          | 0.00/661 [00:00<?, ?B/s]

model.safetensors.index.json:   0%|          | 0.00/35.6k [00:00<?, ?B/s]

Downloading shards:   0%|          | 0/2 [00:00<?, ?it/s]

model-00001-of-00002.safetensors:   0%|          | 0.00/3.97G [00:00<?, ?B/s]

model-00002-of-00002.safetensors:   0%|          | 0.00/2.20G [00:00<?, ?B/s]

Sliding Window Attention is enabled but not implemented for `sdpa`; unexpected results may be encountered.


Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

generation_config.json:   0%|          | 0.00/242 [00:00<?, ?B/s]

None


In [6]:
run_config = {
    'model_hf_id': 'Qwen/Qwen2.5-3B-Instruct',
    'model_s3_loc': 'qwen3b',
    'model_reload': False,
    'model_quant': None,
    'system_prompt': prompts.SYSTEM_PROMPTS['BASE_EARN'],
    'multi-gpu':True,
    'dataset': 'data_quarterly_pit_indu_blended_base',
    'data_location': 'data_quarterly_pit_indu_refresh_blended.json'
}

In [7]:
run_name = f"{run_config['model_s3_loc']}_{run_config['dataset']}"
ir = model_inference.InferenceRun(run_name, run_config)

# Create the prompts and save to the Data folder
prompt_set = ir.create_all_prompts(force_refresh=True, is_save_prompts=True)

Requesting all datasets...
Saving data...


In [10]:
notebook_launcher(ir.run_multi_gpu, num_processes=torch.cuda.device_count())

Launching training on 4 GPUs.
qwen3b
qwen3b
qwen3b
qwen3b


Sliding Window Attention is enabled but not implemented for `sdpa`; unexpected results may be encountered.
Sliding Window Attention is enabled but not implemented for `sdpa`; unexpected results may be encountered.
Sliding Window Attention is enabled but not implemented for `sdpa`; unexpected results may be encountered.
Sliding Window Attention is enabled but not implemented for `sdpa`; unexpected results may be encountered.


tokenizer_config.json:   0%|          | 0.00/7.30k [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/2.78M [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/1.67M [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/7.03M [00:00<?, ?B/s]

  0%|          | 0/887 [00:00<?, ?it/s]

Memory footprint: 2.0 GB
starting backtest...starting backtest...starting backtest...


starting backtest...


888it [15:51,  1.07s/it]                         

Finished run in 0:15:56.753014
Called Save run
called log
Saved bclarke16/tmp/fs/logs/Results_2025-04-03 19:46:42.048460.json
Run Completed!


888it [15:57,  1.08s/it]


In [11]:
run_config = {
    'model_hf_id': 'Qwen/Qwen2.5-3B-Instruct',
    'model_s3_loc': 'qwen3b',
    'model_reload': False,
    'model_quant': None,
    'system_prompt': prompts.SYSTEM_PROMPTS['COT_EARN'],
    'multi-gpu':True,
    'dataset': 'data_quarterly_pit_indu_blended_base',
    'data_location': 'data_quarterly_pit_indu_refresh_blended.json'
}

In [12]:
run_name = f"{run_config['model_s3_loc']}_{run_config['dataset']}"
ir = model_inference.InferenceRun(run_name, run_config)

# Create the prompts and save to the Data folder
prompt_set = ir.create_all_prompts(force_refresh=True, is_save_prompts=True)

Requesting all datasets...
Saving data...


In [13]:
notebook_launcher(ir.run_multi_gpu, num_processes=torch.cuda.device_count())

Launching training on 4 GPUs.
qwen3b
qwen3b
qwen3b
qwen3b


Sliding Window Attention is enabled but not implemented for `sdpa`; unexpected results may be encountered.
Sliding Window Attention is enabled but not implemented for `sdpa`; unexpected results may be encountered.
Sliding Window Attention is enabled but not implemented for `sdpa`; unexpected results may be encountered.
Sliding Window Attention is enabled but not implemented for `sdpa`; unexpected results may be encountered.
  0%|          | 0/887 [00:00<?, ?it/s]

Memory footprint: 2.0 GB
starting backtest...
starting backtest...starting backtest...

starting backtest...


888it [13:42,  1.09it/s]                         

Finished run in 0:13:42.324489
Called Save run
called log
Saved bclarke16/tmp/fs/logs/Results_2025-04-03 20:06:52.680533.json
Run Completed!


888it [13:42,  1.08it/s]


## Run 5 - Fine tuned model

In [19]:
run_config = {
    'model_hf_id': 'Qwen/Qwen2.5-3B-Instruct',
    'model_s3_loc': 'qwen3b',
    'model_reload': False,
    'model_quant': None,
    'system_prompt': prompts.SYSTEM_PROMPTS['BASE_FINE_TUNED'],
    'multi-gpu':False,
    'dataset': 'data_quarterly_pit_indu_blended_base',
    'data_location': 'data_quarterly_pit_indu_refresh_blended.json',
    'fine_tuned_dir': 'fine_tuned2'
}

In [20]:
run_name = f"{run_config['model_s3_loc']}_{run_config['dataset']}_finetuned"
ir = model_inference.InferenceRun(run_name, run_config)

# Create the prompts and save to the Data folder
prompt_set = ir.create_all_prompts(force_refresh=True, is_save_prompts=True)

In [12]:
prompt_set[0]['prompt']

[{'role': 'system',
  'content': 'You are a financial analyst.Use the following income statement, balance sheet to estimate the Basic EPS for the next fiscal period. Use only the data in the prompt. Provide a confidence score for how confident you are of the decision. If you are not confident then lower the confidence score.'},
 {'role': 'user',
  'content': 'Income Statement:                                                        t           t-1           t-2           t-3           t-4           t-5\nitems                                                                                                                          \nRevenue                                      1.387040e+11  1.374120e+11  1.368660e+11  1.363540e+11  1.360970e+11  1.345900e+11\nCost of Revenue                              1.092480e+11  1.077140e+11  1.067900e+11  1.059300e+11  1.053460e+11  1.034980e+11\nGross Profit                                 2.945600e+10  2.969800e+10  3.007600e+10  3.042400e+10  3.07

In [13]:
for prompt in prompt_set:
    prompt['prompt'] += 'The next period EPS is '

In [28]:
from peft import PeftModel
from transformers import AutoTokenizer
import json
from tqdm import tqdm

In [41]:
outputs = []
def run_finetuned_backtest(prompts, fine_tuned_model, tokenizer):
    count = 0
    progress = tqdm(total=len(prompts), position=0, leave=True)
    for prompt in prompts:
        tokens = tokenizer.apply_chat_template(prompt['prompt'], tokenize=False, add_generation_prompt=True )
        model_inputs = tokenizer([tokens], return_tensors='pt').to("cuda")
        generated_ids = fine_tuned_model.generate(**model_inputs, 
                                       pad_token_id=tokenizer.eos_token_id, 
                                       max_new_tokens=50,
                                      temperature=0.001)
        parsed_ids = [
            output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
        ]
        resp = {
            'date': prompt['date'],
            'security': prompt['security'],
            'response': tokenizer.batch_decode(parsed_ids, skip_special_tokens=True)[0]
        }
        outputs.append(resp)
        progress.update()

In [21]:
base_model = ir.load_model_from_storage(run_config['model_s3_loc'])

qwen3b


Sliding Window Attention is enabled but not implemented for `sdpa`; unexpected results may be encountered.


In [25]:
fine_tuned_model = PeftModel.from_pretrained(base_model, 'fine_tuned2')

In [26]:
tokenizer = AutoTokenizer.from_pretrained(run_config['model_hf_id'])

tokenizer_config.json:   0%|          | 0.00/7.30k [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/2.78M [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/1.67M [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/7.03M [00:00<?, ?B/s]

In [42]:
run_finetuned_backtest(prompt_set, fine_tuned_model, tokenizer)

100%|██████████| 887/887 [1:14:34<00:00,  5.05s/it]


In [38]:
outputs[5]

{'date': '2020-04-21',
 'security': 'KO UN Equity',
 'response': '2.233371336592213323174957787778442331355221514729'}

In [20]:
prompt = prompts[0]['prompt']

tokens = tokenizer.apply_chat_template(prompt, tokenize=False, add_generation_prompt=True )
model_inputs = tokenizer([tokens], return_tensors='pt').to("cuda")
generated_ids = fine_tuned_model.generate(**model_inputs, 
                               pad_token_id=tokenizer.eos_token_id, 
                               max_new_tokens=2000,
                              temperature=0.4)
parsed_ids = [
    output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
]
tokenizer.batch_decode(parsed_ids, skip_special_tokens=True)[0]

'{"decision": HOLD, "confidence score": 75, "reason": "The company\'s earnings have been stable but slightly declining, while revenue growth has slowed. Gross profit margin has remained relatively constant, but operating income has decreased. The stock price has shown some volatility, but the overall trend suggests a neutral position."}'

In [17]:

run_config = {
    'model_hf_id': 'deepseek-ai/DeepSeek-R1-Distill-Qwen-14B',
    'model_s3_loc': 'deepseek14Q',
    'model_reload': False,
    'model_quant': None,
    'system_prompt': prompts.SYSTEM_PROMPTS['CoT'],
    'multi-gpu':True,
    'dataset': 'data_annual_pit_spx',
    'data_location': 'data_annual_pit_spx.json'
}

run_config = {
    'model_hf_id': 'Qwen/Qwen2.5-7B-Instruct',
    'model_s3_loc': 'qwen',
    'model_reload': False,
    'model_quant': None,
    'system_prompt': prompts.SYSTEM_PROMPTS['BASE'],
    'multi-gpu':True,
    'dataset': 'data_quarterly_pit_indu',
    'data_location': 'data_quarterly_pit_indu.json'
}

# RUN 1
run_config = {
    'model_hf_id': 'meta-llama/Llama-3.2-3B-Instruct',
    'model_s3_loc': 'llama',
    'model_reload': False,
    'model_quant': None,
    'system_prompt': prompts.SYSTEM_PROMPTS['BASE'],
    'multi-gpu':True,
    'dataset': 'data_quarterly_pit_indu',
    'data_location': 'data_quarterly_pit_indu.json'
}

# RUN 2
run_config = {
    'model_hf_id': 'meta-llama/Llama-3.2-3B-Instruct',
    'model_s3_loc': 'llama',
    'model_reload': False,
    'model_quant': None,
    'system_prompt': prompts.SYSTEM_PROMPTS['CoT'],
    'multi-gpu':True,
    'dataset': 'data_quarterly_pit_indu',
    'data_location': 'data_quarterly_pit_indu.json'
}

# RUN 3
run_config = {
    'model_hf_id': 'deepseek-ai/DeepSeek-R1-Distill-Qwen-14B',
    'model_s3_loc': 'deepseek14Q',
    'model_reload': False,
    'model_quant': None,
    'system_prompt': prompts.SYSTEM_PROMPTS['BASE'],
    'multi-gpu':True,
    'dataset': 'data_quarterly_pit_indu',
    'data_location': 'data_quarterly_pit_indu.json'
}

# RUN 4
run_config = {
    'model_hf_id': 'deepseek-ai/DeepSeek-R1-Distill-Qwen-14B',
    'model_s3_loc': 'deepseek14Q',
    'model_reload': False,
    'model_quant': None,
    'system_prompt': prompts.SYSTEM_PROMPTS['CoT'],
    'multi-gpu':True,
    'dataset': 'data_quarterly_pit_indu',
    'data_location': 'data_quarterly_pit_indu.json'
}

# RUN 5 - failed
run_config = {
    'model_hf_id': 'deepseek-ai/DeepSeek-R1-Distill-Qwen-32B',
    'model_s3_loc': 'deepseek32',
    'model_reload': False,
    'model_quant': None,
    'system_prompt': prompts.SYSTEM_PROMPTS['CoT'],
    'multi-gpu':True,
    'dataset': 'data_quarterly_pit_indu',
    'data_location': 'data_quarterly_pit_indu.json'
}

# Run 6 - failed
run_config = {
    'model_hf_id': 'Qwen/Qwen2.5-7B-Instruct',
    'model_s3_loc': 'qwen',
    'model_reload': False,
    'model_quant': None,
    'system_prompt': prompts.SYSTEM_PROMPTS['CoTDetailed'],
    'multi-gpu':True,
    'dataset': 'data_quarterly_pit_indu',
    'data_location': 'data_quarterly_pit_indu.json'
}

run_config = {
    'model_hf_id': 'meta-llama/Llama-3.2-3B-Instruct',
    'model_s3_loc': 'llama',
    'model_reload': False,
    'model_quant': None,
    'system_prompt': prompts.SYSTEM_PROMPTS['BASE_EARN'],
    'multi-gpu':True,
    'dataset': 'data_quarterly_pit_indu_refresh_v2',
    'data_location': 'data_quarterly_pit_indu_refresh_v2.json'
}

# run_config = {
#     'model_hf_id': 'meta-llama/Llama-3.2-3B-Instruct',
#     'model_s3_loc': 'llama',
#     'model_reload': False,
#     'model_quant': None,
#     'system_prompt': prompts.SYSTEM_PROMPTS['COT_EARN'],
#     'multi-gpu':True,
#     'dataset': 'data_quarterly_pit_indu',
#     'data_location': 'data_quarterly_pit_indu.json'
# }



run_name = f"{run_config['model_s3_loc']}_{run_config['dataset']}"

In [18]:
ir = model_inference.InferenceRun(run_name, run_config)

In [19]:
# Create the prompts and save to the Data folder
prompts = ir.create_all_prompts(True)

Requesting all datasets...
Saving data...


In [20]:
# Run the multi-gpu model with notebook_launcher
notebook_launcher(ir.run_multi_gpu, num_processes=torch.cuda.device_count())

Launching training on 4 GPUs.
llama
llama
llama
llama


Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

tokenizer_config.json:   0%|          | 0.00/54.5k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/9.09M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/296 [00:00<?, ?B/s]

  0%|          | 0/891 [00:00<?, ?it/s]

Memory footprint: 6.4 GB
Waiting...
Waiting...
Waiting...
Waiting...
starting backtest...
starting backtest...
starting backtest...
starting backtest...


 99%|█████████▉| 884/891 [27:05<00:12,  1.79s/it]

Finished run...


100%|█████████▉| 888/891 [27:10<00:04,  1.60s/it]

Finished run...


892it [27:19,  1.82s/it]                         

Finished run...
Finished run...
Gathered results...Gathered results...Gathered results...Gathered results...



Finished run in 0:27:51.318931
Called Save run
called log
Saved bclarke16/tmp/fs/logs/Results_2025-03-30 09:15:48.238865.json
Run Completed!


892it [27:51,  1.87s/it]


In [11]:
ir = model_inference.InferenceRun(run_name, run_config)

In [12]:
p1 = ir.create_all_prompts(True)

Requesting all datasets...
Saving data...


In [11]:
len(p1)

896

In [12]:
model = ir.load_model_from_storage(ir.model_s3_loc)

deepseek32


Unused kwargs: ['_load_in_4bit', '_load_in_8bit', 'quant_method']. These kwargs are not used in <class 'transformers.utils.quantization_config.BitsAndBytesConfig'>.


Loading checkpoint shards:   0%|          | 0/4 [00:00<?, ?it/s]

In [23]:
from transformers import AutoTokenizer

In [24]:
tokenizer = AutoTokenizer.from_pretrained(ir.model_hf_id)

In [28]:
output_result = ir.run_model(p1[0]['prompt'],tokenizer,model)

running model...


In [8]:
output_result = {"date": "2020-02-06", "security": "MMM UN Equity", "response": "Here is the JSON response:\n\n```json\n{\n  \"decision\": \"BUY\",\n  \"confidence score\": 80,\n  \"reason\": \"Gross profit and EPS have increased over time, indicating a strong financial performance\"\n}\n```\n\nI have computed the following financial ratios:\n\n1. Gross Margin: \n   - 2020: 3.786000e+09 / 8.111000e+09 = 0.466\n   - 2019: 3.803000e+09 / 7.991000e+09 = 0.477\n   - 2018: 3.858000e+09 / 8.171000e+09 = 0.471\n   - 2017: 3.553000e+09 / 7.863000e+09 = 0.453\n   - 2016: 3.885000e+09 / 7.945000e+09 = 0.492\n\nThe gross margin has been increasing over time, indicating a strong financial performance.\n\n2. EPS:\n   - 2020: 9.690000e+08 / 1.012600e+10 = 0.953\n   - 2019: 1.583000e+09 / 1.076400e+10 = 1.465\n   - 2018: 1.127000e+09 / 1.014200e+10 = 1.117\n   - 2017: 1.347000e+09 / 9.848000e+09 = 0.137\n   - 2016: 1.543000e+09 / 9.848000e+09 = 0.156\n\nThe EPS has been increasing over time, indicating a strong financial performance.\n\n3. Current Ratio:\n   - 2020: 2.441000e+09 / 9.222000e+09 = 0.264\n   - 2019: 1.588000e+09 / 7.821000e+09 = 0.201\n   - 2018: 1.131000e+09 / 7.265000e+09 = 0.157\n   - 2017: 8.930000e+08 / 7.244000e+09 = 0.123\n   - 2016: 8.910000e+08 / 5.020000e+09 = 0.177\n\nThe current ratio has been increasing over time, indicating a strong financial performance.\n\nBased on these financial ratios, the company has been performing well financially and has a strong track record of increasing gross profit and EPS over time. Therefore, I recommend a BUY decision with a confidence score of 80."}

In [10]:
output_result['response']

'Here is the JSON response:\n\n```json\n{\n  "decision": "BUY",\n  "confidence score": 80,\n  "reason": "Gross profit and EPS have increased over time, indicating a strong financial performance"\n}\n```\n\nI have computed the following financial ratios:\n\n1. Gross Margin: \n   - 2020: 3.786000e+09 / 8.111000e+09 = 0.466\n   - 2019: 3.803000e+09 / 7.991000e+09 = 0.477\n   - 2018: 3.858000e+09 / 8.171000e+09 = 0.471\n   - 2017: 3.553000e+09 / 7.863000e+09 = 0.453\n   - 2016: 3.885000e+09 / 7.945000e+09 = 0.492\n\nThe gross margin has been increasing over time, indicating a strong financial performance.\n\n2. EPS:\n   - 2020: 9.690000e+08 / 1.012600e+10 = 0.953\n   - 2019: 1.583000e+09 / 1.076400e+10 = 1.465\n   - 2018: 1.127000e+09 / 1.014200e+10 = 1.117\n   - 2017: 1.347000e+09 / 9.848000e+09 = 0.137\n   - 2016: 1.543000e+09 / 9.848000e+09 = 0.156\n\nThe EPS has been increasing over time, indicating a strong financial performance.\n\n3. Current Ratio:\n   - 2020: 2.441000e+09 / 9.22200

In [11]:
import json

In [20]:
def format_json(llm_output):
    # remove all the broken lines
    form = llm_output.replace('\n','')
    # Find the start and end of the JSON input
    #try:
    soj = form.find('```json')
    eoj = form.find('}```')

    if eoj == -1:
        eoj = len(llm_output)
        llm_output = llm_output + '}```'
    # Pull out the additional context
    additional = form[:soj]
    additional += form[eoj + 4:]
    json_obj = json.loads(form[soj + 7:eoj + 1])
    json_obj['AdditionalContext'] = additional
    return json_obj
    #except:
    #    return llm_output

In [21]:
format_json(output_result['response'])

{  "decision": "BUY",  "confidence score": 80,  "reason": "Gross profit and EPS have increased over time, indicating a strong financial performance"}


{'decision': 'BUY',
 'confidence score': 80,
 'reason': 'Gross profit and EPS have increased over time, indicating a strong financial performance',
 'AdditionalContext': 'Here is the JSON response:I have computed the following financial ratios:1. Gross Margin:    - 2020: 3.786000e+09 / 8.111000e+09 = 0.466   - 2019: 3.803000e+09 / 7.991000e+09 = 0.477   - 2018: 3.858000e+09 / 8.171000e+09 = 0.471   - 2017: 3.553000e+09 / 7.863000e+09 = 0.453   - 2016: 3.885000e+09 / 7.945000e+09 = 0.492The gross margin has been increasing over time, indicating a strong financial performance.2. EPS:   - 2020: 9.690000e+08 / 1.012600e+10 = 0.953   - 2019: 1.583000e+09 / 1.076400e+10 = 1.465   - 2018: 1.127000e+09 / 1.014200e+10 = 1.117   - 2017: 1.347000e+09 / 9.848000e+09 = 0.137   - 2016: 1.543000e+09 / 9.848000e+09 = 0.156The EPS has been increasing over time, indicating a strong financial performance.3. Current Ratio:   - 2020: 2.441000e+09 / 9.222000e+09 = 0.264   - 2019: 1.588000e+09 / 7.821000e+09

In [14]:
ir.save_run({'test':'1234'})

Saved s3://awmgd-prod-finml-sandbox-user/bclarke16/tmp/fs/logs/results - llama - data_quarterly_pit_indu
Run Completed!


In [22]:
import utils.model_helper as mh

In [23]:
helper = mh.ModelHelper('tmp/fs')

In [24]:
helper.clear_folder('qwen3b')