# Open Source Models

In this notebook we use the models_helper to request all of the models from Huggingface that we will be using as part of the project. This saves the models in Bloomberg Lab S3 storage and makes requesting of them easier during the multi-GPU inference tasks. It means that they do not have to be downloaded by each GPU.

The models we will access are:

- Llama 3.2 3B - meta-llama/Llama-3.2-3B-Instruct
- DeepSeek R1 Qwen 7B - deepseek-ai/DeepSeek-R1-Distill-Qwen-7B
- DeepSeek R1 Qwen 14B - deepseek-ai/DeepSeek-R1-Distill-Qwen-14B
- Qwen 2.5 3B - Qwen/Qwen2.5-3B-Instruct
- Qwen 2.5 7B - Qwen/Qwen2.5-7B-Instruct

We also tried to run: 
- DeepSeek R1 Qwen 32B - deepseek-ai/DeepSeek-R1-Distill-Qwen-32B

In [7]:
from models.model_helper import ModelHelper
from huggingface_hub import login
from transformers import BitsAndBytesConfig

import torch

In [2]:
model_requester = ModelHelper('tmp/fs')

In [3]:
# Log into Huggingface with login credentials. .env can be used as well
with open('pass.txt') as p:
    hf_login = p.read()
    
hf_login = hf_login[hf_login.find('=')+1:hf_login.find('\n')]
login(hf_login, add_to_git_credential=False)

### Llama 3.2

In [4]:
# request the model and save in temp folder under name llama to retrieve later
model_requester.get_model_and_save(model_id='meta-llama/Llama-3.2-3B-Instruct', 
                                   model_name='llama', model_tmp_location='/tmp/',
                                   use_quantization=False)

config.json:   0%|          | 0.00/878 [00:00<?, ?B/s]

model.safetensors.index.json:   0%|          | 0.00/20.9k [00:00<?, ?B/s]

Downloading shards:   0%|          | 0/2 [00:00<?, ?it/s]

model-00001-of-00002.safetensors:   0%|          | 0.00/4.97G [00:00<?, ?B/s]

model-00002-of-00002.safetensors:   0%|          | 0.00/1.46G [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

generation_config.json:   0%|          | 0.00/189 [00:00<?, ?B/s]

Uploaded: /tmp//llama


### DeepSeek R1 Qwen 7B

In [8]:
# Use Quantization to reduce memory size
quant_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_use_double_quant=True,
    bnb_4bit_compute_dtype=torch.bfloat16,
    bnb_4bit_quant_type="nf4"

)

# request the model and save in temp folder 
model_requester.get_model_and_save(model_id='deepseek-ai/DeepSeek-R1-Distill-Qwen-7B', 
                                   model_name='deepseek7B', model_tmp_location='/tmp/',
                                   use_quantization=True,
                                  quant_config=quant_config)

config.json:   0%|          | 0.00/680 [00:00<?, ?B/s]

model.safetensors.index.json:   0%|          | 0.00/28.1k [00:00<?, ?B/s]

Downloading shards:   0%|          | 0/2 [00:00<?, ?it/s]

model-00001-of-000002.safetensors:   0%|          | 0.00/8.61G [00:00<?, ?B/s]

model-00002-of-000002.safetensors:   0%|          | 0.00/6.62G [00:00<?, ?B/s]

Sliding Window Attention is enabled but not implemented for `sdpa`; unexpected results may be encountered.


Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

generation_config.json:   0%|          | 0.00/181 [00:00<?, ?B/s]

Uploaded: /tmp//deepseek7B


### DeepSeek R1 Qwen 14B

In [None]:
quant_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_use_double_quant=True,
    bnb_4bit_compute_dtype=torch.bfloat16,
    bnb_4bit_quant_type="nf4"

)

# request the model and save in temp folder 
model_requester.get_model_and_save(model_id='deepseek-ai/DeepSeek-R1-Distill-Qwen-14B', 
                                   model_name='deepseek14Q', model_tmp_location='/tmp/',
                                   use_quantization=True,
                                  quant_config=quant_config)

### Qwen 2.5 3B

In [None]:
quant_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_use_double_quant=True,
    bnb_4bit_compute_dtype=torch.bfloat16,
    bnb_4bit_quant_type="nf4"

)

# request the model and save in temp folder 
model_requester.get_model_and_save(model_id='Qwen/Qwen2.5-3B-Instruct', 
                                   model_name='qwen3b', model_tmp_location='/tmp/',
                                   use_quantization=True,
                                  quant_config)

### Qwen 2.5 7B

In [None]:
# request the model and save in temp folder 
model_requester.get_model_and_save(model_id='Qwen/Qwen2.5-7B-Instruct', 
                                   model_name='qwen', model_tmp_location='/tmp/',
                                   use_quantization=False)