In [1]:
!pip install unsloth # install unsloth
!pip install --force-reinstall --no-cache-dir --no-deps git+https://github.com/unslothai/unsloth.git
!pip install torch transformers accelerate datasets

Collecting unsloth
  Downloading unsloth-2025.1.8-py3-none-any.whl.metadata (53 kB)
[2K     [90m‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ[0m [32m53.9/53.9 kB[0m [31m1.8 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting unsloth_zoo>=2025.1.4 (from unsloth)
  Downloading unsloth_zoo-2025.1.5-py3-none-any.whl.metadata (16 kB)
Collecting xformers>=0.0.27.post2 (from unsloth)
  Downloading xformers-0.0.29.post2-cp310-cp310-manylinux_2_28_x86_64.whl.metadata (1.0 kB)
Collecting bitsandbytes (from unsloth)
  Downloading bitsandbytes-0.45.1-py3-none-manylinux_2_24_x86_64.whl.metadata (5.8 kB)
Collecting triton>=3.0.0 (from unsloth)
  Downloading triton-3.2.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (1.4 kB)
Collecting tyro (from unsloth)
  Downloading tyro-0.9.13-py3-none-any.whl.metadata (9.4 kB)
Collecting transformers!=4.47.0,>=4.46.1 (from unsloth)
  Downloading transformers-4.48.2-

## Import all revelant packages

In [2]:
# Modules for fine-tuning
from unsloth import FastLanguageModel
import torch                               # Pytorch
from trl import SFTTrainer                 # Trainer for supervised finetuing (SFT)
from unsloth import is_bfloat16_supported  # Checks if hardware supports bfloat16 precision
# Hugging face modules
from huggingface_hub import login          # Hugging face login
from transformers import TrainingArguments # Defines training hyperparameters
from datasets import load_dataset          # Lets you load fine-tuning datasets
# Import weights and biases
import wandb
# Import Kaggle secrets
from kaggle_secrets import UserSecretsClient

ü¶• Unsloth: Will patch your computer to enable 2x faster free finetuning.
ü¶• Unsloth Zoo will now patch everything to make training faster!


## Create API keys and login to hugging face and weights and biases

In [3]:
user_secrets = UserSecretsClient()
hugging_face_token = user_secrets.get_secret("hf_token_read")
wnb_token = user_secrets.get_secret("wandb")

login(hugging_face_token) # Login to Hugging Face
wandb.login(key=wnb_token) # Login to Weights and Biases
run = wandb.init(
    project="Fine-tuning-v3",
    job_type="training",
    anonymous="allow"
)

[34m[1mwandb[0m: Using wandb-core as the SDK backend.  Please refer to https://wandb.me/wandb-core for more information.
[34m[1mwandb[0m: Currently logged in as: [33mabhi9ab[0m ([33mabhi9ab-rv-institute-of-technology-and-management[0m). Use [1m`wandb login --relogin`[0m to force relogin
[34m[1mwandb[0m: Appending key for api.wandb.ai to your netrc file: /root/.netrc


## Loading Deepseek R1 and Tokenizer

In [4]:
# Set parameters
max_seq_length = 2048 # Define the maximum sequence length a model can handle (i.e how many tokens can be processed at once)
dtype = None          # Set to default datatype
load_in_4bit = True   # Enable 4-bit quantization - a memory saving optimization

# Load Deepseek R1 model and tokenizer using unsloth
model, tokenizer = FastLanguageModel.from_pretrained(
    model_name = "unsloth/DeepSeek-R1-Distill-Llama-8B",
    max_seq_length = max_seq_length,
    dtype = dtype,
    load_in_4bit = load_in_4bit,
    token = hugging_face_token
)

==((====))==  Unsloth 2025.1.8: Fast Llama patching. Transformers: 4.48.2.
   \\   /|    GPU: Tesla T4. Max memory: 14.741 GB. Platform: Linux.
O^O/ \_/ \    Torch: 2.5.1+cu121. CUDA: 7.5. CUDA Toolkit: 12.1. Triton: 3.1.0
\        /    Bfloat16 = FALSE. FA [Xformers = 0.0.29.post1. FA2 = False]
 "-____-"     Free Apache license: http://github.com/unslothai/unsloth
Unsloth: Fast downloading is enabled - ignore downloading bars which are red colored!


model.safetensors:   0%|          | 0.00/5.96G [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/236 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/52.9k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/483 [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/17.2M [00:00<?, ?B/s]

## Testing Deepseek R1 on a finance use-case before tuning

### Define a system prompt

In [5]:
prompt_style = """Below is an instruction that describes a task, paired with an input that provides further context.
Write a response that appropriately completes the request.
Before answering, think carefully about the question and create a step-by-step chain of thoughts to ensure a logical and accurate response.

### Instruction:
You are a GenAI-based financial assistant designed to help users with basic and advanced questions about investing. Your goal is to provide clear, accurate, and actionable information to improve financial literacy and guide users towards informed investment decisions. You should also help users discover suitable investment products based on their financial needs and goals.
Please answer the following finance-related question.

### Question:
{}

### Response:
<think>{}"""

In [6]:
# Creating a question
question = """I‚Äôm 35 years old, earn ‚Çπ1.5 lakh per month, and have ‚Çπ10 lakh in savings. 
              I want to invest for my child‚Äôs education (expected cost ‚Çπ50 lakh in 10 years) and my retirement (planned at age 60).
              I‚Äôm willing to take moderate risk. Can you suggest a diversified investment plan to achieve these goals?"""

# Enable optimized inference mode for Unsloth models (improves speed and efficiency)
FastLanguageModel.for_inference(model)

# Format the question using the structured prompt and tokenize it
inputs = tokenizer([prompt_style.format(question, "")], return_tensors = "pt").to("cuda")

# Generate a response using the model
outputs = model.generate(
    input_ids = inputs.input_ids,
    attention_mask = inputs.attention_mask,
    max_new_tokens = 1200,
    use_cache = True
)

# Decode the generated output tokens into human-readable text
response = tokenizer.batch_decode(outputs)

In [7]:
# Extract and print only the relevant response part (after "### Response:")
print(response[0].split("### Response")[1])

:
<think>
Okay, so I'm trying to help this user create a diversified investment plan. They're 35, earn ‚Çπ1.5 lakh a month, and have ‚Çπ10 lakh in savings. Their goals are to fund their child's education, which is expected to cost ‚Çπ50 lakh in 10 years, and also plan for retirement at age 60. They're willing to take moderate risk.

First, I need to figure out how much they need to save for each goal. For the child's education, the target is ‚Çπ50 lakh in 10 years. Assuming they don't contribute anything else, they need to save about ‚Çπ5,000 a month, which seems doable given their salary. But since they have ‚Çπ10 lakh, maybe they can contribute more or use some of it for the education fund.

For retirement, at age 60, they'll need a corpus of around ‚Çπ3.5 lakh per year, considering inflation and a comfortable lifestyle. To build this, over 25 years, they need to save roughly ‚Çπ14,000 a month, which is a significant amount. Given their current savings, they might need to adjust thei

## Fine-tuning step by step

### Step 1 - Update the system prompt

In [8]:
# Update training prompt style to add </think> tag
train_prompt_style = """Below is an instruction that describes a task, paired with an input that provides further context.
Write a response that appropriately completes the request.
Before answering, think carefully about the question and create a step-by-step chain of thoughts to ensure a logical and accurate response.

### Instruction:
You are a GenAI-based financial assistant designed to help users with basic and advanced questions about investing. Your goal is to provide clear, accurate, and actionable information to improve financial literacy and guide users towards informed investment decisions. You should also help users discover suitable investment products based on their financial needs and goals.
Please answer the following finance-related question.

### Question:
{}

### Response:
<think>
Let me break this down step by step:
1. Understand the key concepts being asked about
2. Identify the main differences or important points
3. Structure a clear and concise response
</think>
{}
"""

### Step 2 - Download the fine-tuning dataset and format it for fune-tuning

We will be using the Finance-Instruct-500k found on [here](https://huggingface.co/datasets/Josephgflowers/Finance-Instruct-500k).

Example entry:
```bash
{
  "system": "You are a financial assistant. Answer the user's question accurately but keep it brief.",
  "user": "What is the difference between stocks and bonds?",
  "assistant": "Stocks represent ownership in a company, while bonds are a form of debt where the investor lends money to an entity."
}
```

In [9]:
# Download the dataset using hugging face
dataset = load_dataset("Josephgflowers/Finance-Instruct-500k", split = "train[0:5000]", trust_remote_code = True)
dataset

README.md:   0%|          | 0.00/10.4k [00:00<?, ?B/s]

train.json:   0%|          | 0.00/580M [00:00<?, ?B/s]

Generating train split:   0%|          | 0/518185 [00:00<?, ? examples/s]

Dataset({
    features: ['system', 'user', 'assistant'],
    num_rows: 5000
})

In [10]:
# Show an entry from the dataset
dataset[1]

{'system': '\n',
 'user': 'Explain the classical economic theory and its policy implications.',
 'assistant': 'The classical economic theory refers to the economic thought of the 18th and 19th centuries associated with thinkers like Adam Smith, David Ricardo, and Thomas Malthus. Some of the key ideas of this school of thought are:\n\n‚Ä¢ Laissez-faire approach: Classical economists believed in minimal government intervention in the economy. They argued that the free market, left to its own devices, would lead to the most efficient allocation of resources through the mechanism of supply and demand.\n\n‚Ä¢ Self-interest and "invisible hand": Adam Smith proposed that when individuals pursue their own self-interest, they are led "as if by an invisible hand" to promote the general good of society. This provides a justification for the laissez-faire approach.\n\n‚Ä¢ Wage-fund doctrine: Classical economists believed that wages were determined by a "wage fund" consisting of all capital availab

In [11]:
# We need to format the dataset to fit our prompt training style
EOS_TOKEN = tokenizer.eos_token # Prevent the model from continuing beyond the expected response length by adding the EOS_TOKEN.
EOS_TOKEN

'<ÔΩúend‚ñÅof‚ñÅsentenceÔΩú>'

In [12]:
# Define the formatting prompt function
def formatting_prompt_func(examples):
    inputs = examples["user"]
    outputs = examples["assistant"]

    texts = []

    for input, output in zip(inputs, outputs):
        text = train_prompt_style.format(input, output) + EOS_TOKEN
        texts.append(text)

    return {
        "text": texts
    }

In [13]:
# Update dataset formatting
dataset_finetune = dataset.map(formatting_prompt_func, batched=True)
dataset_finetune["text"][0]

Map:   0%|          | 0/5000 [00:00<?, ? examples/s]

"Below is an instruction that describes a task, paired with an input that provides further context.\nWrite a response that appropriately completes the request.\nBefore answering, think carefully about the question and create a step-by-step chain of thoughts to ensure a logical and accurate response.\n\n### Instruction:\nYou are a GenAI-based financial assistant designed to help users with basic and advanced questions about investing. Your goal is to provide clear, accurate, and actionable information to improve financial literacy and guide users towards informed investment decisions. You should also help users discover suitable investment products based on their financial needs and goals.\nPlease answer the following finance-related question.\n\n### Question:\nExplain tradeoffs between fiscal and monetary policy as tools in a nation's economic toolkit. Provide examples of past instances when each were utilized, the economic conditions that led to them being deployed, their intended eff

### Step 3 - Setting up the model using LoRA

An intuitive explanation of LoRA

Large Language Models (LLMs) have **millions or even billions of weights** that determine how they process and generate text. While fine-tuning a model, we usually update all these weights, which requires **massive computational resources and memory.**

LoRA (**Low-Rank Adaptation**) allows to fine-tune efficiently by:

- Instead of modifying all weights, LoRA adds small, trainable adapters to specific layers.
- These adapters capture task-specific knowledge while leaving the original model unchanged.
- This reduces the number of trainable parameters by more than **90%**, making fine-tuning faster and more **memory-efficient**.

In [14]:
# Apply LoRA (Low-Rank Adaptation) fine-tuning to the model
model_lora = FastLanguageModel.get_peft_model(
    model,
    r=16,  
    target_modules=[
        "q_proj",
        "k_proj",
        "v_proj",
        "o_proj",
        "gate_proj",
        "up_proj",
        "down_proj",
    ],
    lora_alpha=16,                         # Higher the number, more weight changes the lora process will do to the layers.
    lora_dropout=0,                        # No Dropout. Dropout means how much information you retain in the weight updating process. Here, we are setting full retaintion of information.
    bias="none",                           # Whether the lora layers should learn bias terms. Essentially, a memory saving technique.
    use_gradient_checkpointing="unsloth",  # True or "unsloth" for very long context. Saves memory by recomputing the activations that we're doing instead of storing them.  
    random_state=3407,                     # For preset ability. This is a random seed.
    use_rslora=False,                      # A Technique of lora called rank stabilized lora.
    loftq_config=None,                     # We disable this because we already have 4-bit quantization.
)

Unsloth 2025.1.8 patched 32 layers with 32 QKV layers, 32 O layers and 32 MLP layers.


In [15]:
# Initialize the fine-tuning trainer
trainer = SFTTrainer(
    model=model_lora,
    tokenizer=tokenizer,
    train_dataset=dataset_finetune,
    dataset_text_field="text",
    max_seq_length=max_seq_length, # 2048
    dataset_num_proc=2, # Number of processors
    args=TrainingArguments(
        per_device_train_batch_size=2,     # Number of examples processed per device CPU at a time.
        gradient_accumulation_steps=4,     # How many steps the gradient needs to accumulate before updating weights.
        num_train_epochs = 1,              # warmup_ratio for full training runs!
        warmup_steps=5,                    # Gradually increase the learning rate for the first five steps.
        # max_steps=60,                      # How many steps are in the training process?
        learning_rate=2e-4,
        fp16=not is_bfloat16_supported(),  # Float point 16. Use fp16 if bf16 not supported.
        bf16=is_bfloat16_supported(),      # In case we have bf16 support.
        logging_steps=10,                  # logs training process every 10 steps.
        optim="adamw_8bit",                # optimizer
        weight_decay=0.01,                 # Allows regularization to prevent overfitting. Prevents the model to overfit on the dataset and become only accurate on training tasks.
        lr_scheduler_type="linear",        # Type of learning rate.
        seed=3407,                         # Random seed for reproducibility.
        output_dir="outputs",
    ),
)

Map (num_proc=2):   0%|          | 0/5000 [00:00<?, ? examples/s]

### Step 4 - Model training  

In [16]:
# Start the fine-tuning process
trainer_stats = trainer.train()

==((====))==  Unsloth - 2x faster free finetuning | Num GPUs = 1
   \\   /|    Num examples = 5,000 | Num Epochs = 1
O^O/ \_/ \    Batch size per device = 2 | Gradient Accumulation steps = 4
\        /    Total batch size = 8 | Total steps = 625
 "-____-"     Number of trainable parameters = 41,943,040


Step,Training Loss
10,1.9946
20,1.1458
30,1.0591
40,0.995
50,0.9805
60,0.9458
70,0.9228
80,0.9545
90,0.9415
100,0.9951


In [17]:
# Save the fine-tuned model
wandb.finish()

0,1
train/epoch,‚ñÅ‚ñÅ‚ñÅ‚ñÅ‚ñÇ‚ñÇ‚ñÇ‚ñÇ‚ñÇ‚ñÇ‚ñÉ‚ñÉ‚ñÉ‚ñÉ‚ñÉ‚ñÑ‚ñÑ‚ñÑ‚ñÑ‚ñÑ‚ñÑ‚ñÖ‚ñÖ‚ñÖ‚ñÖ‚ñÖ‚ñÖ‚ñÖ‚ñÜ‚ñÜ‚ñÜ‚ñÜ‚ñá‚ñá‚ñá‚ñá‚ñá‚ñá‚ñà‚ñà
train/global_step,‚ñÅ‚ñÅ‚ñÅ‚ñÅ‚ñÇ‚ñÇ‚ñÇ‚ñÇ‚ñÉ‚ñÉ‚ñÉ‚ñÉ‚ñÉ‚ñÉ‚ñÉ‚ñÑ‚ñÑ‚ñÑ‚ñÑ‚ñÑ‚ñÖ‚ñÖ‚ñÖ‚ñÖ‚ñÜ‚ñÜ‚ñÜ‚ñÜ‚ñÜ‚ñÜ‚ñá‚ñá‚ñá‚ñá‚ñá‚ñá‚ñà‚ñà‚ñà‚ñà
train/grad_norm,‚ñÜ‚ñÑ‚ñÇ‚ñÖ‚ñÉ‚ñÉ‚ñÅ‚ñÇ‚ñÅ‚ñÇ‚ñÉ‚ñÑ‚ñÑ‚ñÉ‚ñÑ‚ñÖ‚ñÉ‚ñÉ‚ñÉ‚ñÇ‚ñÉ‚ñÅ‚ñÉ‚ñÉ‚ñÇ‚ñÖ‚ñÖ‚ñÑ‚ñÖ‚ñÉ‚ñÖ‚ñÇ‚ñÉ‚ñÑ‚ñÖ‚ñÖ‚ñÑ‚ñá‚ñà‚ñÉ
train/learning_rate,‚ñà‚ñà‚ñà‚ñà‚ñá‚ñá‚ñá‚ñá‚ñá‚ñá‚ñÜ‚ñÜ‚ñÜ‚ñÜ‚ñÜ‚ñÜ‚ñÜ‚ñÖ‚ñÖ‚ñÖ‚ñÖ‚ñÖ‚ñÑ‚ñÑ‚ñÑ‚ñÑ‚ñÑ‚ñÑ‚ñÑ‚ñÉ‚ñÉ‚ñÉ‚ñÉ‚ñÇ‚ñÇ‚ñÇ‚ñÅ‚ñÅ‚ñÅ‚ñÅ
train/loss,‚ñà‚ñÜ‚ñÑ‚ñÖ‚ñÑ‚ñÑ‚ñÑ‚ñÑ‚ñÉ‚ñÑ‚ñÉ‚ñÑ‚ñÑ‚ñÉ‚ñÖ‚ñÑ‚ñÉ‚ñÑ‚ñÉ‚ñÖ‚ñÉ‚ñÉ‚ñÉ‚ñÉ‚ñÑ‚ñÖ‚ñÑ‚ñÑ‚ñÇ‚ñÑ‚ñÉ‚ñÇ‚ñÇ‚ñÉ‚ñÇ‚ñÉ‚ñÉ‚ñÇ‚ñÅ‚ñÑ

0,1
total_flos,1.3280189906588467e+17
train/epoch,1.0
train/global_step,625.0
train/grad_norm,0.31964
train/learning_rate,0.0
train/loss,0.9133
train_loss,0.93193
train_runtime,11116.4359
train_samples_per_second,0.45
train_steps_per_second,0.056


### Step 5 - Run model inference after fine-tuning

In [18]:
question = "I‚Äôm 35 years old, earn ‚Çπ1.5 lakh per month, and have ‚Çπ10 lakh in savings. I want to invest for my child‚Äôs education (expected cost ‚Çπ50 lakh in 10 years) and my retirement (planned at age 60). I‚Äôm willing to take moderate risk. Can you suggest a diversified investment plan to achieve these goals?"

# Load the inference model using FastLanguageModel (Unsloth optimizes for speed)
FastLanguageModel.for_inference(model_lora)  # Unsloth has 2x faster inference!

# Tokenize the input question with a specific prompt format and move it to the GPU
inputs = tokenizer([prompt_style.format(question, "")], return_tensors="pt").to("cuda")

# Generate a response using LoRA
outputs = model_lora.generate(
    input_ids=inputs.input_ids,           # Tokenize input IDs
    attention_mask=inputs.attention_mask, # Attention mask for padding handling
    max_new_tokens=1200,                  # Maximum length for generating response
    use_cache=True,                       # Enable cache for efficient generation
)

# Decode the generated response from tokenized format to readable text
response = tokenizer.batch_decode(outputs)

# Extract and print only the model's response part after '### Response:'
print(response[0].split("### Response:")[1])


<think>
Let me break this down step by step:
1. Understand the key concepts being asked about
2. Identify the main differences or important points
3. Structure a clear and concise response
</think>
1. Fixed deposits (FDs): Start with 2-3 years FDs for liquidity. This will give you 6-7% interest for the first 1-2 years. Then invest 10-15% of your total savings in FDs for 3-5 years to get 6-7% interest. 

2. Public Provident Fund (PPF): Invest 10-15% of your total savings in PPF for 15 years. This will give you 8-9% tax-free interest. PPF is a safe and tax-efficient option.

3. Gold: Invest 10-15% of your total savings in gold. Gold is a hedge against inflation and currency fluctuations. However, gold prices are volatile so you need to manage this wisely.

4. Mutual funds: Invest 30-40% of your total savings in actively managed equity funds and index funds for moderate risk. This will give you 8-10% average returns over 10-15 years. 

5. Real estate: Consider investing 10-15% of your sa

## Saving the model locally

In [19]:
new_model_local = "DeepSeek-R1-Distill-Llama-8B-finance-v1"
model_lora.save_pretrained(new_model_local) 
tokenizer.save_pretrained(new_model_local)

model_lora.save_pretrained_merged(new_model_local, tokenizer, save_method = "merged_16bit")

Unsloth: You have 2 CPUs. Using `safe_serialization` is 10x slower.
We shall switch to Pytorch saving, which might take 3 minutes and not 30 minutes.
To force `safe_serialization`, set it to `None` instead.
Unsloth: Kaggle/Colab has limited disk space. We need to delete the downloaded
model which will save 4-16GB of disk space, allowing you to save on Kaggle/Colab.
Unsloth: Will remove a cached repo with size 6.0G


Unsloth: Merging 4bit and LoRA weights to 16bit...
Unsloth: Will use up to 18.79 out of 31.35 RAM for saving.
Unsloth: Saving model... This might take 5 minutes ...


 34%|‚ñà‚ñà‚ñà‚ñç      | 11/32 [00:00<00:01, 13.60it/s]
We will save to Disk and not RAM now.
100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 32/32 [00:24<00:00,  1.31it/s]


Unsloth: Saving tokenizer... Done.
Unsloth: Saving DeepSeek-R1-Distill-Llama-8B-finance-v1/pytorch_model-00001-of-00004.bin...
Unsloth: Saving DeepSeek-R1-Distill-Llama-8B-finance-v1/pytorch_model-00002-of-00004.bin...
Unsloth: Saving DeepSeek-R1-Distill-Llama-8B-finance-v1/pytorch_model-00003-of-00004.bin...
Unsloth: Saving DeepSeek-R1-Distill-Llama-8B-finance-v1/pytorch_model-00004-of-00004.bin...
Done.


## Pushing the model to Hugging Face Hub

In [20]:
hugging_face_write_token = user_secrets.get_secret("hf_token_write")
login(hugging_face_write_token)

In [21]:
new_model_online = "abhi9ab/DeepSeek-R1-Distill-Llama-8B-finance-v1"
model_lora.push_to_hub(new_model_online)
tokenizer.push_to_hub(new_model_online)

model_lora.push_to_hub_merged(new_model_online, tokenizer, save_method = "merged_16bit")

README.md:   0%|          | 0.00/626 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/1.56k [00:00<?, ?B/s]

  0%|          | 0/1 [00:00<?, ?it/s]

adapter_model.safetensors:   0%|          | 0.00/168M [00:00<?, ?B/s]

Saved model to https://huggingface.co/abhi9ab/DeepSeek-R1-Distill-Llama-8B-finance-v1


  0%|          | 0/1 [00:00<?, ?it/s]

tokenizer.json:   0%|          | 0.00/17.2M [00:00<?, ?B/s]

Unsloth: You are pushing to hub in Kaggle environment.
To save memory, we shall move abhi9ab/DeepSeek-R1-Distill-Llama-8B-finance-v1 to /tmp/DeepSeek-R1-Distill-Llama-8B-finance-v1
Unsloth: Will remove a cached repo with size 1.6K


Unsloth: Merging 4bit and LoRA weights to 16bit...
Unsloth: Will use up to 18.84 out of 31.35 RAM for saving.
Unsloth: Saving model... This might take 5 minutes ...


100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 32/32 [00:23<00:00,  1.34it/s]


Unsloth: Saving tokenizer...

No files have been modified since last commit. Skipping to prevent empty commit.


 Done.
Unsloth: Saving /tmp/DeepSeek-R1-Distill-Llama-8B-finance-v1/pytorch_model-00001-of-00004.bin...
Unsloth: Saving /tmp/DeepSeek-R1-Distill-Llama-8B-finance-v1/pytorch_model-00002-of-00004.bin...
Unsloth: Saving /tmp/DeepSeek-R1-Distill-Llama-8B-finance-v1/pytorch_model-00003-of-00004.bin...
Unsloth: Saving /tmp/DeepSeek-R1-Distill-Llama-8B-finance-v1/pytorch_model-00004-of-00004.bin...


  0%|          | 0/4 [00:00<?, ?it/s]

pytorch_model-00002-of-00004.bin:   0%|          | 0.00/5.00G [00:00<?, ?B/s]

pytorch_model-00001-of-00004.bin:   0%|          | 0.00/4.98G [00:00<?, ?B/s]

pytorch_model-00003-of-00004.bin:   0%|          | 0.00/4.92G [00:00<?, ?B/s]

pytorch_model-00004-of-00004.bin:   0%|          | 0.00/1.17G [00:00<?, ?B/s]

Done.
Saved merged model to https://huggingface.co/abhi9ab/DeepSeek-R1-Distill-Llama-8B-finance-v1
