### Installation

In [None]:
%%capture

!pip install --no-deps bitsandbytes accelerate xformers==0.0.29 peft trl triton
!pip install --no-deps cut_cross_entropy unsloth_zoo
!pip install sentencepiece protobuf datasets huggingface_hub hf_transfer
!pip install --no-deps unsloth

### Unsloth

In [None]:
from unsloth import FastLanguageModel
import torch
max_seq_length = 2048
dtype = None
load_in_4bit = True

fourbit_models = [
    "unsloth/mistral-7b-bnb-4bit",
    "unsloth/mistral-7b-instruct-v0.2-bnb-4bit",
    "unsloth/llama-2-7b-bnb-4bit",
    "unsloth/llama-2-13b-bnb-4bit",
    "unsloth/codellama-34b-bnb-4bit",
    "unsloth/tinyllama-bnb-4bit",
    "unsloth/gemma-7b-bnb-4bit", # New Google 6 trillion tokens model 2.5x faster!
    "unsloth/gemma-2b-bnb-4bit",
]

model, tokenizer = FastLanguageModel.from_pretrained(
    model_name = "unsloth/mistral-7b-instruct-v0.3-bnb-4bit", # Choose ANY! eg teknium/OpenHermes-2.5-Mistral-7B
    max_seq_length = max_seq_length,
    dtype = dtype,
    load_in_4bit = load_in_4bit,
    # token = "hf_...", # use one if using gated models like meta-llama/Llama-2-7b-hf
)

🦥 Unsloth: Will patch your computer to enable 2x faster free finetuning.
🦥 Unsloth Zoo will now patch everything to make training faster!
==((====))==  Unsloth 2025.2.12: Fast Mistral patching. Transformers: 4.48.3.
   \\   /|    GPU: Tesla T4. Max memory: 14.741 GB. Platform: Linux.
O^O/ \_/ \    Torch: 2.5.1+cu124. CUDA: 7.5. CUDA Toolkit: 12.4. Triton: 3.1.0
\        /    Bfloat16 = FALSE. FA [Xformers = 0.0.29. FA2 = False]
 "-____-"     Free Apache license: http://github.com/unslothai/unsloth
Unsloth: Fast downloading is enabled - ignore downloading bars which are red colored!


model.safetensors:   0%|          | 0.00/4.14G [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/157 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/141k [00:00<?, ?B/s]

tokenizer.model:   0%|          | 0.00/587k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/446 [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.96M [00:00<?, ?B/s]

We now add LoRA adapters so we only need to update 1 to 10% of all parameters!

In [None]:
model = FastLanguageModel.get_peft_model(
    model,
    r = 16, # Choose any number > 0 ! Suggested 8, 16, 32, 64, 128
    target_modules = ["q_proj", "k_proj", "v_proj", "o_proj",
                      "gate_proj", "up_proj", "down_proj",],
    lora_alpha = 16,
    lora_dropout = 0, # Supports any, but = 0 is optimized
    bias = "none",    # Supports any, but = "none" is optimized
    # [NEW] "unsloth" uses 30% less VRAM, fits 2x larger batch sizes!
    use_gradient_checkpointing = "unsloth", # True or "unsloth" for very long context
    random_state = 3407,
    use_rslora = False,  # We support rank stabilized LoRA
    loftq_config = None, # And LoftQ
)

Unsloth 2025.2.12 patched 32 layers with 32 QKV layers, 32 O layers and 32 MLP layers.


<a name="Data"></a>
### Data Prep
We now use the `ChatML` format for conversation style finetunes. We use [Open Assistant conversations](https://huggingface.co/datasets/philschmid/guanaco-sharegpt-style) in ShareGPT style. ChatML renders multi turn conversations like below:

```
<|im_start|>system
You are a helpful assistant.<|im_end|>
<|im_start|>user
What's the capital of France?<|im_end|>
<|im_start|>assistant
Paris.
```

**[NOTE]** To train only on completions (ignoring the user's input) read TRL's docs [here](https://huggingface.co/docs/trl/sft_trainer#train-on-completions-only).

We use our `get_chat_template` function to get the correct chat template. We support `zephyr, chatml, mistral, llama, alpaca, vicuna, vicuna_old` and our own optimized `unsloth` template.

Normally one has to train `<|im_start|>` and `<|im_end|>`. We instead map `<|im_end|>` to be the EOS token, and leave `<|im_start|>` as is. This requires no additional training of additional tokens.

Note ShareGPT uses `{"from": "human", "value" : "Hi"}` and not `{"role": "user", "content" : "Hi"}`, so we use `mapping` to map it.

For text completions like novel writing, try this [notebook](https://colab.research.google.com/drive/1ef-tab5bhkvWmBOObepl1WgJvfvSzn5Q?usp=sharing).

In [None]:
import pandas as pd
df1 = pd.read_csv("/content/FINANCIAL_DATA_1.csv")
df2 = pd.read_csv("/content/FINANCIAL_DATA_2.csv")
df3 = pd.read_csv("/content/FINANCIAL_DATA_3.csv")
df4 = pd.read_csv("/content/FINANCIAL_DATA_4.csv")
df5 = pd.read_csv("/content/FINANCIAL_DATA_5.csv")
df6 = pd.read_csv("/content/FINANCIAL_DATA_6.csv")
df7 = pd.read_csv("/content/FINANCIAL_DATA_7.csv")
df8 = pd.read_csv("/content/FINANCIAL_DATA_8.csv")
df9 = pd.read_csv("/content/FINANCIAL_DATA_9.csv")
df10 = pd.read_csv("/content/FINANCIAL_DATA_10.csv")
df11 = pd.read_csv("/content/fintech_completed_dataset.csv")
df13=pd.read_csv("/content/FINANCIAL_CLASSIFICATION_DATA_14.csv")
df14=pd.read_csv("/content/FINANCIAL_CLASSIFICATION_DATA_13.csv")
df15=pd.read_csv("/content/FINANCIAL_CLASSIFICATION_DATA_12.csv")

FileNotFoundError: [Errno 2] No such file or directory: '/content/FINANCIAL_DATA_1.csv'

In [None]:
df= pd.concat([df1, df2, df3, df4, df5, df6,df7,df8,df9,df10,df11,df13,df14,df15], ignore_index=True)

In [None]:
df

In [None]:
!pip install datasets

In [None]:
df

In [None]:
from datasets import Dataset
from unsloth.chat_templates import get_chat_template

dataset = Dataset.from_pandas(df)

dataset = dataset.rename_columns({
    "PROMPT": "input",
    "RESULT": "output"
})

def create_conversation(sample):
    return {
        "conversations": [
            {
                "from": "human",
                "value": sample["input"],
            },
            {
                "from": "gpt",
                "value": sample["output"],
            }
        ]
    }

dataset = dataset.map(create_conversation, remove_columns=dataset.column_names)

tokenizer = get_chat_template(
    tokenizer,
    chat_template="mistral",  # Use "mistral" for Mistral models
    mapping={
        "role": "from",
        "content": "value",
        "user": "human",
        "assistant": "gpt"
    },
    map_eos_token=True,
)

def formatting_prompts_func(examples):
    convos = examples["conversations"]
    texts = [tokenizer.apply_chat_template(
        convo,
        tokenize=False,
        add_generation_prompt=False
    ) for convo in convos]
    return {"text": texts}

dataset = dataset.map(formatting_prompts_func, batched=True)



Map:   0%|          | 0/7032 [00:00<?, ? examples/s]

Map:   0%|          | 0/7032 [00:00<?, ? examples/s]

In [None]:
print(dataset[0]["text"])


<s>[INST] Sarah lent $500 to John for his new laptop. [/INST]Lender: SarahAmount Lent: $500Borrower: John</s>


Let's see how the `ChatML` format works by printing the 5th element

In [None]:
dataset[0]["conversations"]

[{'from': 'human', 'value': 'Sarah lent $500 to John for his new laptop.'},
 {'from': 'gpt', 'value': 'Lender: SarahAmount Lent: $500Borrower: John'}]

In [None]:
print(df.iloc[0])


PROMPT     Sarah lent $500 to John for his new laptop.
RESULT    Lender: SarahAmount Lent: $500Borrower: John
Name: 0, dtype: object


In [None]:
print(dataset[5]["text"])

<s>[INST] After much consideration, my father decided to loan me $7,000 for a down payment on a car. [/INST]Lender: my fatherAmount Lent: $7,000Borrower: me</s>


In [None]:
dataset

Dataset({
    features: ['conversations', 'text'],
    num_rows: 7032
})

If you're looking to make your own chat template, that also is possible! You must use the Jinja templating regime. We provide our own stripped down version of the `Unsloth template` which we find to be more efficient, and leverages ChatML, Zephyr and Alpaca styles.

More info on chat templates on [our wiki page!](https://github.com/unslothai/unsloth/wiki#chat-templates)

In [None]:
unsloth_template = \
    "{{ bos_token }}"\
    "{{ 'You are a helpful assistant to the user\n' }}"\
    "{% for message in messages %}"\
        "{% if message['role'] == 'user' %}"\
            "{{ '>>> User: ' + message['content'] + '\n' }}"\
        "{% elif message['role'] == 'assistant' %}"\
            "{{ '>>> Assistant: ' + message['content'] + eos_token + '\n' }}"\
        "{% endif %}"\
    "{% endfor %}"\
    "{% if add_generation_prompt %}"\
        "{{ '>>> Assistant: ' }}"\
    "{% endif %}"
unsloth_eos_token = "eos_token"

if False:
    tokenizer = get_chat_template(
        tokenizer,
        chat_template = (unsloth_template, unsloth_eos_token,), # You must provide a template and EOS token
        mapping = {"role" : "from", "content" : "value", "user" : "human", "assistant" : "gpt"}, # ShareGPT style
        map_eos_token = True, # Maps <|im_end|> to </s> instead
    )

<a name="Train"></a>
### Train the model
Now let's use Huggingface TRL's `SFTTrainer`! More docs here: [TRL SFT docs](https://huggingface.co/docs/trl/sft_trainer). We do 60 steps to speed things up, but you can set `num_train_epochs=1` for a full run, and turn off `max_steps=None`. We also support TRL's `DPOTrainer`!

In [None]:
from trl import SFTTrainer
from transformers import TrainingArguments
from unsloth import is_bfloat16_supported

trainer = SFTTrainer(
    model = model,
    tokenizer = tokenizer,
    train_dataset = dataset,
    dataset_text_field = "text",
    max_seq_length = max_seq_length,
    dataset_num_proc = 2,
    packing = False, # Can make training 5x faster for short sequences.
    args = TrainingArguments(
        per_device_train_batch_size = 2,
        gradient_accumulation_steps = 4,
        warmup_steps = 5,
        max_steps = 60,
        learning_rate = 2e-4,
        fp16 = not is_bfloat16_supported(),
        bf16 = is_bfloat16_supported(),
        logging_steps = 1,
        optim = "adamw_8bit",
        weight_decay = 0.01,
        lr_scheduler_type = "linear",
        seed = 3407,
        output_dir = "outputs",
        report_to = "none", # Use this for WandB etc
    ),
)

Applying chat template to train dataset (num_proc=2):   0%|          | 0/7032 [00:00<?, ? examples/s]

Tokenizing train dataset (num_proc=2):   0%|          | 0/7032 [00:00<?, ? examples/s]

Tokenizing train dataset (num_proc=2):   0%|          | 0/7032 [00:00<?, ? examples/s]

In [None]:
# @title Show current memory stats
gpu_stats = torch.cuda.get_device_properties(0)
start_gpu_memory = round(torch.cuda.max_memory_reserved() / 1024 / 1024 / 1024, 3)
max_memory = round(gpu_stats.total_memory / 1024 / 1024 / 1024, 3)
print(f"GPU = {gpu_stats.name}. Max memory = {max_memory} GB.")
print(f"{start_gpu_memory} GB of memory reserved.")

GPU = Tesla T4. Max memory = 14.741 GB.
4.051 GB of memory reserved.


In [None]:
trainer_stats = trainer.train()

==((====))==  Unsloth - 2x faster free finetuning | Num GPUs = 1
   \\   /|    Num examples = 7,032 | Num Epochs = 1
O^O/ \_/ \    Batch size per device = 2 | Gradient Accumulation steps = 4
\        /    Total batch size = 8 | Total steps = 60
 "-____-"     Number of trainable parameters = 41,943,040


Step,Training Loss
1,3.689
2,2.9888
3,2.9312
4,2.638
5,2.0345
6,1.8985
7,1.6932
8,1.7614
9,1.3901
10,1.37


In [None]:
print(model.config._name_or_path)


unsloth/mistral-7b-instruct-v0.3-bnb-4bit


In [None]:
# @title Show final memory and time stats
used_memory = round(torch.cuda.max_memory_reserved() / 1024 / 1024 / 1024, 3)
used_memory_for_lora = round(used_memory - start_gpu_memory, 3)
used_percentage = round(used_memory / max_memory * 100, 3)
lora_percentage = round(used_memory_for_lora / max_memory * 100, 3)
print(f"{trainer_stats.metrics['train_runtime']} seconds used for training.")
print(
    f"{round(trainer_stats.metrics['train_runtime']/60, 2)} minutes used for training."
)
print(f"Peak reserved memory = {used_memory} GB.")
print(f"Peak reserved memory for training = {used_memory_for_lora} GB.")
print(f"Peak reserved memory % of max memory = {used_percentage} %.")
print(f"Peak reserved memory for training % of max memory = {lora_percentage} %.")

275.6203 seconds used for training.
4.59 minutes used for training.
Peak reserved memory = 4.721 GB.
Peak reserved memory for training = 0.67 GB.
Peak reserved memory % of max memory = 32.026 %.
Peak reserved memory for training % of max memory = 4.545 %.


<a name="Inference"></a>
### Inference
Let's run the model! Since we're using `ChatML`, use `apply_chat_template` with `add_generation_prompt` set to `True` for inference.

In [None]:
from unsloth.chat_templates import get_chat_template

tokenizer = get_chat_template(
    tokenizer,
    chat_template = "chatml", # Supports zephyr, chatml, mistral, llama, alpaca, vicuna, vicuna_old, unsloth
    mapping = {"role" : "from", "content" : "value", "user" : "human", "assistant" : "gpt"}, # ShareGPT style
    map_eos_token = True, # Maps <|im_end|> to </s> instead
)

FastLanguageModel.for_inference(model) # Enable native 2x faster inference

messages = [
    {"from": "human", "value": "shubam spent 1000 at starbucks,sai owes half of the amount"},
]
inputs = tokenizer.apply_chat_template(
    messages,
    tokenize = True,
    add_generation_prompt = True, # Must add for generation
    return_tensors = "pt",
).to("cuda")

outputs = model.generate(input_ids = inputs, max_new_tokens = 64, use_cache = True)
tokenizer.batch_decode(outputs)

['<|im_start|>user\nshubam spent 1000 at starbucks,sai owes half of the amount<|im_end|>\n<|im_start|>assistant\nshubam spent 1000 at starbucks,sai owes half of the amount<|im_end|>']

In [None]:
# Save the fine-tuned model and tokenizer locally
model.save_pretrained("./unsloth_finetuned_classifier")
tokenizer.save_pretrained("./unsloth_finetuned_classifier")

('./unsloth_finetuned_classifier/tokenizer_config.json',
 './unsloth_finetuned_classifier/special_tokens_map.json',
 './unsloth_finetuned_classifier/tokenizer.json')

In [None]:
from unsloth import FastLanguageModel

# Load the base model
base_model_name = "unsloth/mistral-7b-instruct-v0.3-bnb-4bit"
model, tokenizer = FastLanguageModel.from_pretrained(
    model_name=base_model_name,
    max_seq_length=2048,
    dtype=None,
    load_in_4bit=True,
)

# Apply the LoRA adapter configuration
model = FastLanguageModel.get_peft_model(
    model,
    r=16,
    target_modules=["q_proj", "k_proj", "v_proj", "o_proj", "gate_proj", "up_proj", "down_proj"],
    lora_alpha=16,
    lora_dropout=0,
    bias="none",
    use_gradient_checkpointing="unsloth",
    random_state=3407,
)

# Load the fine-tuned LoRA weights
model.load_adapter("./unsloth_finetuned_classifier", adapter_name="default")



==((====))==  Unsloth 2025.2.9: Fast Mistral patching. Transformers: 4.48.3.
   \\   /|    GPU: Tesla T4. Max memory: 14.741 GB. Platform: Linux.
O^O/ \_/ \    Torch: 2.5.1+cu124. CUDA: 7.5. CUDA Toolkit: 12.4. Triton: 3.1.0
\        /    Bfloat16 = FALSE. FA [Xformers = 0.0.29. FA2 = False]
 "-____-"     Free Apache license: http://github.com/unslothai/unsloth
Unsloth: Fast downloading is enabled - ignore downloading bars which are red colored!


<All keys matched successfully>

In [None]:
print(inputs["input_ids"].device)  # Should match model.device

In [None]:
model.to("cuda")

In [None]:
prompt = """
You are a financial assistant tasked with analyzing transactions and summarizing them in a structured format. Follow these rules strictly:

1. The output must always be in the following format:
Lender: [Name of the lender]
Amount Lent: [Total amount lent as an integer]
Borrower 1: [Name of the first borrower]
Amount: [Amount owed by the first borrower as an integer]
Borrower 2: [Name of the second borrower]
Amount: [Amount owed by the second borrower as an integer]

2. If the input involves percentages or fractions:
- Convert percentages into decimal form (e.g., 33.33% = 0.3333).
- Multiply the percentage or fraction by the total amount to calculate the borrower's share.
- Round the result to the nearest integer.

3. If the input involves "what's left" or similar phrases:
- Subtract the amounts already accounted for from the total amount to determine the remaining share.

4. Always use integers for all amounts in the output. Do not include decimals or currency symbols.

5. Use the names provided in the input exactly as they appear.

Example 1:
Input: "John covered the rent of 18,000. Kelly owes 6,000, Liam is paying 1/3, and John covers what's left."
Output:
Lender: John
Amount Lent: 18000
Borrower 1: Kelly
Amount: 6000
Borrower 2: Liam
Amount: 6000

Example 2:
Input: "Anita paid a utility bill of 3,600. Bina owes Anita 1,200, while Chitra is responsible for 33.33% of the total amount, and Anita covers the rest."
Output:
Lender: Anita
Amount Lent: 3600
Borrower 1: Bina
Amount: 1200
Borrower 2: Chitra
Amount: 1200
Now analyze the following transaction and provide the output in the specified format:
Input: "Sarah paid a grocery bill of 9,000. Mike owes 2,000, Tom is responsible for 25% of the total amount, and Sarah covers the rest."
"""

In [None]:
messages = [
    {"from": "human", "value": prompt},
]
inputs = tokenizer.apply_chat_template(
    messages,
    tokenize=True,
    add_generation_prompt=True,  # Must add for generation
    return_tensors="pt",  # Return PyTorch tensors
).to("cuda")  # Move inputs to the GPU

# Generate outputs
outputs = model.generate(
    input_ids=inputs,
    max_new_tokens=128,  # Maximum number of tokens to generate
    use_cache=True,     # Use KV cache for faster generation
)

# Decode and print the outputs
decoded_outputs = tokenizer.batch_decode(outputs, skip_special_tokens=True)
print(decoded_outputs[0])

<|im_start|>user

You are a financial assistant tasked with analyzing transactions and summarizing them in a structured format. Follow these rules strictly:

1. The output must always be in the following format:
Lender: [Name of the lender]
Amount Lent: [Total amount lent as an integer]
Borrower 1: [Name of the first borrower]
Amount: [Amount owed by the first borrower as an integer]
Borrower 2: [Name of the second borrower]
Amount: [Amount owed by the second borrower as an integer]

2. If the input involves percentages or fractions:
- Convert percentages into decimal form (e.g., 33.33% = 0.3333).
- Multiply the percentage or fraction by the total amount to calculate the borrower's share.
- Round the result to the nearest integer.

3. If the input involves "what's left" or similar phrases:
- Subtract the amounts already accounted for from the total amount to determine the remaining share.

4. Always use integers for all amounts in the output. Do not include decimals or currency symbols

In [None]:
from huggingface_hub import login
login()  # Enter your Hugging Face token when prompted

VBox(children=(HTML(value='<center> <img\nsrc=https://huggingface.co/front/assets/huggingface_logo-noborder.sv…

In [None]:
# Save the fine-tuned model and tokenizer locally
model.save_pretrained("./unsloth_finetuned_classifier")
tokenizer.save_pretrained("./unsloth_finetuned_classifier")

('./unsloth_finetuned_classifier/tokenizer_config.json',
 './unsloth_finetuned_classifier/special_tokens_map.json',
 './unsloth_finetuned_classifier/tokenizer.json')

In [None]:
# Push the fine-tuned model and tokenizer to Hugging Face Hub
model.push_to_hub("SaiGaneshanM/unsloth_finetuned_classifier", adapter_name="default")
tokenizer.push_to_hub("SaiGaneshanM/unsloth_finetuned_classifier")

README.md:   0%|          | 0.00/611 [00:00<?, ?B/s]

  0%|          | 0/1 [00:00<?, ?it/s]

adapter_model.safetensors:   0%|          | 0.00/168M [00:00<?, ?B/s]

Saved model to https://huggingface.co/SaiGaneshanM/unsloth_finetuned_classifier


In [None]:
prompt = """
You are a financial assistant tasked with analyzing transactions and summarizing them in a structured format. Follow these rules strictly:

1. The output must always be in the following format:
Lender: [Name of the lender]
Amount Lent: [Total amount lent as an integer]
Borrower 1: [Name of the first borrower]
Amount: [Amount owed by the first borrower as an integer]
Borrower 2: [Name of the second borrower]
Amount: [Amount owed by the second borrower as an integer]

2. If the input involves percentages or fractions:
- Convert percentages into decimal form (e.g., 33.33% = 0.3333).
- Multiply the percentage or fraction by the total amount to calculate the borrower's share.
- Round the result to the nearest integer.

3. If the input involves "what's left" or similar phrases:
- Subtract the amounts already accounted for from the total amount to determine the remaining share.

4. Always use integers for all amounts in the output. Do not include decimals or currency symbols.

5. Use the names provided in the input exactly as they appear.

Example 1:
Input: "John covered the rent of 18,000. Kelly owes 6,000, Liam is paying 1/3, and John covers what's left."
Output:
Lender: John
Amount Lent: 18000
Borrower 1: Kelly
Amount: 6000
Borrower 2: Liam
Amount: 6000

Example 2:
Input: "Anita paid a utility bill of 3,600. Bina owes Anita 1,200, while Chitra is responsible for 33.33% of the total amount, and Anita covers the rest."
Output:
Lender: Anita
Amount Lent: 3600
Borrower 1: Bina
Amount: 1200
Borrower 2: Chitra
Amount: 1200
Now analyze the following transaction and provide the output in the specified format:
Input: "Sarah paid a grocery bill of 9,000. Mike owes 2,000, Tom is responsible for 25% of the total amount, and Sarah covers the rest."
"""

In [None]:
%%capture
# Normally using pip install unsloth is enough

# Temporarily as of Jan 31st 2025, Colab has some issues with Pytorch
# Using pip install unsloth will take 3 minutes, whilst the below takes <1 minute:
!pip install --no-deps bitsandbytes accelerate xformers==0.0.29 peft trl triton
!pip install --no-deps cut_cross_entropy unsloth_zoo
!pip install sentencepiece protobuf datasets huggingface_hub hf_transfer
!pip install --no-deps unsloth

In [None]:

from unsloth import FastLanguageModel

# Load the base model
base_model_name = "unsloth/mistral-7b-instruct-v0.3-bnb-4bit"
model, tokenizer = FastLanguageModel.from_pretrained(
    model_name=base_model_name,
    max_seq_length=2048,
    dtype=None,
    load_in_4bit=True,
    device_map="auto",  # Automatically distribute the model across available devices
)

# Apply the LoRA adapter configuration
model = FastLanguageModel.get_peft_model(
    model,
    r=16,
    target_modules=["q_proj", "k_proj", "v_proj", "o_proj", "gate_proj", "up_proj", "down_proj"],
    lora_alpha=16,
    lora_dropout=0,
    bias="none",
    use_gradient_checkpointing="unsloth",
    random_state=3407,
)

# Load the fine-tuned LoRA weights from Hugging Face Hub
model.load_adapter("SaiGaneshanM/unsloth_finetuned_classifier", adapter_name="default")

==((====))==  Unsloth 2025.2.10: Fast Mistral patching. Transformers: 4.48.3.
   \\   /|    GPU: Tesla T4. Max memory: 14.741 GB. Platform: Linux.
O^O/ \_/ \    Torch: 2.5.1+cu124. CUDA: 7.5. CUDA Toolkit: 12.4. Triton: 3.1.0
\        /    Bfloat16 = FALSE. FA [Xformers = 0.0.29. FA2 = False]
 "-____-"     Free Apache license: http://github.com/unslothai/unsloth
Unsloth: Fast downloading is enabled - ignore downloading bars which are red colored!


<All keys matched successfully>

In [None]:
from unsloth.chat_templates import get_chat_template

tokenizer = get_chat_template(
    tokenizer,
    chat_template="chatml",  # Supports zephyr, chatml, mistral, llama, alpaca, vicuna, etc.
    mapping={"role": "from", "content": "value", "user": "user", "assistant": "assistant"},  # Align with chatml
    map_eos_token=True,  # Maps 〈eos〉 to </s> instead
)

# Define the prompt template with detailed instructions
prompt_template = """
You are a financial assistant tasked with analyzing transactions and summarizing them in a structured format. Follow these rules strictly:
1. The output must always be in the following format:
Lender: [Name of the lender]
Amount Lent: [Total amount lent as an integer]
Borrower 1: [Name of the first borrower]
Amount: [Amount owed by the first borrower as an integer]
Borrower 2: [Name of the second borrower]
Amount: [Amount owed by the second borrower as an integer]
2. If the input involves percentages or fractions:
   - Convert percentages into decimal form (e.g., 33.33% = 0.3333).
   - Multiply the percentage or fraction by the total amount to calculate the borrower's share.
   - Round the result to the nearest integer.
3. If the input involves "what's left" or similar phrases:
   - Subtract the amounts already accounted for from the total amount to determine the remaining share.
4. Always use integers for all amounts in the output. Do not include decimals or currency symbols.
5. Use the names provided in the input exactly as they appear.

Example 1:
Input: "John covered the rent of 18,000. Kelly owes 6,000, Liam is paying 1/3, and John covers what's left."
Output:
Lender: John
Amount Lent: 18000
Borrower 1: Kelly
Amount: 6000
Borrower 2: Liam
Amount: 6000

Example 2:
Input: "Anita paid a utility bill of 3,600. Bina owes Anita 1,200, while Chitra is responsible for 33.33% of the total amount, and Anita covers the rest."
Output:
Lender: Anita
Amount Lent: 3600
Borrower 1: Bina
Amount: 1200
Borrower 2: Chitra
Amount: 1200

Now analyze the following transaction and provide the output in the specified format:
Input: "{input}"
"""

# Define the input variable
input_text = "Yara paid the $400 rent; Zoe owes $100, and Yara pays the rest."

# Substitute the input into the prompt template
prompt = prompt_template.format(input=input_text)

# Define the input messages
messages = [
    {"from": "user", "value": prompt},
    {"from": "assistant", "value": ""},  # Placeholder for the assistant's response
]

# Tokenize the input
inputs = tokenizer.apply_chat_template(
    messages,
    tokenize=True,
    add_generation_prompt=True,  # Add a generation prompt for the assistant's response
    return_tensors="pt",
).to("cuda")

# Generate outputs
outputs = model.generate(input_ids=inputs, max_new_tokens=128, use_cache=True)

# Decode the outputs
decoded_outputs = tokenizer.batch_decode(outputs, skip_special_tokens=True)

# Extract only the assistant's response
# Split at the last occurrence of "Output:" to isolate the structured response
if "Output:" in decoded_outputs[0]:
    assistant_response = decoded_outputs[0].split("Output:")[-1].strip()
else:
    assistant_response = decoded_outputs[0].strip()


# Define the input messages



In [None]:
def extract_last_assistant_response(text):
    marker = "<|im_start|>assistant"
    parts = text.split(marker)
    if len(parts) > 1:
        return parts[-1].strip()  # Return last part after the marker
    return text.strip()  # Return original if marker is not found

# Example input
input_text = assistant_response

# Extract and print the relevant part
output = extract_last_assistant_response(input_text)
print(output)


Lender: Yara
Amount Lent: 400
Borrower 1: Zoe
Amount: 100


In [None]:
print(decoded_outputs[0])

<|im_start|>user
You are a financial assistant tasked with analyzing transactions and summarizing them in a structured format. Follow these rules strictly:

1. The output must always be in the following format:
Lender: [Name of the lender]
Amount Lent: [Total amount lent as an integer]
Borrower 1: [Name of the first borrower]
Amount: [Amount owed by the first borrower as an integer]
Borrower 2: [Name of the second borrower]
Amount: [Amount owed by the second borrower as an integer]

2. If the input involves percentages or fractions:
- Convert percentages into decimal form (e.g., 33.33% = 0.3333).
- Multiply the percentage or fraction by the total amount to calculate the borrower's share.
- Round the result to the nearest integer.

3. If the input involves "what's left" or similar phrases:
- Subtract the amounts already accounted for from the total amount to determine the remaining share.

4. Always use integers for all amounts in the output. Do not include decimals or currency symbols.

In [None]:
print(inputs["input_ids"].device)  # Should match model.device

cuda:0


In [None]:
model.to("cuda")

PeftModelForCausalLM(
  (base_model): LoraModel(
    (model): MistralForCausalLM(
      (model): MistralModel(
        (embed_tokens): Embedding(32768, 4096, padding_idx=770)
        (layers): ModuleList(
          (0-31): 32 x MistralDecoderLayer(
            (self_attn): MistralAttention(
              (q_proj): lora.Linear4bit(
                (base_layer): Linear4bit(in_features=4096, out_features=4096, bias=False)
                (lora_dropout): ModuleDict(
                  (default): Identity()
                )
                (lora_A): ModuleDict(
                  (default): Linear(in_features=4096, out_features=16, bias=False)
                )
                (lora_B): ModuleDict(
                  (default): Linear(in_features=16, out_features=4096, bias=False)
                )
                (lora_embedding_A): ParameterDict()
                (lora_embedding_B): ParameterDict()
                (lora_magnitude_vector): ModuleDict()
              )
              (k_proj):

In [None]:
print(inputs["input_ids"].device)  # Should match model.device

cpu
