<a href="https://colab.research.google.com/github/dastanrab/Data-Structures/blob/master/calori_model.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [2]:
!pip install unsloth
!pip install bitsandbytes
!pip install trl
!pip install accelerate
!pip install datasets
!pip install transformers
!pip install protobuf==3.20.3
!git clone https://github.com/ggml-org/llama.cpp
%cd llama.cpp
!cmake -B build
!cmake --build build --config Release

from unsloth import FastLanguageModel
from datasets import load_dataset

# Load base model with Unsloth
model, tokenizer = FastLanguageModel.from_pretrained(
    model_name = 'unsloth/Phi-3-mini-4k-instruct-bnb-4bit',
    max_seq_length = 2048,
    dtype = None,
    load_in_4bit = True
)

# Load dataset directly from Hugging Face
dataset = load_dataset("Codatta/MM-Food-100K", split="train")

SyntaxError: invalid syntax (ipython-input-1443036827.py, line 9)

In [1]:
cd llama.cpp
git checkout b3345
git submodule update --init --recursive
make clean
make all -j
git log -1

SyntaxError: invalid syntax (ipython-input-3000639451.py, line 1)

In [2]:
import json
# Map dataset to text format for SFTTrainer
def to_text(ex):
    # ورودی (پرومپت) از ستون‌های دیتاست ساخته میشه
    prompt = (
        f"Dish: {ex['dish_name']}\n"
        f"Ingredients: {', '.join(ex['ingredients'])}\n"
        f"Portion: {', '.join(ex['portion_size'])}\n"
        f"Cooking method: {ex['cooking_method']}"
    )

    # خروجی (ریسپانس) پروفایل غذاییه
    response = json.dumps(ex["nutritional_profile"], ensure_ascii=False)

    msgs = [
        {"role": "user", "content": prompt},
        {"role": "assistant", "content": response},
    ]
    return {
        "text": tokenizer.apply_chat_template(
            msgs, tokenize=False, add_generation_prompt=False
        )
    }

dataset = dataset.map(to_text, remove_columns=dataset.column_names)

In [3]:
# Prepare model for LoRA fine-tuning
model = FastLanguageModel.get_peft_model(
    model,
    r = 64,
    target_modules=['q_proj','k_proj','v_proj','o_proj','gate_proj','up_proj','down_proj'],
    lora_alpha = 128,
    lora_dropout = 0,
    bias = 'none',
    use_gradient_checkpointing = 'unsloth'
)

Unsloth 2025.9.2 patched 32 layers with 32 QKV layers, 32 O layers and 32 MLP layers.


In [4]:
from trl import SFTTrainer, SFTConfig

trainer = SFTTrainer(
    model = model,
    train_dataset = dataset,
    tokenizer = tokenizer,
    dataset_text_field = 'text',
    max_seq_length = 2048,
    args = SFTConfig(
        per_device_train_batch_size = 2,
        gradient_accumulation_steps = 4,
        warmup_steps = 10,
        max_steps = 60,  # small for demo, increase for real training
        logging_steps = 1,
        output_dir = "outputs",
        optim = "adamw_8bit",
        num_train_epochs = 1
    ),
)

trainer.train()

==((====))==  Unsloth - 2x faster free finetuning | Num GPUs used = 1
   \\   /|    Num examples = 100,000 | Num Epochs = 1 | Total steps = 60
O^O/ \_/ \    Batch size per device = 2 | Gradient accumulation steps = 4
\        /    Data Parallel GPUs = 1 | Total batch size (2 x 4 x 1) = 8
 "-____-"     Trainable parameters = 119,537,664 of 3,940,617,216 (3.03% trained)
[34m[1mwandb[0m: Currently logged in as: [33mdastanrab[0m ([33mdastanrab-bazist[0m) to [32mhttps://api.wandb.ai[0m. Use [1m`wandb login --relogin`[0m to force relogin


Unsloth: Will smartly offload gradients to save VRAM!


Step,Training Loss
1,1.692
2,1.6995
3,1.5052
4,1.6032
5,1.372
6,1.3727
7,1.2926
8,1.1768
9,1.0898
10,0.995


TrainOutput(global_step=60, training_loss=0.5503777265548706, metrics={'train_runtime': 294.216, 'train_samples_per_second': 1.631, 'train_steps_per_second': 0.204, 'total_flos': 3287130548207616.0, 'train_loss': 0.5503777265548706, 'epoch': 0.0048})

In [5]:
# Test inference
FastLanguageModel.for_inference(model)

messages = [
    {"role": "user", "content": "Dish: Fried Chicken\nIngredients: chicken, breading, oil\nPortion: 300g\nCooking method: Frying"}
]

inputs = tokenizer.apply_chat_template(
    messages,
    tokenize=True,
    add_generation_prompt=True,
    return_tensors="pt"
).to("cuda")

outputs = model.generate(
    input_ids=inputs,
    max_new_tokens=128,
    temperature=0.7,
    top_p=0.9,
    do_sample=True
)

response = tokenizer.batch_decode(outputs, skip_special_tokens=True)[0]
print(response)

The attention mask is not set and cannot be inferred from input because pad token is same as eos token. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.


Dish: Fried Chicken
Ingredients: chicken, breading, oil
Portion: 300g
Cooking method: Frying "{\"fat_g\":20.0,\"protein_g\":30.0,\"calories_kcal\":500,\"carbohydrate_g\":30.0}"


In [6]:
# Export to GGUF for Ollama
model.save_pretrained_gguf(
    "gguf_food_model",
    tokenizer,
    quantization_method="q4_k_m",
    maximum_memory_usage = 0.3)

Unsloth: You have 1 CPUs. Using `safe_serialization` is 10x slower.
We shall switch to Pytorch saving, which might take 3 minutes and not 30 minutes.
To force `safe_serialization`, set it to `None` instead.
Unsloth: Kaggle/Colab has limited disk space. We need to delete the downloaded
model which will save 4-16GB of disk space, allowing you to save on Kaggle/Colab.
Unsloth: Will remove a cached repo with size 2.3G


Unsloth: Merging 4bit and LoRA weights to 16bit...
Unsloth: Will use up to 1.62 out of 12.67 RAM for saving.
Unsloth: Saving model... This might take 5 minutes ...


 16%|█▌        | 5/32 [00:00<00:01, 15.80it/s]
We will save to Disk and not RAM now.
100%|██████████| 32/32 [01:44<00:00,  3.25s/it]


Unsloth: Saving tokenizer... Done.
Unsloth: Saving gguf_food_model/pytorch_model.bin...
Done.


Unsloth: Converting mistral model. Can use fast conversion = True.


==((====))==  Unsloth: Conversion from QLoRA to GGUF information
   \\   /|    [0] Installing llama.cpp might take 3 minutes.
O^O/ \_/ \    [1] Converting HF to GGUF 16bits might take 3 minutes.
\        /    [2] Converting GGUF 16bits to ['q4_k_m'] might take 10 minutes each.
 "-____-"     In total, you will have to wait at least 16 minutes.

Unsloth: Installing llama.cpp. This might take 3 minutes...
Unsloth: CMAKE detected. Finalizing some steps for installation.
Unsloth: [1] Converting model at gguf_food_model into f16 GGUF format.
The output location will be /content/llama.cpp/gguf_food_model/unsloth.F16.gguf
This might take 3 minutes...


Unsloth: Extending gguf_food_model/tokenizer.model with added_tokens.json.
Originally tokenizer.model is of size (32000).
But we need to extend to sentencepiece vocab size (32011).


INFO:hf-to-gguf:Loading model: gguf_food_model
INFO:hf-to-gguf:Model architecture: MistralForCausalLM
INFO:gguf.gguf_writer:gguf: This GGUF file is for Little Endian only
INFO:hf-to-gguf:Exporting model...
INFO:hf-to-gguf:gguf: loading model part 'pytorch_model.bin'
INFO:hf-to-gguf:token_embd.weight,           torch.float16 --> F16, shape = {3072, 32064}
Traceback (most recent call last):
  File "/content/llama.cpp/llama.cpp/convert_hf_to_gguf.py", line 9021, in <module>
    main()
  File "/content/llama.cpp/llama.cpp/convert_hf_to_gguf.py", line 9015, in main
    model_instance.write()
  File "/content/llama.cpp/llama.cpp/convert_hf_to_gguf.py", line 445, in write
    self.prepare_tensors()
  File "/content/llama.cpp/llama.cpp/convert_hf_to_gguf.py", line 2265, in prepare_tensors
    super().prepare_tensors()
  File "/content/llama.cpp/llama.cpp/convert_hf_to_gguf.py", line 313, in prepare_tensors
    for new_name, data_torch in (self.modify_tensors(data_torch, name, bid)):
          

RuntimeError: Unsloth: Quantization failed for /content/llama.cpp/gguf_food_model/unsloth.F16.gguf
You might have to compile llama.cpp yourself, then run this again.
You do not need to close this Python program. Run the following commands in a new terminal:
You must run this in the same folder as you're saving your model.
git clone --recursive https://github.com/ggerganov/llama.cpp
cd llama.cpp && make clean && make all -j
Once that's done, redo the quantization.

In [8]:

!python3 convert_hf_to_gguf.py ../gguf_food_model --outfile ../gguf_food_model_final.gguf

INFO:hf-to-gguf:Loading model: gguf_food_model
INFO:hf-to-gguf:Model architecture: MistralForCausalLM
INFO:gguf.gguf_writer:gguf: This GGUF file is for Little Endian only
INFO:hf-to-gguf:Exporting model...
INFO:hf-to-gguf:gguf: loading model part 'pytorch_model.bin'
INFO:hf-to-gguf:token_embd.weight,           torch.float16 --> F16, shape = {3072, 32064}
Traceback (most recent call last):
  File "/content/llama.cpp/llama.cpp/convert_hf_to_gguf.py", line 9021, in <module>
    main()
  File "/content/llama.cpp/llama.cpp/convert_hf_to_gguf.py", line 9015, in main
    model_instance.write()
  File "/content/llama.cpp/llama.cpp/convert_hf_to_gguf.py", line 445, in write
    self.prepare_tensors()
  File "/content/llama.cpp/llama.cpp/convert_hf_to_gguf.py", line 2265, in prepare_tensors
    super().prepare_tensors()
  File "/content/llama.cpp/llama.cpp/convert_hf_to_gguf.py", line 313, in prepare_tensors
    for new_name, data_torch in (self.modify_tensors(data_torch, name, bid)):
          