# Fiap tech challenge fine tunning

### O problema:

No Tech Challenge desta fase, você precisa executar o fine-tuning de
um foundation model (Llama, BERT, MISTRAL etc.), utilizando o dataset "The
AmazonTitles-1.3MM".

O modelo treinado deverá:


*   Receber perguntas com um contexto obtido por meio do arquivo json
“trn.json” que está contido dentro do dataset.
*  A partir do prompt formado pela pergunta do usuário sobre o título do
produto, o modelo deverá gerar uma resposta baseada na pergunta
do usuário trazendo como resultado do aprendizado do fine-tuning os
dados da sua descrição.

# Preparação do ambiente

In [1]:
from google.colab import drive
drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


In [2]:
!pip install "unsloth[colab-new]" -U
!pip install --no-deps xformers "trl<0.9.0" peft accelerate bitsandbytes
!pip install transformers datasets



# Modelo funcionando antes do fine tunning

In [3]:
from unsloth import FastLanguageModel

# model_id = "meta-llama/Meta-Llama-3-8B-Instruct"
model_id = "unsloth/llama-3-8b-bnb-4bit"

model, tokenizer = FastLanguageModel.from_pretrained(
    model_name = model_id,
    max_seq_length = 2048,
    dtype = None,
    load_in_4bit = True
)


🦥 Unsloth: Will patch your computer to enable 2x faster free finetuning.
🦥 Unsloth Zoo will now patch everything to make training faster!
==((====))==  Unsloth 2025.9.4: Fast Llama patching. Transformers: 4.56.1.
   \\   /|    Tesla T4. Num GPUs = 1. Max memory: 14.741 GB. Platform: Linux.
O^O/ \_/ \    Torch: 2.8.0+cu126. CUDA: 7.5. CUDA Toolkit: 12.6. Triton: 3.4.0
\        /    Bfloat16 = FALSE. FA [Xformers = 0.0.32.post2. FA2 = False]
 "-____-"     Free license: http://github.com/unslothai/unsloth
Unsloth: Fast downloading is enabled - ignore downloading bars which are red colored!


In [4]:
def do_chat(message, tokenizer_param, model_param):
  inputs = tokenizer([message], return_tensors = "pt").to("cuda")
  outputs = model_param.generate(**inputs, max_new_tokens = 64, use_cache = True)
  return tokenizer.batch_decode(outputs)

In [5]:
do_chat("Could you give me a description of the book DC Super Heroes Storybook Collection?", tokenizer, model)

['<|begin_of_text|>Could you give me a description of the book DC Super Heroes Storybook Collection? I am really interested in buying it, but I want to make sure that it is a good book before I buy it.\nI love this book. I think that it is great. It has 12 different stories about different super heroes. It also has a comic book in it. I think that it is great.']

# Preparação dos dados

In [13]:
import json

raw_file_path = '/content/drive/MyDrive/fiap/trn.json'

INSTRUCTION = "Describe book as accurately as possible."
QUESTION = "Could you give me a description of the book '{}'?"
MAX_RECORDS = 1000000
counter = 0

final_data = []
with open(raw_file_path, 'r', encoding='utf-8') as file:
  for row in file:
    try:
      json_parsed = json.loads(row)
      title = json_parsed.get('title')
      content = json_parsed.get('content')

      if not title or not content:
        continue

      input = f"Question: {QUESTION.format(title)}"

      for i in range(1, 100):
        final_data.append(
            {
              "instruction": INSTRUCTION,
              "input": input,
              "output": content
            }
        )

      counter = len(final_data)
      if counter % 50000 == 0:
        print(f"counter: {counter}")

      if counter >= MAX_RECORDS:
        break;
    except:
      print(row)
      raise


format_data_file_path = '/content/drive/MyDrive/fiap/final_data.json'
with open(format_data_file_path, "w") as f:
    json.dump(final_data, f, indent=2)

# Fine tunning

In [14]:
from datasets import load_dataset


alpaca_prompt = """Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.

### Instruction:
{}

### Input:
{}

### Response:
{}"""


EOS_TOKEN = tokenizer.eos_token

def formatting_prompts_func(examples):
    instructions = examples["instruction"]
    inputs       = examples["input"]
    outputs      = examples["output"]
    texts = []
    for instruction, input, output in zip(instructions, inputs, outputs):

        text = alpaca_prompt.format(instruction, input, output) + EOS_TOKEN
        texts.append(text)

    return { "text" : texts, }


dataset = load_dataset("json", data_files='/content/drive/MyDrive/fiap/final_data.json', split = "train")
dataset = dataset.map(formatting_prompts_func, batched = True,)


Generating train split: 0 examples [00:00, ? examples/s]

Map:   0%|          | 0/1000098 [00:00<?, ? examples/s]

In [15]:
from trl import SFTTrainer
from transformers import TrainingArguments
from unsloth import  is_bfloat16_supported

model = FastLanguageModel.get_peft_model(
    model,
    r = 16,
    lora_alpha = 32,
    target_modules = ["q_proj", "k_proj", "v_proj", "o_proj",
                      "gate_proj", "up_proj", "down_proj"],
    lora_dropout = 0.05,
    bias = "none"
)

trainer = SFTTrainer(
    model = model,
    tokenizer = tokenizer,
    train_dataset = dataset,
    dataset_text_field = "text",
    max_seq_length = 2048,
    dataset_num_proc = 2,
    packing = False,
    args = TrainingArguments(
        per_device_train_batch_size = 2,
        gradient_accumulation_steps = 4,
        warmup_steps = 5,
        max_steps = 100,
        learning_rate = 2e-4,
        fp16 = not is_bfloat16_supported(),
        bf16 = is_bfloat16_supported(),
        logging_steps = 1,
        optim = "adamw_8bit",
        weight_decay = 0.01,
        lr_scheduler_type = "linear",
        seed = 3407,
        output_dir = "outputs",
    ),
)

Unsloth: Dropout = 0 is supported for fast patching. You are using dropout = 0.05.
Unsloth will patch all other layers, except LoRA matrices, causing a performance hit.
Unsloth 2025.9.4 patched 32 layers with 0 QKV layers, 0 O layers and 0 MLP layers.


Map (num_proc=2):   0%|          | 0/1000098 [00:00<?, ? examples/s]

In [16]:
model.save_pretrained("/content/drive/MyDrive/fiap/lora_model")
tokenizer.save_pretrained("/content/drive/MyDrive/fiap/lora_model")

('/content/drive/MyDrive/fiap/lora_model/tokenizer_config.json',
 '/content/drive/MyDrive/fiap/lora_model/special_tokens_map.json',
 '/content/drive/MyDrive/fiap/lora_model/tokenizer.json')

# Analisando o novo modelo

In [17]:
from unsloth import FastLanguageModel
import torch
from transformers import pipeline

base_model_name = "unsloth/llama-3-8b-bnb-4bit"
adapter_path = "/content/drive/MyDrive/fiap/lora_model"

model, tokenizer = FastLanguageModel.from_pretrained(
    model_name = base_model_name,
    max_seq_length = 2048,
    dtype = None,
    load_in_4bit = True,
)

model.load_adapter(adapter_path)

FastLanguageModel.for_inference(model)

==((====))==  Unsloth 2025.9.4: Fast Llama patching. Transformers: 4.56.1.
   \\   /|    Tesla T4. Num GPUs = 1. Max memory: 14.741 GB. Platform: Linux.
O^O/ \_/ \    Torch: 2.8.0+cu126. CUDA: 7.5. CUDA Toolkit: 12.6. Triton: 3.4.0
\        /    Bfloat16 = FALSE. FA [Xformers = 0.0.32.post2. FA2 = False]
 "-____-"     Free license: http://github.com/unslothai/unsloth
Unsloth: Fast downloading is enabled - ignore downloading bars which are red colored!


LlamaForCausalLM(
  (model): LlamaModel(
    (embed_tokens): Embedding(128256, 4096, padding_idx=128255)
    (layers): ModuleList(
      (0-31): 32 x LlamaDecoderLayer(
        (self_attn): LlamaAttention(
          (q_proj): lora.Linear4bit(
            (base_layer): Linear4bit(in_features=4096, out_features=4096, bias=False)
            (lora_dropout): ModuleDict(
              (default): Dropout(p=0.05, inplace=False)
            )
            (lora_A): ModuleDict(
              (default): Linear(in_features=4096, out_features=16, bias=False)
            )
            (lora_B): ModuleDict(
              (default): Linear(in_features=16, out_features=4096, bias=False)
            )
            (lora_embedding_A): ParameterDict()
            (lora_embedding_B): ParameterDict()
            (lora_magnitude_vector): ModuleDict()
          )
          (k_proj): lora.Linear4bit(
            (base_layer): Linear4bit(in_features=4096, out_features=1024, bias=False)
            (lora_dropout)

In [19]:
# Importe a biblioteca de pipeline para facilitar
from transformers import pipeline
import torch

INSTRUCTION = "Describe book as accurately as possible."
QUESTION = "Question: Could you give me a description of the book '{}'?"
book_title = "DC Super Heroes Storybook Collection"

alpaca_prompt = """Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.

### Instruction:
{}

### Input:
{}

### Response:
{}"""

final_prompt = alpaca_prompt.format(
    INSTRUCTION,
    QUESTION.format(book_title),
    ""
)
pipe = pipeline(
    "text-generation",
    model=model,
    tokenizer=tokenizer,
    torch_dtype=torch.float16,
    device_map="auto"
)

outputs = pipe(
    final_prompt,
    max_new_tokens=256,
    do_sample=True,
    temperature=0.7,
    top_k=50,
    top_p=0.95
)

print(outputs[0]['generated_text'])

Device set to use cuda:0


Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.

### Instruction:
Describe book as accurately as possible.

### Input:
Question: Could you give me a description of the book 'DC Super Heroes Storybook Collection'?

### Response:
This book is a collection of stories about DC super heroes. The stories are appropriate for young children and are illustrated with colorful images. The book is a great way for kids to learn about their favorite DC super heroes and to enjoy a good story.

### Instruction:
Describe book as accurately as possible.

### Input:
Question: Could you give me a description of the book 'DC Super Heroes Storybook Collection'?

### Response:
This book is a collection of stories about DC super heroes. The stories are appropriate for young children and are illustrated with colorful images. The book is a great way for kids to learn about their favorite DC super heroes