# Desafio

Ap√≥s o sucesso na automa√ß√£o de an√°lises de exames e textos cl√≠nicos,
o hospital quer avan√ßar para um n√≠vel superior de personaliza√ß√£o: criar um
assistente virtual m√©dico, treinado com os dados pr√≥prios do hospital,
capaz de auxiliar nas condutas cl√≠nicas, responder d√∫vidas de m√©dicos e
sugerir procedimentos com base nos protocolos internos.  



# Pr√©-processamento


## Estruturar dados

Utilizando o conjunto de dados [MedQuad](https://github.com/abachaa/MedQuAD), as fun√ß√µes a seguir, convertem os dados estruturados em CSV para o formato JSON.


```js
// input.csv

AnswerID;Answer
ADAM_0003147_Sec1.txt;"Question: What is (are) Polycystic ovary syndrome ? (Also called: Polycystic ovaries; Polycystic ovary disease; Stein-Leventhal syndrome; Polyfollicular ovarian disease)
URL: https://www.nlm.nih.gov/medlineplus/ency/article/000369.htm

// output.json
[{
    "question": "What is (are) Polycystic ovary syndrome ? (Also called: Polycystic ovaries; Polycystic ovary disease; Stein-Leventhal syndrome; Polyfollicular ovarian disease)",
    "answer": "Polycystic ovary syndrome is a condition in which a woman has an imbalance of female sex hormones. This may lead to changes in the menstrual cycle, cysts in the ovaries, trouble getting pregnant, and other health problems."
  }, ... ]
```

In [None]:
import json
import csv
from google.colab import drive

drive.mount('/content/drive')

Mounted at /content/drive


In [None]:
def csv_to_json(csv_file_path, output_json_path):
	qa_pairs = []

	with open(csv_file_path, 'r', encoding='utf-8') as file:
		reader = csv.reader(file, delimiter=';')

		next(reader, None)

		for row in reader:
			if len(row) >= 2:
				full_text = row[1]

				if full_text.startswith('"') and full_text.endswith('"'):
					full_text = full_text[1:-1]

				question_start = full_text.find("Question: ") + len("Question: ")
				url_start = full_text.find("URL: ")
				answer_start = full_text.find("Answer: ") + len("Answer: ")

				if question_start >= len("Question: ") and url_start > question_start:
					question = full_text[question_start:url_start].strip()

					answer = full_text[answer_start:].strip()

					if answer.endswith(')'):
						answer = answer[:-1]

					qa_pairs.append({
						"instruction": "Explain:",
						"input": f"{question}",
						"output": f"{answer}",
					})

	with open(output_json_path, 'w', encoding='utf-8') as json_file:
		json.dump(qa_pairs, json_file, indent=2, ensure_ascii=False)

	return qa_pairs

In [None]:
csv_file_path = '/content/drive/MyDrive/tech-challenge-fase-3/qa-medquad-raw.csv'
dataset_processed = '/content/drive/MyDrive/tech-challenge-fase-3/qa-medquad-processed.json'

input = csv_to_json(csv_file_path , dataset_processed)
print(f"{len(input)} registros criados.")

2479 registros criados.


# Fine-tuning

Para fins de did√°ticos e de acessibilidade, todos os recursos ser√£o open source, isso incl√∫i o modelo utilizado para o fine-tuning.

A acessibilidade fica por conta da utiliza√ß√£o da **QLoRA**,(Quantized Low-Rank Adaptation) que se trata de uma t√©cnica de ajuste fino eficiente para modelos de linguagem grandes (LLMs) que reduz a necessidade de mem√≥ria ao quantizar o modelo base para um formato de 4 bits. Ela permite treinar modelos maiores em hardware mais limitado, com um custo computacional e de mem√≥ria significativamente menor do que o ajuste fino tradicional.

> **NOTA** Este c√≥digo foi retirado da documenta√ß√£o do pr√≥prio unsloth na data em que este arquivo foi criado e pode n√£o estar atualizado. Consulte a referencia.  
>
> REF: Tutorial: How to Finetune Llama-3 and Use In Ollama - https://docs.unsloth.ai/get-started/fine-tuning-llms-guide/tutorial-how-to-finetune-llama-3-and-use-in-ollama

In [None]:
!pip install "unsloth[colab-new] @ git+https://github.com/unslothai/unsloth.git"
!pip install --no-deps xformers "trl<0.9.0" peft accelerate bitsandbytes

Collecting unsloth@ git+https://github.com/unslothai/unsloth.git (from unsloth[colab-new]@ git+https://github.com/unslothai/unsloth.git)
  Cloning https://github.com/unslothai/unsloth.git to /tmp/pip-install-7ryjd2wx/unsloth_f7f5fee3c450481692d6b09687d0e39a
  Running command git clone --filter=blob:none --quiet https://github.com/unslothai/unsloth.git /tmp/pip-install-7ryjd2wx/unsloth_f7f5fee3c450481692d6b09687d0e39a
  Resolved https://github.com/unslothai/unsloth.git to commit d1e312dcdc57bf020aa0f6da810226efe79cd69a
  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone
Collecting unsloth_zoo>=2025.11.6 (from unsloth@ git+https://github.com/unslothai/unsloth.git->unsloth[colab-new]@ git+https://github.com/unslothai/unsloth.git)
  Downloading unsloth_zoo-2025.12.1-py3-none-any.whl.metadata (32 kB)
Collecting tyro (from unsloth@ git+https://github.com/unslothai/unsloth.gi

In [None]:
from unsloth import FastLanguageModel

max_seq_length = 2048 # Choose any! We auto support RoPE Scaling internally!
dtype = None # None for auto detection. Float16 for Tesla T4, V100, Bfloat16 for Ampere+
load_in_4bit = True # Use 4bit quantization to reduce memory usage. Can be False.

# 4bit pre quantized models we support for 4x faster downloading + no OOMs.
fourbit_models = [
    "unsloth/Meta-Llama-3.1-8B-bnb-4bit",      # Llama-3.1 15 trillion tokens model 2x faster!
    "unsloth/Meta-Llama-3.1-8B-Instruct-bnb-4bit",
    "unsloth/Meta-Llama-3.1-70B-bnb-4bit",
    "unsloth/Meta-Llama-3.1-405B-bnb-4bit",    # We also uploaded 4bit for 405b!
    "unsloth/Mistral-Nemo-Base-2407-bnb-4bit", # New Mistral 12b 2x faster!
    "unsloth/Mistral-Nemo-Instruct-2407-bnb-4bit",
    "unsloth/mistral-7b-v0.3-bnb-4bit",        # Mistral v3 2x faster!
    "unsloth/mistral-7b-instruct-v0.3-bnb-4bit",
    "unsloth/Phi-3.5-mini-instruct",           # Phi-3.5 2x faster!
    "unsloth/Phi-3-medium-4k-instruct",
    "unsloth/gemma-2-9b-bnb-4bit",
    "unsloth/gemma-2-27b-bnb-4bit",            # Gemma 2x faster!
] # More models at https://huggingface.co/unsloth

model, tokenizer = FastLanguageModel.from_pretrained(
    model_name = "unsloth/Meta-Llama-3.1-8B-bnb-4bit",
    max_seq_length = max_seq_length,
    dtype = dtype,
    load_in_4bit = load_in_4bit,
    # token = "hf_...", # use one if using gated models like meta-llama/Llama-2-7b-hf
)

ü¶• Unsloth: Will patch your computer to enable 2x faster free finetuning.
ü¶• Unsloth Zoo will now patch everything to make training faster!
Unsloth: Could not find Config class in trl.trainer.dpo_trainer. Found: []
Unsloth: Could not find Config class in trl.trainer.iterative_sft_trainer. Found: []
Unsloth: Could not find Config class in trl.trainer.sft_trainer. Found: []
==((====))==  Unsloth 2025.11.6: Fast Llama patching. Transformers: 4.57.2.
   \\   /|    Tesla T4. Num GPUs = 1. Max memory: 14.741 GB. Platform: Linux.
O^O/ \_/ \    Torch: 2.9.0+cu126. CUDA: 7.5. CUDA Toolkit: 12.6. Triton: 3.5.0
\        /    Bfloat16 = FALSE. FA [Xformers = 0.0.33.post2. FA2 = False]
 "-____-"     Free license: http://github.com/unslothai/unsloth
Unsloth: Fast downloading is enabled - ignore downloading bars which are red colored!


model.safetensors:   0%|          | 0.00/5.70G [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/235 [00:00<?, ?B/s]

tokenizer_config.json: 0.00B [00:00, ?B/s]

special_tokens_map.json:   0%|          | 0.00/459 [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/17.2M [00:00<?, ?B/s]

In [None]:
model = FastLanguageModel.get_peft_model(
    model,
    r = 16, # Choose any number > 0 ! Suggested 8, 16, 32, 64, 128
    target_modules = ["q_proj", "k_proj", "v_proj", "o_proj",
                      "gate_proj", "up_proj", "down_proj",],
    lora_alpha = 16,
    lora_dropout = 0, # Supports any, but = 0 is optimized
    bias = "none",    # Supports any, but = "none" is optimized
    # [NEW] "unsloth" uses 30% less VRAM, fits 2x larger batch sizes!
    use_gradient_checkpointing = "unsloth", # True or "unsloth" for very long context
    random_state = 3407,
    use_rslora = False,  # We support rank stabilized LoRA
    loftq_config = None, # And LoftQ
)

Unsloth 2025.11.6 patched 32 layers with 32 QKV layers, 32 O layers and 32 MLP layers.


### Data Prep
Preparamos nossos dados para ent√£o treinarmos o modelo.
Neste momento, definimos contexto sistemico que ser√° interpretado para cada registro do nossos dados esdtruturados.

In [None]:
from sklearn.model_selection import train_test_split
from datasets import Dataset # Import Dataset
import pandas as pd

prompt = """Below is an instruction that describes a task, paired with an input that provides further context. Write a response in pt-BR that appropriately completes the request.

### Instruction:
{}

### Input:
{}

### Response:
{}"""

EOS_TOKEN = tokenizer.eos_token # Must add EOS_TOKEN
def formatting_prompts_func(example):
    instruction = example["instruction"]
    input_text = example["input"]
    output = example["output"]

    # Format a single prompt string per example
    text = prompt.format(instruction, input_text, output) + EOS_TOKEN
    return { "text" : text }
pass

# Use a list comprehension for clarity, as the function now processes single records
formatted = [formatting_prompts_func(record) for record in input]

pd_dataset = pd.DataFrame(formatted, columns=["text"])
pd_dataset.dropna(inplace=True)
pd_train, pd_test = train_test_split(
    pd_dataset,
    test_size=0.2,       # 20% for testing
    random_state=42,     # Ensures reproducible splits
    shuffle=True         # Shuffles the data before splitting
)


# Convert the pandas DataFrame to a datasets.Dataset object
train_dataset = Dataset.from_pandas(pd_train)
test_dataset = Dataset.from_pandas(pd_test)

### Treinando o modelo
Agora vamos usar o `SFTTrainer` do Hugging Face TRL.  
> REF: Documenta√ß√£o do SFT do TRL - https://huggingface.co/docs/trl/sft_trainer


In [None]:
from trl import SFTTrainer
from transformers import TrainingArguments
from unsloth import is_bfloat16_supported

trainer = SFTTrainer(
    model = model,
    tokenizer = tokenizer,
    train_dataset = train_dataset,
    dataset_text_field = "text",
    max_seq_length = max_seq_length,
    dataset_num_proc = 2,
    packing = False, # Pode tornar o treino at√© 5 vezes mais r√°pido para sequ√™ncias curtas.
    args = TrainingArguments(
        per_device_train_batch_size = 2,
        gradient_accumulation_steps = 4,
        warmup_steps = 5,
        max_steps = 60, # Executamos 60 etapas para acelerar o processo,
        learning_rate = 2e-4,
        fp16 = not is_bfloat16_supported(),
        bf16 = is_bfloat16_supported(),
        logging_steps = 1,
        optim = "adamw_8bit",
        weight_decay = 0.01,
        lr_scheduler_type = "linear",
        seed = 3407,
        output_dir = "outputs",
        report_to = "none",
    ),
)

Map (num_proc=2):   0%|          | 0/1983 [00:00<?, ? examples/s]

In [None]:
trainer_stats = trainer.train()

==((====))==  Unsloth - 2x faster free finetuning | Num GPUs used = 1
   \\   /|    Num examples = 1,983 | Num Epochs = 1 | Total steps = 60
O^O/ \_/ \    Batch size per device = 2 | Gradient accumulation steps = 4
\        /    Data Parallel GPUs = 1 | Total batch size (2 x 4 x 1) = 8
 "-____-"     Trainable parameters = 41,943,040 of 8,072,204,288 (0.52% trained)


Unsloth: Will smartly offload gradients to save VRAM!


Step,Training Loss
1,1.816
2,1.9415
3,2.0567
4,1.6661
5,1.5165
6,1.6005
7,1.8052
8,1.7659
9,1.4796
10,1.7941


### Inferencia

Para testar o modelo, adicionamos a instru√ß√£o e a entrada, deixando o parametro `ouput` vazio



In [None]:
FastLanguageModel.for_inference(model) # Enable native 2x faster inference
inputs = tokenizer(
[
    prompt.format(
        "Explain as a general doctor", # instruction
        "How to prevent Noonan syndrome?", # input
        "", # output - leave this blank for generation!
    )
], return_tensors = "pt").to("cuda")

outputs = model.generate(**inputs, max_new_tokens = 64, use_cache = True)
tokenizer.batch_decode(outputs)

["<|begin_of_text|>Below is an instruction that describes a task, paired with an input that provides further context. Write a response in pt-BR that appropriately completes the request.\n\n### Instruction:\nExplain as a general doctor\n\n### Input:\nHow to prevent Noonan syndrome?\n\n### Response:\nNoonan syndrome is a genetic disorder that causes a number of physical abnormalities. It's not known what causes Noonan syndrome, but it's thought to be inherited. There is no cure for Noonan syndrome, but treatment can help manage the symptoms. Treatment may include: - Surgery to repair heart defects - Medication"]

Em seguida, o modelo customizado √© salvo

In [None]:
model.save_pretrained("/content/drive/MyDrive/tech-challenge-fase-3/model/tech_challenge_3_model")  # Local saving
tokenizer.save_pretrained("/content/drive/MyDrive/tech-challenge-fase-3/model/tech_challenge_3_model")
# model.push_to_hub("your_name/lora_model", token = "...") # Online saving
# tokenizer.push_to_hub("your_name/lora_model", token = "...") # Online saving

('/content/drive/MyDrive/tech-challenge-fase-3/model/tech_challenge_3_model/tokenizer_config.json',
 '/content/drive/MyDrive/tech-challenge-fase-3/model/tech_challenge_3_model/special_tokens_map.json',
 '/content/drive/MyDrive/tech-challenge-fase-3/model/tech_challenge_3_model/tokenizer.json')