#**FIAP Hackathon: Fine Tunning com diagramas de arquitetura.**

# 1. Instalando pacotes e determinando parâmetros do modelo.

1.1 Instalando as bibliotecas necessárias para o fine-tuning.

In [None]:
!pip install --no-deps bitsandbytes accelerate xformers==0.0.29 peft trl triton
!pip install --no-deps cut_cross_entropy unsloth_zoo
!pip install sentencepiece protobuf datasets huggingface_hub hf_transfer
!pip install -U datasets huggingface_hub fsspec
!pip install --no-deps unsloth

Collecting fsspec
  Using cached fsspec-2025.5.1-py3-none-any.whl.metadata (11 kB)


1.2 - Importando os módulos necessários.

In [None]:
from unsloth import FastVisionModel, is_bf16_supported
from unsloth.trainer import UnslothVisionDataCollator
import os
import torch

🦥 Unsloth: Will patch your computer to enable 2x faster free finetuning.


    PyTorch 2.5.1+cu121 with CUDA 1201 (you have 2.6.0+cu124)
    Python  3.11.11 (you have 3.11.13)
  Please reinstall xformers (see https://github.com/facebookresearch/xformers#installing-xformers)
  Memory-efficient attention, SwiGLU, sparse and more won't be available.
  Set XFORMERS_MORE_DETAILS=1 for more details


🦥 Unsloth Zoo will now patch everything to make training faster!


1.3 - Carregando o modelo pré-treinado "Llama-3.2-11b-Vision" e seu respectivo tokenizer.

In [None]:
model, tokenizer = FastVisionModel.from_pretrained(
    "unsloth/Llama-3.2-11b-Vision-Instruct",
    load_in_4bit = True,
    use_gradient_checkpointing = "unsloth",
)

==((====))==  Unsloth 2025.6.8: Fast Mllama patching. Transformers: 4.52.4.
   \\   /|    Tesla T4. Num GPUs = 1. Max memory: 14.741 GB. Platform: Linux.
O^O/ \_/ \    Torch: 2.6.0+cu124. CUDA: 7.5. CUDA Toolkit: 12.4. Triton: 3.2.0
\        /    Bfloat16 = FALSE. FA [Xformers = None. FA2 = False]
 "-____-"     Free license: http://github.com/unslothai/unsloth
Unsloth: Fast downloading is enabled - ignore downloading bars which are red colored!


Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

1.4 - Aplicando a técnica PEFT (Parameter-Efficient Fine-tuning) ao modelo carregado.

In [None]:
model = FastVisionModel.get_peft_model(
    model,
    finetune_vision_layers = True,
    finetune_language_layers = True,
    finetune_attention_modules = True,
    finetune_mlp_modules = True,

    r = 16,
    lora_alpha = 32,
    lora_dropout = 0.05,
    bias = "none",
    random_state = 3407,
    use_rslora = False,
    loftq_config = None,
)

Unsloth: Making `model.base_model.model.model.vision_model.transformer` require gradients


1.5 - Montando o Google Drive no ambiente do Google Colab.

In [None]:
from google.colab import drive
drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


1.6 -  Carregando o dataset de imagens e definindo a instrução principal do prompt.

In [None]:
from datasets import load_dataset
from transformers import TextStreamer

dataset_path= "/content/drive/MyDrive/fiap-hackaton/data"

dataset = load_dataset("imagefolder", data_dir=dataset_path, split="train")

instruction = "You are an expert architecture diagram designer, analyze this architecture diagram and identify potential security threats using STRIDE methodology."

Resolving data files:   0%|          | 0/20 [00:00<?, ?it/s]

# 2. Instrução do Fine Tunnning.


2.1 - Formatando imagens do dataset em modelo de chat e gerando o par "image/instruction" necessário para o aprendizado.

In [None]:
def convert_text(sample):
  conversation = [
      {
          "role": "user",
          "content": [
              {"type": "text", "text": instruction },
              {"type": "image", "image": sample["image"] }
            ]
      }
  ]
  return {"messages": conversation}
pass

converted_dataset = [convert_text(sample) for sample in dataset]

2.2 - Preparando uma amostra única de inferência.

In [None]:
FastVisionModel.for_inference(model)

image = dataset[0]["image"]
instruction = "You are an expert architecture diagram designer, analyze this architecture diagram and identify potential security threats using STRIDE methodology."

messages = [
    {
        "role": "user",
        "content": [
            {"type": "image"},
            {"type": "text", "text": instruction }
          ]
    }
]

2.3 - Convertendo imagem em tensores processáveis pelo modelo e gerando a resposta.

In [None]:
input_text = tokenizer.apply_chat_template(messages, add_generation_prompt=True)
inputs = tokenizer(
    image,
    input_text,
    add_special_tokens = False,
    return_tensors="pt",
).to("cuda")

text_streamer = TextStreamer(tokenizer, skip_prompt=True)
_ = model.generate(**inputs, streamer=text_streamer, max_new_tokens=128,
    use_cache=True, temperature=1.5, min_p=0.1)

Based on the provided architecture diagram, I'll perform an analysis using the STRIDE methodology, which includes Spoofing, Tampering, Repudiation, Denial of Service (D), Elevation of Privilege, and Information Disclosure.

1. **Spoofing**: 
This threat can occur if the hub virtual network, the spoke virtual network, and the private endpoint are not properly secured or if the network infrastructure is compromised.
2. **Tampering**:
The Azure Bastion, jump box, and public IP address are critical components of the virtual network infrastructure. Any issues with these components, such as misconfiguration, vulnerabilities in


# 3. Treinando o modelo e o tokenizer e salvando.

3.1 - Configurando e iniciando o processo de treinamento do modelo.

In [None]:
from trl import SFTConfig, SFTTrainer
FastVisionModel.for_training(model)
trainer = SFTTrainer(
    model=model,
    tokenizer=tokenizer,
    data_collator=UnslothVisionDataCollator(model, tokenizer),
    train_dataset=converted_dataset,
    args=SFTConfig(
        per_device_train_batch_size=2,
        gradient_accumulation_steps=4,
        warmup_steps=5,
        max_steps=6,
        learning_rate=2e-4,
        fp16=not is_bf16_supported(),
        bf16=is_bf16_supported(),
        logging_steps=1,
        optim="adamw_8bit",
        weight_decay=0.01,
        lr_scheduler_type="linear",
        seed=3407,
        report_to="none",
    ),
)

trainer_stats = trainer.train()

==((====))==  Unsloth - 2x faster free finetuning | Num GPUs used = 1
   \\   /|    Num examples = 20 | Num Epochs = 2 | Total steps = 6
O^O/ \_/ \    Batch size per device = 2 | Gradient accumulation steps = 4
\        /    Data Parallel GPUs = 1 | Total batch size (2 x 4 x 1) = 8
 "-____-"     Trainable parameters = 67,174,400/11,000,000,000 (0.61% trained)


Step,Training Loss
1,5.1225
2,5.1273
3,5.0716
4,5.1585
5,5.1264
6,4.2683


3.2 - Compararando a saída do modelo pré-treinado com a do modelo treinado.

In [None]:
FastVisionModel.for_inference(model)

image = dataset[0]["image"]
instruction = "You are an expert architecture diagram designer, analyze this architecture diagram and identify potential security threats using STRIDE methodology."

messages = [
    {
        "role": "user",
        "content": [
            {"type": "image"},
            {"type": "text", "text": instruction }
          ]
    }
]

input_text = tokenizer.apply_chat_template(messages, add_generation_prompt=True)
inputs = tokenizer(
    image,
    input_text,
    add_special_tokens = False,
    return_tensors="pt",
).to("cuda")

text_streamer = TextStreamer(tokenizer, skip_prompt=True)
_ = model.generate(**inputs, streamer=text_streamer, max_new_tokens=128,
    use_cache=True, temperature=1.5, min_p=0.1)

To identify potential security threats in the given architecture diagram using the STRIDE methodology, we need to break down the diagram into its components and analyze them step by step. STRIDE stands for Spoofing, Tampering and man-in-the-middle, Repudiation, Information disclosure, Denial of service, and Elevation of privilege.

1. **Spoofing**: Spoofing occurs when an attacker mimics legitimate traffic or makes it appear that they are someone else. In this architecture, potential spoofing attacks could involve unauthorized access attempts through the public IP address or if virtual network peering is not properly configured.

2. **Tam


3.3 - Salvando o modelo e o tokenizer no Google Drive.

In [None]:
model.save_pretrained("/content/drive/MyDrive/fiap-hackaton/model")
tokenizer.save_pretrained("/content/drive/MyDrive/fiap-hackaton/model")

[]

# 4. Analisando um diagrama a partir do modelo treinado.

4.1 - Carregando o modelo e o tokenizer que foram salvos na seção anterior.

In [None]:
model, tokenizer = FastVisionModel.from_pretrained(
    "/content/drive/MyDrive/fiap-hackaton/model",
    load_in_4bit = True,
    use_gradient_checkpointing = "unsloth",
)

==((====))==  Unsloth 2025.6.8: Fast Mllama patching. Transformers: 4.52.4.
   \\   /|    Tesla T4. Num GPUs = 1. Max memory: 14.741 GB. Platform: Linux.
O^O/ \_/ \    Torch: 2.6.0+cu124. CUDA: 7.5. CUDA Toolkit: 12.4. Triton: 3.2.0
\        /    Bfloat16 = FALSE. FA [Xformers = None. FA2 = False]
 "-____-"     Free license: http://github.com/unslothai/unsloth
Unsloth: Fast downloading is enabled - ignore downloading bars which are red colored!


Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

4.2 - Realizando uma inferência final usando o modelo treinado.

In [None]:
image = dataset[2]["image"] #Mudar o número para selecionar um diagrama diferente.
instruction = "You are an expert architecture diagram designer, analyze this architecture diagram and identify potential security threats using STRIDE methodology."

messages = [
    {
        "role": "user",
        "content": [
            {"type": "image"},
            {"type": "text", "text": instruction }
          ]
    }
]

input_text = tokenizer.apply_chat_template(messages, add_generation_prompt=True)
inputs = tokenizer(
    image,
    input_text,
    add_special_tokens = False,
    return_tensors="pt",
).to("cuda")

text_streamer = TextStreamer(tokenizer, skip_prompt=True)
_ = model.generate(**inputs, streamer=text_streamer, max_new_tokens=128,
    use_cache=True, temperature=1.5, min_p=0.1)

**Potential Security Threats Identified:**

Based on the provided STRIDE security model, this Azure architecture diagram is not designed or deployed from the ground up, and security concerns arise at multiple points:

1.  **Privilege escalation and session hijacking**

The architecture consists of various regions connected via VPN and spokes. If attackers gain access to any region, they may elevate privileges and hijack sessions on other spokes or regions.
2.  **Data Tampering**

Azure Firewall sits inside a subnets region of all regions. Data may be tampered with while the packets traverse from outside to Azure Firewall in any region.

**Suggested Impro
