# Task 3 - Generative tasks with external models

This notebook contains a prototype of a RAG system based on LLaVA model. This task will consist of enhancing users multimodal prompts by providing related information from our data pipeline. The related info will be added according to its similarity with the user request. The aim is to provide the model with better context to create the response.

In [3]:
# Get Prompt from user

# Extract different types of data
user_prompt_text = "What is this cancer? Which is the diagnosis and how can it be treated?"
user_prompt_images = []
user_prompt_audios = []

# Retrieve similar results

## Using LLaVA local model
Here we will try to use a LLaVA model deployed locally. Particularly, we will use llava-1.5 with 7b parameter.
First, we need to load the model.

In [None]:
import base64
import torch
from PIL import Image
from transformers import AutoProcessor, LlavaForConditionalGeneration, BitsAndBytesConfig

# Load the model in half-precision
model = LlavaForConditionalGeneration.from_pretrained("llava-hf/llava-1.5-7b-hf", dtype=torch.float16, device_map="auto")
processor = AutoProcessor.from_pretrained("llava-hf/llava-1.5-7b-hf")

model.safetensors.index.json: 0.00B [00:00, ?B/s]

Fetching 3 files:   0%|          | 0/3 [00:00<?, ?it/s]

model-00002-of-00003.safetensors:   0%|          | 0.00/4.96G [00:00<?, ?B/s]

model-00001-of-00003.safetensors:   0%|          | 0.00/4.99G [00:00<?, ?B/s]

model-00003-of-00003.safetensors:   0%|          | 0.00/4.18G [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/3 [00:00<?, ?it/s]

generation_config.json:   0%|          | 0.00/141 [00:00<?, ?B/s]

Some parameters are on the meta device because they were offloaded to the disk and cpu.


processor_config.json:   0%|          | 0.00/173 [00:00<?, ?B/s]

chat_template.json:   0%|          | 0.00/701 [00:00<?, ?B/s]

chat_template.jinja:   0%|          | 0.00/674 [00:00<?, ?B/s]

preprocessor_config.json:   0%|          | 0.00/505 [00:00<?, ?B/s]

Using a slow image processor as `use_fast` is unset and a slow processor was saved with this model. `use_fast=True` will be the default behavior in v4.52, even if the model was saved with a slow processor. This will result in minor differences in outputs. You'll still be able to use a slow processor with `use_fast=False`.


tokenizer_config.json: 0.00B [00:00, ?B/s]

tokenizer.model:   0%|          | 0.00/500k [00:00<?, ?B/s]

tokenizer.json: 0.00B [00:00, ?B/s]

added_tokens.json:   0%|          | 0.00/41.0 [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/552 [00:00<?, ?B/s]

['USER:  \nWhat is this cancer? Which is the diagnosis and how can it be treated? ASSISTANT: The image shows a skin lesion, which appears to be a cancerous growth or a skin tag. The diagnosis of this skin lesion would']

Once we have loaded the model, we can send a query the following way. Here, we will ask a question about the diagnosis of an image:

In [9]:
image = Image.open("test.jpg").convert("RGB")

conversation = [
    {
        "role": "user",
        "content": [
            {"type": "image", "image": image},
            {"type": "text", "text": user_prompt_text},
        ],
    },
]

inputs = processor.apply_chat_template(
    conversation,
    add_generation_prompt=True,
    tokenize=True,
    return_dict=True,
    return_tensors="pt"
).to(model.device, torch.float16)

# Generate
generate_ids = model.generate(**inputs, max_new_tokens=256)
processor.batch_decode(generate_ids, skip_special_tokens=True)

['USER:  \nWhat is this cancer? Which is the diagnosis and how can it be treated? ASSISTANT: The image shows a skin lesion, which appears to be a cancerous growth or a skin tag. The diagnosis of this skin lesion would be a form of skin cancer, such as squamous cell carcinoma or basal cell carcinoma. These skin cancers are typically treated with surgical excision, which involves removing the cancerous growth along with a small margin of healthy skin around it. In some cases, additional treatments like radiation therapy or topical medications may be necessary, depending on the size, location, and stage of the cancer. It is essential to consult a dermatologist or a skin cancer specialist for proper diagnosis and treatment.']

## Using Gemini API
Using a local model requires high hardware specs (specially RAM and GPU) that are not matched by consumer-grade PCs, so it is very slow.

In [4]:
from dotenv import load_dotenv
from google import genai
from google.genai import types

load_dotenv()

client = genai.Client()

# Upload the first image
image_path = "test.jpg"
uploaded_file = client.files.upload(file=image_path)

## Prepare the second image as inline data
#image2_path = "path/to/image2.png"
#with open(image2_path, 'rb') as f:
#    img2_bytes = f.read()

# Create the prompt with text and multiple images
response = client.models.generate_content(
    model="gemini-2.5-flash",
    contents=[
        user_prompt_text,
        uploaded_file  # Use the uploaded file reference
        #types.Part.from_bytes(
        #    data=img2_bytes,
        #    mime_type='image/png'
        #)
    ]
)

print(response.text)


Based on the image provided, which shows a red, scaly, and somewhat crusted patch with areas of erosion on the skin, this lesion is highly suspicious for a **non-melanoma skin cancer**.

**Important Disclaimer:** I am an AI and cannot provide medical advice or a definitive diagnosis. A proper diagnosis can only be made by a qualified healthcare professional, such as a dermatologist, after a thorough examination and often a biopsy. The information below is for educational purposes only and should not be used to self-diagnose or self-treat.

---

### **Potential Diagnosis (Highly Suspicious):**

Given the appearance, the most likely diagnoses include:

1.  **Squamous Cell Carcinoma (SCC):**
    *   **Description:** This is a very common type of skin cancer that can appear as a firm, red nodule, a scaly patch, or a sore that doesn't heal. It can sometimes be crusted, ulcerated, or bleed easily. It often arises in sun-exposed areas and can develop from a precursor lesion called an actinic 