# 1. Load model from Hugging Face Hub and configure the inference

This notebook is for running MedGemma 27B locally with offloading to system memory. GGUF quantized models do not have the visual encoder part so this is the best way of running it if you have a GPU with less than 32 Gb of VRAM. 

In [None]:
import os, gc, torch
from PIL import Image
from transformers import (
    AutoConfig,
    AutoModelForImageTextToText,
    AutoProcessor,
    BitsAndBytesConfig,
)
from accelerate import init_empty_weights, infer_auto_device_map

# Config
model_id   = "google/medgemma-27b-it"      
image_path = r"F:\Downloads\Chest2.jpeg"
prompt     = "Describe the key findings."  
HF_TOKEN   = os.getenv("HF_TOKEN")        

ATTN_IMPL = "sdpa"                    

bnb_cfg = BitsAndBytesConfig(
    load_in_8bit=True,
    llm_int8_threshold=6.0,
    llm_int8_has_fp16_weight=False,
    llm_int8_enable_fp32_cpu_offload=True,
    llm_int8_skip_modules=["vision_tower", "SiglipVisionModel"],
)

config = AutoConfig.from_pretrained(
    model_id,
    attn_implementation=ATTN_IMPL,
    token=HF_TOKEN,
)

with init_empty_weights():
    empty = AutoModelForImageTextToText.from_config(
        config, torch_dtype=torch.bfloat16
    )

device_map = infer_auto_device_map(
    empty,
    no_split_module_classes=["SiglipVisionModel"],  
    max_memory={0: "44GB", "cpu": "120GB"},       
)
del empty
gc.collect()

# Load the model
print("Loading MedGemma-27B")
model = AutoModelForImageTextToText.from_pretrained(
    model_id,
    token=HF_TOKEN,
    torch_dtype=torch.bfloat16,
    quantization_config=bnb_cfg,
    device_map=device_map,
    offload_state_dict=False,
    low_cpu_mem_usage=True,
    attn_implementation=ATTN_IMPL,
)
processor = AutoProcessor.from_pretrained(model_id, token=HF_TOKEN)
print("Model ready, VRAM usage:",
      f"{torch.cuda.memory_allocated()/1024/1024/1024: .2f} GB")

# open the image, setup the prompt and apply the chat template
img = Image.open(image_path).convert("RGB")

system_msg = {"role": "system",
              "content": [{"type": "text",
                           "text": "You are an expert radiologist."}]}

usr_msg_img =     {"role": "user",
     "content": [
         {"type": "text", "text": prompt},
         {"type": "image", "image": img},
     ]}

batch = processor.apply_chat_template(
    [system_msg, usr_msg_img],
    add_generation_prompt=True,
    tokenize=True,
    return_dict=True,
    return_tensors="pt",
).to(model.device, dtype=torch.bfloat16) 


  from .autonotebook import tqdm as notebook_tqdm
The model weights are not tied. Please use the `tie_weights` method before using the `infer_auto_device` function.


Loading MedGemma-27B


Tied parameters are on different devices: {'lm_head.weight': 'cpu', 'model.language_model.embed_tokens.weight': 0}. Please modify your custom device map or set `device_map='auto'`. 
Loading checkpoint shards: 100%|██████████| 12/12 [00:10<00:00,  1.16it/s]
Some parameters are on the meta device because they were offloaded to the cpu.
Using a slow image processor as `use_fast` is unset and a slow processor was saved with this model. `use_fast=True` will be the default behavior in v4.52, even if the model was saved with a slow processor. This will result in minor differences in outputs. You'll still be able to use a slow processor with `use_fast=False`.


Model ready, VRAM usage:  20.79 GB


# 2. Running inference

### First we run it "deterministically". It's greedy decoding, the model always chooses the most likely token at each step. The output isn't really deterministic but it will be pretty consistent. 

In [None]:
# ===== cell 6 =====
with torch.inference_mode():
    output_ids = model.generate(
    **batch,
    do_sample=False,       # "deterministic" (greedy) decoding
    num_beams=1,           # no beam search
    temperature=0.0,       # ignored because do_sample=False
    top_k=None,            # ignored
    top_p=None,            # ignored
    max_new_tokens=1024
)

answer = processor.decode(output_ids[0], skip_special_tokens=True)
print("\n—--- OUTPUT ---—\n")
print(answer)

The following generation flags are not valid and may be ignored: ['temperature']. Set `TRANSFORMERS_VERBOSITY=info` for more details.



—--- OUTPUT ---—

user
You are an expert radiologist.

Describe the key findings.




model
Based on the provided chest X-ray, here are the key findings:

1.  **Normal Heart Size and Mediastinal Contours:** The cardiac silhouette appears within normal limits in size, and the mediastinal structures (trachea, esophagus, great vessels) have typical contours.
2.  **Normal Lung Fields:** The lung parenchyma appears clear bilaterally, without evidence of consolidation, infiltrates, masses, or significant pleural effusions.
3.  **Normal Diaphragms and Costophrenic Angles:** The diaphragmatic domes are well-defined and at appropriate levels. The costophrenic angles are sharp, indicating no fluid accumulation in the pleural spaces.
4.  **Normal Bony Structures:** The visualized ribs, clavicles, and thoracic spine appear intact without acute fractures or destructive lesions.
5.  **Normal Hila:** The pulmonary hila appear unremarkable in size and density.

**Overall Impression:** The chest X-ray

### Now we run the model more with token sampling.

In [3]:
# ===== cell 6 =====
with torch.inference_mode():
    output_ids = model.generate(
    **batch,
    do_sample=True,           # enable sampling instead of greedy
    temperature=0.7,          # controls randomness; lower = more deterministic
    top_p=0.9,                # nucleus sampling
    top_k=50,                 # optional but helps constrain randomness
    max_new_tokens=1024,
)

answer = processor.decode(output_ids[0], skip_special_tokens=True)
print("\n—--- OUTPUT ---—\n")
print(answer)


—--- OUTPUT ---—

user
You are an expert radiologist.

Describe the key findings.




model
Here's a breakdown of the key findings in the chest X-ray:

**Overall Impression:** The image shows a standard posteroanterior (PA) chest X-ray. The patient is rotated slightly to the right.

**Key Findings:**

1.  **Normal Heart Size and Mediastinum:** The cardiac silhouette appears within normal limits in size and shape. The mediastinal contours are unremarkable.
2.  **Normal Lung Fields:** Both lung fields appear clear without evidence of consolidation, infiltrates, masses, or effusions.
3.  **Normal Hila:** The hilar regions are not enlarged or obscured.
4.  **Normal Pleura:** No pleural effusion or pneumothorax is identified. The pleural surfaces appear smooth.
5.  **Normal Bony Structures:** The visualized ribs, clavicles, and thoracic spine appear intact.
6.  **Scoliosis:** There is a moderate S-shaped curvature of the thoracic spine (scoliosis). This is a structural abnormality, not an 

### Something a bit different, we run the model so it output 3 different outputs, that we feed into the language part of MedGemma to get a consensuous result. 

In [4]:
# ===== cell 6 =====
with torch.inference_mode():
    output_ids = model.generate(
    **batch,
    do_sample=True,           # enable sampling instead of greedy
    temperature=0.7,          # controls randomness; lower = more deterministic
    top_p=0.9,                # nucleus sampling
    top_k=50,                 # optional but helps constrain randomness
    max_new_tokens=1024,
    num_return_sequences=3,
)

answers = processor.batch_decode(output_ids, skip_special_tokens=True)
for i, ans in enumerate(answers, 1):
    print(f"\n--- Output {i} ---\n{ans}")


--- Output 1 ---
user
You are an expert radiologist.

Describe the key findings.




model
Here's an analysis of the chest radiograph, focusing on key findings:

**Overall Impression:** The image shows a standard posteroanterior (PA) view of the chest. The patient is likely an adult female, given the breast tissue visible. The lungs appear relatively clear, but there are some notable findings.

**Key Findings:**

1.  **Scoliosis:** There is a significant curvature of the spine, specifically a thoracic scoliosis (convex to the right). This is evident by the asymmetry of the rib cage and the vertebral column. The scoliosis affects the appearance of the lung fields and mediastinum.

2.  **Lung Fields:**
    *   The lung fields appear largely clear without obvious consolidation, large effusions, or pneumothorax.
    *   There might be subtle increased interstitial markings, particularly in the lower zones, but this could be related to the scoliosis distorting the lung architecture or subt

### The same as before but we also give it the original x-ray, have no idea if this is in anyway useful.

In [5]:
fuse_prompt = (
    "Here are three separate radiology reports for the same chest X-ray. "
    "Unify them into a single, concise final report, preserving every key "
    "finding and discarding contradictions if they are clearly wrong.\n\n"
    + "\n\n---\n\n".join(f"[Report {i+1}]\n{r}" for i, r in enumerate(answers, 1))
)

user_msg_text = {"role": "user", "content": [{"type": "text", "text": fuse_prompt}]}

batch = processor.apply_chat_template(
    [system_msg,user_msg_text],
    add_generation_prompt=True,
    tokenize=True,
    return_dict=True,
    return_tensors="pt",
).to(model.device, dtype=torch.bfloat16)

with torch.inference_mode():
    ids = model.generate(
    **batch,
    max_new_tokens=1024,
    do_sample=False,          # deterministic merge
    return_dict_in_generate=True
)

gen_ids = ids.sequences[:, batch["input_ids"].shape[-1]:]

final_report = processor.batch_decode(gen_ids, skip_special_tokens=True)[0]
print(final_report)

Okay, I will unify the three reports into a single, concise final report, incorporating all key findings and resolving any contradictions.

**Final Unified Chest Radiograph Report**

**Patient:** (Assuming adult female based on Report 2, but not specified)
**Date:** (Not specified)
**Study:** Chest Radiograph, PA view

**Clinical Indication:** (Not specified)

**Findings:**

1.  **Cardiomediastinal Silhouette:** Normal in size and contour. The heart size is within normal limits. The mediastinum is midline.
2.  **Lung Fields:** Clear bilaterally. No evidence of consolidation, masses, pleural effusion, or pneumothorax. Pulmonary vasculature is unremarkable. Lung volumes appear normal.
3.  **Bones:**
    *   **Scoliosis:** Significant thoracic scoliosis is present, convex to the right (as noted in Reports 2 and 3). This causes asymmetry of the rib cage.
    *   **Pectus Excavatum:** A concave depression of the sternum is present (as noted in Report 4).
    *   Other visualized bony struct

In [6]:
fuse_prompt = (
    "Here are three separate radiology reports for the same chest X-ray. "
    "Unify them into a single, concise final report, preserving every key "
    "finding and discarding contradictions if they are clearly wrong. For reference the x ray itself is also supplied.\n\n"
    + "\n\n---\n\n".join(f"[Report {i+1}]\n{r}" for i, r in enumerate(answers, 1))
)
user_msg_text = {"role": "user",
     "content": [
         {"type": "text", "text": fuse_prompt},
         {"type": "image", "image": img},
     ]}

batch = processor.apply_chat_template(
    [system_msg,user_msg_text],
    add_generation_prompt=True,
    tokenize=True,
    return_dict=True,
    return_tensors="pt",
).to(model.device, dtype=torch.bfloat16)

with torch.inference_mode():
    ids = model.generate(
    **batch,
    max_new_tokens=1024,
    do_sample=False,          # deterministic merge
    return_dict_in_generate=True
)

gen_ids = ids.sequences[:, batch["input_ids"].shape[-1]:]

final_report = processor.batch_decode(gen_ids, skip_special_tokens=True)[0]
print(final_report)



Okay, I will unify the three reports into a single, concise final report, incorporating all key findings and resolving any contradictions.

**Final Unified Radiology Report**

**Patient:** (Not specified in reports, assume standard adult)
**Exam:** Chest Radiograph (PA view)
**Date:** (Not specified)

**Clinical Indication:** (Not specified)

**Findings:**

1.  **Cardiomediastinal Silhouette:** The heart size is within normal limits. Mediastinal contours are unremarkable.
2.  **Lung Fields:** The lungs are clear bilaterally without evidence of consolidation, pleural effusion, or pneumothorax. Pulmonary vasculature is normal.
3.  **Bones:**
    *   **Scoliosis:** Significant thoracic scoliosis is present, convex to the right, causing asymmetry of the rib cage.
    *   **Pectus Excavatum:** A concave depression of the sternum (pectus excavatum) is noted.
    *   Other visualized bony structures (ribs, clavicles) appear intact.
4.  **Other:** No acute abnormalities identified.

**Impressi