#   PoliMeme Decode: Automated Political Meme Analysis
# Model: Qwen2.5-VL-7B-Instruct | Method: Few-Shot Chain of Thought | Output: Structured JSON

# Overview
This notebook implements an automated pipeline to analyze Bangladeshi political and social memes. Using the Qwen2.5-VL Vision-Language Model (VLM), we extract semantic meaning, detect political intensity, and identify visual metaphors.

# Key Features

* 4-Bit Quantization: Optimized to run on Tesla T4 (16GB) GPUs.
* Structured JSON Output: Enforces strict schema adherence for CSV generation.
* Memory Management: Implements aggressive garbage collection to prevent CUDA OOM errors.
* Robust Parsing: Regex-based fallbacks to handle model output variations.


# 1. Environment Setup
We first install the necessary dependencies. Qwen2.5-VL requires the latest transformers from GitHub and qwen-vl-utils for handling visual inputs.

In [None]:
# Cell 1: Install required libraries
!pip install -q git+https://github.com/huggingface/transformers --upgrade
!pip install -q accelerate bitsandbytes pandas tqdm qwen-vl-utils
print("Libraries installed! Please restart the kernel if this is the first run.")

# 2. Configuration & Imports

In [None]:
# Cell 2: Import libraries and setup
import os
import re
import json
import torch
import pandas as pd
from PIL import Image
from tqdm import tqdm
from transformers import Qwen2VLForConditionalGeneration, AutoProcessor, BitsAndBytesConfig
from qwen_vl_utils import process_vision_info

# Setup device
device = "cuda" if torch.cuda.is_available() else "cpu"
print(f"Device: {device}")

In [None]:
# Cell 3: Configuration
class Config:
    # UPDATE THIS PATH TO YOUR ACTUAL IMAGE DIRECTORY
    IMAGE_DIR = "/kaggle/input/poli-meme-decode-cuet-cse-fest/PoliMemeDecode/Test/Image"
    
    # Qwen2.5-VL-7B-Instruct is highly recommended for this task (OCR + Reasoning)
    MODEL_NAME = "Qwen/Qwen2.5-VL-7B-Instruct" 
    
    OUTPUT_CSV = "meme_analysis_optimized.csv"
    CHECKPOINT_CSV = "meme_analysis_checkpoint.csv"
    
    # Generation settings
    MAX_NEW_TOKENS = 512

config = Config()

# 3. Load Quantized Model
We load Qwen2.5-VL-7B-Instruct using BitsAndBytesConfig. This loads the model in 4-bit precision (NF4), reducing VRAM usage from ~15GB to ~6GB, leaving room for image processing context.

In [None]:
# Cell 4: Load Model
import torch
from transformers import AutoModelForVision2Seq, AutoProcessor, BitsAndBytesConfig

print(f"Loading {config.MODEL_NAME}...")

# 1. Define Quantization Config
bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_use_double_quant=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.bfloat16
)

# 2. Load Processor
processor = AutoProcessor.from_pretrained(config.MODEL_NAME, trust_remote_code=True)

# 3. Load Model using AutoModelForVision2Seq
# We use device_map="auto" here which is generally safer with AutoModel + BitsAndBytes
model = AutoModelForVision2Seq.from_pretrained(
    config.MODEL_NAME,
    quantization_config=bnb_config,
    device_map="auto", 
    trust_remote_code=True,
    low_cpu_mem_usage=True
)

print("Model loaded successfully!")

# 4. System Prompt Engineering
This is the core logic. We define a strict System Prompt that instructs the model to act as an expert analyst. It includes Guidelines for classification and Examples to ensure the output format matches the competition requirements exactly.

In [None]:
# Cell 5: Define System Prompt

# Placeholder <BLOCK> to avoid string escaping issues
prompt_template = """You are an expert analyst of Bangladeshi political and social memes. Your task is to analyze the meme image and output a JSON object.

### OUTPUT FORMAT (JSON ONLY):
You must output a single JSON object with these exact keys:
1.  **Image_name**: (Leave empty)
2.  **Humor**: Choose exactly ONE: 'Mockery', 'Sarcastic', 'Ironic', 'Satirical', 'Other'.
3.  **Metaphor**: Choose exactly ONE: 'Both', 'Text', 'Image'.
4.  **Meme_Explanation**: Describe the meme's meaning, context, and target.
5.  **Metaphor_Object**: Identify the specific SUBJECT or ENTITY (e.g., "Sheikh Hasina", "Vote Chori").
6.  **Political_Intensity**: Choose exactly ONE: 'High', 'Moderate', 'Low'.

### CRITICAL RULES FOR 'Political_Intensity'
* **HIGH**: You MUST select 'High' if the meme contains ANY of the following:
    * **Names/Faces**: Sheikh Hasina, Khaleda Zia, Tarique Rahman, Yunus, Modi, Trump, Sajeeb Wazed Joy, Obaidul Quader, Palak.
    * **Parties/Groups**: AL (Awami League), BNP, Jamaat, Shibir, BCL (Chhatra League), Hefazat.
    * **Keywords**: "Vote Chori", "August 15", "Genocide", "Dictator", "Fascist", "Regime", "Hartal", "1971", "Razakar", "Gonobhaban".
    * **State Forces**: Police, RAB, Army (only if depicted suppressing people or supporting a regime).
    * *Rule of Thumb*: If it attacks a specific politician or party -> **HIGH**.

* **MODERATE**: Use ONLY for:
    * General social satire without naming names (e.g., "Politicians are liars" but no specific face).
    * Institutional criticism (e.g., "Bank corruption", "Education board", "Dhaka Traffic", "Price Hikes").
    * Cultural figures/Celebs (e.g., Shakib Al Hasan, Hero Alom) unless linked to a political party.

* **LOW**: Use for:
    * Relatable daily life (Exams, Relationships, Weather, Cricket sports).
    * Pop culture/Movies.

### GUIDELINES FOR 'Metaphor_Object'
* **Identify the TARGET**: Do not describe the image. Describe who/what the image represents.
* *Wrong*: "Crying cat"
* *Right*: "The Public" or "Failed Student"
* *Wrong*: "Man laughing"
* *Right*: "Sheikh Hasina" or "Corrupt Politician"

### EXAMPLES

Example 1 (High - Named Leader):
Output:
<BLOCK>json
{
    "Humor": "Satirical",
    "Metaphor": "Both",
    "Meme_Explanation": "Satirizes the idea of politicians like Hasina fleeing to India.",
    "Metaphor_Object": "Sheikh Hasina",
    "Political_Intensity": "High"
}
<BLOCK>

Example 2 (Moderate - General Issue):
Output:
<BLOCK>json
{
    "Humor": "Ironic",
    "Metaphor": "Image",
    "Meme_Explanation": "Criticizes the general state of the country's fitness/infrastructure without naming a specific leader.",
    "Metaphor_Object": "Country's fitness",
    "Political_Intensity": "Moderate"
}
<BLOCK>

Example 3 (Low - Relatable):
Output:
<BLOCK>json
{
    "Humor": "Other",
    "Metaphor": "Text",
    "Meme_Explanation": "Relatable humor about having a low balance in a mobile wallet.",
    "Metaphor_Object": "Low account balance",
    "Political_Intensity": "Low"
}
<BLOCK>

### YOUR TASK:
Analyze the provided image and output the JSON.
"""

# Replace placeholder
SYSTEM_PROMPT = prompt_template.replace("<BLOCK>", "```")

print("System Prompt Refined.")

# 5. Helper Functions
We define two utility functions:


* extract_json: Uses Regex to find valid JSON blocks within the model's textual response, handling cases where the model might add conversational filler.
* analyze_image: Handles image resizing (to prevent OOM), tokenization, and inference generation.


In [None]:
# Cell 6: Processing Functions

def extract_json(text):
    """Robust JSON extraction."""
    try:
        # Try finding code blocks
        match = re.search(r"```json\s*(\{.*?\})\s*```", text, re.DOTALL)
        if match: return json.loads(match.group(1))
        # Try finding raw braces
        match = re.search(r"(\{.*\})", text, re.DOTALL)
        if match: return json.loads(match.group(1))
        return json.loads(text)
    except:
        return None

def analyze_image(image_path):
    image = Image.open(image_path).convert("RGB")
    
    # Resize large images to avoid OOM
    max_dim = 1024
    if max(image.size) > max_dim:
        image.thumbnail((max_dim, max_dim))

    messages = [
        {"role": "system", "content": [{"type": "text", "text": SYSTEM_PROMPT}]},
        {"role": "user", "content": [{"type": "image", "image": image}, {"type": "text", "text": "Analyze this."}]}
    ]

    text = processor.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
    image_inputs, video_inputs = process_vision_info(messages)
    
    inputs = processor(
        text=[text],
        images=image_inputs,
        videos=video_inputs,
        padding=True,
        return_tensors="pt"
    ).to(model.device)

    with torch.no_grad():
        generated_ids = model.generate(**inputs, max_new_tokens=config.MAX_NEW_TOKENS)
    
    generated_ids_trimmed = [
        out_ids[len(in_ids):] for in_ids, out_ids in zip(inputs.input_ids, generated_ids)
    ]
    output_text = processor.batch_decode(
        generated_ids_trimmed, skip_special_tokens=True, clean_up_tokenization_spaces=False
    )[0]

    return output_text

print("Helper Functions Defined.")

# 6. Execution Pipeline
The main loop processes images sequentially. It includes:


* Checkpointing: Saves progress every 10 images.
* Garbage Collection: Explicitly clears GPU cache after every iteration to ensure stability.
* Data Mapping: Maps the JSON output to the CSV columns required for submission.


In [None]:
# Cell 7: Process Images & Generate CSV

import gc 

def main():
    image_files = sorted([f for f in os.listdir(config.IMAGE_DIR) if f.lower().endswith(('.jpg', '.png', '.jpeg'))])
    results = []
    
    print(f"Found {len(image_files)} images.")

    # Resume from checkpoint
    if os.path.exists(config.CHECKPOINT_CSV):
        print("Resuming from checkpoint...")
        df_checkpoint = pd.read_csv(config.CHECKPOINT_CSV)
        results = df_checkpoint.to_dict('records')
        processed_files = set(df_checkpoint['Image_name'])
        image_files = [f for f in image_files if f not in processed_files]

    for idx, img_name in enumerate(tqdm(image_files)):
        img_path = os.path.join(config.IMAGE_DIR, img_name)
        
        try:
            # Memory Management
            torch.cuda.empty_cache()
            gc.collect()

            raw_response = analyze_image(img_path)
            data = extract_json(raw_response)
            
            if not data:
                entry = {
                    "Image_name": img_name,
                    "Humor": "Other",
                    "Metaphor": "Both",
                    "Meme_Explanation": "Error parsing response.",
                    "Metaphor_Object": "Unknown",
                    "Political_Intensity": "Low"
                }
            else:
                entry = {
                    "Image_name": img_name,
                    "Humor": data.get("Humor", "Other"),
                    "Metaphor": data.get("Metaphor", "Both"),
                    "Meme_Explanation": data.get("Meme_Explanation", ""),
                    "Metaphor_Object": data.get("Metaphor_Object", ""),
                    "Political_Intensity": data.get("Political_Intensity", "Low")
                }
            
            # Print for verification
            print(f"\n[Result for {img_name}]")
            print(f"Object: {entry['Metaphor_Object']} | Intensity: {entry['Political_Intensity']}")
            print("-" * 40)

            results.append(entry)

            if idx % 10 == 0:
                pd.DataFrame(results).to_csv(config.CHECKPOINT_CSV, index=False, encoding='utf-8-sig')

        except Exception as e:
            print(f"Error processing {img_name}: {e}")
            torch.cuda.empty_cache()
            continue

    # Final Save with Correct Columns
    final_df = pd.DataFrame(results)
    
    # Target Columns
    cols = ["Image_name", "Humor", "Metaphor", "Meme_Explanation", "Metaphor_Object", "Political_Intensity"]
    
    # Ensure all columns exist
    for col in cols:
        if col not in final_df.columns:
            final_df[col] = ""
            
    final_df = final_df[cols]
    final_df.to_csv(config.OUTPUT_CSV, index=False, encoding='utf-8-sig')
    print(f"Analysis Complete. Saved to {config.OUTPUT_CSV}")

if __name__ == "__main__":
    main()

## Notebook Summary: Automated Political Meme Analysis with Qwen2.5-VL

1. Automated Multimodal Analysis: The notebook implements an end-to-end pipeline using the Qwen2.5-VL-7B-Instruct model to interpret Bangladeshi political memes. It extracts complex semantic layers, classifying visual metaphors, humor types (e.g., Satire, Irony), and political intensity.
2. Hardware Optimization: To function within the 16GB VRAM limit of Kaggle's Tesla T4 GPUs, the model is loaded with 4-bit quantization (NF4) using bitsandbytes. This reduces memory usage significantly while maintaining inference quality for visual reasoning tasks.
3. Structured Output Generation: The workflow employs a specialized system prompt with few-shot examples to strictly enforce a JSON output schema. This ensures the model consistently generates clean, parseable data for specific CSV columns like Metaphor_Object and Political_Intensity instead of unstructured text.
4. Robust Error Management: The pipeline includes aggressive memory management strategies, such as image resizing (max 1024px) and explicit garbage collection (gc.collect()), to prevent CUDA Out-of-Memory (OOM) errors. It also features checkpointing to save progress every 10 images, ensuring data integrity during long batch processes.