# Running KiVA Benchmark with LLaVA

This notebook sets up and runs the KiVA benchmark using the LLaVA (Large Language and Vision Assistant) model.

Make sure you're running this notebook with a GPU runtime in Colab:
- Runtime > Change runtime type > GPU

## 1. Setup and Dependencies

First, let's install the required packages and clone the repository.

In [None]:
# Install required packages
!pip install torch torchvision transformers pillow
!pip install bitsandbytes accelerate
!pip install git+https://github.com/huggingface/transformers.git

In [None]:
# Clone the KiVA repository
!git clone https://github.com/VHKoisa/kiva-challenge.git
%cd kiva-challenge

## 2. Import Required Modules and Setup Model

Now we'll import the necessary modules and set up the LLaVA model.

In [None]:
import torch
from transformers import LlavaForConditionalGeneration, AutoProcessor
from PIL import Image
import os
import gc

# Set PyTorch memory management
torch.cuda.empty_cache()
os.environ['PYTORCH_CUDA_ALLOC_CONF'] = 'expandable_segments:True'

# Enable memory efficient attention
os.environ['TRANSFORMERS_OFFLINE'] = '1'
os.environ['ATTENTION_IMPLEMENTATION'] = 'flash_attention_2'

# Check if CUDA is available
device = "cuda" if torch.cuda.is_available() else "cpu"
print(f"Using device: {device}")

# Print available GPU memory
if torch.cuda.is_available():
    print(f"Total GPU memory: {torch.cuda.get_device_properties(0).total_memory / 1e9:.2f} GB")
    print(f"Available GPU memory: {torch.cuda.memory_allocated() / 1e9:.2f} GB")

In [None]:
def load_llava_model():
    # Use 7B model instead of 13B
    model_id = "llava-hf/llava-1.5-7b-hf"
    
    # Load model with memory optimizations
    model = LlavaForConditionalGeneration.from_pretrained(
        model_id,
        torch_dtype=torch.float16,
        load_in_8bit=True,  # Enable 8-bit quantization
        device_map="auto",  # Automatically manage memory
        max_memory={0: "4GB"},  # Limit GPU memory usage
        low_cpu_mem_usage=True
    )
    
    processor = AutoProcessor.from_pretrained(model_id)
    processor.image_processor.do_center_crop = False
    
    # Clear cache after loading
    torch.cuda.empty_cache()
    gc.collect()
    
    return model, processor

# Load the model
model, processor = load_llava_model()

## 3. Run KiVA Benchmark

Now we can run the KiVA benchmark using the single image format.

In [None]:
# Set up the parameters
concept = "2DRotation"  # You can change this to: Colour, Resize, Reflect, or Counting
difficulty = "kiva"     # or "kiva-adults" for harder version

# Import and run the chat system
from chat_systems.chat_system_single_image_kiva import main

# Run the benchmark
results = main(concept=concept, model="llava")

## 4. View Results

The results will show the model's performance on the benchmark.

In [None]:
# Print results summary
print(f"Results for {concept} concept:")
print(f"Accuracy: {results['accuracy']:.2f}%")
print("\nDetailed results:")
for trial in results['trials']:
    print(f"Trial {trial['id']}: {trial['result']}")