# Gemma3 on CoreAI

The CoreAI container is started using the `start_CoreAI.sh` script which uses `podman`, set up a mounted volume called `/iti` where the script exists and will write files with the same user ID and group ID as the calling user.


To enhance the functionality of the CoreAI  environment, we need to install some libraries not pre-installed but required for this notebook. 

## Create and Activate the Virtual Environment:
Open your terminal or command prompt within the Jupyter notebook. Navigate via `File -> New -> Terminal`.
Type `bash` to access a shell compatible with the following commands.
Navigate to the project directory where you want to set up the environment:

```bash
python3 -m venv --system-site-packages myvenv
source myvenv/bin/activate
pip3 install ipykernel
python -m ipykernel install --user --name=myvenv --display-name="Python (myvenv)"
```

Those commands can also be executed using the `./create_myvenv.sh` script located in the same folder where this notebook is.

## Install Required Libraries:

Before running the following commands, load the `Python (myvenv)` kernel.

Ensure you are in the directory where the Jupyter Notebook and the `myvenv` directory are located. 

In [None]:
!. ./myvenv/bin/activate; pip install -r requirements.txt

Add `accelerate` to the PATH

In [None]:
import os
pwd = os.getcwd()
os.environ['PATH'] =  os.path.join(pwd, 'myvenv/bin') + os.pathsep + os.environ['PATH']

! echo $PATH
! which accelerate

**Make sure `HF_HOME` is set BEFORE `transformers` is loaded**

In [None]:
import os
os.makedirs('HF_HOME', exist_ok=True)
os.environ['HF_HOME'] = 'HF_HOME'

In [None]:
if 'HF_TOKEN' not in os.environ:
    printf("No HF_TOKEN set, will not be able to download the model")
    exit(1)

hf_token=os.environ['HF_TOKEN']

# LLM: Large Language Model queries

## LLM: Obtain model

In [None]:
from transformers import pipeline
import torch

pipe = pipeline(
    "text-generation",
    model="google/gemma-3-4b-it",
    device="cuda",
    torch_dtype=torch.bfloat16,
    token=hf_token
)


## LLM: Question Answering

In [None]:
def takeInput():
    cond = False
    #take input
    while(cond == False):
        sen = input('Enter the string\n')
        temp = sen.split()
        if len(temp) < 3:
            print("Please enter atleast 3 words !")
        else:
            cond = True
    return sen

In [None]:
question = takeInput()

messages = [
    {
        "role": "system",
        "content": [{"type": "text", "text": "you are a helpful assistant"}]
    },
    {
        "role": "user",
        "content": [{"type": "text", "text": f"{question}"}]
    },
]
user_message = messages[1]["content"][0]["text"]

output = pipe(text_inputs=user_message, max_new_tokens=1000)
print(output[0]["generated_text"])

### When done using the LLM: free up GPU memory

Note: we are using the same variable for text and image (`pipe`) this code can be used when switching from one method to the next

In [None]:
# Free up the model from memory before testing the VLM (Garbage Collection)
import gc

# check memory
print(torch.cuda.memory_allocated())

del pipe

gc.collect()
torch.cuda.empty_cache()

# check memory again
print(torch.cuda.memory_allocated())

# VLM: Image Understanding

## VLM: Obtain model

In [None]:
from transformers import pipeline
import torch

pipe = pipeline(
    "image-text-to-text",
    model="google/gemma-3-4b-it",
    device="cuda",
    torch_dtype=torch.bfloat16,
    token=hf_token
)

## VLM: Exract information from images

Modify prompt to ask questions

In [None]:
!mkdir -p data
!wget https://raw.githubusercontent.com/Infotrend-Inc/OpenAI_WebUI/refs/heads/main/assets/Infotrend_Logo.png -O data/test1.png

In [None]:
import PIL 
import base64
import io
from IPython.display import display

img_file = "data/test1.png" # a framework for a page object detection network based on Mask R-CNN


def base64_image(img_file):
    img_type = "png"
    img_b64 = None
    img_str = None
    img_bytes = io.BytesIO()
    with PIL.Image.open(img_file) as image:
        display(image)
        image.save(img_bytes, format=img_type)
        img_b64 = base64.b64encode(img_bytes.getvalue()).decode('utf-8')
    if img_b64 is not None:
        img_str = f"data:image/{img_type};base64,{img_b64}"
    
    if img_str is None:
        print("No valid image data")
        exit(1)

    return img_str

img_str = base64_image(img_file)

messages = [
    {
        "role": "system",
        "content": [{"type": "text", "text": "You are a helpful assistant."}]
    },
    {
        "role": "user",
        "content": [
            {"type": "image_url", "image_url": { "url": img_str } },
            {"type": "text", "text": "Describe the image. what is the field of expertise needed, explain the idea behind the meaning of the image?"}
        ]
    }
]

output = pipe(text=messages, max_new_tokens=1000)
print(output[0]["generated_text"][-1]["content"])

In [None]:
!mkdir -p data
!wget https://raw.githubusercontent.com/Infotrend-Inc/OpenAI_WebUI/refs/heads/main/assets/Screenshot-OAI_WebUI_GPT.jpg -O data/test2.png

In [None]:
img_file = 'data/test2.png'
img_str = base64_image(img_file)

messages = [
    {
        "role": "system",
        "content": [{"type": "text", "text": "You are a helpful assistant."}]
    },
    {
        "role": "user",
        "content": [
            {"type": "image_url", "image_url": { "url": img_str } },
            {"type": "text", "text": "Describe the image."}
        ]
    }
]

output = pipe(text=messages, max_new_tokens=1000)
print(output[0]["generated_text"][-1]["content"])

### Webcam

Requires that the `-v /dev/video0:/dev/video0` flag is used when starting the container.

In [None]:
import matplotlib.pyplot as plt
import cv2
import numpy as np
from IPython.display import display, Image
import ipywidgets as widgets
import threading

Capture an image and store it to disk

In [None]:
stopButton = widgets.ToggleButton(value=False, description='Stop', disabled=False, button_style='danger', tooltip='Description', icon='square')

def view(button):
    cap = cv2.VideoCapture(0)
    display_handle=display(None, display_id=True)

    data = None
    while stopButton.value == False:
        _, data = cap.read()
        data = cv2.flip(data, 1) # if your camera reverses your image
        _, frame = cv2.imencode('.jpeg', data)
        display_handle.update(Image(data=frame.tobytes()))

    if stopButton.value==True:
        cap.release()
        cv2.imwrite("data/webcam.png", data)
            
display(stopButton)
thread = threading.Thread(target=view, args=(stopButton,))
thread.start()

In [None]:
img_file = "data/webcam.png"
img_str = base64_image(img_file)
    
messages = [
    {
        "role": "system",
        "content": [{"type": "text", "text": "You are a helpful assistant."}]
    },
    {
        "role": "user",
        "content": [
            {"type": "image_url", "image_url": { "url": img_str } },
            {"type": "text", "text": "Describe the image"}
        ]
    }
]

output = pipe(text=messages, max_new_tokens=1000)
print(output[0]["generated_text"][-1]["content"])