# OCI Multimodal Vision LLM Step-by-step

### What this file does:
Demonstrates multimodal (image+text) prompts with Oracle Cloud's Generative AI, using only the OCI Python SDK (no LangChain needed).

**Documentation to reference:**
- OCI Gen AI: https://docs.oracle.com/en-us/iaas/Content/generative-ai/home.htm
- OCI Python SDK: https://github.com/oracle/oci-python-sdk/tree/master/src/oci/generative_ai_inference

**Relevant slack channels:**
- #generative-ai-users: for questions on OCI Gen AI
- #igiu-innovation-lab: general discussions on your project
- #igiu-ai-learning: help with sandbox environment or help with running this code

**Env setup:**
- sandbox.yaml: Contains OCI config and compartment.
- .env: Load environment variables if needed.

**How to run in notebook:**
- Make sure your runtime environment has all dependencies and access to required config files.
- Run the notebook cells in order.

---

## 1. Environment Setup

- **Dependencies:**  
  - `oci`
  - `python-dotenv`
  - `envyaml`
- **Config files:** `sandbox.yaml` (and `.env` for secrets if used)

> Install missing Python packages with:
```bash
pip install oci python-dotenv envyaml
```


In [None]:
import os
import base64
from dotenv import load_dotenv
from envyaml import EnvYAML
import oci
import time
load_dotenv()

## 2. Load OCI Configuration

Reads details from `sandbox.yaml`. Double-check your credentials and region/profile settings if you hit a permissions error.

In [None]:
# Make sure your sandbox.yaml file is set up for your environment. You might have to specify the full path depending on your `cwd`.
# You can also try making your cwd for jupyter match your workspace python code:
# vscode menu -> Settings > Extensions > Jupyter > Notebook File Root
# change from ${fileDirname} to ${workspaceFolder}

SANDBOX_CONFIG_FILE = "sandbox.yaml"

def load_config(config_path):
    try:
        return EnvYAML(config_path)
    except FileNotFoundError:
        print(f"Error: Configuration file '{config_path}' not found.")
        return None

scfg = load_config(SANDBOX_CONFIG_FILE)
if scfg is not None and 'oci' in scfg and 'configFile' in scfg['oci'] and 'profile' in scfg['oci'] and 'compartment' in scfg['oci']:
    config = oci.config.from_file(os.path.expanduser(scfg["oci"]["configFile"]), scfg["oci"]["profile"])
    compartment_id = scfg["oci"]["compartment"]
else:
    print("Error: Invalid configuration for OCI.")
    raise Exception("Check your 'sandbox.yaml'.")

## 3. Model List & Endpoint
Set which models to test and define the (region-specific) service endpoint.

In [None]:
MODEL_LIST = [
    "meta.llama-4-scout-17b-16e-instruct",
    "openai.gpt-4.1",
    "xai.grok-4",
]
llm_service_endpoint = "https://inference.generativeai.us-chicago-1.oci.oraclecloud.com"

## 4. Select and Show Your Image

You can change `IMAGE_PATH` below to point to a different image if you want.

In [None]:
from IPython.display import Image, display
IMAGE_PATH = 'vision/dussera-b.jpg'   # Change if you use a different image!
display(Image(filename=IMAGE_PATH))

In [None]:
def encode_image(path):
    with open(path, "rb") as image_file:
        return base64.b64encode(image_file.read()).decode("utf-8")

IMAGE_B64 = encode_image(IMAGE_PATH)

## 5. Build the Multimodal User Message
We'll use the structure that OCI's GenerativeAI SDK expects for multimodal (text+image) inputs (as in `multi_modal.py`).

In [None]:
USER_TEXT = "Tell me about this image"  # You can change this!

def build_user_message(img_b64, text):
    content1 = oci.generative_ai_inference.models.TextContent()
    content1.text = text
    content2 = oci.generative_ai_inference.models.ImageContent()
    image_url = oci.generative_ai_inference.models.ImageUrl()
    image_url.url = f"data:image/jpeg;base64,{img_b64}"
    content2.image_url = image_url
    message = oci.generative_ai_inference.models.UserMessage()
    message.content = [content1,content2]
    return message

## 6. Chat Request Utilities
---
Wrap up the chat message with API-friendly parameters for the request.

In [None]:
def get_chat_request(message):
    chat_request = oci.generative_ai_inference.models.GenericChatRequest()
    chat_request.messages = [message]
    chat_request.api_format = oci.generative_ai_inference.models.BaseChatRequest.API_FORMAT_GENERIC
    chat_request.num_generations = 1
    chat_request.is_stream = False
    chat_request.max_tokens = 500
    chat_request.temperature = 0.75
    return chat_request

def get_chat_detail(llm_request, compartment_id, model_id):
    chat_detail = oci.generative_ai_inference.models.ChatDetails()
    chat_detail.serving_mode = oci.generative_ai_inference.models.OnDemandServingMode(model_id=model_id)
    chat_detail.compartment_id = compartment_id
    chat_detail.chat_request = llm_request
    return chat_detail

## 7. Initialize the LLM Client


In [None]:
llm_client = oci.generative_ai_inference.GenerativeAiInferenceClient(
    config=config,
    service_endpoint=llm_service_endpoint,
    retry_strategy=oci.retry.NoneRetryStrategy(),
    timeout=(10,240)
)

## 8. Run Inference ‚Äì Compare Results!
Loop through the models, send your multimodal prompt, and print their answers.

In [None]:
for model_id in MODEL_LIST:
    print("\n" + "="*80)
    print(f"RESULTS FOR MODEL: {model_id}\n" + "="*80)
    start_time = time.time()
    user_msg = build_user_message(IMAGE_B64, USER_TEXT)
    llm_payload = get_chat_request(user_msg)
    chat_detail = get_chat_detail(llm_payload, compartment_id, model_id)
    llm_response = llm_client.chat(chat_detail)
    if (llm_response is not None and hasattr(llm_response, 'data') and hasattr(llm_response.data, 'chat_response') and llm_response.data.chat_response is not None and hasattr(llm_response.data.chat_response, 'choices') and llm_response.data.chat_response.choices):
        llm_text = llm_response.data.chat_response.choices[0].message.content[0].text
        print(llm_text)
    else:
        print("Error: Invalid response from LLM.")
    end_time = time.time()
    print(f"\nTime taken: {end_time - start_time:.2f} seconds\n")

## 9. Play and Explore

- Change `USER_TEXT` to ask any question about your picture.
- Swap in a different image.
- Compare models easily!


## üßë‚Äçüíª Project Ideas for Practice

Below are some fun project prompts. Try one (or all) after you run a basic image through the models!

1. **Business Card ‚Üí vCard**
   - Upload a photo of a business card.
   - Write a prompt such as: "Extract all the contact details from this business card and output as a vCard file."
   - Post-process the model's output to save/download the .vcf file.

2. **Agenda/Schedule ‚Üí Calendar File**
   - Try an image of a handwritten or printed agenda.
   - Prompt: "Read and convert this agenda into an iCalendar (.ics) file."
   - Save and import to your calendar app!

3. **Driver's License ‚Üí CRM/Create Record**
   - Upload an image of a driver's license (redact sensitive fields if needed).
   - Prompt: "Extract all key customer information for a CRM record."
   - Map the LLM's result into your database or spreadsheet.

If you see errors, double-check credentials or configurations. Refer to comments or docs for help.

---
**Happy building!**