# OCI + LangChain Multimodal LLM Walkthrough

**What this notebook does:**
Demonstrates multimodal large language model (LLM) capabilities using OCI Generative AI. You'll learn how to analyze images by combining text prompts with visual data, comparing responses from different models.

**Documentation to reference:**
- OCI Gen AI: https://docs.oracle.com/en-us/iaas/Content/generative-ai/pretrained-models.htm
- OCI OpenAI compatible SDK: https://github.com/oracle-samples/oci-openai
- LangChain: https://docs.langchain.com/oss/python/langchain/overview

**Relevant slack channels:**
- #generative-ai-users: *for questions on OCI Gen AI*
- #igiu-innovation-lab: *general discussions on your project*
- #igiu-ai-learning: *help with sandbox environment or help with running this code*

**How to run:**
Execute cells sequentially. Requires sandbox.yaml configuration.

**Steps covered:**
1. Prepare the environment & configuration
2. Load and inspect the image you want the model to see
3. Build the prompt payload (text + image)
4. Send it to different OCI-hosted models via LangChain
5. Compare the responses and timings


## 1  Environment setup
**Requirements** (already in the repo):

- Create or edit `sandbox.yaml`  and `.env` with your OCI denv details like `profile` and `compartment`.
- configure cwd for jupyter match your workspace python code: 
    -  vscode menu -> Settings > Extensions > Jupyter > Notebook File Root
    -  change from `${fileDirname}` to `${workspaceFolder}`


In [None]:
import os, sys, base64, time, pathlib
from dotenv import load_dotenv
from envyaml import EnvYAML

from langChain.oci_openai_helper import OCIOpenAIHelper

load_dotenv()


## 2  Load sandbox configuration

In [None]:
SANDBOX_CONFIG_FILE = 'sandbox.yaml'

def load_config(path):
    try:
        return EnvYAML(path)
    except FileNotFoundError:
        raise FileNotFoundError(f"‚ùå '{path}' not found. Create it from sandbox.yaml.template.")

cfg = load_config(SANDBOX_CONFIG_FILE)
compartment_id = cfg['oci']['compartment']
profile        = cfg['oci']['profile']
print('Profile     :', profile)
print('Compartment :', compartment_id[:6] + '‚Ä¶')


## 3  Select models & service endpoint

In [None]:
MODEL_LIST = [
    'meta.llama-4-scout-17b-16e-instruct',
    'openai.gpt-4.1',
    'xai.grok-4'
]



## 4  Load & visualize the image

In [None]:
from IPython.display import Image, display
IMAGE_PATH = 'vision/dussera-b.jpg'  # adjust if needed
display(Image(filename=IMAGE_PATH))

def encode_image(path):
    with open(path, 'rb') as f:
        return base64.b64encode(f.read()).decode('utf-8')

IMAGE_B64 = encode_image(IMAGE_PATH)


## 5  Craft the prompt (text + image)

In [None]:
USER_TEXT = 'Tell me about this image'

def make_prompt(img_b64, text):
    return [{
        'role': 'user',
        'content': [
            {'type': 'text', 'text': text},
            {
                'type': 'image_url',
                'image_url': {
                    'url': f'data:image/jpeg;base64,{img_b64}'
                }
            }
        ]
    }]


## 6  Query each model & measure latency

In [None]:
for model in MODEL_LIST:
    print('\n' + '='*80)
    print('Model:', model)
    llm_client = OCIOpenAIHelper.get_client(
        model_name=model,
        config=cfg
        )

    start = time.time()
    response = llm_client.invoke(make_prompt(IMAGE_B64, USER_TEXT))
    print('Answer:', response.content)
    print('‚è± {:.2f}s'.format(time.time() - start))
print('\nDone!')


## 7. Experiment and Explore

**Safe ways to experiment:**
- Change `USER_TEXT` to ask different questions: "What colors do you see?", "Describe the emotions in this scene", "Count the number of people"
- Try different `IMAGE_PATH` values, e.g., './langChain/vision/receipt.png' or './langChain/vision/people_walking.mp4' (for video analysis)
- Add more models to `MODEL_LIST` or remove some to compare fewer
- Modify the prompt structure in `make_prompt()` function

## üßë‚Äçüíª Practice Exercises and Discussion Prompts

**Beginner Exercises:**
1. **Image Description Variations** - Change the prompt to focus on different aspects (colors, objects, emotions)
2. **Model Comparison** - Run the same image through all models and note differences in responses
3. **Timing Analysis** - Which model is fastest? Why might that be?

**Intermediate Projects:**
1. **Business Card ‚Üí vCard**
   - Upload a photo of a business card
   - Prompt: "Extract all contact details and format as vCard"
   - Save the output as a .vcf file

2. **Document Analysis**
   - Use receipt.png or other documents
   - Extract structured data (prices, dates, items)

3. **Video Frame Analysis**
   - Try with video files like people_walking.mp4
   - Ask about activities, objects, or scene descriptions

**Discussion Questions:**
- How do different models interpret the same image?
- What are the trade-offs between speed and quality?
- How might you use this for real-world applications?

**Next Steps:**
- Check out the corresponding Python script for a non-interactive version
- Explore other vision capabilities in the OCI documentation

If you encounter errors, verify your sandbox.yaml configuration and API access.

---
**Happy experimenting!**
