# LangChain Integration with Nova 2 Omni - Multimodal Reasoning

This notebook demonstrates how to use Amazon Nova 2 Omni with LangChain tool definitions and direct boto3 calls for reasoning. We combine LangChain's tool schema with boto3's Bedrock API to enable reasoning configuration.

**Key Features:**
- Multimodal input processing (image, video, audio)
- Reasoning effort configuration (low, medium, high)
- LangChain tool definitions with Pydantic schemas
- Direct boto3 calls for full API control

---

## Setup and Installation

In [None]:
# Install required packages
!pip install langchain langchain-aws -q

In [None]:
import json
import base64
import boto3
from typing import Literal
from botocore.config import Config
from botocore.exceptions import ClientError
from langchain_core.tools import tool
from pydantic import BaseModel, Field

import nova_utils

MODEL_ID = "us.amazon.nova-2-omni-v1:0"
REGION_ID = "us-west-2"

def get_bedrock_runtime():
    config = Config(read_timeout=2 * 60)
    return boto3.client(
        service_name="bedrock-runtime",
        region_name=REGION_ID,
        config=config
    )

## Define Tools with LangChain

Use LangChain to define tool schemas, then convert them to Bedrock format.

In [None]:
class SafetyAssessmentInput(BaseModel):
    """Input schema for safety risk assessment."""
    identified_hazards: list[str] = Field(description="List of potential hazards or risks")
    risk_level: Literal["low", "medium", "high", "critical"] = Field(
        description="Overall risk level assessment"
    )
    recommended_actions: list[str] = Field(
        description="List of recommended safety actions or precautions"
    )

@tool(args_schema=SafetyAssessmentInput)
def assess_safety_risks(identified_hazards: list[str], risk_level: str, recommended_actions: list[str]) -> dict:
    """Assess safety risks and hazards in a scene or situation.
    
    Use this tool to identify potential dangers, evaluate risk levels,
    and recommend appropriate safety measures or precautions.
    """
    return {
        "status": "assessment_complete",
        "hazards": identified_hazards,
        "risk_level": risk_level,
        "actions": recommended_actions
    }

class RecipeExtractionInput(BaseModel):
    """Input schema for extracting recipe information."""
    dish_name: str = Field(description="Name of the dish being prepared")
    ingredients: list[str] = Field(description="List of ingredients used")
    steps: list[str] = Field(description="Ordered list of preparation steps")
    cooking_time: str = Field(description="Estimated total cooking time")
    difficulty: Literal["easy", "medium", "hard"] = Field(
        description="Difficulty level of the recipe"
    )

@tool(args_schema=RecipeExtractionInput)
def extract_recipe(dish_name: str, ingredients: list[str], steps: list[str], cooking_time: str, difficulty: str) -> dict:
    """Extract structured recipe information from cooking videos or images.
    
    Use this tool to parse cooking demonstrations and create structured
    recipe data including ingredients, steps, timing, and difficulty.
    """
    return {
        "status": "recipe_extracted",
        "dish": dish_name,
        "ingredients": ingredients,
        "steps": steps,
        "time": cooking_time,
        "difficulty": difficulty
    }

# Convert LangChain tool to Bedrock format
def langchain_tool_to_bedrock(lc_tool):
    schema = lc_tool.args_schema.model_json_schema()
    return {
        "toolSpec": {
            "name": lc_tool.name,
            "description": lc_tool.description,
            "inputSchema": {
                "json": schema
            }
        }
    }

safety_tools = [langchain_tool_to_bedrock(assess_safety_risks)]
recipe_tools = [langchain_tool_to_bedrock(extract_recipe)]

---

## Example 1: Image Understanding with Medium Reasoning

Analyze an image using medium reasoning effort.

In [None]:
# Load image
image_path = "media/man_crossing_street.png"
image_bytes, image_format = nova_utils.load_image_as_bytes(image_path)

# Create request with reasoning config
request = {
    "modelId": MODEL_ID,
    "messages": [
        {
            "role": "user",
            "content": [
                {"image": {"format": image_format, "source": {"bytes": image_bytes}}},
                {"text": "Analyze this image for safety risks. Identify any hazards, assess the overall risk level, and recommend appropriate safety actions. Use the assess_safety_risks tool to provide your assessment."}
            ]
        }
    ],
    "toolConfig": {"tools": safety_tools},
    "additionalModelRequestFields": {
        "reasoningConfig": {
            "type": "enabled",
            "maxReasoningEffort": "medium"
        }
    }
}

bedrock = get_bedrock_runtime()
response = bedrock.converse(**request)

print("=== Image Analysis Response ===")
for content in response["output"]["message"]["content"]:
    if "text" in content:
        print(f"Text: {content['text']}")
    elif "toolUse" in content:
        tool_use = content["toolUse"]
        print(f"\nTool: {tool_use['name']}")
        print(f"Arguments: {json.dumps(tool_use['input'], indent=2)}")

---

## Example 2: Video Analysis with High Reasoning

Analyze video content with high reasoning effort.

In [None]:
# Load video
video_path = "media/Cheesecake.mp4"
with open(video_path, "rb") as f:
    video_bytes = f.read()

request = {
    "modelId": MODEL_ID,
    "messages": [
        {
            "role": "user",
            "content": [
                {"video": {"format": "mp4", "source": {"bytes": video_bytes}}},
                {"text": "Watch this cooking video and extract the complete recipe. Identify the dish name, all ingredients used, step-by-step instructions, total cooking time, and difficulty level. Use the extract_recipe tool to provide the structured recipe data."}
            ]
        }
    ],
    "toolConfig": {"tools": recipe_tools},
    "additionalModelRequestFields": {
        "reasoningConfig": {
            "type": "enabled",
            "maxReasoningEffort": "medium"
        }
    }
}

bedrock = get_bedrock_runtime()
response = bedrock.converse(**request)

print("=== Video Analysis Response ===")
for content in response["output"]["message"]["content"]:
    if "text" in content:
        print(f"Text: {content['text']}")
    elif "toolUse" in content:
        tool_use = content["toolUse"]
        print(f"\nTool: {tool_use['name']}")
        print(f"Arguments: {json.dumps(tool_use['input'], indent=2)}")

---

## Example 3: MMMU-Style Multiple Choice

Implement MMMU-style evaluation with multiple choice questions.

In [None]:
class MultipleChoiceAnswerInput(BaseModel):
    """Input schema for multiple choice answer submission."""
    selected_option: Literal["A", "B", "C", "D"] = Field(
        description="The selected answer option (A, B, C, or D)"
    )
    reasoning_steps: str = Field(
        description="Step-by-step reasoning that led to this answer"
    )

@tool(args_schema=MultipleChoiceAnswerInput)
def submit_multiple_choice_answer(selected_option: str, reasoning_steps: str) -> dict:
    """Submit the final answer for a multiple choice question.
    
    Use this tool after carefully analyzing the question and all options.
    Provide your reasoning steps before selecting the final answer.
    """
    return {
        "status": "submitted",
        "answer": selected_option,
        "reasoning": reasoning_steps
    }

mmmu_bedrock_tools = [langchain_tool_to_bedrock(submit_multiple_choice_answer)]

In [None]:
# Load image
image_path = "media/man_crossing_street.png"
image_bytes, image_format = nova_utils.load_image_as_bytes(image_path)

question = """Based on the image, what is the most appropriate action for the person to take?

A) Continue walking without looking
B) Check for traffic before crossing
C) Run across the street quickly
D) Wait for a green light signal

Use the submit_multiple_choice_answer tool to provide your answer."""

request = {
    "modelId": MODEL_ID,
    "messages": [
        {
            "role": "user",
            "content": [
                {"image": {"format": image_format, "source": {"bytes": image_bytes}}},
                {"text": question}
            ]
        }
    ],
    "toolConfig": {"tools": mmmu_bedrock_tools},
    "additionalModelRequestFields": {
        "reasoningConfig": {
            "type": "enabled",
            "maxReasoningEffort": "medium"
        }
    }
}

bedrock = get_bedrock_runtime()
response = bedrock.converse(**request)

print("=== MMMU-Style Question Response ===")
for content in response["output"]["message"]["content"]:
    if "toolUse" in content:
        tool_use = content["toolUse"]
        args = tool_use["input"]
        print(f"Selected Option: {args.get('selected_option')}")
        print(f"Reasoning: {args.get('reasoning_steps')}")

---

## Key Takeaways

- **Hybrid Approach**: Use LangChain for tool definitions, boto3 for API calls with reasoning
- **Reasoning Effort**: Use `low`, `medium`, or `high` based on task complexity
- **Tool Conversion**: Convert LangChain tools to Bedrock format with `langchain_tool_to_bedrock`
- **Full API Control**: Direct boto3 calls enable all Bedrock features including reasoning
- **Temperature Settings**: Use 0.0-0.1 for factual tasks

## Next Steps

- Explore **05_langgraph_multimodal_reasoning.ipynb** for stateful workflows
- Check **06_strands_multimodal_reasoning.ipynb** for multi-agent patterns