# langchain tutorial

## `01` Install Packages

In [4]:
# !pip install langchain-google-genai==2.1.7
# !pip install langchain-community==0.3.27
# !pip getpass4==0.0.14.1

## `02` Impoting Packages

In [21]:
# import getpass ## is used to get the API key from the user if you do not want to add it in the environment variables.
import os
from dotenv import load_dotenv
from langchain_google_genai import ChatGoogleGenerativeAI
from IPython.display import Markdown, display, Image
import base64

from langchain_core.messages import HumanMessage

In [2]:
def display_markdown(response: str) -> None:
    """Display text as Markdown in Jupyter Notebook."""
    display(Markdown(response))

## `03` Setup API Key

In [3]:
load_dotenv(".env")

True

In [4]:
GEMINI_API_KEY= os.getenv("GEMINI_API_KEY")
GEMINI_MODEL= os.getenv("GEMINI_MODEL", "gemini-1.5-flash")

In [5]:
if not GEMINI_API_KEY:
    print("No API key was found - please head over to the troubleshooting notebook in this folder to identify & fix!")
elif not GEMINI_API_KEY.startswith("AIzaSy"):
    print("An API key was found, but it doesn't start sk-proj-; please check you're using the right key - see troubleshooting notebook")
elif GEMINI_API_KEY.strip() != GEMINI_API_KEY:
    print("An API key was found, but it looks like it might have space or tab characters at the start or end - please remove them - see troubleshooting notebook")
else:
    print("API key found and looks good so far!")

API key found and looks good so far!


## `04` Setup LLM

In [6]:
llm = ChatGoogleGenerativeAI(
    model=GEMINI_MODEL,
    temperature=0,
    max_tokens=None,
    timeout=None,
    max_retries=2,
    api_key=GEMINI_API_KEY,
)

## `05` Tasks

### `05.1` Task 1 translator

In [None]:
messages = [
    (
        "system",
        "You are a helpful assistant that translates English to French."
        "Translate the user sentence.",
    ),
    ("human", "I love programming."),
]

In [None]:
ai_msg = llm.invoke(messages)
display_markdown(f"**Input:** {messages[1][1]}\n\n**Output:** {ai_msg.content}")

**Input:** I love programming.

**Output:** J'aime la programmation.

### `05.2` Task 2 Multimodal inputs

#### Example using a public URL (remains the same)

In [None]:
message_url = HumanMessage(
    content=[
        {
            "type": "text",
            "text": "Describe the image at the URL. Response in Markdown format.",
        },
        {"type": "image_url", "image_url": "https://picsum.photos/seed/picsum/200/300"},
    ]
)

In [16]:
result_url = llm.invoke([message_url])
display_markdown(f"**Input:** {message_url.content[0]['text']}\n\n**Image URL:** {message_url.content[1]['image_url']}\n\n**Output:** {result_url.content}")

**Input:** Describe the image at the URL. Response in Markdown format.

**Image URL:** https://picsum.photos/seed/picsum/200/300

**Output:** Here's a description of the image:

The image is a landscape photograph showcasing a majestic snow-capped mountain peak at either sunrise or sunset.

* **Sky:** The sky dominates the upper two-thirds of the image, filled with a soft, pastel palette of pinks, oranges, and purples, indicative of the golden hour.  Thin, wispy clouds are scattered across the sky.

* **Mountain:** A prominent, sharply pointed snow-covered mountain peak rises from the lower third of the image.  The snow appears pristine and undisturbed, except for some subtle textural variations suggesting wind or shadows.  A slight haze or atmospheric perspective softens the details of the mountain's upper reaches.

* **Foreground:** The foreground consists of a gently sloping expanse of snow, appearing smooth and relatively flat, leading the eye towards the mountain.

* **Overall:** The image evokes a sense of serenity, vastness, and the beauty of a pristine, mountainous landscape. The soft light and color palette contribute to a peaceful and almost ethereal atmosphere.

#### Example using a local image file encoded in base64 


In [None]:
image_file_path = "/home/israa/Pictures/cat.jpeg"

with open(image_file_path, "rb") as image_file:
    encoded_image = base64.b64encode(image_file.read()).decode("utf-8")

message_local = HumanMessage(
    content=[
        {
            "type": "text", 
            "text": "Describe the local image. Rsesponse in Markdown format."
        },

        {
            "type": "image_url", 
            "image_url": f"data:image/png;base64,{encoded_image}"
        },
    ]
)

In [20]:
result_local = llm.invoke([message_local])
display_markdown(f"**Input:** {message_local.content[0]['text']}\n\n**Output:** {result_local.content}")

**Input:** Describe the local image. Rsesponse in Markdown format.

**Output:** Here's a description of the image in markdown format:

The image shows a close-up of an adorable, young, white kitten.  

* **Kitten:** The kitten is predominantly white with a slightly creamy or light beige tint to its fur. Its eyes are a striking light blue.  It has small, pink nose and ears that are pointed upright.  The kitten appears to be small, likely a few weeks or months old. Its paws are neatly tucked under its chest.

* **Setting:** The kitten is resting on a dark brown and tan patterned fabric, possibly a blanket or piece of fur, that resembles a leopard or cheetah print. The background is dark and out of focus, drawing attention to the kitten.

* **Overall Impression:** The image conveys a sense of cuteness and innocence. The kitten's posture and expression suggest curiosity or alertness. The dark background and soft lighting enhance the kitten's features and create a visually appealing contrast.

#### Audio Input

In [None]:
# Ensure you have an audio file named 'example_audio.mp3' or provide the correct path.
audio_file_path = "example_audio.mp3"
audio_mime_type = "audio/mpeg"


with open(audio_file_path, "rb") as audio_file:
    encoded_audio = base64.b64encode(audio_file.read()).decode("utf-8")

message = HumanMessage(
    content=[
        {"type": "text", "text": "Transcribe the audio."},
        {
            "type": "media",
            "data": encoded_audio,  # Use base64 string directly
            "mime_type": audio_mime_type,
        },
    ]
)

In [None]:
response = llm.invoke([message])  # Uncomment to run
print(f"Response for audio: {response.content}")

#### Video Input

In [None]:
# Ensure you have a video file named 'example_video.mp4' or provide the correct path.
video_file_path = "example_video.mp4"
video_mime_type = "video/mp4"


with open(video_file_path, "rb") as video_file:
    encoded_video = base64.b64encode(video_file.read()).decode("utf-8")

message = HumanMessage(
    content=[
        {"type": "text", "text": "Describe the first few frames of the video."},
        {
            "type": "media",
            "data": encoded_video,  # Use base64 string directly
            "mime_type": video_mime_type,
        },
    ]
)

In [None]:
response = llm.invoke([message])  # Uncomment to run
print(f"Response for video: {response.content}")

#### Image Generation (Multimodal Output)

In [None]:
import base64

from IPython.display import , display
from langchain_core.messages import AIMessage


llm = ChatGoogleGenerativeAI(model="models/gemini-2.0-flash-preview-image-generation")

message = {
    "role": "user",
    "content": "Generate a photorealistic image of a cuddly cat wearing a hat.",
}

response = llm.invoke(
    [message],
    generation_config=dict(response_modalities=["TEXT", "IMAGE"]),
)


def _get_image_base64(response: AIMessage) -> None:
    image_block = next(
        block
        for block in response.content
        if isinstance(block, dict) and block.get("image_url")
    )
    return image_block["image_url"].get("url").split(",")[-1]


image_base64 = _get_image_base64(response)
display(Image(data=base64.b64decode(image_base64), width=300))