<a href="https://colab.research.google.com/github/VijayaJothi24/Google_GenAI/blob/main/Google_Capstone_Project_GeminiAPI_GenAI_.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# MultiModal AI Capablity *Image,Text,Video,Audio * understanding with Gemini

## Image understanding with Gemini

Gemini has from the begining been a multimodal model, capable of analyzing all sorts of medias using its [long context window](https://developers.googleblog.com/en/new-features-for-the-gemini-api-and-google-ai-studio/).

[Gemini 2.0](https://ai.google.dev/gemini-api/docs/models/gemini-v2) and later bring Image analysis to a whole new level as illustrated in [this image](https://i.pinimg.com/474x/c2/f7/52/c2f75236a0882c1e3dae641ae0fe6769.jpg):


In [1]:
from IPython.display import Image, display

# Display the image from the URL
image_url = "https://i.pinimg.com/736x/20/2a/fe/202afe2d2615248f757fa0e4d925d701.jpg"
display(Image(url=image_url))



This notebook will show you how to easily use Gemini to perform the same kind of image, video and text analysis. Each of them has different prompts that you can select using the dropdown, also feel free to experiment with your own.

.

## Setup

This section install the SDK, set it up using  [API key](../quickstarts/Authentication.ipynb), imports the relevant libs, downloads the sample videos and upload them to Gemini.


### Install SDK

The new **[Google Gen AI SDK](https://ai.google.dev/gemini-api/docs/sdks)** provides programmatic access to Gemini 2.0 (and previous models) using both the [Google AI for Developers](https://ai.google.dev/gemini-api/docs) and [Vertex AI](https://cloud.google.com/vertex-ai/generative-ai/docs/overview) APIs. With a few exceptions, code that runs on one platform will run on both.

In [2]:
%pip install -U -q 'google-genai'

[?25l   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/154.7 kB[0m [31m?[0m eta [36m-:--:--[0m[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m154.7/154.7 kB[0m [31m6.5 MB/s[0m eta [36m0:00:00[0m
[?25h

### Setup your API key

To run the following cell, your API key must be stored it in a Colab Secret named `GOOGLE_API_KEY`.

In [3]:
from google.colab import userdata

GOOGLE_API_KEY=userdata.get('GOOGLE_API_KEY')

### Initialize SDK client

With the new SDK only need to initialize a client with you API key (or OAuth if using [Vertex AI](https://cloud.google.com/vertex-ai)). The model is now set in each call.

In [4]:
from google import genai
from google.genai import types

client = genai.Client(api_key=GOOGLE_API_KEY)

### Select the Gemini model

Video understanding works best Gemini 2.5 pro model. Also select former models to compare their behavior but it is recommended to use at least the 2.0 ones.



In [5]:
model_name = "gemini-2.5-pro-exp-03-25" # @param ["gemini-1.5-flash-latest","gemini-2.0-flash-lite","gemini-2.0-flash","gemini-2.5-pro-exp-03-25"] {"allow-input":true, isTemplate: true}

### Get sample iMAGEs

I will start with uploaded image, as it's a more common use-case, but I will also see later to analyse Youtube videos.

In [6]:
import requests
from PIL import Image
from io import BytesIO

# Function to download and process image from URL
def process_image_from_url(url):
    try:
        # Fetch the image data from the URL
        response = requests.get(url)
        response.raise_for_status()  # Raise error for bad response
        image_data = BytesIO(response.content)

        # Open the image using Pillow
        image = Image.open(image_data)

        # Example: Convert image to grayscale
        grayscale_image = image.convert("L")
        grayscale_image.show()  # Display the processed image

        # Save the processed image locally
        grayscale_image.save("processed_image.jpg")
        print("Image processed and saved as 'processed_image.jpg'")
    except Exception as e:
        print(f"Error processing image: {e}")

# Replace with your URL
image_url = "https://i.pinimg.com/736x/20/2a/fe/202afe2d2615248f757fa0e4d925d701.jpg"
process_image_from_url(image_url)


Image processed and saved as 'processed_image.jpg'


In [7]:
import time

def upload_video(video_file_name):
  video_file = client.files.upload(file=video_file_name)

  while video_file.state == "PROCESSING":
      print('Waiting for video to be processed.')
      time.sleep(10)
      video_file = client.files.get(name=video_file.name)

  if video_file.state == "FAILED":
    raise ValueError(video_file.state)
  print(f'image processing complete: ' + video_file.uri)

  return video_file

Image_analyse = upload_video('processed_image.jpg')


image processing complete: https://generativelanguage.googleapis.com/v1beta/files/gdu13erdemka


### Upload the image

Upload  the image using the File API.

This can take a couple of minutes as the videos will need to be processed and tokenized.

### Imports

In [8]:
import json
from PIL import Image
from IPython.display import display, Markdown, HTML

In [9]:
from IPython.display import Image, display

# Display the image
image_url = "https://i.pinimg.com/736x/20/2a/fe/202afe2d2615248f757fa0e4d925d701.jpg"
display(Image(url=image_url))


In [10]:
prompt = "Describe the image in detail, focusing on the key objects, characters, and their interactions. Identify any notable patterns, colors, or themes present in the scene. Highlight the context or purpose of the elements within the image, and interpret the overall mood or message conveyed. Include any symbolic or cultural significance if applicable."

video = "processed_image.jpg"

response = client.models.generate_content(
    model=model_name,
    contents=[
        video,
        prompt,
    ]
)

Markdown(response.text)


Okay, here is a detailed description of the image `processed_image.jpg`:

**Overall Impression:**
The image is a close-up to medium shot portrait of a woman proudly displaying a framed certificate or award. The focus is clearly on the woman and the object she holds, suggesting a moment of recognition or achievement.

**Key Character:**
The central figure is a woman, likely middle-aged or slightly older.
*   **Appearance:** She has dark, possibly short or pulled-back hair. She is wearing thin-rimmed eyeglasses.
*   **Attire:** She is dressed in a dark-colored top, possibly black or navy blue, which appears somewhat formal or professional (like a blouse or jacket).
*   **Expression:** She has a pleasant, genuine smile and is looking directly towards the camera (or slightly off-camera), engaging the viewer. Her expression conveys happiness, pride, and satisfaction.

**Key Object:**
The woman is holding a framed document with both hands, presenting it forward.
*   **Frame:** The frame appears to be made of dark wood or a similar dark material (black or dark brown).
*   **Document:** Inside the frame is a light-colored (likely white or off-white) paper, consistent with a certificate, diploma, or award plaque. Text is visible on the document, but it is illegible at this resolution. There might be a logo or seal, but details aren't clear.
*   **Handling:** She holds it carefully and centrally in front of her torso, emphasizing its importance.

**Setting and Background:**
The background is simple and non-distracting, ensuring the focus remains on the woman and her achievement.
*   It appears to be an indoor setting.
*   The backdrop is a plain, light-colored wall (perhaps beige, cream, or light grey), possibly slightly textured but largely out of focus due to a shallow depth of field.
*   The lighting seems adequate, likely indoor or possibly flash photography, illuminating the subject clearly without harsh shadows.

**Interactions:**
*   The primary interaction is between the woman and the implied viewer/camera. Her direct gaze and smile create a connection.
*   She is interacting with the framed certificate by holding and displaying it prominently, signifying its value to her.

**Patterns, Colors, and Themes:**
*   **Patterns:** No strong visual patterns dominate, other than the simple rectangular shape of the frame and certificate.
*   **Colors:** The color palette is relatively simple and muted: dark tones from her clothing and the frame contrast with the light certificate paper and the neutral background. Her skin tone and hair color add warmth.
*   **Themes:** The prominent themes are achievement, recognition, pride, success, and celebration. It captures a milestone or moment of accomplishment.

**Context and Purpose:**
*   **Context:** The scene strongly suggests an award ceremony, a graduation, a recognition event, or a similar formal occasion where accomplishments are acknowledged and celebrated.
*   **Purpose:** The purpose of the photograph is likely to document and commemorate this specific achievement. The woman is the recipient, and the certificate is the tangible evidence of her success.

**Overall Mood and Message:**
The mood is positive, celebratory, and proud. The image conveys a clear message of personal or professional success and the satisfaction that comes with recognition for one's efforts or contributions.

**Symbolic/Cultural Significance:**
*   Framed certificates and awards are widely recognized symbols of validation, competence, and accomplishment within many cultures and professional/academic fields.
*   The act of receiving and displaying such an item often carries cultural weight, signifying honor, merit, and the successful completion of a task, course, or period of service. The formal presentation (holding it carefully, smiling) reinforces the cultural value placed on such recognition.

In summary, the photograph captures a happy and proud woman displaying a framed certificate, symbolizing a significant moment of personal or professional achievement within a formal or celebratory context.

# Extract and organize text

Gemini can also read what's in the .csv file and extract it in an organized way. Gemini reasoning capabilities can generate new ideas for you.



In [11]:
import requests
import pandas as pd

def analyze_csv_from_url(url):
    try:
        # Fetch the CSV data from the URL
        response = requests.get(url)
        response.raise_for_status()  # Raise an error for unsuccessful requests

        # Save the CSV content locally (optional)
        with open("downloaded_file.csv", "wb") as file:
            file.write(response.content)

        # Load the CSV into a Pandas DataFrame
        df = pd.read_csv("downloaded_file.csv")

        # Example Analysis: Display basic information about the data
        print("First 5 rows:")
        print(df.head())

        print("\nSummary Statistics:")
        print(df.describe())

        print("\nColumns in the CSV file:")
        print(df.columns)

        # You can add further analysis depending on your requirements
    except Exception as e:
        print(f"Error occurred: {e}")


# Replace with your CSV URL
csv_url = "https://raw.githubusercontent.com/VijayaJothi24/VijayaJothi24/main/City.csv"

analyze_csv_from_url(csv_url)


First 5 rows:
               City Population    Users
0       NEW YORK NY  8,405,837  302,149
1  SAN FRANCISCO CA    629,591  213,609
2        CHICAGO IL  1,955,130  164,468
3    LOS ANGELES CA  1,595,037  144,132
4     WASHINGTON DC    418,859  127,001

Summary Statistics:
               City Population    Users
count            20         20       20
unique           20         20       20
top     NEW YORK NY  8,405,837  302,149
freq              1          1        1

Columns in the CSV file:
Index(['City', 'Population', 'Users'], dtype='object')


As you can see, Gemini is able to grasp to with item corresponds each note, including the last one.

# Analyze youtube videos

Downbelow Another Generative AI capablity task of Video Analysing is done

In [12]:
response = client.models.generate_content(
    model=model_name,
    contents=types.Content(
        parts=[
            types.Part(text="Find all the instances where Vijaya says \"software testing\". Provide timestamps and broader context for each instance."),
            types.Part(
                file_data=types.FileData(file_uri='https://www.youtube.com/watch?v=13TBF_4KqXA')
            )
        ]
    )
)

Markdown(response.text)


Based on the audio, here are the instances where Vijaya says "software testing", along with timestamps and context:

1.  **Timestamp:** 0:05 - 0:07
    *   **Context:** Vijaya is introducing the mind map shown on the screen. She says, "...this is the screen for the **software testing** foundation..." setting the overall topic of the presentation.

2.  **Timestamp:** 1:09 - 1:11
    *   **Context:** Vijaya is defining the term 'Defect' as presented in the mind map. She reads or explains the definition: "An Error Identified in **Software Testing**."

3.  **Timestamp:** 1:32 - 1:34
    *   **Context:** After explaining a specific root cause example for a defect (incorrect GPS configuration), Vijaya mentions how such a defect would be discovered, stating, "...So it is found during a **software testing**."

4.  **Timestamp:** 3:15 - 3:17
    *   **Context:** Vijaya is discussing the "Value of Static testing" and explaining why it's important. She concludes the point by saying, "...This states the necessity of static testing in the **software testing** environment."