<a href="https://colab.research.google.com/github/VijayaJothi24/Python_Project/blob/main/Gemini_3_GenAI_Capablities.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Image understanding with Gemini

Gemini has from the begining been a multimodal model, capable of analyzing all sorts of medias using its [long context window](https://developers.googleblog.com/en/new-features-for-the-gemini-api-and-google-ai-studio/).

[Gemini 2.0](https://ai.google.dev/gemini-api/docs/models/gemini-v2) and later bring Image analysis to a whole new level as illustrated in [this image](https://i.pinimg.com/474x/c2/f7/52/c2f75236a0882c1e3dae641ae0fe6769.jpg):


In [53]:
#@title Building with Gemini 2.0: Image understanding
%%html
<iframe width="560" height="315" src="https://i.pinimg.com/474x/c2/f7/52/c2f75236a0882c1e3dae641ae0fe6769.jpg" title="Image player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" referrerpolicy="strict-origin-when-cross-origin" allowfullscreen></iframe>

This notebook will show you how to easily use Gemini to perform the same kind of image, video and text analysis. Each of them has different prompts that you can select using the dropdown, also feel free to experiment with your own.

.

## Setup

This section install the SDK, set it up using  [API key](../quickstarts/Authentication.ipynb), imports the relevant libs, downloads the sample videos and upload them to Gemini.


### Install SDK

The new **[Google Gen AI SDK](https://ai.google.dev/gemini-api/docs/sdks)** provides programmatic access to Gemini 2.0 (and previous models) using both the [Google AI for Developers](https://ai.google.dev/gemini-api/docs) and [Vertex AI](https://cloud.google.com/vertex-ai/generative-ai/docs/overview) APIs. With a few exceptions, code that runs on one platform will run on both.

In [33]:
%pip install -U -q 'google-genai'

[?25l   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/154.7 kB[0m [31m?[0m eta [36m-:--:--[0m[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m154.7/154.7 kB[0m [31m5.0 MB/s[0m eta [36m0:00:00[0m
[?25h

### Setup your API key

To run the following cell, your API key must be stored it in a Colab Secret named `GOOGLE_API_KEY`.

In [34]:
from google.colab import userdata

GOOGLE_API_KEY=userdata.get('GOOGLE_API_KEY')

### Initialize SDK client

With the new SDK only need to initialize a client with you API key (or OAuth if using [Vertex AI](https://cloud.google.com/vertex-ai)). The model is now set in each call.

In [35]:
from google import genai
from google.genai import types

client = genai.Client(api_key=GOOGLE_API_KEY)

### Select the Gemini model

Video understanding works best Gemini 2.5 pro model. Also select former models to compare their behavior but it is recommended to use at least the 2.0 ones.



In [54]:
model_name = "gemini-2.5-pro-exp-03-25" # @param ["gemini-1.5-flash-latest","gemini-2.0-flash-lite","gemini-2.0-flash","gemini-2.5-pro-exp-03-25"] {"allow-input":true, isTemplate: true}

### Get sample iMAGEs

I will start with uploaded image, as it's a more common use-case, but I will also see later to analyse Youtube videos.

In [37]:
import requests
from PIL import Image
from io import BytesIO

# Function to download and process image from URL
def process_image_from_url(url):
    try:
        # Fetch the image data from the URL
        response = requests.get(url)
        response.raise_for_status()  # Raise error for bad response
        image_data = BytesIO(response.content)

        # Open the image using Pillow
        image = Image.open(image_data)

        # Example: Convert image to grayscale
        grayscale_image = image.convert("L")
        grayscale_image.show()  # Display the processed image

        # Save the processed image locally
        grayscale_image.save("processed_image.jpg")
        print("Image processed and saved as 'processed_image.jpg'")
    except Exception as e:
        print(f"Error processing image: {e}")

# Replace with your URL
image_url = "https://i.pinimg.com/474x/c2/f7/52/c2f75236a0882c1e3dae641ae0fe6769.jpg"
process_image_from_url(image_url)


Image processed and saved as 'processed_image.jpg'


In [58]:
import time

def upload_video(video_file_name):
  video_file = client.files.upload(file=video_file_name)

  while video_file.state == "PROCESSING":
      print('Waiting for video to be processed.')
      time.sleep(10)
      video_file = client.files.get(name=video_file.name)

  if video_file.state == "FAILED":
    raise ValueError(video_file.state)
  print(f'image processing complete: ' + video_file.uri)

  return video_file

Image_analyse = upload_video('processed_image.jpg')


image processing complete: https://generativelanguage.googleapis.com/v1beta/files/omi02nnublrb


### Upload the image

Upload  the image using the File API.

This can take a couple of minutes as the videos will need to be processed and tokenized.

### Imports

In [41]:
import json
from PIL import Image
from IPython.display import display, Markdown, HTML

In [46]:
from IPython.display import Image, display

# Display the image
image_url = "https://i.pinimg.com/474x/c2/f7/52/c2f75236a0882c1e3dae641ae0fe6769.jpg"
display(Image(url=image_url))


In [43]:
prompt = "Describe the image in detail, focusing on the key objects, characters, and their interactions. Identify any notable patterns, colors, or themes present in the scene. Highlight the context or purpose of the elements within the image, and interpret the overall mood or message conveyed. Include any symbolic or cultural significance if applicable."

video = "processed_image.jpg"

response = client.models.generate_content(
    model=model_name,
    contents=[
        video,
        prompt,
    ]
)

Markdown(response.text)


Okay, here is a detailed description of the image `processed_image.jpg`:

**Overall Scene:**
The image captures a moment of celebration or a formal gathering, likely within an office or corporate setting. A group of people, predominantly men dressed in business attire, are gathered around a long table covered with a white tablecloth. The central focus is a large rectangular cake being prepared for cutting.

**Key Objects:**

1.  **Cake:** Positioned centrally on the table, this is a large, rectangular sheet cake. It's covered in white frosting. There appears to be writing or a logo/emblem decorated on top, though the specific details are not perfectly clear from this viewpoint. Its size suggests it's meant for a significant number of people.
2.  **Table:** A long table covered with a crisp white tablecloth, serving as the platform for the cake and the gathering point for the group.
3.  **Knife:** A large knife, possibly ceremonial or simply a large serving knife suitable for the cake's size, is held by one individual, poised above the cake.
4.  **Plates & Cutlery:** Simple white plates (likely paper or basic ceramic) are stacked near the cake, ready for serving. Some basic cutlery might also be present.

**Characters and Interactions:**

1.  **Central Figure:** A man standing directly behind the cake is the main actor in the scene. He is wearing a dark suit, white shirt, and tie. He holds the large knife with both hands, looking down at the cake, seemingly about to make the first cut. He has a slight smile, indicating the positive nature of the event.
2.  **Surrounding Group:** Several other individuals, mostly men also in suits and ties (various shades of grey, blue, black), flank the central figure and extend along the table. At least one woman is visible in the group. Their attention is directed towards the central figure and the cake. Many have pleasant expressions or slight smiles, observing the cake-cutting ritual.
3.  **Interaction:** The primary interaction is the collective focus on the act of cutting the cake. The people are standing relatively close together, suggesting camaraderie or a shared purpose within the context of the event. It’s a shared moment of observance before the cake is distributed.

**Patterns, Colors, and Themes:**

1.  **Patterns:** The repetition of formal business attire (suits, shirts, ties) creates a visual pattern reinforcing the professional context. The rectangular shapes of the cake and table provide geometric structure. The arrangement of people clustered around the table forms a social pattern of gathering.
2.  **Colors:** The color palette is dominated by the dark tones of the suits, contrasted sharply by the white of the tablecloth, cake frosting, shirts, and plates. Various colors appear in the ties, adding small points of accent. The overall color scheme is somewhat muted and formal.
3.  **Themes:** Key themes include celebration, achievement, milestone, formality, community, and corporate culture. The act itself points towards marking a specific occasion.

**Context and Purpose:**

*   **Context:** This scene strongly suggests a corporate or organizational event. It could be an anniversary celebration, the conclusion of a successful project, a retirement party, a welcome event for a new executive, or a similar workplace milestone. The formal attire and the setting point away from a casual, personal gathering.
*   **Purpose:** The elements serve to mark and celebrate the occasion formally. The cake is a traditional centerpiece for celebration, meant to be shared. The ceremonial act of the first cut, often performed by a key individual, signifies the official start of the celebratory part of the event. The gathering itself fosters team spirit or acknowledges collective effort/significance.

**Overall Mood and Message:**

*   **Mood:** The mood is generally positive, celebratory, and communal, albeit within a formal framework. There's a sense of anticipation and shared recognition of the event's importance.
*   **Message:** The image conveys a message of unity, success, or acknowledgment within a professional group. It highlights a moment where formality briefly pauses for a shared ritual of celebration.

**Symbolic or Cultural Significance:**

*   **Cake Cutting:** In many Western and westernized cultures, cutting a cake is a deeply ingrained ritual for celebrations. It symbolizes shared joy, marking a special moment in time. The "first cut" often holds significance, performed by the person(s) being honored or a figure of authority. Sharing the cake symbolizes community and distribution of goodwill or success.
*   **Formal Attire:** Represents professionalism, seriousness of purpose, and the established culture of the organization.

In summary, the image depicts a formal workplace celebration centered around the ritualistic cutting of a large cake, observed by a group of professionals, symbolizing a shared milestone or achievement within the organization.

# Extract and organize text

Gemini can also read what's in the .csv file and extract it in an organized way. Gemini reasoning capabilities can generate new ideas for you.



In [66]:
import requests
import pandas as pd

def analyze_csv_from_url(url):
    try:
        # Fetch the CSV data from the URL
        response = requests.get(url)
        response.raise_for_status()  # Raise an error for unsuccessful requests

        # Save the CSV content locally (optional)
        with open("downloaded_file.csv", "wb") as file:
            file.write(response.content)

        # Load the CSV into a Pandas DataFrame
        df = pd.read_csv("downloaded_file.csv")

        # Example Analysis: Display basic information about the data
        print("First 5 rows:")
        print(df.head())

        print("\nSummary Statistics:")
        print(df.describe())

        print("\nColumns in the CSV file:")
        print(df.columns)

        # You can add further analysis depending on your requirements
    except Exception as e:
        print(f"Error occurred: {e}")


# Replace with your CSV URL
csv_url = "https://raw.githubusercontent.com/VijayaJothi24/VijayaJothi24/main/City.csv"

analyze_csv_from_url(csv_url)


First 5 rows:
               City Population    Users
0       NEW YORK NY  8,405,837  302,149
1  SAN FRANCISCO CA    629,591  213,609
2        CHICAGO IL  1,955,130  164,468
3    LOS ANGELES CA  1,595,037  144,132
4     WASHINGTON DC    418,859  127,001

Summary Statistics:
               City Population    Users
count            20         20       20
unique           20         20       20
top     NEW YORK NY  8,405,837  302,149
freq              1          1        1

Columns in the CSV file:
Index(['City', 'Population', 'Users'], dtype='object')


As you can see, Gemini is able to grasp to with item corresponds each note, including the last one.

# Analyze youtube videos

Downbelow Another Generative AI capablity task of Video Analysing is done

In [52]:
response = client.models.generate_content(
    model=model_name,
    contents=types.Content(
        parts=[
            types.Part(text="Find all the instances where Vijaya says \"software testing\". Provide timestamps and broader context for each instance."),
            types.Part(
                file_data=types.FileData(file_uri='https://www.youtube.com/watch?v=13TBF_4KqXA')
            )
        ]
    )
)

Markdown(response.text)


Based on the audio and the visual mind map, here are the instances where Vijaya says "software testing":

1.  **Timestamp:** 0:06 - 0:07
    *   **Quote:** "...the screen for the **software testing** foundation..."
    *   **Broader Context:** Vijaya is introducing the mind map shown on the screen, stating that its central topic is the "Software Testing foundation".

2.  **Timestamp:** 1:11
    *   **Quote:** "...error identified in **software testing**."
    *   **Broader Context:** Vijaya is defining the term "Defect" according to the mind map, explaining it as an error that is identified during the process of software testing.

3.  **Timestamp:** 1:33 - 1:34
    *   **Quote:** "...found during a **software testing**."
    *   **Broader Context:** While explaining the root cause example of a defect (incorrect GPS configuration), Vijaya mentions that such an issue is discovered during the software testing activity.

4.  **Timestamp:** 3:16 - 3:17
    *   **Quote:** "...static testing in the **software testing** environment."
    *   **Broader Context:** Vijaya is discussing the "Value of Static testing", emphasizing its importance by stating that certain coding faults found via static analysis might be missed if only dynamic testing were performed within the overall software testing environment.