<a href="https://colab.research.google.com/github/VijayaJothi24/Python_Project/blob/main/Google_Capstone_Project_GeminiAPI_Gemini.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# MultiModal AI Capablity *Image,Text,Video,Audio * understanding with Gemini

## Image understanding with Gemini

Gemini has from the begining been a multimodal model, capable of analyzing all sorts of medias using its [long context window](https://developers.googleblog.com/en/new-features-for-the-gemini-api-and-google-ai-studio/).

[Gemini 2.0](https://ai.google.dev/gemini-api/docs/models/gemini-v2) and later bring Image analysis to a whole new level as illustrated in [this image](https://i.pinimg.com/474x/c2/f7/52/c2f75236a0882c1e3dae641ae0fe6769.jpg):


In [1]:
from IPython.display import Image, display

# Display the image from the URL
image_url = "https://i.pinimg.com/736x/fd/95/02/fd95021c676932304e41167ce9d86211.jpg"
display(Image(url=image_url))



This notebook will show you how to easily use Gemini to perform the same kind of image, video and text analysis. Each of them has different prompts that you can select using the dropdown, also feel free to experiment with your own.

.

## Setup

This section install the SDK, set it up using  [API key](../quickstarts/Authentication.ipynb), imports the relevant libs, downloads the sample videos and upload them to Gemini.


### Install SDK

The new **[Google Gen AI SDK](https://ai.google.dev/gemini-api/docs/sdks)** provides programmatic access to Gemini 2.0 (and previous models) using both the [Google AI for Developers](https://ai.google.dev/gemini-api/docs) and [Vertex AI](https://cloud.google.com/vertex-ai/generative-ai/docs/overview) APIs. With a few exceptions, code that runs on one platform will run on both.

In [2]:
%pip install -U -q 'google-genai'

[?25l   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/154.7 kB[0m [31m?[0m eta [36m-:--:--[0m[2K   [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[91m╸[0m [32m153.6/154.7 kB[0m [31m19.0 MB/s[0m eta [36m0:00:01[0m[2K   [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[91m╸[0m [32m153.6/154.7 kB[0m [31m19.0 MB/s[0m eta [36m0:00:01[0m[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m154.7/154.7 kB[0m [31m1.7 MB/s[0m eta [36m0:00:00[0m
[?25h

### Setup your API key

To run the following cell, your API key must be stored it in a Colab Secret named `GOOGLE_API_KEY`.

In [3]:
from google.colab import userdata

GOOGLE_API_KEY=userdata.get('GOOGLE_API_KEY')

### Initialize SDK client

With the new SDK only need to initialize a client with you API key (or OAuth if using [Vertex AI](https://cloud.google.com/vertex-ai)). The model is now set in each call.

In [4]:
from google import genai
from google.genai import types

client = genai.Client(api_key=GOOGLE_API_KEY)

### Select the Gemini model

Video understanding works best Gemini 2.5 pro model. Also select former models to compare their behavior but it is recommended to use at least the 2.0 ones.



In [5]:
model_name = "gemini-2.5-pro-exp-03-25" # @param ["gemini-1.5-flash-latest","gemini-2.0-flash-lite","gemini-2.0-flash","gemini-2.5-pro-exp-03-25"] {"allow-input":true, isTemplate: true}

### Get sample iMAGEs

I will start with uploaded image, as it's a more common use-case, but I will also see later to analyse Youtube videos.

In [6]:
import requests
from PIL import Image
from io import BytesIO

# Function to download and process image from URL
def process_image_from_url(url):
    try:
        # Fetch the image data from the URL
        response = requests.get(url)
        response.raise_for_status()  # Raise error for bad response
        image_data = BytesIO(response.content)

        # Open the image using Pillow
        image = Image.open(image_data)

        # Example: Convert image to grayscale
        grayscale_image = image.convert("L")
        grayscale_image.show()  # Display the processed image

        # Save the processed image locally
        grayscale_image.save("processed_image.jpg")
        print("Image processed and saved as 'processed_image.jpg'")
    except Exception as e:
        print(f"Error processing image: {e}")

# Replace with your URL
image_url = "https://i.pinimg.com/736x/fd/95/02/fd95021c676932304e41167ce9d86211.jpg"
process_image_from_url(image_url)


Image processed and saved as 'processed_image.jpg'


In [7]:
import time

def upload_video(video_file_name):
  video_file = client.files.upload(file=video_file_name)

  while video_file.state == "PROCESSING":
      print('Waiting for video to be processed.')
      time.sleep(10)
      video_file = client.files.get(name=video_file.name)

  if video_file.state == "FAILED":
    raise ValueError(video_file.state)
  print(f'image processing complete: ' + video_file.uri)

  return video_file

Image_analyse = upload_video('processed_image.jpg')


image processing complete: https://generativelanguage.googleapis.com/v1beta/files/2m5fd6q7zh99


### Upload the image

Upload  the image using the File API.

This can take a couple of minutes as the videos will need to be processed and tokenized.

### Imports

In [8]:
import json
from PIL import Image
from IPython.display import display, Markdown, HTML

In [10]:
from IPython.display import Image, display

# Display the image
image_url = "https://i.pinimg.com/736x/fd/95/02/fd95021c676932304e41167ce9d86211.jpg"
display(Image(url=image_url))


In [11]:
prompt = "Describe the image in detail, focusing on the key objects, characters, and their interactions. Identify any notable patterns, colors, or themes present in the scene. Highlight the context or purpose of the elements within the image, and interpret the overall mood or message conveyed. Include any symbolic or cultural significance if applicable."

video = "processed_image.jpg"

response = client.models.generate_content(
    model=model_name,
    contents=[
        video,
        prompt,
    ]
)

Markdown(response.text)


Okay, let's break down the provided image, "processed_image.jpg".

**Overall Impression:**
The image is a clean, modern digital graphic, likely serving as promotional or informational material for a technology product or service called "Gemini". It features a friendly robot mascot as the central figure, surrounded by icons representing various capabilities.

**Key Objects and Characters:**

1.  **Central Character (Robot/Mascot):**
    *   **Appearance:** A stylized, anthropomorphic robot dominates the center. It's primarily white and various shades of blue. The head is large and round with a screen-like face displaying simple, friendly blue line-art eyes. Two blue antennae with white tips protrude from the top. The torso is white with blue accents. The arms and legs are blue, segmented or cylindrical, ending in simple white mitten-like hands and rounded blue feet.
    *   **Pose:** The robot stands facing slightly towards the viewer, with its left arm slightly raised and palm open in a welcoming or presenting gesture. Its head is slightly tilted, enhancing its friendly demeanor.
    *   **Interaction:** The robot acts as the embodiment or representative of "Gemini". Its friendly design aims to make the associated technology seem approachable and helpful.

2.  **Text "Gemini":**
    *   **Appearance:** The word "Gemini" is written in a large, bold, modern sans-serif font.
    *   **Placement:** Positioned prominently, usually below or near the robot, clearly identifying the subject of the graphic.
    *   **Color:** Typically rendered in blue, matching the robot's color scheme.
    *   **Purpose:** This is the brand name or product name being presented.

3.  **Capability Icons:**
    *   **Appearance:** Several circular icons are arranged around the central robot, often appearing to orbit or emanate from it. Each icon contains a simple, stylized graphic symbol within the circle and often has a text label below it.
    *   **Content:** These icons represent the functions or capabilities associated with Gemini. Common examples seen in such graphics might include:
        *   A speech bubble (for chat/conversation)
        *   A pen or document (for writing/drafting)
        *   A lightbulb (for ideas/brainstorming)
        *   Code brackets `</>` (for coding/programming assistance)
        *   An artist's palette (for creative tasks/design)
    *   **Purpose:** To quickly convey the versatility and range of tasks that Gemini can assist with.

4.  **Background:**
    *   **Appearance:** The background is typically minimalistic, often featuring a smooth gradient transitioning between shades of blue, purple, or white. It might include subtle abstract shapes, light flares, or soft geometric patterns to add depth without distracting from the main elements.
    *   **Purpose:** To provide a clean, visually appealing, and modern backdrop that reinforces the technological theme.

**Patterns, Colors, and Themes:**

*   **Patterns:** Repetition of circular shapes (icons), clean lines, smooth curves in the robot design, and often a gradient pattern in the background.
*   **Colors:** Dominated by blues and white. Blue evokes technology, trust, intelligence, and stability. White suggests simplicity, cleanliness, and modernity. Occasional accents of other colors (like purple in gradients) might add visual interest.
*   **Themes:**
    *   **Technology & AI:** Clearly central, represented by the robot and functional icons.
    *   **Helpfulness & Assistance:** The friendly mascot and task-oriented icons suggest a tool designed to aid users.
    *   **Versatility & Creativity:** The range of icons (coding, writing, ideas) highlights broad applicability.
    *   **Modernity & Simplicity:** The overall design aesthetic is clean, uncluttered, and contemporary.

**Context and Purpose:**

*   **Context:** This graphic is almost certainly related to Google's AI model, Gemini. It's designed for digital platforms (websites, apps, presentations) to introduce or explain what Gemini is and what it can do.
*   **Purpose:** To create brand recognition for Gemini, communicate its core functionalities in an easily digestible visual format, and establish a friendly, accessible identity for a potentially complex technology (AI).

**Mood and Message:**

*   **Mood:** Optimistic, friendly, helpful, approachable, innovative, and efficient.
*   **Message:** The core message is that "Gemini is a powerful yet friendly AI assistant capable of helping you with a wide range of creative, communicative, and technical tasks." It emphasizes ease of use and broad utility.

**Symbolic or Cultural Significance:**

*   **Friendly Robot:** Anthropomorphizing AI into a non-threatening, cute robot is a common strategy to make advanced technology less intimidating and more relatable to a broad audience.
*   **Color Palette:** The use of blue aligns with common cultural associations of the color with technology and trustworthiness (e.g., IBM, Facebook, Twitter).
*   **Icons:** Using universally recognized symbols for tasks (chat, write, code) transcends language barriers and makes the capabilities instantly understandable.

In summary, the image is a well-designed piece of branding and informational graphic for the AI "Gemini," using a friendly robot mascot, clear icons, and a clean, modern aesthetic to convey its helpfulness, versatility, and approachable nature.

# Extract and organize text

Gemini can also read what's in the .csv file and extract it in an organized way. Gemini reasoning capabilities can generate new ideas for you.



In [12]:
import requests
import pandas as pd

def analyze_csv_from_url(url):
    try:
        # Fetch the CSV data from the URL
        response = requests.get(url)
        response.raise_for_status()  # Raise an error for unsuccessful requests

        # Save the CSV content locally (optional)
        with open("downloaded_file.csv", "wb") as file:
            file.write(response.content)

        # Load the CSV into a Pandas DataFrame
        df = pd.read_csv("downloaded_file.csv")

        # Example Analysis: Display basic information about the data
        print("First 5 rows:")
        print(df.head())

        print("\nSummary Statistics:")
        print(df.describe())

        print("\nColumns in the CSV file:")
        print(df.columns)

        # You can add further analysis depending on your requirements
    except Exception as e:
        print(f"Error occurred: {e}")


# Replace with your CSV URL
csv_url = "https://raw.githubusercontent.com/VijayaJothi24/VijayaJothi24/main/City.csv"

analyze_csv_from_url(csv_url)


First 5 rows:
               City Population    Users
0       NEW YORK NY  8,405,837  302,149
1  SAN FRANCISCO CA    629,591  213,609
2        CHICAGO IL  1,955,130  164,468
3    LOS ANGELES CA  1,595,037  144,132
4     WASHINGTON DC    418,859  127,001

Summary Statistics:
               City Population    Users
count            20         20       20
unique           20         20       20
top     NEW YORK NY  8,405,837  302,149
freq              1          1        1

Columns in the CSV file:
Index(['City', 'Population', 'Users'], dtype='object')


As you can see, Gemini is able to grasp to with item corresponds each note, including the last one.

# Analyze youtube videos

Downbelow Another Generative AI capablity task of Video Analysing is done

In [14]:
response = client.models.generate_content(
    model=model_name,
    contents=types.Content(
        parts=[
            types.Part(text="Find all the instances where Vijaya says \"software testing\". Provide timestamps and broader context for each instance."),
            types.Part(
                file_data=types.FileData(file_uri='https://www.youtube.com/watch?v=13TBF_4KqXA')
            )
        ]
    )
)

Markdown(response.text)


Okay, here are the instances where Vijaya says "software testing", along with timestamps and context:

1.  **Timestamp:** 0:06 - 0:07
    *   **Context:** Vijaya is introducing the mind map displayed on the screen. She states, "So, this is the screen for the **software testing** foundation tidbits." She is identifying the central topic of the mind map.

2.  **Timestamp:** 1:10 - 1:11
    *   **Context:** Vijaya is defining the term 'Defect' as shown on the mind map branch. She says, "So defect is an error identified in **software testing**." She is providing the definition associated with the 'Defect' node.

3.  **Timestamp:** 1:32 - 1:33
    *   **Context:** While explaining the 'Root Cause' example related to a defect in a fitness tracker application, Vijaya mentions how such an issue would be discovered. She says, "...it is found during a **software testing**." This is part of illustrating a real-world defect scenario.

4.  **Timestamp:** 3:15 - 3:17
    *   **Context:** Vijaya is discussing the 'Value of Static Testing'. She explains why it's important, stating, "This states the necessity of static testing in the **software testing** environment." She emphasizes its role in finding defects that dynamic testing might miss within the overall testing process.