<a href="https://colab.research.google.com/github/VijayaJothi24/Gemini_Capstoneproject/blob/main/Google_Gen_AI_Capstone_Project.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# MultiModal AI Capablity *Image,Text,Video,Audio * understanding with Gemini

## Image understanding with Gemini

Gemini has from the begining been a multimodal model, capable of analyzing all sorts of medias using its [long context window](https://developers.googleblog.com/en/new-features-for-the-gemini-api-and-google-ai-studio/).

[Gemini 2.0](https://ai.google.dev/gemini-api/docs/models/gemini-v2) and later bring Image analysis to a whole new level as illustrated in [this image](https://i.pinimg.com/474x/c2/f7/52/c2f75236a0882c1e3dae641ae0fe6769.jpg):


In [None]:
from IPython.display import Image, display

# Display the image from the URL
image_url = "https://i.pinimg.com/736x/20/2a/fe/202afe2d2615248f757fa0e4d925d701.jpg"
display(Image(url=image_url))



## Setup

This section install the SDK, set it up using  [API key](../quickstarts/Authentication.ipynb), imports the relevant libs, downloads the sample videos and upload them to Gemini.


### Install SDK

The new **[Google Gen AI SDK](https://ai.google.dev/gemini-api/docs/sdks)** provides programmatic access to Gemini 2.0 (and previous models) using both the [Google AI for Developers](https://ai.google.dev/gemini-api/docs) and [Vertex AI](https://cloud.google.com/vertex-ai/generative-ai/docs/overview) APIs. With a few exceptions, code that runs on one platform will run on both.

In [None]:
%pip install -U -q 'google-genai'

### Setup  API key

To run the following cell,  API key is stored it in a Colab Secret named `GOOGLE_API_KEY`.

In [None]:
from google.colab import userdata

GOOGLE_API_KEY=userdata.get('GOOGLE_API_KEY')

### Initialize SDK client

With the new SDK only need to initialize a client with you API key (or OAuth if using [Vertex AI](https://cloud.google.com/vertex-ai)). The model is now set in each call.

In [None]:
from google import genai
from google.genai import types

client = genai.Client(api_key=GOOGLE_API_KEY)

### Select the Gemini model

Video understanding works best Gemini 2.5 pro model. Also select former models to compare their behavior but it is recommended to use at least the 2.0 ones.



In [None]:
model_name = "gemini-2.5-pro-exp-03-25" # @param ["gemini-1.5-flash-latest","gemini-2.0-flash-lite","gemini-2.0-flash","gemini-2.5-pro-exp-03-25"] {"allow-input":true, isTemplate: true}

### Get sample Image

I will start with uploaded image, as it's a more common use-case, but I will also see later to analyse Youtube videos.

In [None]:
import requests
from PIL import Image
from io import BytesIO

# Function to download and process image from URL
def process_image_from_url(url):
    try:
        # Fetch the image data from the URL
        response = requests.get(url)
        response.raise_for_status()  # Raise error for bad response
        image_data = BytesIO(response.content)

        # Open the image using Pillow
        image = Image.open(image_data)

        # Example: Convert image to grayscale
        grayscale_image = image.convert("L")
        grayscale_image.show()  # Display the processed image

        # Save the processed image locally
        grayscale_image.save("processed_image.jpg")
        print("Image processed and saved as 'processed_image.jpg'")
    except Exception as e:
        print(f"Error processing image: {e}")

# Replace with your URL
image_url = "https://i.pinimg.com/736x/20/2a/fe/202afe2d2615248f757fa0e4d925d701.jpg"
process_image_from_url(image_url)


Image processed and saved as 'processed_image.jpg'


In [None]:
import time

def upload_video(video_file_name):
  video_file = client.files.upload(file=video_file_name)

  while video_file.state == "PROCESSING":
      print('Waiting for video to be processed.')
      time.sleep(10)
      video_file = client.files.get(name=video_file.name)

  if video_file.state == "FAILED":
    raise ValueError(video_file.state)
  print(f'image processing complete: ' + video_file.uri)

  return video_file

Image_analyse = upload_video('processed_image.jpg')


image processing complete: https://generativelanguage.googleapis.com/v1beta/files/4aaq5y2eoa8i


### Upload the image

Upload  the image using the File API.

This can take a couple of minutes as the videos will need to be processed and tokenized.

### Imports

In [None]:
import json
from PIL import Image
from IPython.display import display, Markdown, HTML

In [None]:
from IPython.display import Image, display

# Display the image
image_url = "https://i.pinimg.com/736x/20/2a/fe/202afe2d2615248f757fa0e4d925d701.jpg"
display(Image(url=image_url))


In [None]:
prompt = "Describe the image in detail, focusing on the key shapes, color. Identify any notable patterns, colors, or themes preacters, analyze the elements and composition of this image. Describe the shapes, colors, arrangement, and any notable patterns or featuresHighlight the context or purpose of the elements within the image, and interpret the overall mood or message conveyed. Include any symbolic or cultural significance if applicable."

video = "processed_image.jpg"

response = client.models.generate_content(
    model=model_name,
    contents=[
        video,
        prompt,
    ]
)

Markdown(response.text)


Okay, let's analyze the provided image, "processed_image.jpg".

**Overall Impression:**

The image presents a vibrant and dynamic abstract composition characterized by geometric complexity and a striking color palette. It appears digitally generated or heavily processed, emphasizing clean lines, overlapping forms, and a sense of layered depth.

**Shapes:**

*   **Dominant Shapes:** The composition is built primarily from geometric shapes. Rectangles and squares form foundational blocks, often layered or intersecting. Sharp, angular triangles and diagonal lines cut across these steadier forms, introducing energy and movement.
*   **Secondary Shapes:** Curved elements or partial circles might be present, offering a visual counterpoint to the prevailing angularity, softening edges, or creating focal points.
*   **Interaction:** Shapes overlap extensively, creating new, composite forms in the intersections. Some shapes have crisp, well-defined edges, while others might have softer, blurred, or gradient transitions, suggesting varying depths or focal planes.

**Colors:**

*   **Palette:** The color palette is a key feature, likely high-contrast and saturated. Expect to see bold primary or secondary colors (like electric blues, vibrant reds or magentas, bright yellows or greens) juxtaposed against each other or against darker neutrals (deep grays, blacks) or stark whites.
*   **Application:** Colors are likely applied in solid blocks corresponding to the geometric shapes. However, there might also be areas of gradient color transitions within shapes or subtle textural variations adding visual interest.
*   **Contrast:** Strong value contrast (light vs. dark) and hue contrast (complementary or disparate colors placed side-by-side) are probably used to make the shapes pop and create visual excitement.

**Arrangement and Composition:**

*   **Layering:** The elements are arranged in multiple layers, creating a sense of depth and complexity. Some shapes appear closer, overlapping those behind them.
*   **Balance:** The composition is likely asymmetrical, achieving balance through the careful distribution of visual weight (size, color intensity, placement of shapes) rather than mirroring.
*   **Movement:** Diagonal lines, sharp angles, and the juxtaposition of contrasting elements likely create a strong sense of dynamism and visual movement, guiding the eye through the composition.
*   **Focal Point:** While abstract, there might be a primary focal area where colors are brightest, shapes are most complex, or lines converge, drawing the viewer's initial attention.

**Notable Patterns and Features:**

*   **Repetition (Potential):** There might be repetition of certain shapes, colors, or motifs, creating rhythm or pattern within the overall complexity.
*   **Intersection Effects:** The areas where shapes overlap might feature unique colors (as if through transparent layers) or distinct boundary lines, highlighting the layering process.
*   **Digital Aesthetic:** The clean lines, potentially perfect geometric forms, and vibrant, possibly unnatural colors strongly suggest a digital origin or heavy digital manipulation ("processed"). There might be subtle digital artifacts or textures if examined closely.

**Context and Purpose:**

*   **Aesthetic Exploration:** The primary purpose appears to be aesthetic – an exploration of form, color, and composition. It functions as a piece of abstract visual art.
*   **Graphic Design:** It could serve as a background element, a visual identity component (like for a tech company or creative agency), or an illustration demonstrating modern design principles.
*   **Mood Board/Inspiration:** It might represent a visual theme or mood, perhaps related to technology, energy, urban environments, or complexity.

**Overall Mood and Message:**

*   **Mood:** The mood is likely energetic, dynamic, modern, and possibly complex or even slightly chaotic due to the layering and sharp angles. The vibrant colors contribute to a sense of excitement and intensity.
*   **Message:** The image could be interpreted as representing the complexity of modern life, the interconnectedness of systems, the dynamism of technology, or simply the beauty found in structured abstraction. It celebrates geometry, color interaction, and digital precision.

**Symbolic or Cultural Significance:**

*   **Modernity/Technology:** Abstract geometric art, especially with a digital aesthetic, is often associated with modernity, technology, data, and progress.
*   **Order and Chaos:** The interplay between structured geometric forms (order) and their complex, layered, sometimes jarring arrangement (chaos) can be seen as a visual metaphor for various systems or experiences.
*   **Color Symbolism:** Specific colors might carry cultural associations (e.g., red for energy/passion, blue for stability/technology, yellow for optimism), though in abstract art, their primary role is often compositional and perceptual.

In summary, "processed_image.jpg" is likely a visually striking abstract piece dominated by layered geometric shapes, a vibrant and high-contrast color palette, and a dynamic composition. Its purpose is primarily aesthetic, conveying a mood of modern energy and complexity, potentially serving as digital art or a graphic design element.

# Extract and organize text

Gemini can also read what's in the .csv file and extract it in an organized way. Gemini reasoning capabilities can generate new ideas for you.



In [None]:
import requests
import pandas as pd

def analyze_csv_from_url(url):
    try:
        # Fetch the CSV data from the URL
        response = requests.get(url)
        response.raise_for_status()  # Raise an error for unsuccessful requests

        # Save the CSV content locally (optional)
        with open("downloaded_file.csv", "wb") as file:
            file.write(response.content)

        # Load the CSV into a Pandas DataFrame
        df = pd.read_csv("downloaded_file.csv")

        # Example Analysis: Display basic information about the data
        print("First 5 rows:")
        print(df.head())

        print("\nSummary Statistics:")
        print(df.describe())

        print("\nColumns in the CSV file:")
        print(df.columns)

        # You can add further analysis depending on your requirements
    except Exception as e:
        print(f"Error occurred: {e}")


# Replace with your CSV URL
csv_url = "https://raw.githubusercontent.com/VijayaJothi24/VijayaJothi24/main/City.csv"

analyze_csv_from_url(csv_url)


First 5 rows:
               City Population    Users
0       NEW YORK NY  8,405,837  302,149
1  SAN FRANCISCO CA    629,591  213,609
2        CHICAGO IL  1,955,130  164,468
3    LOS ANGELES CA  1,595,037  144,132
4     WASHINGTON DC    418,859  127,001

Summary Statistics:
               City Population    Users
count            20         20       20
unique           20         20       20
top     NEW YORK NY  8,405,837  302,149
freq              1          1        1

Columns in the CSV file:
Index(['City', 'Population', 'Users'], dtype='object')


As you can see, Gemini is able to grasp to with item corresponds each note, including the last one.

# Analyze youtube videos

Downbelow Another Generative AI capablity task of Video Analysing is done

In [None]:
response = client.models.generate_content(
    model=model_name,
    contents=types.Content(
        parts=[
            types.Part(text="Find all the instances where Vijaya says \"software testing\". Provide timestamps and broader context for each instance."),
            types.Part(
                file_data=types.FileData(file_uri=https://www.youtube.com/watch?v=2HngmLk_W5A)
            )
        ]
    )
)

Markdown(response.text)


Based on the audio, here are the instances where Vijaya says "software testing", along with timestamps and context:

1.  **Timestamp:** 0:05 - 0:07
    *   **Context:** Vijaya is introducing the mind map shown on the screen. She says, "...this is the screen for the **software testing** foundation..." setting the overall topic of the presentation.

2.  **Timestamp:** 1:09 - 1:11
    *   **Context:** Vijaya is defining the term 'Defect' as presented in the mind map. She reads or explains the definition: "An Error Identified in **Software Testing**."

3.  **Timestamp:** 1:32 - 1:34
    *   **Context:** After explaining a specific root cause example for a defect (incorrect GPS configuration), Vijaya mentions how such a defect would be discovered, stating, "...So it is found during a **software testing**."

4.  **Timestamp:** 3:15 - 3:17
    *   **Context:** Vijaya is discussing the "Value of Static testing" and explaining why it's important. She concludes the point by saying, "...This states the necessity of static testing in the **software testing** environment."