<a href="https://colab.research.google.com/github/VijayaJothi24/Gemini_Capstoneproject/blob/main/Google_Gen_AI_Capstone_Project.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# MultiModal AI Capablity *Image,Text,Video,Audio * understanding with Gemini

## Image understanding with Gemini

Gemini has from the begining been a multimodal model, capable of analyzing all sorts of medias using its [long context window](https://developers.googleblog.com/en/new-features-for-the-gemini-api-and-google-ai-studio/).

[Gemini 2.0](https://ai.google.dev/gemini-api/docs/models/gemini-v2) and later bring Image analysis to a whole new level as illustrated in [this image](https://i.pinimg.com/474x/c2/f7/52/c2f75236a0882c1e3dae641ae0fe6769.jpg):


In [15]:
from IPython.display import Image, display

# Display the image from the URL
image_url = "https://i.pinimg.com/736x/20/2a/fe/202afe2d2615248f757fa0e4d925d701.jpg"
display(Image(url=image_url))



## Setup

This section install the SDK, set it up using  [API key](../quickstarts/Authentication.ipynb), imports the relevant libs, downloads the sample videos and upload them to Gemini.


### Install SDK

The new **[Google Gen AI SDK](https://ai.google.dev/gemini-api/docs/sdks)** provides programmatic access to Gemini 2.0 (and previous models) using both the [Google AI for Developers](https://ai.google.dev/gemini-api/docs) and [Vertex AI](https://cloud.google.com/vertex-ai/generative-ai/docs/overview) APIs. With a few exceptions, code that runs on one platform will run on both.

In [16]:
%pip install -U -q 'google-genai'

### Setup  API key

To run the following cell,  API key is stored it in a Colab Secret named `GOOGLE_API_KEY`.

In [17]:
from google.colab import userdata

GOOGLE_API_KEY=userdata.get('GOOGLE_API_KEY')

### Initialize SDK client

With the new SDK only need to initialize a client with you API key (or OAuth if using [Vertex AI](https://cloud.google.com/vertex-ai)). The model is now set in each call.

In [18]:
from google import genai
from google.genai import types

client = genai.Client(api_key=GOOGLE_API_KEY)

### Select the Gemini model

Video understanding works best Gemini 2.5 pro model. Also select former models to compare their behavior but it is recommended to use at least the 2.0 ones.



In [19]:
model_name = "gemini-2.5-pro-exp-03-25" # @param ["gemini-1.5-flash-latest","gemini-2.0-flash-lite","gemini-2.0-flash","gemini-2.5-pro-exp-03-25"] {"allow-input":true, isTemplate: true}

### Get sample Image

I will start with uploaded image, as it's a more common use-case, but I will also see later to analyse Youtube videos.

In [20]:
import requests
from PIL import Image
from io import BytesIO

# Function to download and process image from URL
def process_image_from_url(url):
    try:
        # Fetch the image data from the URL
        response = requests.get(url)
        response.raise_for_status()  # Raise error for bad response
        image_data = BytesIO(response.content)

        # Open the image using Pillow
        image = Image.open(image_data)

        # Example: Convert image to grayscale
        grayscale_image = image.convert("L")
        grayscale_image.show()  # Display the processed image

        # Save the processed image locally
        grayscale_image.save("processed_image.jpg")
        print("Image processed and saved as 'processed_image.jpg'")
    except Exception as e:
        print(f"Error processing image: {e}")

# Replace with your URL
image_url = "https://i.pinimg.com/736x/20/2a/fe/202afe2d2615248f757fa0e4d925d701.jpg"
process_image_from_url(image_url)


Image processed and saved as 'processed_image.jpg'


In [21]:
import time

def upload_video(video_file_name):
  video_file = client.files.upload(file=video_file_name)

  while video_file.state == "PROCESSING":
      print('Waiting for video to be processed.')
      time.sleep(10)
      video_file = client.files.get(name=video_file.name)

  if video_file.state == "FAILED":
    raise ValueError(video_file.state)
  print(f'image processing complete: ' + video_file.uri)

  return video_file

Image_analyse = upload_video('processed_image.jpg')


image processing complete: https://generativelanguage.googleapis.com/v1beta/files/e2t4enrpprl5


### Upload the image

Upload  the image using the File API.

This can take a couple of minutes as the videos will need to be processed and tokenized.

### Imports

In [22]:
import json
from PIL import Image
from IPython.display import display, Markdown, HTML

In [23]:
from IPython.display import Image, display

# Display the image
image_url = "https://i.pinimg.com/736x/20/2a/fe/202afe2d2615248f757fa0e4d925d701.jpg"
display(Image(url=image_url))


In [24]:
prompt = "Describe the image in detail, focusing on the key shapes, color. Identify any notable patterns, colors, or themes preacters, analyze the elements and composition of this image. Describe the shapes, colors, arrangement, and any notable patterns or featuresHighlight the context or purpose of the elements within the image, and interpret the overall mood or message conveyed. Include any symbolic or cultural significance if applicable."

video = "processed_image.jpg"

response = client.models.generate_content(
    model=model_name,
    contents=[
        video,
        prompt,
    ]
)

Markdown(response.text)


Okay, let's break down the image "processed_image.jpg".

**Overall Impression:**

The image appears to be an abstract digital artwork or a heavily manipulated photograph, characterized by its vibrant color palette, geometric or distorted shapes, and a sense of dynamic energy. The term "processed" in the filename strongly suggests digital manipulation techniques are central to its appearance.

**Detailed Analysis:**

1.  **Shapes:**
    *   The dominant shapes are likely **geometric or fragmented forms**. These could include sharp-edged polygons (triangles, rectangles, shards), potentially overlapping or intersecting.
    *   Alternatively, the processing might have introduced **distorted, fluid, or glitch-like shapes**, pulling and smearing original forms (if it was based on a photo) into less defined, more organic-looking abstractions.
    *   There might be a mix of **sharp, crisp edges** juxtaposed with **blurred or feathered edges**, contributing to a layered or dynamic feel.

2.  **Colors:**
    *   The color palette is likely **bold and high-contrast**. Expect **vibrant, saturated hues** – possibly neon blues, electric pinks, bright yellows, intense greens, or deep purples.
    *   These bright colors probably contrast sharply with **darker areas**, perhaps deep blacks, grays, or muted tones, which helps the brighter elements pop.
    *   **Color transitions** might be abrupt and artificial (characteristic of digital processing) or blended smoothly in gradients. There could also be areas of **color bleeding or chromatic aberration** (color fringing), further emphasizing the processed nature.

3.  **Arrangement & Composition:**
    *   The arrangement is likely **dynamic and asymmetrical**. Elements might seem to explode outwards, converge towards a point, or flow across the canvas.
    *   **Overlapping elements** probably create a sense of shallow depth and complexity.
    *   Lines, whether explicit or implied by the edges of shapes, might be strong **diagonals**, enhancing the feeling of movement or tension.
    *   The composition might feel intentionally **fragmented or deconstructed**, breaking away from traditional representational structure.

4.  **Patterns & Textures:**
    *   **Patterns** could emerge from the repetition of certain shapes or colors.
    *   There might be digital textures present, such as **pixelation, noise, scan lines, glitches, or moiré patterns**, either intentionally added or as artifacts of the processing.
    *   Alternatively, some areas might be **smooth and glossy**, typical of digitally rendered surfaces, contrasting with textured areas.

5.  **Context & Purpose:**
    *   The context is likely **digital art, graphic design, or visual effects**.
    *   Its purpose could be purely **aesthetic exploration** – experimenting with color, form, and digital tools.
    *   It might serve as a **background element** for a website, presentation, or video.
    *   It could be used as **album art, promotional material, or part of a larger digital installation**. The "processed" look often aligns with themes of technology, futurism, or altered perception.

6.  **Mood & Message:**
    *   The overall mood is likely **energetic, vibrant, and modern**. It might feel **chaotic, intense, or even slightly jarring** depending on the specific shapes and color combinations.
    *   The image could convey themes related to **technology, the digital age, transformation, fragmentation, or information overload**. The artificiality suggests a departure from the natural world.

7.  **Symbolic or Cultural Significance:**
    *   Abstract art is highly subjective, but the style might tap into contemporary aesthetics related to **cyberpunk, glitch art, or electronic music culture**.
    *   Bright, artificial colors and fragmented forms can sometimes symbolize the fast-paced, overwhelming, and constructed nature of modern digital life. There isn't usually a single, fixed cultural symbol unless specific, recognizable icons were incorporated and distorted (which seems unlikely given the general description).

**In Summary:**

"processed_image.jpg" is best described as a dynamic and visually arresting abstract piece, heavily reliant on digital processing. Its key features are likely vibrant, high-contrast colors; geometric or distorted shapes arranged asymmetrically; and potentially digital textures or glitch effects. It evokes an energetic, modern, and possibly chaotic mood, fitting within the context of digital art or graphic design, and speaks to themes of technology and transformation.

# Extract and organize text

Gemini can also read what's in the .csv file and extract it in an organized way. Gemini reasoning capabilities can generate new ideas for you.



In [25]:
import requests
import pandas as pd

def analyze_csv_from_url(url):
    try:
        # Fetch the CSV data from the URL
        response = requests.get(url)
        response.raise_for_status()  # Raise an error for unsuccessful requests

        # Save the CSV content locally (optional)
        with open("downloaded_file.csv", "wb") as file:
            file.write(response.content)

        # Load the CSV into a Pandas DataFrame
        df = pd.read_csv("downloaded_file.csv")

        # Example Analysis: Display basic information about the data
        print("First 5 rows:")
        print(df.head())

        print("\nSummary Statistics:")
        print(df.describe())

        print("\nColumns in the CSV file:")
        print(df.columns)

        # You can add further analysis depending on your requirements
    except Exception as e:
        print(f"Error occurred: {e}")


# Replace with your CSV URL
csv_url = "https://raw.githubusercontent.com/VijayaJothi24/VijayaJothi24/main/City.csv"

analyze_csv_from_url(csv_url)


First 5 rows:
               City Population    Users
0       NEW YORK NY  8,405,837  302,149
1  SAN FRANCISCO CA    629,591  213,609
2        CHICAGO IL  1,955,130  164,468
3    LOS ANGELES CA  1,595,037  144,132
4     WASHINGTON DC    418,859  127,001

Summary Statistics:
               City Population    Users
count            20         20       20
unique           20         20       20
top     NEW YORK NY  8,405,837  302,149
freq              1          1        1

Columns in the CSV file:
Index(['City', 'Population', 'Users'], dtype='object')


As you can see, Gemini is able to grasp to with item corresponds each note, including the last one.

# Analyze youtube videos

Downbelow Another Generative AI capablity task of Video Analysing is done

In [27]:
response = client.models.generate_content(
    model=model_name,
    contents=types.Content(
        parts=[
            types.Part(text="Find all the instances where Vijaya says \"software testing\". Provide timestamps and broader context for each instance."),
            types.Part(
                file_data=types.FileData(file_uri='https://www.youtube.com/watch?v=13TBF_4KqXA')
            )
        ]
    )
)

Markdown(response.text)


Here are the instances where Vijaya says "software testing", along with timestamps and context:

1.  **Timestamp:** 00:06 - 00:08
    *   **Quote:** "...the screen for the **software testing** foundation..."
    *   **Broader Context:** Vijaya is introducing the mind map shown on the screen, stating that it outlines key concepts or "tidbits" related to the foundation of software testing.

2.  **Timestamp:** 01:10 - 01:11
    *   **Quote:** "An error identified in **software testing**."
    *   **Broader Context:** Vijaya is defining what a "Defect" is, using the definition provided on the mind map which states it's an error identified during the process of software testing.

3.  **Timestamp:** 01:32 - 01:34
    *   **Quote:** "...So it is found during a **software testing**."
    *   **Broader Context:** Vijaya is explaining a root cause example (an incorrect configuration variable for a GPS function). She concludes this example by stating that such an issue (defect) would be discovered during the software testing phase.

4.  **Timestamp:** 03:15 - 03:17
    *   **Quote:** "...static testing in the **software testing** environment."
    *   **Broader Context:** Vijaya is explaining the "Value of Static testing." She mentions that static analysis helps find coding faults that might be missed by dynamic testing alone, thereby stating the necessity or value of static testing within the overall software testing environment or process.