<a href="https://colab.research.google.com/github/VijayaJothi24/Gemini_Capstoneproject/blob/main/Google_Gen_AI_Capstone_Project.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# MultiModal AI Capablity *Image,Text,Video,Audio * understanding with Gemini

## Image understanding with Gemini

Gemini has from the begining been a multimodal model, capable of analyzing all sorts of medias using its [long context window](https://developers.googleblog.com/en/new-features-for-the-gemini-api-and-google-ai-studio/).

[Gemini 2.0](https://ai.google.dev/gemini-api/docs/models/gemini-v2) and later bring Image analysis to a whole new level as illustrated in [this image](https://i.pinimg.com/474x/c2/f7/52/c2f75236a0882c1e3dae641ae0fe6769.jpg):


In [55]:
from IPython.display import Image, display

# Display the image from the URL
image_url = "https://i.pinimg.com/736x/20/2a/fe/202afe2d2615248f757fa0e4d925d701.jpg"
display(Image(url=image_url))



## Setup

This section install the SDK, set it up using  [API key](../quickstarts/Authentication.ipynb), imports the relevant libs, downloads the sample videos and upload them to Gemini.


### Install SDK

The new **[Google Gen AI SDK](https://ai.google.dev/gemini-api/docs/sdks)** provides programmatic access to Gemini 2.0 (and previous models) using both the [Google AI for Developers](https://ai.google.dev/gemini-api/docs) and [Vertex AI](https://cloud.google.com/vertex-ai/generative-ai/docs/overview) APIs. With a few exceptions, code that runs on one platform will run on both.

In [56]:
%pip install -U -q 'google-genai'

### Setup  API key

To run the following cell,  API key is stored it in a Colab Secret named `GOOGLE_API_KEY`.

In [57]:
from google.colab import userdata

GOOGLE_API_KEY=userdata.get('GOOGLE_API_KEY')

### Initialize SDK client

With the new SDK only need to initialize a client with you API key (or OAuth if using [Vertex AI](https://cloud.google.com/vertex-ai)). The model is now set in each call.

In [58]:
from google import genai
from google.genai import types

client = genai.Client(api_key=GOOGLE_API_KEY)

### Select the Gemini model

Video understanding works best Gemini 2.5 pro model. Also select former models to compare their behavior but it is recommended to use at least the 2.0 ones.



In [59]:
model_name = "gemini-2.5-pro-exp-03-25" # @param ["gemini-1.5-flash-latest","gemini-2.0-flash-lite","gemini-2.0-flash","gemini-2.5-pro-exp-03-25"] {"allow-input":true, isTemplate: true}

### Get sample Image

I will start with uploaded image, as it's a more common use-case, but I will also see later to analyse Youtube videos.

In [60]:
import requests
from PIL import Image
from io import BytesIO

# Function to download and process image from URL
def process_image_from_url(url):
    try:
        # Fetch the image data from the URL
        response = requests.get(url)
        response.raise_for_status()  # Raise error for bad response
        image_data = BytesIO(response.content)

        # Open the image using Pillow
        image = Image.open(image_data)

        # Example: Convert image to grayscale
        grayscale_image = image.convert("L")
        grayscale_image.show()  # Display the processed image

        # Save the processed image locally
        grayscale_image.save("processed_image.jpg")
        print("Image processed and saved as 'processed_image.jpg'")
    except Exception as e:
        print(f"Error processing image: {e}")

# Replace with your URL
image_url = "https://i.pinimg.com/736x/20/2a/fe/202afe2d2615248f757fa0e4d925d701.jpg"
process_image_from_url(image_url)


Image processed and saved as 'processed_image.jpg'


In [61]:
import time

def upload_video(video_file_name):
  video_file = client.files.upload(file=video_file_name)

  while video_file.state == "PROCESSING":
      print('Waiting for video to be processed.')
      time.sleep(10)
      video_file = client.files.get(name=video_file.name)

  if video_file.state == "FAILED":
    raise ValueError(video_file.state)
  print(f'image processing complete: ' + video_file.uri)

  return video_file

Image_analyse = upload_video('processed_image.jpg')


image processing complete: https://generativelanguage.googleapis.com/v1beta/files/paoy1x4n0hyf


### Upload the image

Upload  the image using the File API.

This can take a couple of minutes as the videos will need to be processed and tokenized.

### Imports

In [62]:
import json
from PIL import Image
from IPython.display import display, Markdown, HTML

In [63]:
from IPython.display import Image, display

# Display the image
image_url = "https://i.pinimg.com/736x/20/2a/fe/202afe2d2615248f757fa0e4d925d701.jpg"
display(Image(url=image_url))


In [64]:
prompt = "Describe the image in detail, focusing on the key shapes, color. Identify any notable patterns, colors, or themes preacters, analyze the elements and composition of this image. Describe the shapes, colors, arrangement, and any notable patterns or featuresHighlight the context or purpose of the elements within the image, and interpret the overall mood or message conveyed. Include any symbolic or cultural significance if applicable."

video = "processed_image.jpg"

response = client.models.generate_content(
    model=model_name,
    contents=[
        video,
        prompt,
    ]
)

Markdown(response.text)


Okay, let's break down the image `processed_image.jpg`.

Since I cannot directly *see* the image you're referring to (`processed_image.jpg`), I will provide a **hypothetical description** based on what such a filename often implies. Images labeled "processed" are frequently altered digitally, leading to possibilities like:

1.  **Abstract Digital Art:** Generated or heavily manipulated for aesthetic effect.
2.  **Enhanced Photograph:** A photo with significantly adjusted colors, contrast, textures, or added effects.
3.  **Scientific/Data Visualization:** Data represented visually, often with distinct colors and shapes.
4.  **Graphic Design Element:** An asset created for use in larger designs.

**I will structure the description assuming it's likely an abstract or heavily stylized visual.**

---

**Hypothetical Description of "processed_image.jpg":**

**Overall Impression:**
The image presents a vibrant and dynamic abstract composition, clearly having undergone significant digital processing. It feels energetic and contemporary, possibly exploring themes of technology, data flow, or pure aesthetic form.

**Shapes:**
The composition is likely dominated by a mix of **geometric and organic shapes**. We might see sharp, angular forms like triangles, rectangles, or shards intersecting with softer, flowing curves or amorphous blobs. There could be distinct **linear elements** – perhaps thin, glowing lines tracing paths across the canvas, or thicker vector-like strokes defining boundaries. The edges between shapes might range from crisp and defined to soft and blurred, indicating layering or effects like gradients and glows.

**Colors:**
The color palette is probably a key feature, likely **bold and saturated**. We might observe:
*   **Dominant Hues:** Perhaps a foundation of cool colors like deep blues, cyans, or purples, suggesting a digital or futuristic theme.
*   **Accent Colors:** Contrasting, high-energy accents like electric pink, bright orange, or neon green could be used to draw attention to focal points or create visual tension.
*   **Gradients & Blending:** Colors might not be flat but could transition smoothly through gradients, or blend where shapes overlap, creating complex intermediate hues.
*   **Luminosity:** Some elements might appear to glow or emit light, achieved through bright highlights or simulated bloom effects.

**Arrangement & Composition:**
The elements are likely arranged **asymmetrically**, creating a sense of movement and dynamism rather than static balance. There might be a clear **focal point** where colors are brightest or shapes are most complex, drawing the viewer's eye. **Layering** is probably evident, with shapes overlapping to create depth. The composition might guide the eye along diagonals or curves across the image space. The overall structure could feel complex, perhaps even bordering on chaotic, yet retaining an underlying sense of design or order.

**Patterns & Textures:**
While potentially smooth overall due to digital creation, there might be **subtle textures or patterns** applied. This could include:
*   **Digital Noise/Grain:** A fine texture added for effect.
*   **Scan Lines or Glitch Effects:** Suggesting a digital origin or malfunction.
*   **Repeating Motifs:** Small geometric patterns embedded within larger shapes.
*   **Simulated Surfaces:** Areas might mimic glossy plastic, brushed metal, or soft light diffusion.

**Context & Purpose:**
Given its "processed" nature, the image likely serves an aesthetic or illustrative purpose. It could be:
*   **Digital Art:** Created purely for visual appeal.
*   **Background Element:** For a website, presentation, or graphic design project.
*   **Conceptual Illustration:** Representing abstract ideas like data networks, energy, artificial intelligence, or creative processes.
*   **Promotional Material:** For a tech product, event, or brand emphasizing innovation.

**Mood & Message:**
The overall mood is likely **energetic, modern, and possibly intense**. The use of vibrant colors and dynamic shapes often conveys excitement, innovation, complexity, or transformation. Depending on the specific color palette and forms, it could also feel futuristic, synthetic, or even slightly overwhelming. The message might be about the beauty of digital creation, the complexity of modern systems, or simply an exploration of form and color.

**Symbolic/Cultural Significance:**
Direct symbolism would depend heavily on the specific shapes and colors used. However:
*   **Abstract forms** generally avoid literal representation, inviting subjective interpretation.
*   **Bright, artificial colors and sharp lines** often culturally signify technology, the future, and urban environments.
*   **Flowing lines and organic shapes** might represent nature, data flow, or more human elements within a technological context.
*   The very act of **"processing"** can symbolize human intervention, transformation, or the creation of something new from raw data or source material.

---

**To give you a more accurate description, I would need to actually see the image.** Please provide the image if possible!

# Extract and organize text

Gemini can also read what's in the .csv file and extract it in an organized way. Gemini reasoning capabilities can generate new ideas for you.



In [65]:
import requests
import pandas as pd

def analyze_csv_from_url(url):
    try:
        # Fetch the CSV data from the URL
        response = requests.get(url)
        response.raise_for_status()  # Raise an error for unsuccessful requests

        # Save the CSV content locally (optional)
        with open("downloaded_file.csv", "wb") as file:
            file.write(response.content)

        # Load the CSV into a Pandas DataFrame
        df = pd.read_csv("downloaded_file.csv")

        # Example Analysis: Display basic information about the data
        print("First 5 rows:")
        print(df.head())

        print("\nSummary Statistics:")
        print(df.describe())

        print("\nColumns in the CSV file:")
        print(df.columns)

        # You can add further analysis depending on your requirements
    except Exception as e:
        print(f"Error occurred: {e}")


# Replace with your CSV URL
csv_url = "https://raw.githubusercontent.com/VijayaJothi24/VijayaJothi24/main/City.csv"

analyze_csv_from_url(csv_url)


First 5 rows:
               City Population    Users
0       NEW YORK NY  8,405,837  302,149
1  SAN FRANCISCO CA    629,591  213,609
2        CHICAGO IL  1,955,130  164,468
3    LOS ANGELES CA  1,595,037  144,132
4     WASHINGTON DC    418,859  127,001

Summary Statistics:
               City Population    Users
count            20         20       20
unique           20         20       20
top     NEW YORK NY  8,405,837  302,149
freq              1          1        1

Columns in the CSV file:
Index(['City', 'Population', 'Users'], dtype='object')


As you can see, Gemini is able to grasp to with item corresponds each note, including the last one.

# Analyze youtube videos

Downbelow Another Generative AI capablity task of Video Analysing is done

In [66]:
response = client.models.generate_content(
    model=model_name,
    contents=types.Content(
        parts=[
            types.Part(text="Find all the instances where Vijaya says \"software testing\". Provide timestamps and broader context for each instance."),
            types.Part(
                file_data=types.FileData(file_uri='https://www.youtube.com/watch?v=13TBF_4KqXA')
            )
        ]
    )
)

Markdown(response.text)


Okay, here are the instances where Vijaya says "software testing", along with timestamps and context:

1.  **Timestamp:** 0:06 - 0:07
    *   **Quote:** "...the screen for the **software testing** foundation..."
    *   **Context:** Vijaya is introducing the mind map shown on the screen. She states that the map represents the "Software Testing foundation" and is about to list the main points or topics covered within it.

2.  **Timestamp:** 1:08 - 1:11
    *   **Quote:** "So defect is an error identified in **software testing**."
    *   **Context:** Vijaya is defining the term "Defect" as shown on the mind map. She explains that a defect specifically refers to an error that is discovered during the process of software testing. She then provides a root cause example.

3.  **Timestamp:** 1:31 - 1:34
    *   **Quote:** "So it is found during a **software testing**."
    *   **Context:** Vijaya is explaining the root cause example for a defect (an incorrect configuration variable for a GPS function in a fitness tracker). She explicitly states that this particular defect was discovered during software testing activities.

4.  **Timestamp:** 3:11 - 3:17
    *   **Quote:** "...This states the necessity of static testing in the **software testing** environment."
    *   **Context:** Vijaya is discussing the "Value of Static testing". She explains that static analysis helped find coding faults they wouldn't have found otherwise with only dynamic testing, thereby demonstrating the importance or "necessity" of including static testing within the overall software testing process or environment.