<a href="https://colab.research.google.com/github/Saim-Hassan786/Gemini-2.0-Video-Audio-Analysis-And-Text-To-Speech-Implementation/blob/main/Gemini_2_0_Video_Audio_Analysis_And_Text_To_Speech_Implementation.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Using GEMINI-2.0_Flash_Exp
**First of all installing and using the `google-genai` library in a Google Colab environment to use the new API from the Gemini's 2.0 Flash Experimental. The `!pip install --upgrade -q google-genai` command installs or updates the `google-genai` package quietly (without showing extra details). Then, the code imports the `userdata` module from the `google.colab` library, which allows access to user-specific information in the Colab environment. After that, it sets the environment variable `GOOGLE_API_KEY` by fetching its value from `userdata.get('GOOGLE_API_KEY')` which API key we have already created from Google AI Studio. This is likely intended to securely access Google's services using an API key, which is stored in the user's Colab session for authentication purposes. By setting the environment variable, the code makes the API key available for any API calls that require it.**

In [None]:
!pip install --upgrade -q google-genai
from google.colab import userdata
import os
os.environ["GOOGLE_API_KEY"] = userdata.get('GOOGLE_API_KEY')

**The following code is used to interact with Google's GenAI services. First, it imports the `genai` library from Google, which provides tools for working with AI models. Then, it creates a `Client` object from the `genai` module, which establishes a connection to Google's GenAI services, allowing the user to make requests to the AI models. The line `MODEL : str = "gemini-2.0-flash-exp"` defines a variable named `MODEL` and assigns it the value `"gemini-2.0-flash-exp"`. This string represents the specific AI model version that the we intends to work with, in this case, "gemini-2.0-flash-exp". By setting this variable, the code specifies which model to use for tasks like generating text or other AI-related operations.**

In [None]:
from google import genai
client = genai.Client()
MODEL : str = "gemini-2.0-flash-exp"

**Then, we create a Client object by calling genai.Client(), which sets up a connection to the GenAI API. The variable MODEL : str = "gemini-2.0-flash-exp" defines a string called MODEL and assigns it the name of a specific AI model, "gemini-2.0-flash-exp". This tells the code which AI model to use when making requests, allowing the user to leverage the features of the selected model, such as generating text or performing other tasks. The code is setting up the environment for further AI operations with this model.**

**This code uses the GenAI client to generate content using the specified model. The line `response = client.models.generate_content()` sends a request to the GenAI API to create content based on the provided input. The `model = MODEL` argument tells the API to use the "gemini-2.0-flash-exp" model defined earlier, and the `contents = "Tell me 10 amazing facts about space"` argument specifies the input prompt, asking the AI to generate facts about space. After receiving the response, the code uses `from IPython.display import display, Markdown` to import the necessary tools to display the content. Finally, `display(Markdown(response.text))` displays the AI-generated response in a Markdown format, which makes it visually formatted and easy to read in the notebook.**

In [None]:
response = client.models.generate_content(
    model = MODEL,
    contents = "Tell me 10 amazing facts about space"
)
from IPython.display import display , Markdown
display(Markdown(response.text))

Alright, buckle up for 10 amazing facts about space that'll make your head spin:

1.  **Neutron Stars Are Incredibly Dense:** If you were to take a teaspoonful of material from a neutron star (the collapsed core of a massive star), it would weigh about 6 billion tons on Earth. That's like squeezing the mass of Mount Everest into a teaspoon!

2. **There's a Planet Made of Diamond:** The planet 55 Cancri e, twice the size of Earth, is believed to be composed primarily of crystallized carbon, meaning it's essentially a giant diamond! It orbits a star similar to our sun, though it's much hotter.

3. **The Universe is Expanding, and Accelerating:** Not only is the universe getting bigger, but the rate at which it expands is also speeding up. Scientists are still trying to figure out what's causing this, which is attributed to something called "dark energy."

4.  **A Day on Venus is Longer Than a Year:** Venus rotates so slowly that it takes about 243 Earth days to complete one rotation on its axis. But its orbital period around the sun is only about 225 Earth days. So, a day on Venus is actually longer than its year!

5. **There Are Trillions of Stars:** While it's impossible to know the exact number, it's estimated that there are hundreds of billions of galaxies in the observable universe, and each galaxy contains hundreds of billions of stars. This means we're talking about trillions, perhaps even quadrillions, of stars.

6.  **Space is Not Completely Silent:** While there's no air to carry sound waves in the traditional sense, space isn't silent. Radio waves, electromagnetic vibrations, and charged particles create "sounds" that can be detected and converted into audio. This "space sound" is often eerie and otherworldly.

7.  **The Largest Volcano in the Solar System is on Mars:** Olympus Mons, a shield volcano on Mars, is a colossal feature. It's about 600 km (370 miles) in diameter and 25 km (16 miles) high. That's roughly the size of the state of Arizona and three times the height of Mount Everest!

8. **Black Holes Don't "Suck":** They don't wander around hoovering everything in their path. Black holes have very strong gravity, and anything that passes their event horizon (the point of no return) can't escape. But if the sun were replaced with a black hole of the same mass, Earth would still orbit it – just like it orbits the sun now.

9. **There's an Enormous Cloud of Alcohol in Space:** The Sagittarius B2 cloud, located near the center of our galaxy, contains billions of liters of ethyl alcohol, the same type of alcohol found in alcoholic beverages. While that may sound like a party, it's also filled with other complex organic molecules that are essential to forming life.

10. **We See the Past When We Look at Stars:** Because light travels at a finite speed, when we look at distant stars, we're actually seeing them as they were in the past. The light from some stars has traveled for millions or billions of years to reach us, so we're essentially looking back in time.

I hope you found those facts as amazing as I do! Space is a truly mind-blowing place.


In [None]:
import gdown
url = 'https://drive.google.com/uc?export=download&id=1zQzSfh5f9yiW3VyX8QBG_odNnNmgwmyA'
output_path = '/content/Introduction.mp4'
gdown.download(url, output_path, quiet=True)
print(f"Video downloaded successfully: {output_path}")

Video downloaded successfully: /content/Introduction.mp4


In [None]:
import time
Intro_video = "/content/Introduction.mp4"

def upload_video(video_file_name):
  video_file = client.files.upload(path = video_file_name)
  while video_file.state=="PROCESSING":
    print("Video Is Being Processed, Kindly Wait!")
    time.sleep(10)
    video_file = client.files.get(name=video_file.name or "")
  if video_file.state =="SUCCESS":
    pass
  elif video_file.state=="FAILED":
    raise ValueError(video_file.state)
  print(f'Video processing complete: ' + (video_file.uri or ""))
  return video_file

video_intro = upload_video(Intro_video)

Video Is Being Processed, Kindly Wait!
Video processing complete: https://generativelanguage.googleapis.com/v1beta/files/e0p2zidqktc8


In [None]:
from google.genai.types import Content, Part
prompt = """Explain what is happening in this video and what is the person saying in this video and pay a close attention to his
words espiecially about what he is designation and explain the whole motive of this video"""

video = video_intro

response = client.models.generate_content(
    model=MODEL,
    contents=[
        Content(
            role="user",
            parts=[
                Part.from_uri(
                    file_uri=video.uri or "",
                    mime_type=video.mime_type or ""),
                ]),
        prompt,
    ]
)

Markdown(response.text)

Okay, here's an analysis of the video you sent:

**What's Happening in the Video:**

* **Setting:** The video is a medium shot of a man standing in front of a red and gold patterned curtain. The background is rather eye-catching, and possibly a little distracting.
* **Appearance:** The man is wearing glasses, a dark-colored jacket, and a plaid shirt. He appears to be in his late 20s or early 30s. 
* **Action:** He is looking directly at the camera and speaking. His hand placement seems a little awkward. It’s not clear whether he is addressing someone in particular.
* **Visuals:** The lighting is somewhat harsh and direct, possibly from an overhead light. It doesn’t seem like a professional studio setup.

**What the Person is Saying:**

The man says:

> "Hi there, my name is Saim Hasan Akhtar, and I am a student of PAIC agentiq.ai platform."

**Analysis of His Words and Motivation:**

* **Introduction:** His first statement is a basic introduction: "Hi there, my name is Saim Hasan Akhtar." This sets the stage for the video and makes it clear it's a personal message.
* **Designation:**  He then says, "and I am a student of PAIC agentiq.ai platform." This is a key piece of information.  He identifies himself as a student of a particular platform. 
* **Motive:** The overall motive of the video seems to be an introduction.  Given the context, it is likely this is some form of self introduction video. It could be:
    *   **A class requirement:** It's possible that this video is an assignment or a task set by the PAIC agentiq.ai platform. It might require students to introduce themselves.
    * **Part of a larger process:** This video might be part of an onboarding process for students on the platform.
    * **Public profile:** He may be creating a profile on the platform or a social media profile, and this video is his “about me.”

**Key Points**
* **No explicit objective:** The video is simply a personal introduction. It doesn’t make any further claims or call for specific action.
* **Emphasis on his learning:** His mention of being a "student" at the platform is significant, indicating he is engaged in some type of educational program. 
* **Possible professional context:** Given the mention of an AI platform, it's very likely this video is for a professional or academic purpose.

**In Summary**
The video is primarily an introduction where a man identifies himself and states that he is a student at PAIC agentiq.ai platform. The purpose is likely to fulfil a requirement or introduce himself to a community associated with the platform. There isn't any explicit objective beyond that, making it fairly simple in scope.


In [None]:
prompt1 = """Explain What is Happening in this Video with clearing focusing of what he
is saying and hus designation as well and also tell about his attire and explain the whole video
"""
response = client.models.generate_content(
    model= MODEL,
    contents=[
        Content(
            role ="user",
            parts=[
               Part.from_uri(
                   file_uri = video.uri,
                   mime_type = video.mime_type)
               ]),
        prompt1
        ]
)

In [None]:
Markdown(response.text)

Certainly! Here's a breakdown of what's happening in the video:

**Overall Scene:**

The video shows a young man standing in front of a red, patterned tapestry or curtain. He is looking directly at the camera, as if speaking directly to the viewer.

**Spoken Content:**

The man says, "Hi there, My name is Saim Hassan Akhtar, and I am a student of PIAIC Gentiki AI platform." 

* **Name:** He introduces himself as Saim Hassan Akhtar.
* **Designation:** He states that he is a student. 
* **Affiliation:** He mentions that he is affiliated with "PIAIC Gentiki AI platform". (PIAIC stands for Presidential Initiative for Artificial Intelligence and Computing). This indicates he is likely taking a course or program related to artificial intelligence.

**Attire:**

The man is wearing a casual outfit. 

* He has on a dark-colored jacket or coat.
* Underneath, he is wearing a collared shirt with a light-colored plaid pattern. 
* He has glasses with clear frames.
* He has short, dark hair and a neatly trimmed beard.

**Video Details:**

* **Framing:** The camera is positioned directly in front of the man at about chest height. It's a medium shot, showing the upper part of his body and a bit of the background.
* **Lighting:** The lighting is fairly even and natural.
* **Background:** The red patterned background acts as a backdrop, contrasting with the man's attire. The background itself is a decorative hanging made of fabric.

**Interpretation:**

This video is likely a short introduction, perhaps for a course, project, or online networking. He is providing his name, designation, and association with an educational platform specializing in AI.

Let me know if you have any more questions or if there's anything else you'd like to know!

**This code is used to set up a connection with Google's Generative AI (genai) service. First, it imports the `genai` library, which provides the necessary tools to interact with Google's AI models. Then, it creates a `Client` object, which is initialized with an API key that is securely fetched from the `userdata`. The `http_options` dictionary specifies additional settings, such as using a specific version of the API (`v1alpha`).We are using this parameter to explicitley specify the gemini's 2.0 flash exp model to process our data. Finally, the model variable `MODEL` is set to "gemini-2.0-flash-exp", which specifies the AI model to use for generating responses. This setup allows the code to send requests to the Google Generative AI and get responses based on the gemini 2.0 flash exp model.**

In [None]:
from google import genai
client = genai.Client(
    api_key=userdata.get("GOOGLE_API_KEY"),
    http_options= {'api_version': 'v1alpha'}
  )
MODEL = "gemini-2.0-flash-exp"


**The following code helps manage audio recording and processes data asynchronously. First, it imports necessary modules: `contextlib` to manage context (like opening and closing files), `wave` to work with `.wav` audio files, and `IPython.display` for audio playback.**

**The `async_enumerate` function is an asynchronous function. It takes an iterable (`it`) and asynchronously loops through it, yielding each item along with its index (`n`). This allows for efficient, non-blocking iteration, which is useful when dealing with tasks that take time, like processing audio data.**

**Next, the `audio_recording` function is a context manager, defined using `@contextlib.contextmanager`. A context manager ensures that resources, like files, are properly opened and closed. In this case, it opens a `.wav` audio file with specific properties, such as the sample width (size of each audio sample), number of channels (1 for mono, 2 for stereo), and the sample rate (the number of samples per second, 24000 in this case). Inside the context manager, the `wave.open` function creates or overwrites the file, and the `yield wf` statement temporarily returns the `wave` file object, allowing for audio data to be written into it. Once the block of code that uses the `audio_recording` function finishes, the file is automatically closed.**

**Overall, the code sets up a framework for recording and processing audio in a `.wav` format asynchronously, making it suitable for audio data to be handled efficiently.**

In [None]:
import contextlib
import wave
from IPython.display import display, Audio

async def async_enumerate(it):
  n = 0
  async for item in it:
    yield n, item
    n +=1

@contextlib.contextmanager
def audio_recording(filename,channels=1, rate = 24000, sample_width = 2):
  with wave.open(filename, "wb") as wf:
     wf.setsampwidth(sample_width)
     wf.setnchannels(channels)
     wf.setframerate(rate)
     yield wf

**Now, The key `"generation_config"` holds another dictionary that contains configuration details. Inside this, `"response_modalities"` is a list that defines the types of responses the system should generate. In this case, it specifies that the response modality should be `"AUDIO"`, meaning the response will be in audio format. This configuration is likely used when interacting with an API, for generating audio responses.**

In [None]:
config = {"generation_config":{"response_modalities":["AUDIO"]}}

**This code sets up an asynchronous process to send a message, receive an audio response, and save the response to a `.wav` file. It starts by connecting to an AI model using `client.aio.live.connect`, which allows the program to interact with the model in real-time. The `config` variable specifies that the response should be in audio format, and `MODEL` refers to the specific AI model being used.**

**Once the connection is made, it defines a file called `greeting.wav` to store the audio response. The `audio_recording(filename)` context manager is used to handle the audio file, which ensures the file is opened for writing and will automatically close when done. Inside the context manager, the message `"Hi there myself Saim Hassan Akhtar, tell me something about Agentic AI"` is sent to the AI model using `session.send(msg, end_of_turn=True)`. This method sends the message and signals the end of the user's input.**

**Next, the program waits for the AI's response using `session.receive()`, which retrieves the response from the model. The response is processed asynchronously using the `async_enumerate` function, which allows for efficient, non-blocking iteration over the response data. As each chunk of response data is received, it is written into the `greeting.wav` file using `wf.writeframes(response_chunks.data)`.**

**If there is data in the response (`if response_chunks.data is not None`), it gets written into the file. The `n == 0` check ensures that a message is printed when the first chunk of audio data is received, indicating that the audio response is being generated. After all the data is written, the audio file is ready, and it is played using `display(Audio(filename, autoplay=True))`, which automatically plays the generated audio response.**

In [None]:
async with client.aio.live.connect(model=MODEL,config=config) as session:
  filename = "greeting.wav"
  with audio_recording(filename)as wf:
    msg = "Hi there myself Saim Hassan Akhtar, tell me about Agentic Artificial Intelligence "
    print(msg)
    await session.send(msg,end_of_turn= True)
    response = session.receive()

    async for n,response_chunks in async_enumerate(response):
      if response_chunks.data is not None:
        wf.writeframes(response_chunks.data)
        if n==0:
          print(".......Your Audio Response Is Generating......")

display(Audio(filename,autoplay=True))

Hi there myself Saim Hassan Akhtar, tell me about Agentic Artificial Intelligence 
.......Your Audio Response Is Generating......


# Customized Audio Response

**Now , I customized the above above to give the response to the user by taking prompt from it , this code defines a function `generate_audio_response` that sends a text message to an AI model, receives an audio response, and saves the audio to a `.wav` file. The function takes four parameters: `message` (the message to send), `model` (the AI model to use), `config` (configuration for the model), and `file_name` (the name of the audio file to save the response).**

**The function starts by establishing an asynchronous connection to the AI model using `client.aio.live.connect()`. Inside the connection, the program defines the audio file `greeting.wav` where the response will be saved. It uses the `audio_recording(filename)` context manager to open the file for writing the audio data.**

**The `msg` variable contains the text message to send to the AI model. The message is sent using `await session.send(msg, end_of_turn=True)`, which also signals the end of the user's message. The program then waits for the response with `session.receive()`, which retrieves the AI’s reply.**

**The response is processed asynchronously using `async_enumerate()`, which helps handle large or streaming responses efficiently. As chunks of response data are received, they are written to the `greeting.wav` file using `wf.writeframes(response_chunks.data)`. If the first chunk is received, a message ".......Your Audio Response Is Generating......" is printed to indicate progress.**

**Once all data is written to the audio file, the function plays the audio file automatically using `display(Audio(filename, autoplay=True))`. This ensures that the user can hear the AI's response immediately after it's generated. The whole process is asynchronous, meaning it doesn’t block other tasks while waiting for the response.**

In [None]:
async def generate_audio_response(message: str, model=MODEL, config=config, file_name='audio.wav'):


    async with client.aio.live.connect(model=MODEL,config=config) as session:
      filename = "greeting.wav"
      with audio_recording(filename)as wf:
        msg = messages
        print(msg)
        await session.send(msg,end_of_turn= True)
        response = session.receive()

        async for n,response_chunks in async_enumerate(response):
           if response_chunks.data is not None:
            wf.writeframes(response_chunks.data)
           if n==0:
             print(".......Your Audio Response Is Generating......")

    display(Audio(filename,autoplay=True))

**The code below allows you to send a message to the AI model (Gemini) and get an audio response in return. It starts by importing the necessary libraries, `asyncio` and `nest_asyncio`. `asyncio` is used to run asynchronous tasks, and `nest_asyncio` is needed to allow running `asyncio` inside a Jupyter notebook, which normally doesn't support asynchronous code well.**

**The `messages` variable collects a query from the user through `input()`, asking what they want to ask the AI. Then, `nest_asyncio.apply()` makes sure that the `asyncio` event loop can run properly in the notebook. After that, `asyncio.run(generate_audio_response(messages))` is called, which runs the `generate_audio_response` function asynchronously, sending the user’s message to the AI and getting the audio response.**

**This setup ensures the program waits for the AI’s reply without blocking other tasks, and once the response is ready, it plays the audio. The user can type their question and immediately hear the AI's spoken reply.**

# Feel Free To Test
# Thanks !

In [None]:
import asyncio
import nest_asyncio

messages = input("Enter your query here that you want to ask from Gemini  ")
nest_asyncio.apply()
asyncio.run(generate_audio_response(messages))
