<a href="https://colab.research.google.com/github/MrSohailAhmad/PIAIC/blob/piaic-assignments/assignment_05.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [1]:
# Installation
# !pip install --upgrade --quiet google-genai
!pip install --upgrade google-genai



In [4]:
from google.colab import userdata
GOOGLE_API_KEY = userdata.get('GOOGLE_API_KEY')

if GOOGLE_API_KEY:
    print("Key found")
else:
    print("Key not found")

# GOOGLE_API_KEY

Key found


In [9]:
from google import genai
from google.genai import Client
client = genai.Client(api_key=GOOGLE_API_KEY)
model="gemini-2.0-flash-exp"

In [39]:
from google.genai.types import GenerateContentResponse
from IPython.display import display, Markdown, Video

In [11]:
response:GenerateContentResponse = client.models.generate_content(
 model=model,
 contents='How does AI work?'

)
# response
display(Markdown(response.text))

That's a great question! It's a complex topic, but let's break down how AI works in a way that's easier to understand. Instead of focusing on the technical jargon, we'll talk about the core concepts.

Think of AI as trying to get computers to learn and solve problems like humans do (or even better in some cases!). Here's a simplified overview of the main components:

**1. Data: The Foundation of AI**

* **Data is the "fuel" for AI.** Just like humans learn from experiences, AI systems learn from data.
* **Different Types of Data:** This can include text, images, videos, audio, numbers, sensor readings, and much more.
* **The More Data, the Better (Usually):** Generally, the more relevant and high-quality data an AI system has, the better it can learn and perform.

**2. Algorithms: The Learning Rules**

* **Algorithms are sets of instructions.** They're like recipes that tell the computer how to process data and learn from it.
* **Machine Learning:** A crucial type of algorithm that allows AI systems to learn from data *without explicit programming*. This is often where the "intelligence" comes from.
* **Different Types of Machine Learning:**
    * **Supervised Learning:** The AI is given labeled data (e.g., pictures of cats with the label "cat"). It learns to predict the label for new, unseen data.
    * **Unsupervised Learning:** The AI is given unlabeled data and tries to find patterns and structures within it (e.g., grouping customers based on their purchasing behavior).
    * **Reinforcement Learning:** The AI learns by trial and error, receiving rewards or penalties based on its actions (e.g., a game-playing AI learning to win).
* **Deep Learning:** A powerful type of machine learning that uses artificial neural networks (inspired by the structure of the human brain) to learn very complex patterns from massive amounts of data. This is behind many of the recent advancements in AI.

**3. Models: The Learned Representations**

* **A model is the result of the learning process.** It's like a mathematical representation of the patterns and relationships that the AI has discovered in the data.
* **Example:** An image recognition model might store patterns of edges, shapes, and colors that it has learned are associated with a particular object (e.g., a cat).
* **Using the Model:** Once a model is trained, it can be used to make predictions or decisions on new, unseen data.

**4. Processing Power: The Engine**

* **AI requires a lot of computational power.** Processing large amounts of data and training complex models is very resource-intensive.
* **GPUs (Graphics Processing Units) are often used** because they can perform many calculations in parallel, speeding up the training process.
* **Cloud computing platforms** provide access to the necessary resources for AI development and deployment.

**Analogy: Training a Dog**

Let's use an analogy of training a dog:

* **Data:** You show your dog many pictures of balls and tell them "ball."
* **Algorithm:** The training method you use is like the machine learning algorithm.
* **Model:** The dog's understanding of what a ball is (based on its training) becomes its internal model.
* **Prediction:** The dog sees a new ball and brings it to you.

**In Summary:**

1. **Gather Data:** Collect relevant data that is used to train the AI model.
2. **Choose an Algorithm:** Decide which learning method is best for your needs.
3. **Train the Model:** The algorithm learns patterns from the data and creates a model.
4. **Test and Refine:** Evaluate the model's performance and make adjustments to improve it.
5. **Deploy the Model:** Use the trained model to solve problems and make predictions on new data.

**Key Things to Remember:**

* **AI is not magic.** It's based on mathematical algorithms and statistical methods.
* **AI is constantly evolving.** Research is constantly pushing the boundaries of what's possible.
* **AI is not a single thing.** There are many different types of AI, each with its own strengths and weaknesses.
* **AI is a tool.** It can be used for good or bad, depending on how it's implemented.

**Is it perfect?** No, not yet! AI can be prone to biases present in the data it learns from, and it might make mistakes. It's important to understand both the power and limitations of AI.

This is a simplified overview, but hopefully, it gives you a good understanding of the fundamental concepts behind how AI works. Do you have any more specific questions about AI that I can help you with? Perhaps you'd like to learn more about a particular type of AI or its application?


In [55]:
# !wget my_video.mp4 -O my_video.mp4 -q
#  upload video from google colab
from google.colab import files
uploaded = files.upload()


Saving VID_20241224_010932.mp4 to VID_20241224_010932.mp4


In [64]:
# Upload all the videos using the File API.
# You can find more details about how to use it in the Get Started notebook.
# This can take a couple of minutes as the videos will need to be processed and tokenized.

import time

def upload_video(video_file_name):
  video_file = client.files.upload(path=video_file_name)
  while video_file.state == "PROCESSING":
      print('Waiting for video to be processed.')
      time.sleep(10)
      video_file = client.files.get(name=video_file.name or "")

  if video_file.state == "FAILED":
    raise ValueError(video_file.state)
  print(f'Video processing complete: ' + (video_file.uri or ""))

  return video_file

my_video = upload_video("VID_20241224_010932.mp4")

Waiting for video to be processed.
Video processing complete: https://generativelanguage.googleapis.com/v1beta/files/xrekv0dlnpo


In [65]:
from google.genai.types import Content, Part
prompt = """For each scene in this video,
            generate captions that describe the scene along with any spoken text placed in quotation marks.
            Place each caption into an object with the timecode of the caption in the video.
         """

video = my_video

response = client.models.generate_content(
    model=model,
    contents=[
        Content(
            role="user",
            parts=[
                Part.from_uri(
                    file_uri=video.uri or "",
                    mime_type=video.mime_type or ""),
                ]),
        prompt,
    ]
)

Markdown(response.text)

```json
[
  {
    "timecode": "0:00",
    "caption": "A person wearing a black hoodie with a red logo and drawstring is centered in the frame. They have dark hair and a beard, and appear to be talking.  The background consists of what looks like a tiled wall. The text spoken is, “Hi dad I am Mohammad Suhail”"
  },
  {
     "timecode": "0:04",
     "caption": "The same person from the previous scene is still in the frame, looking directly at the camera.  They continue speaking.  The spoken text is, “and I am a full-stack developer and now I'm learning a gentic chain generative AI.”"
  },
  {
    "timecode": "0:11",
    "caption": "The same individual remains centered and looks at the camera.  They are still speaking. The text spoken is, “Suggest me what content I should learn in agentic AI.”"
   },
   {
    "timecode": "0:17",
    "caption": "The speaker continues to face the camera. The text spoken is, “or generative AI”"
  },
  {
    "timecode": "0:20",
    "caption": "The person is still in the frame and speaking. Their hands come up to the red drawstrings of their hoodie. The spoken text is, “Suggest me some”"
    },
   {
     "timecode": "0:24",
      "caption":"The speaker continues to hold the red drawstrings and still speaks. The spoken text is, “Suggest me some content that I learn agentic AI or”"
    },
  {
   "timecode":"0:28",
    "caption":"The person is still looking at the camera and still speaking. The text spoken is, “generative AI”"
   }
]
```

In [67]:


# Analyze the video (visual and audio components)
def analyze_video(video_file):
    """
    Analyzes both the visual and audio components of the uploaded video.
    """
    prompt = """
    Analyze the uploaded video and provide:
    1. A summary of the visual elements (e.g., scenes, actions, objects).
    2. A summary of the audio content, including spoken words, tone, and any significant background sounds.
    """
    try:
        response = client.models.generate_content(
            model=model,
            contents=[
                Content(
                    role="user",
                    parts=[
                        Part.from_uri(
                            file_uri=video_file.uri or "",
                            mime_type=video_file.mime_type or ""
                        )
                    ]
                ),
                prompt
            ]
        )
        print("Analysis Response:")
        display(Markdown(response.text))
        return response.text
    except Exception as e:
        print(f"Error analyzing video: {e}")
        return None


# Call the analysis function
analysis_result = analyze_video(my_video)

# Interact with the LLM based on the analysis
def interact_with_llm(analysis_text):
    """
    Interacts with the LLM by asking questions based on the video analysis.
    """
    question_prompt = """
    Based on the provided analysis, answer the following:
    1. What is the main message or theme of the video?
    2. How do the visual and audio components complement each other in conveying the message?
    3. Are there any notable emotional tones or themes in the spoken content?
    5. what he request and how to respond to it?
    """
    try:
        response = client.models.generate_content(
            model=model,
            contents=[
                analysis_text,
                question_prompt
            ]
        )
        print("LLM Interaction Response:")
        display(Markdown(response.text))
    except Exception as e:
        print(f"Error during LLM interaction: {e}")


# Proceed if analysis was successful
if analysis_result:
    interact_with_llm(analysis_result)


Analysis Response:


Okay, here's an analysis of the video you've provided:

**1. Summary of Visual Elements:**

*   **Scene:** The video is a close-up shot of a man's face and upper torso. He appears to be indoors, against a plain white or off-white tiled wall, with a wooden slat or strip visible at the top of the frame. There might be a blue towel in the background.
*   **Action:** The man is looking directly at the camera and speaking. He adjusts the drawstrings on his hoodie at the end of his request.
*   **Objects:** The man is wearing a black hoodie with the word "NIKE" prominently displayed in red. He also has white wireless earbuds in his ears. The most prominent object is the red drawstring of his hoodie, as he fiddles with it.
*   **Camera Angle:** The camera angle is direct, straight at the man, and fairly close, showing his face and upper chest. It is slightly askew, suggesting the video was taken while the camera was tilted. The man is slightly rotated, as well.
*   **Lighting:** The lighting is bright, primarily coming from the front, causing the man's face to be well-lit with some natural shadows and also revealing any marks on the man's face. 
*   **Composition:** The man is the central focus of the frame. The background is simple and unobtrusive. 

**2. Summary of Audio Content:**

*   **Spoken Words:** The man begins by saying, "Hi Dad, I am Mohammad Suhail." He continues:
    * "And I am a full stack developer and now I am learning agentic AI, generative AI."
    *  "Suggest me what content I should learn in Agentic AI or Generative AI."
    * "Suggest me some content that I learn in Agentic AI or Generative AI."
*   **Tone:** The man's tone is conversational, casual, and slightly inquisitive. He sounds like he is genuinely seeking advice and is interested in the content suggestions. His speech has a slight accent (likely from South Asia or the Middle East).
*   **Background Sounds:** There is no significant background sound. The audio is clear and focused on the man's voice.

**In Summary**

The video is a short, personal clip of a man named Mohammad Suhail, who identifies as a full-stack developer, seeking suggestions on what to learn in Agentic or Generative AI. The visual focus is directly on him, and the audio captures his request clearly. He has a casual and friendly tone throughout his request. The background is simple, which allows the viewer to focus on him and his words.

LLM Interaction Response:


Okay, here are the answers to your questions based on the analysis:

**1. What is the main message or theme of the video?**

The main message of the video is a request for guidance and learning resources related to Agentic and Generative AI. Mohammad Suhail, the speaker, is a full-stack developer who is currently exploring these areas and is directly seeking suggestions from his father (implied by his opening line, "Hi Dad"). The core theme is one of **learning and seeking expert advice** in a specific technical domain.

**2. How do the visual and audio components complement each other in conveying the message?**

The visual and audio components work well together to convey the message:

*   **Visuals:** The close-up shot of Mohammad Suhail, looking directly at the camera, creates a sense of intimacy and directness. It makes the viewer feel like they are being personally addressed. His casual attire (hoodie, earbuds) signals a relaxed, informal tone. The simple background ensures that there is no distraction from his request. His manipulation of the hoodie strings at the end might imply a slight bit of nervousness, perhaps, when asking for advice.
*   **Audio:** The clear audio focus on his voice ensures that the message is easily understandable. His conversational and inquisitive tone emphasizes his earnest desire to learn. The specific mention of "Agentic AI" and "Generative AI" immediately establish the context of the request. The slightly accented English, while being a characteristic, is completely clear and does not hinder comprehension.

In essence, the visuals put a face to the request, while the audio clearly articulates the specific need. The combination makes the request personal, focused, and easily understandable.

**3. Are there any notable emotional tones or themes in the spoken content?**

Yes, there are some notable emotional tones and themes in his spoken content:

*   **Inquisitiveness:** The primary tone is one of inquisitiveness. He is seeking suggestions and clearly interested in learning more about the AI topics.
*   **Respect/Affection (implied):** By addressing his father ("Hi Dad"), there is an underlying tone of respect, and potentially even affection. There is an assumed relationship where he values his father's opinion and guidance.
*   **Humility:** While he is a full-stack developer, he acknowledges that he is still learning in these new areas, suggesting a humility in his approach to knowledge acquisition.
*   **Earnestness:** His tone is genuine, showing he really does want to get the resources needed to begin his studies.

**4. What he requests and how to respond to it?**

*   **Request:** Mohammad Suhail requests suggestions for content to learn in Agentic AI or Generative AI. He specifically asks what content he should learn in these areas. This is essentially a request for recommendations on learning resources, such as courses, books, tutorials, or specific topics to focus on.

*   **How to Respond:** Here's how to respond to his request, keeping his perspective and the context in mind:

    1.  **Acknowledge his effort:** Start by acknowledging his initiative. For example, "Hi Mohammad, it's great to see you're exploring Agentic and Generative AI!"
    2.  **Provide structured recommendations:**  Rather than listing everything at once, it is a good idea to categorize recommendations. You could offer:
        *   **Foundational Concepts:** Recommend resources on the basic principles of AI, machine learning, and deep learning if he needs a review. This might include courses from platforms like Coursera, edX, or Khan Academy. Specifically, mention some courses on Neural Networks or Deep Learning basics before diving into Agentic AI.
        *   **Agentic AI Resources:** Suggest specific materials on Agentic AI (also known as autonomous agents or intelligent agents). Examples could include research papers, blogs, or courses focusing on this. Some key aspects would be to understand the concepts of environments, states, actions, policies, rewards, and RL (Reinforcement Learning).
        *   **Generative AI Resources:** Offer recommendations on Generative AI, such as courses or resources covering GANs (Generative Adversarial Networks), VAEs (Variational Autoencoders), or diffusion models, as well as practical frameworks such as Transformers.
        *   **Specific Tools/Frameworks:** Suggest specific libraries and frameworks that are relevant to his interests, such as TensorFlow, PyTorch, or specialized libraries for RL or Generative AI, and provide links to tutorials.
        *   **Project Ideas:** Propose some project ideas. This can be crucial for him to get hands-on experience.
        *   **Staying Up-To-Date:** Mention that the AI landscape is dynamic and it is essential to follow the research and updates in the field, providing suggestions on blogs or journals to follow.
    3.  **Offer further assistance:** Encourage him to ask further questions and offer to help as he progresses. For instance, "Let me know if you have more questions along the way." or "If you get stuck on something feel free to reach out".
    4.  **Personalized touch:** You can tailor the response further if you have an understanding of his current skill levels and preferences.

By using this approach, the response will be helpful, well-structured, and encouraging. It directly addresses the user's request and fosters further engagement.
