<header>
   <p  style='font-size:36px;font-family:Arial; color:#F0F0F0; background-color: #00233c; padding-left: 20pt; padding-top: 20pt;padding-bottom: 10pt; padding-right: 20pt;'>
       Video Analyzer with Google Gemini
  <br>
       <img id="teradata-logo" src="https://storage.googleapis.com/clearscape_analytics_demo_data/DEMO_Logo/teradata.svg" alt="Teradata" style="width: 125px; height: auto; margin-top: 20pt;">
    </p>
</header>

<p style = 'font-size:20px;font-family:Arial'><b>Introduction</b></p>

<p style = 'font-size:16px;font-family:Arial'>Video understanding or video insights are crucial across various industries and applications due to their multifaceted benefits. They enhance content analysis and management by automatically generating metadata, categorizing content, and making videos more searchable. Moreover, video insights provide critical data that drive decision-making, enhance user experiences, and improve operational efficiencies across diverse sectors.</p>

<p style = 'font-size:16px;font-family:Arial'>Google’s Gemini 2.0 model brings significant advancements to this field. Beyond its impressive improvements in language processing, this model can handle an enormous input context of up to 1 million tokens. To further its capabilities, Gemini 2.0 is trained as a multimodal model, natively processing text, images, audio, and video. This powerful combination of varied input types and extensive context size opens up new possibilities for processing long videos effectively.</p>

<p style = 'font-size:16px;font-family:Arial'>In this notebook, we will dive into how Gemini 2.0 can be leveraged for generating valuable video insights, transforming the way we understand and utilize video content across different domains.</p>

<p style = 'font-size:16px;font-family:Arial'>Below are the steps needed:</p>
<li style = 'font-size:16px;font-family:Arial'>Installing dependencies</li>
<li style = 'font-size:16px;font-family:Arial'>Setting up the Gemini API key</li>
<li style = 'font-size:16px;font-family:Arial'>Importing the libraries</li>
<li style = 'font-size:16px;font-family:Arial'>Saving uploaded files</li>
<li style = 'font-size:16px;font-family:Arial'>Upload videos to the Files API</li>
<li style = 'font-size:16px;font-family:Arial'>Declare functions to be used</li>
<li style = 'font-size:16px;font-family:Arial'>Define tools using the functions created</li>
<li style = 'font-size:16px;font-family:Arial'>Execute the functions to get the video analysis</li>
<li style = 'font-size:16px;font-family:Arial'>Clean up</li>

</p>

<hr style='height:2px;border:none;background-color:#00233C;'>
<b style = 'font-size:20px;font-family:Arial'>1. Configuring the environment</b>

<p style = 'font-size:18px;font-family:Arial'><b>1.1 Install the required libraries</b></p>

In [None]:
%%capture
!pip install -r requirements.txt --quiet

In [None]:
!pip install -q google-genai

<div class="alert alert-block alert-info">
<p style = 'font-size:16px;font-family:Arial'><b>Note: </b><i>Please restart the kernel after executing these two lines. The simplest way to restart the Kernel is by typing zero zero: <b> 0 0</b></i></p>

<hr style='height:1px;border:none;background-color:#00233C;'>

<p style = 'font-size:18px;font-family:Arial'><b>1.2 Import the required libraries</b></p>

<p style = 'font-size:16px;font-family:Arial'>Here, we import the required libraries, set environment variables and environment paths (if required).</p>

In [None]:
# import google.generativeai as genai_key
from IPython.display import Markdown, Audio
from teradataml import *
import getpass
from typing import List
from google import genai, generativeai
from google.genai import types
display.max_rows = 5

<hr style="height:2px;border:none;background-color:#00233C;">
<b style = 'font-size:20px;font-family:Arial'>2. Connect to Vantage</b>
<p style = 'font-size:18px;font-family:Arial'><b>2.1 Connect to Vantage</b></p>
<p style = 'font-size:16px;font-family:Arial'>We will be prompted to provide the password. We will enter the password, press the Enter key, and then use the down arrow to go to the next cell.</p>

In [None]:
%run -i ../../UseCases/startup.ipynb
eng = create_context(host = 'host.docker.internal', username='demo_user', password = password)
print(eng)

<p style = 'font-size:16px;font-family:Arial'>Begin running steps with Shift + Enter keys. </p>

<hr style="height:2px;border:none;background-color:#00233C;">
<b style = 'font-size:20px;font-family:Arial'>3. Setup API key for Google Gemini</b>
<p style = 'font-size:16px;font-family:Arial'>Please enter the Google API Key, if you don't have one, please get it from <a href = 'https://ai.google.dev/gemini-api/docs/api-key'>here</a></p>

In [None]:
GOOGLE_API_KEY = getpass.getpass(prompt = 'Please enter GOOGLE_API_KEY: ')
generativeai.configure(api_key = GOOGLE_API_KEY)

<p style = 'font-size:16px;font-family:Arial'>Specify the Gemini model to be used for video analysis. </p>

In [None]:
MODEL_ID = "gemini-2.0-flash-exp" # @param ["gemini-1.5-flash-8b","gemini-1.5-flash-002","gemini-1.5-pro-002","gemini-2.0-flash-exp"] {"allow-input":true}

<p style = 'font-size:16px;font-family:Arial'>Create genai client using the API key. </p>

In [None]:
client = genai.Client(api_key=GOOGLE_API_KEY)

<p style = 'font-size:16px;font-family:Arial'>The Gemini API directly accepts video file formats. The File API supports files up to 2GB in size and allows storage of up to 20GB per project. Uploaded files remain available for 2 days and cannot be downloaded from the API.</p>

In [None]:
import os

# Specify the folder path containing the files you want to upload
folder_path = "./videos"

# Get a list of all files in the folder (optional: filter by file extension if needed)
file_paths = [os.path.join(folder_path, filename) for filename in os.listdir(folder_path) if os.path.isfile(os.path.join(folder_path, filename))]

# file_paths
# Loop through each file path and upload it
uploaded_files = []
for file_path in file_paths:
    uploaded_file = client.files.upload(file=file_path)
    uploaded_files.append(uploaded_file)

<p style = 'font-size:16px;font-family:Arial'>We have uploaded 3 files, the uploaded_files can be seen as below.</p>

In [None]:
uploaded_files

<hr style="height:2px;border:none;background-color:#00233C;">
<b style = 'font-size:20px;font-family:Arial'>4. Setting up and calling prompt on the video</b></p>
<p style = 'font-size:16px;font-family:Arial'>We will set up the SYSTEM and USER prompts.</p>

In [None]:
SYSTEM_PROMPT = "When given a video and a query, call the relevant function only once with the appropriate timecodes,name and text for the video"

In [None]:
USER_PROMPT = "For each scene in this video, generate text that describe the scene. Place each text into an object with the time of the text in the video."

<p style = 'font-size:16px;font-family:Arial'>The <code>generate_content</code> generates a model response given an input GenerateContentRequest. Using the SYSTEM_PROMPT and USER_PROMPT we execute the function. Safety ratings and content filtering are reported for both prompt in GenerateContentResponse.prompt_feedback and for each candidate in finishReason and in safetyRatings. The API: - Returns either all requested candidates or none of them - Returns no candidates at all only if there was something wrong with the prompt (check promptFeedback) - Reports feedback on each candidate in finishReason and safetyRatings.</p>

In [None]:
from google.genai import types
data=''
for file_upload in uploaded_files:
    print(file_upload.uri)
    response = client.models.generate_content(
        model=MODEL_ID,
        contents=[
            types.Content(
                role="user",
                parts=[
                    types.Part.from_uri(
                        file_uri=file_upload.uri,
                        mime_type=file_upload.mime_type),
                    ]),
            USER_PROMPT,
        ],
        config=types.GenerateContentConfig(
            system_instruction=SYSTEM_PROMPT,
            temperature=0.0,
        ),
    )
    print(response.text)

    data += response.text

In [None]:
data

<hr style="height:2px;border:none;background-color:#00233C;">
<b style = 'font-size:20px;font-family:Arial'>5. Function Calls to get back the data in a way we expect it</b></p>

<p style = 'font-size:16px;font-family:Arial'>Using the Gemini API function calling feature, you can provide custom function definitions to the model. The model doesn't directly invoke these functions, but instead generates structured output that specifies a function name and suggested arguments. You can then use the function name and arguments to call an external API, and you can incorporate the resulting API output into a further query to the model, enabling the model to provide a more comprehensive response and take additional actions.</p>

<p style = 'font-size:16px;font-family:Arial'>Function calling empowers users to interact with real-time information and services like databases, customer relationship management systems, and document repositories. The feature also enhances the model's ability to provide relevant and contextual answers. Function calling is best for interacting with external systems. If your use case requires the model to perform computation but doesn't involve external systems or APIs, you should consider using code execution instead.</p>

In [None]:
set_timecodes = types.FunctionDeclaration(
    name="set_timecodes",
    description="Set the timecodes for the video with associated text",
    parameters={
        "type": "OBJECT",
        "properties": {
            "timecodes": {
                "type": "ARRAY",
                "items": {
                    "type": "OBJECT",
                    "properties": {
                        "time": {"type": "STRING"},
                        "text": {"type": "STRING"},
                    },
                    "required": ["time", "text"],
                }
            }
        },
        "required": ["timecodes"]
    }
)

set_timecodes_with_objects = types.FunctionDeclaration(
    name="set_timecodes_with_objects",
    description="Set the timecodes for the video with associated text and object list",
    parameters={
        "type": "OBJECT",
        "properties": {
            "timecodes": {
                "type": "ARRAY",
                "items": {
                    "type": "OBJECT",
                    "properties": {
                        "time": {"type": "STRING"},
                        "text": {"type": "STRING"},
                        "objects": {
                            "type": "ARRAY",
                            "items": {"type": "STRING"},
                        },
                    },
                    "required": ["time", "text", "objects"],
                }
            }
        },
        "required": ["timecodes"],
    }
)

set_timecodes_with_numeric_values = types.FunctionDeclaration(
    name="set_timecodes_with_numeric_values",
    description="Set the timecodes for the video with associated numeric values",
    parameters={
        "type": "OBJECT",
        "properties": {
            "timecodes": {
                "type": "ARRAY",
                "items": {
                    "type": "OBJECT",
                    "properties": {
                        "time": {"type": "STRING"},
                        "value": {"type": "NUMBER"},
                    },
                    "required": ["time", "value"],
                }
            }
        },
        "required": ["timecodes"],
    }
)

set_timecodes_with_descriptions = types.FunctionDeclaration(
    name="set_timecodes_with_descriptions",
    description="Set the timecodes for the video with associated spoken text and visual descriptions",
    parameters={
        "type": "OBJECT",
        "properties": {
            "timecodes": {
                "type": "ARRAY",
                "items": {
                    "type": "OBJECT",
                    "properties": {
                        "time": {"type": "STRING"},
                        "spoken_text": {"type": "STRING"},
                        "visual_description": {"type": "STRING"},
                    },
                    "required": ["time", "spoken_text", "visual_description"],
                }
            }
        },
        "required": ["timecodes"]
    }
)

<p style = 'font-size:16px;font-family:Arial;color:#00233C'>Using the functions declared we will create tools for the video analysis.</p>

In [None]:
video_tools = types.Tool(
    function_declarations=[set_timecodes, set_timecodes_with_objects, set_timecodes_with_numeric_values],
)

In [None]:

def set_timecodes_func(timecodes):
    return [{**t, "text": t["text"].replace("\\'", "'")} for t in timecodes]

def set_timecodes_with_objects_func(timecodes):
    return [{**t, "text": t["text"].replace("\\'", "'")} for t in timecodes]

def set_timecodes_with_descriptions_func(timecodes):
    return [{**t, "text": t["spoken_text"].replace("\\'", "'")} for t in timecodes]

In [None]:
USER_PROMPT

<p style = 'font-size:16px;font-family:Arial;color:#00233C'>We will generate the response for the 3 video files uploaded and create a dataframe using the response.</a></p>

In [None]:
from google.genai import types
data={}
i=0
df_index=''
for file_upload in uploaded_files:
    response = client.models.generate_content(
        model=MODEL_ID,
        contents=[
            types.Content(
                role="user",
                parts=[
                    types.Part.from_uri(
                        file_uri=file_upload.uri,
                        mime_type=file_upload.mime_type),
                    ]),
            USER_PROMPT,
        ],
        config=types.GenerateContentConfig(
            system_instruction=SYSTEM_PROMPT,
            tools=[video_tools],
            temperature=0,
        )
    )

    response.candidates[0].content.parts[0].function_call
    # print(response.candidates[0].content.parts[0])
    df_index = 'video_'+str(i)
    i=i+1
    # print(df_index)      
    if (len(data) == 0):
        data = response.candidates[0].content.parts[0].function_call.args
        data_1 = response.candidates[0].content.parts[0].function_call.args
        data_1.update(video_no= df_index)
        # df1 = pd.DataFrame(data_1['timecodes'], index=index_var)
        df1 = pd.json_normalize(data_1, "timecodes","video_no")
        # df = pd.DataFrame(data['timecodes'])
        # df.append(df_index)
        # print(df1)
    else:    
        # data.append(response.candidates[0].content.parts[0].function_call.args)
        data.update(response.candidates[0].content.parts[0].function_call.args)
        data_1.update(response.candidates[0].content.parts[0].function_call.args)
        data_1.update(video_no= df_index)
        df1 = pd.concat([df1,pd.json_normalize(data_1, "timecodes","video_no")])
        # df = pd.concat([df,pd.DataFrame(data['timecodes'])])
        # df.append(df_index)
    # print(df)    
# df.index = df_index
# index_var=['video_0','video_1','video_2']
# print(index_var)
# df
# df.index = index_var
df1

<p style = 'font-size:16px;font-family:Arial;color:#00233C'>The above data from the dataframe can be stored into Vantage table using <code>copy_to_sql</code> functions. The data can be used for futher analysis in Vantage using the various In-Db Clearscape Analytic functions available in Vanatge.</p>

<p style = 'font-size:16px;font-family:Arial;color:#00233C'>For reference we have included 1 video file here so that we can verify the transcripts generated by using the Gemini model for the video files.</p>

In [None]:
from IPython.display import Video
Video("./videos/RoadAccidents004_x264.mp4")

<hr style="height:2px;border:none;background-color:#00233C;">
<b style = 'font-size:20px;font-family:Arial;color:#00233C'>6. Cleanup</b>

<p style = 'font-size:16px;font-family:Arial;color:#00233C'>The following code will clean up the video files uploaded to Google Gemini.</p>

In [None]:
try:
    for file_upload in uploaded_files:
        generativeai.delete_file(file_upload.name)
        print(f'Deleted {file_upload.name }.')
except:
    pass

In [None]:
remove_context()

<footer style="padding-bottom:35px; background:#f9f9f9; border-bottom:3px solid #00233C">
    <div style="float:left;margin-top:14px">ClearScape Analytics™</div>
    <div style="float:right;">
        <div style="float:left; margin-top:14px">
            Copyright © Teradata Corporation - 2025. All Rights Reserved
        </div>
    </div>
</footer>