## Introduction to video Analysis):

This Python script utilizes Google Gemini, a multimodal LLM (Large Language Model), to analyze a video and identify scene changes, specifically focusing on detecting potential points for ad insertion. The script prompts Gemini to select the 10 most suitable scene transitions across the video, distributing them throughout the beginning, middle, and end. These points are chosen to minimize interruptions for viewers and ensure optimal ad placement.

The Gemini model provides metadata for each identified scene change, including the timestamp, reason for the scene change, summary, transition type, narrative role, dialogue intensity, and character types. This metadata is returned in a consistent JSON structure, thanks to Gemini's controlled output feature, making it suitable for downstream processing without additional transformations. The script converts this structured JSON response into a Pandas DataFrame for storage in a database like BigQuery for future analysis.

In [40]:
import json
import pandas as pd

import vertexai  # Import the Vertex AI library for initializing and using the Gemini model
from google.cloud import storage  # Google Cloud Storage client for handling GCS
from vertexai.generative_models import (
    GenerationConfig,        # Configuration settings for Gemini's response generation
    GenerativeModel,         # Class representing the generative model used for generating responses
    Part,                    # Used to define parts of multimodal content like videos
    Content,                 # Represents the content used in the generation request
    GenerationResponse,      # Structure to handle the response from Gemini
)

# Import the ImageGenerationModel from the Vertex AI preview vision models package.
# This model is used for generating images based on user prompts, leveraging GCP's Vertex AI services.
from vertexai.preview.vision_models import ImageGenerationModel

In [2]:
# Set the GCP project and location where Vertex AI is being used
project_id = "qwiklabs-gcp-02-32addd023448"
location = "us-central1"

# Initialize Vertex AI with the project ID and location to use the Gemini model
vertexai.init(project=project_id, location=location)



In [3]:
# Define the prompt that will be sent to Gemini.
# This prompt explains the task to analyze the video and identify the best scene changes for ad placement.
prompt = '''
       I have a video that I need you to analyze for ad placement by detecting scene changes, 
       also known as shot boundaries. I need to identify the 10 best scene changes across the 
       entire movie, which are the best potential points for ad placement as they minimize 
       interruptions for viewers. These scene changes should be selected from all parts of the movie: 
       the beginning, middle, and the very end. Make sure you distribute the selected scenes evenly across 
       the entire movie.
       For each of these scene changes, please provide:

        timestamp: The exact timestamps indicating where the scene change occurs. Make sure that the timestamp of scenes are matched those in the original movie,
        reflecting its position accurately. The timestamps must exactly match those in the original movie.
        
        reason: The reason why this is a scene change and why it is a good location for ad placement. the reason 
        should be very specific. Summarize the story after and before the scene and explain why 
        between these two scenes is a good place for an ad.
        
        summary: A brief summary of the scene before the change.
        
        transition_feeling: The main feeling that the transition makes in viewers like excitement, peace, fear, etc.
        
        transition_type: The method used to switch from one scene to another like cuts, fades, dissolves, etc.
        
        narrative_type: The main role or significance of the scene in the storyline like pivotal, climatic, conflict, etc.
        
        dialogue_intensity: The amount and intensity of dialogue in the scene like monologue, dialogue, narration, debate, etc.

        characters_type: The types of the most important character involved in the scene transition like protagonist, antagonist, supporting, etc.
        
        scene_categories:  Classification of the scene before the change into the categories such as action, drama, comedy, etc.
      '''      

# Define the expected response schema to ensure the output JSON is structured correctly
response_schema = {
    "type": "array",
    "items": {
        "type": "object",
        "properties": {
            "timestamp": {"type": "string"},
            "reason": {"type": "string"},
            "transition_feeling": {"type": "string"},
            "transition_type": {"type": "string"},
            "narrative_type": {"type": "string"},
            "dialogue_intensity": {"type": "string"},
            "characters_type": {"type": "string"},
            "scene_categories": {"type": "string"},
        },
        # Ensure that these properties are always present in the output
        "required": ["timestamp", "reason","transition_feeling","transition_type","narrative_type",
                    "characters_type","scene_categories"],
    },
}


In [4]:
# Specify the version of the Gemini model to be used
model_id = "gemini-1.5-pro-001"  
model = GenerativeModel(model_id)

# Set up the generation configuration to control Gemini's response
generation_config = GenerationConfig(
    temperature=0,  # Set the temperature to 0 for consistent output
    response_mime_type="application/json",  # Expect the response to be in JSON format
    response_schema=response_schema  # Use the predefined response schema for structured output
)

In [5]:
# Define the URL of the video file stored in Google Cloud Storage
video_file_url = 'gs://video_demo_test/wakeup_princess.mp4'

# Load the video file from Google Cloud Storage as an input to Gemini
video_file = Part.from_uri(video_file_url, mime_type="video/mp4")

# Combine the video file and the prompt into a single input to the Gemini model
contents = [video_file, prompt]

# Generate content from Gemini by passing the contents and the generation configuration
response = model.generate_content(contents, generation_config=generation_config)


In [6]:
# Parse the JSON response from Gemini into a Python dictionary
json_response = json.loads(response.text)

# Convert the JSON response into a Pandas DataFrame for easier analysis and storage
df_response = pd.DataFrame(json_response)

In [38]:
# df_response.head()

## Introduction to Image Generation

This script leverages GCP's image generation model to create images based on a user-defined prompt. The model allows for customizing the image output through two primary inputs:
1. **Image Prompt**: A description of the desired scene or elements to be generated.
2. **Negative Prompt**: Specifies objects or elements to avoid in the generated images.

The script can generate up to 4 images at a time, with the number of images controlled by a parameter. Users can choose between multiple pre-trained models, such as:
- **imagen-3.0-generate-001**: The default model for image generation.
- **imagen-3.0-fast-generate-001**: A low-latency version for faster image generation.
- **imagegeneration@002**: An earlier version of the image generation model.

Once images are generated, they can be accessed, displayed, and saved to disk. The script includes parameters to control the **language** and **aspect ratio** of the generated images.


In [None]:
# Initialize the image generation model from GCP's pre-trained model 'imagen-3.0-generate-001'.
image_model = ImageGenerationModel.from_pretrained("imagen-3.0-generate-001")

# Other available image generation models:
# 1. imagen-3.0-fast-generate-001: Low-latency image generation model for faster results.
# 2. imagegeneration@002: Older model (Imagen 2.00), available for use if needed.

# Define the image prompt describing the desired scene.
image_prompt = '''
        Having banana cake with a hot tea while watching rainy weather from a rustic window in Paris with the view of Eiffel
'''

# Define the negative prompt to exclude specific elements from the generated images.
# Here, we instruct the model to avoid 'tea pot' and 'banana' in the output images.
negative_prompt = 'tea pot, banana',

# Generate up to 4 images using the model, based on the provided image and negative prompts.
images = image_model.generate_images(
        prompt=image_prompt,        # The prompt describing the desired image.
        negative_prompt='tea pot, banana',  # Elements to exclude from the generated images.
        number_of_images=4,         # Specify how many images to generate (up to 4).
        language="en",              # Language of the prompt (English in this case).
        aspect_ratio="1:1",         # Aspect ratio of the generated images (square format).
    )

# 'images' is an iterable object containing the generated images.


In [39]:
# Show the first generated image.
images[0].show()

# Save the first generated image to disk.
images[0].save()
