# Gemini<>Skreens Multi-Modal Use cases



## Overview and Goal

Skreens is a cloud based live video editing service provider

In this notebook, we are going to explore "the art of possible", showcase a variety of different use cases to further enrich and advance live video streaming personalization by leveraing gemini multimodality.

### Vertex AI Gemini API

- **Gemini 1.5 Flash** (`gemini-1.5-flash`): Gemini 1.5 Flash was purpose-built as our fastest, most cost-efficient model yet for high volume tasks, at scale, to address developers’ feedback asking for lower latency and cost. 

Please note: This notebook is designed to help conceptualize and visualize use cases. Evaluation is done by humans, and while we strive for accuracy, there's no guarantee that all hallucinations (incorrect outputs) have been eliminated.

### Install Vertex AI SDK for Python and auth user(colab only)


In [None]:
from google.colab import auth as google_auth
google_auth.authenticate_user()

In [19]:
! pip3 install --upgrade --user google-cloud-aiplatform



### Restart runtime


In [20]:
import IPython
import time

app = IPython.Application.instance()
app.kernel.do_shutdown(True)

{'status': 'ok', 'restart': True}

### Initialize project and install SDK


In [1]:
PROJECT_ID = "jz-amigo-1"  # @param {type:"string"}
LOCATION = "us-central1"  # @param {type:"string"}

import vertexai

vertexai.init(project=PROJECT_ID, location=LOCATION)

In [2]:
from vertexai.generative_models import (
    GenerationConfig,
    GenerativeModel,
    Image,
    Part,
)

## Use Gemini 1.5 flash model

In [3]:
multimodal_model = GenerativeModel("gemini-1.5-flash")

### Define helper functions


In [4]:
import http.client
import typing
import urllib.request

import IPython.display
from PIL import Image as PIL_Image
from PIL import ImageOps as PIL_ImageOps


def display_images(
    images: typing.Iterable[Image],
    max_width: int = 600,
    max_height: int = 350,
) -> None:
    for image in images:
        pil_image = typing.cast(PIL_Image.Image, image._pil_image)
        if pil_image.mode != "RGB":
            # RGB is supported by all Jupyter environments (e.g. RGBA is not yet)
            pil_image = pil_image.convert("RGB")
        image_width, image_height = pil_image.size
        if max_width < image_width or max_height < image_height:
            # Resize to display a smaller notebook image
            pil_image = PIL_ImageOps.contain(pil_image, (max_width, max_height))
        IPython.display.display(pil_image)


def get_image_bytes_from_url(image_url: str) -> bytes:
    with urllib.request.urlopen(image_url) as response:
        response = typing.cast(http.client.HTTPResponse, response)
        image_bytes = response.read()
    return image_bytes


def load_image_from_url(image_url: str) -> Image:
    image_bytes = get_image_bytes_from_url(image_url)
    return Image.from_bytes(image_bytes)


def display_content_as_image(content: str | Image | Part) -> bool:
    if not isinstance(content, Image):
        return False
    display_images([content])
    return True


def display_content_as_video(content: str | Image | Part) -> bool:
    if not isinstance(content, Part):
        return False
    part = typing.cast(Part, content)
    file_path = part.file_data.file_uri.removeprefix("gs://")
    video_url = f"https://storage.googleapis.com/{file_path}"
    print (video_url)
    IPython.display.display(IPython.display.Video(video_url, width=600))
    return True


def print_multimodal_prompt(contents: list[str | Image | Part]):
    """
    Given contents that would be sent to Gemini,
    output the full multimodal prompt for ease of readability.
    """
    for content in contents:
        if display_content_as_image(content):
            continue
        if display_content_as_video(content):
            continue
        print(content)

## Use Case 1: Viewer Live Interaction


In [5]:
prompt = """
- What is shown in this video?
- Where is the location of this video?
- What is the score at the beginning of the video?
- What is the score at the end of the video?
- who's on deck?
- What was the last pitch?

"""
video = Part.from_uri(
    uri="gs://gemini-bucket-373/Tigers vs. Red Sox Game Highlights (5_31_24) _ MLB Highlights.mp4",
    mime_type="video/mp4",
)
contents = [prompt, video]

responses = multimodal_model.generate_content(contents, stream=True)

print("-------Prompt--------")
print_multimodal_prompt(contents)

print("\n-------Response--------")
for response in responses:
    print(response.text, end="")

-------Prompt--------

- What is shown in this video?
- Where is the location of this video?
- What is the score at the beginning of the video?
- What is the score at the end of the video?
- who's on deck?
- What was the last pitch?


https://storage.googleapis.com/gemini-bucket-373/Tigers vs. Red Sox Game Highlights (5_31_24) _ MLB Highlights.mp4



-------Response--------
- This video is a baseball game between the Boston Red Sox and the Detroit Tigers.
- The location of the video is Fenway Park in Boston.
- The score at the beginning of the video is 0-0.
- The score at the end of the video is 7-3 in favor of the Red Sox.
- The batter on deck is Connor Wong.
- The last pitch was a slider that was called a strike three. 


## Use Case 2: Entity Extraction for Integration
- Fantasy Sports Platform: real time statistics and player performance data to be used to update fantasy sports
- Sports Betting Platform: Game statistics can inform betting odds in real time
- Social Media Platform: Highlights, key moments, hashtags to generate buzz and engagements
- Sports News Channels: Game data to be used for news broadcasting
- E-commerce integration: advertisement


In [6]:
prompt = """
Answer the following questions using the video only:
- What are the names and their numbers of baseball players appeared in this video?
- What are commercial logos appeared in this video?
Provide the answer JSON.
"""
video = Part.from_uri(
    uri="gs://gemini-bucket-373/Tigers vs. Red Sox Game Highlights (5_31_24) _ MLB Highlights.mp4",
    mime_type="video/mp4",
)
contents = [prompt, video]

responses = multimodal_model.generate_content(contents, stream=True)

print("-------Prompt--------")
print_multimodal_prompt(contents)

print("\n-------Response--------")
for response in responses:
    print(response.text, end="")

-------Prompt--------

Answer the following questions using the video only:
- What are the names and their numbers of baseball players appeared in this video?
- What are commercial logos appeared in this video?
Provide the answer JSON.

https://storage.googleapis.com/gemini-bucket-373/Tigers vs. Red Sox Game Highlights (5_31_24) _ MLB Highlights.mp4



-------Response--------
```json
{
 "baseball_players": [
  {
   "name": "Tanner Houck",
   "number": 89
  },
  {
   "name": "Wencesel Perez",
   "number": 5
  },
  {
   "name": "Vaughn Grissom",
   "number": 16
  },
  {
   "name": "Riley Greene",
   "number": 31
  },
  {
   "name": "Willier Abreu",
   "number": 30
  },
  {
   "name": "Rafael Devers",
   "number": 11
  },
  {
   "name": "Connor Wong",
   "number": 81
  },
  {
   "name": "Ceddanne Rafaela",
   "number": 43
  },
  {
   "name": "Colt Keith",
   "number": 33
  },
  {
   "name": "Justin Slate",
   "number": 57
  },
  {
   "name": "Kenta Maeda",
   "number": 18
  }
 ],
 "commercial_logos": [
  {
   "name": "MLB app",
   "count": 5
  },
  {
   "name": "BETMGM",
   "count": 8
  },
  {
   "name": "MassMutual",
   "count": 6
  },
  {
   "name": "PlayBall Org",
   "count": 4
  },
  {
   "name": "At Bat",
   "count": 4
  },
  {
   "name": "Chevrolet",
   "count": 6
  },
  {
   "name": "Build Submarines",
   "count": 3
  },
  {
   

## Use Case 3: Advanced insight in video ( one shot/ few shot prompting)


In [10]:
prompt = """

- Suggest optimal ad placements (MM:SS) in this video for maximum viewer attention, with explanations?

<EXAMPLE>
INPUT: gs://gemini-bucket-373/Tigers vs. Red Sox Game Highlights (5_31_24) _ MLB Highlights.mp4
OUTPUT: 

07:08: it is a good spot for ads because it is right after exciting moment, viewers often watch the replay
03:56: this is a good spot for ads, because it is during pitching changes, provides a brief pause for a seamless ads transition

</EXAMPLE>

-  Condense this video's most exciting and crucial moments into a highlight reel so TV channels can use during their evening news program for summary of the day's news? 


<EXAMPLE>

INPUT: gs://gemini-bucket-373/Tigers vs. Red Sox Game Highlights (5_31_24) _ MLB Highlights.mp4
OUTPUT:

00:13 - 00:22:  Gio Urshela of the Red Sox hit his second home run, including a sizzling line drive that went deep into the monster seats

01:18 - 01:25:  Despite Torkelson's struggles at the plate, Manuel Valdez delivered with two home runs for the Red Sox, showcasing his ability to perform under pressure.

02:19 - 02:28:  SRiley Greene of the Tigers dominated the opening, hitting a home run, stealing a base


</EXAMPLE>
"""
video = Part.from_uri(
    uri="gs://gemini-bucket-373/Tigers vs. Red Sox Game Highlights (5_31_24) _ MLB Highlights.mp4",
    mime_type="video/mp4",
)
contents = [prompt, video]

responses = multimodal_model.generate_content(contents, stream=True)

print("-------Prompt--------")
print_multimodal_prompt(contents)

print("\n-------Response--------")
for response in responses:
    print(response.text, end="")

-------Prompt--------


- Suggest optimal ad placements (MM:SS) in this video for maximum viewer attention, with explanations?

<EXAMPLE>
INPUT: gs://gemini-bucket-373/Tigers vs. Red Sox Game Highlights (5_31_24) _ MLB Highlights.mp4
OUTPUT: 

07:08: it is a good spot for ads because it is right after exciting moment, viewers often watch the replay
03:56: this is a good spot for ads, because it is during pitching changes, provides a brief pause for a seamless ads transition

</EXAMPLE>

-  Condense this video's most exciting and crucial moments into a highlight reel so TV channels can use during their evening news program for summary of the day's news? 


<EXAMPLE>

INPUT: gs://gemini-bucket-373/Tigers vs. Red Sox Game Highlights (5_31_24) _ MLB Highlights.mp4
OUTPUT:

00:13 - 00:22:  Gio Urshela of the Red Sox hit his second home run, including a sizzling line drive that went deep into the monster seats

01:18 - 01:25:  Despite Torkelson's struggles at the plate, Manuel Valdez deliver


-------Response--------
## Optimal Ad Placements:

**00:17 - 00:18:**  This is a good spot for ads because it is during a replay of a key moment (Von Grisson's catch). Viewers are likely to be engaged and willing to watch a short ad before moving on to the next play.

**01:04 - 01:05:** This is another opportunity during a replay of an exciting play (the steal by Riley Greene).  The brief pause before the replay is ideal for a seamless ad transition.

**01:24 - 01:25:**  This is a good spot for ads, as it occurs during a pitching change.  Viewers typically use this time to catch their breath and will likely be receptive to a brief ad break.

**01:49 - 01:50:** This is a good ad placement due to a replay review for a close play. Viewers are likely to be engrossed in the replay, making it an opportune moment for a short ad break.

**02:25 - 02:26:** This is an excellent ad opportunity during a pitching change that follows a noteworthy play (Rafael Devers' near-miss catch).

**02:55 - 02