<a href="https://colab.research.google.com/github/deep-diver/Vid2Persona/blob/ipynb%2Fvid2desc/notebooks/Vid2Desc_Gemini_1_0_Pro_Vision.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Ask about Video clip with Gemini 1.0 Pro Vision on Vertex AI

In [None]:
!pip install --upgrade google-cloud-aiplatform

## Authentication to Vertex AI with `gcloud`

In [None]:
!gcloud auth application-default login

# or do the same thing without interrupting prompt
#
# export GOOGLE_APPLICATION_CREDENTIALS="/path/to/your/service_account_key.json"
# gcloud auth application-default login --client-id-file=/path/to/your/service_account_key.json

## Setup GCP Project and Location

In [None]:
GCP_PROJECT_ID="gde-prj"
GCP_PROJECT_LOCATION="us-central1"

## Call Gemini 1.0 Pro Vision

### Define general function

In [None]:
import base64
import vertexai
from vertexai.generative_models import GenerativeModel, Part, GenerationResponse

def initi_vertexai(project_id: str, location: str) -> None:
    vertexai.init(project=project_id, location=location)

def ask_gemini(
    prompt: str=None, gcs: str=None, base64_encoded: bytes=None, stream: bool=False, generation_config: dict=None
) -> GenerationResponse:
    if gcs is None and base64_encoded is None:
        raise ValueError("Either a GCS bucket path or base64_encoded string of the video must be provided")

    if gcs is not None and base64_encoded is not None:
        raise ValueError("Only one of gcs or base64_encoded must be provided")

    if gcs is not None:
        video = Part.from_uri(gcs, mime_type="video/mp4")
    else:
        video = Part.from_data(data=base64_encoded, mime_type="video/mp4")

    if prompt is None:
        prompt = "What is in the video?"

    if generation_config is None:
        generation_config={
            "max_output_tokens": 2048,
            "temperature": 0.4,
            "top_p": 1,
            "top_k": 32
        },

    vision_model = GenerativeModel("gemini-1.0-pro-vision")
    return vision_model.generate_content(
        [video, prompt],
        generation_config=generation_config, stream=stream
    )

### Ask about video on GCS with non-streamining mode

In [None]:
initi_vertexai(GCP_PROJECT_ID, GCP_PROJECT_LOCATION)
try:
    response = ask_gemini(gcs="gs://cloud-samples-data/video/animals.mp4")
except:
    print("something went wrong")

In [None]:
print(response.text)

 The video is an advertisement for the movie Zootopia. It features a sloth, a fox, and a rabbit taking selfies with a Google Pixel phone. The ad highlights the phone's camera quality and its ability to take great photos even in low-light conditions. The ad also features the tagline "See more at g.co/ZootopiaSelfies".


### Ask about video on GCS with streamining mode

In [None]:
initi_vertexai(GCP_PROJECT_ID, GCP_PROJECT_LOCATION)
try:
    response = ask_gemini(gcs="gs://cloud-samples-data/video/animals.mp4", stream=True)
except:
    print("something went wrong")

In [None]:
for response_piece in response:
    print(response_piece.text)
    print()

 It is a commercial for the movie Zootopia. It shows a sloth, a fox, and a rabbit in a city. It also shows a tiger,

 an elephant, and a seal. The animals are taking pictures of each other. The commercial is funny because it shows the animals doing human things.



### Ask about based64 encoded video with non-streamining mode

In [None]:
!gsutil cp gs://cloud-samples-data/video/animals.mp4 ./

Copying gs://cloud-samples-data/video/animals.mp4...
/ [1 files][ 16.1 MiB/ 16.1 MiB]                                                
Operation completed over 1 objects/16.1 MiB.                                     


In [None]:
import base64

with open("animals.mp4", "rb") as video_file:
    video_data = video_file.read()

encoded_string = base64.b64encode(video_data)

In [None]:
initi_vertexai(GCP_PROJECT_ID, GCP_PROJECT_LOCATION)
try:
    response = ask_gemini(base64_encoded=encoded_string)
    print(response.text)
except:
    print("something went wrong")

 The video is an advertisement for the movie Zootopia. It features a sloth, a fox, and a rabbit taking selfies with a Google Pixel phone. The ad highlights the phone's camera quality and its ability to take great photos even in low-light conditions. The video ends with the tagline "See more at g.co/ZootopiaSelfies".


### Ask about based64 encoded video with streamining mode

In [None]:
initi_vertexai(GCP_PROJECT_ID, GCP_PROJECT_LOCATION)
try:
    response = ask_gemini(base64_encoded=encoded_string, stream=True)
except:
    print("something went wrong")

for response_piece in response:
    print(response_piece.text)
    print()

 This is a commercial for the movie Zootopia. It features a sloth, a fox, and a rabbit taking selfies at the Los Angeles Zoo. The commercial

 was released in 2016.

