# REST API Video Samples

## Objective
Conducting Q&A with video inputs in GPT-4V.	

## Time

You should expect to spend 5-10 minutes running this sample.

## Before you begin

#### Installation

In [None]:
%pip install -r ../requirements.txt

### Parameters
You need to set a series of configurations such as GPT-4V_DEPLOYMENT_NAME, OPENAI_API_BASE, OPENAI_API_VERSION, VISION_API_ENDPOINT.

Add "OPENAI_API_KEY" and "VISION_API_KEY" as variable name and \<Your API Key Value\> and \<Your VISION Key Value\> as variable value in the environment variables.
 <br>
      
      WINDOWS Users: 
         setx OPENAI_API_KEY "REPLACE_WITH_YOUR_KEY_VALUE_HERE"
         setx VISION_API_KEY "REPLACE_WITH_YOUR_KEY_VALUE_HERE"

      MACOS/LINUX Users: 
         export OPENAI_API_KEY="REPLACE_WITH_YOUR_KEY_VALUE_HERE"
         export VISION_API_KEY="REPLACE_WITH_YOUR_KEY_VALUE_HERE"

In [1]:
import os
from dotenv import load_dotenv
load_dotenv()

True

In [2]:
# Setting up the deployment name
deployment_name: str = os.getenv("GPT-4V_DEPLOYMENT_NAME")
# The base URL for your Azure OpenAI resource. e.g. "https://<your resource name>.openai.azure.com"
openai_api_base: str = os.getenv("OPENAI_API_BASE")
openai_api_key: str = os.getenv("OPENAI_API_KEY")
# Currently OPENAI API have the following versions available: 2022-12-01.
# All versions follow the YYYY-MM-DD date structure.
openai_api_version: str = os.getenv("OPENAI_API_VERSION")

In [3]:
# Setting up the deployment name
deployment_name: str = os.getenv("GPT-4V_DEPLOYMENT_NAME")
# The base URL for your Azure OpenAI resource. e.g. "https://<your resource name>.openai.azure.com"
openai_api_base: str = os.getenv("OPENAI_API_BASE")
# Currently OPENAI API have the following versions available: 2022-12-01.
# All versions follow the YYYY-MM-DD date structure.
openai_api_version: str = os.getenv("OPENAI_API_VERSION")

# The base URL for your vision resource endpoint, e.g. "https://<your-resource-name>.cognitiveservices.azure.com"
vision_api_endpoint: str = os.getenv("VISION_API_ENDPOINT")

# Insert your video SAS URL, e.g. https://<your-storage-account-name>.blob.core.windows.net/<your-container-name>/<your-video-name>?<SAS-token>
video_SAS_url: str = "https://gpt4vsamples.blob.core.windows.net/videos/Microsoft%20Copilot%20Short.mp4"
# This index name must be unique
# It must start with alphanumeric, can contain hyphens but they must be followed by alphanumeric (no consecutive hyphens or trailing hyphen).
# It must be 24 characters or less
video_index_name: str = "copilot-video-demo-index"
# This video ID must be unique
video_id: str = "copilot-video-1"

should_cleanup: bool = False

## Connect to your project
To start with let us create a config file with your project details. This file can be used in this sample or other samples to connect to your workspace.

In [4]:
import json
from pathlib import Path

config = {
    "GPT-4V_DEPLOYMENT_NAME": deployment_name,
    "OPENAI_API_BASE": openai_api_base,
    "OPENAI_API_VERSION": openai_api_version,
    "VISION_API_ENDPOINT": vision_api_endpoint,
}

p = Path("../config.json")

with p.open(mode="w") as file:
    file.write(json.dumps(config))

## Run this Example

In [5]:
import os
import re
import sys

parent_dir = Path(Path.cwd()).parent
sys.path.append(str(parent_dir))
from shared_functions import call_GPT4V_video, process_video_indexing

# Setting up the vision resource key
vision_api_key ="4660b804230a4cde925f324b813126f3"      #os.getenv("VISION_API_KEY")

In [13]:
vision_api_key

'4660b804230a4cde925f324b813126f3'

### Create Video Index


In [6]:
# You only need to run this cell once to create the index
process_video_indexing(vision_api_endpoint, vision_api_key, video_index_name, video_SAS_url, video_id)

201 {"name":"copilot-video-demo-index","userData":{},"features":[{"name":"vision","modelVersion":"2023-05-31","domain":"surveillance"},{"name":"speech","modelVersion":"2023-06-30","domain":"generic"}],"eTag":"\"ab198b40b10042ce8f990af3fb411e52\"","createdDateTime":"2024-01-08T12:42:19.9841630Z","lastModifiedDateTime":"2024-01-08T12:42:19.9841630Z"}
202 {"name":"my-ingestion","state":"Running","batchName":"806945f0-9154-4ba3-b39f-022b693b4e0f","createdDateTime":"2024-01-08T12:42:21.4841613Z","lastModifiedDateTime":"2024-01-08T12:42:21.7341652Z"}
{'value': [{'name': 'my-ingestion', 'state': 'Completed', 'batchName': '806945f0-9154-4ba3-b39f-022b693b4e0f', 'createdDateTime': '2024-01-08T12:42:21.4841613Z', 'lastModifiedDateTime': '2024-01-08T12:42:47.8591808Z'}]}
Ingestion completed.


### Call GPT-4V API with Video Index

In [7]:
# System messages and user prompt
sys_message = """
Your task is to assist in analyzing and optimizing creative assets. 
You will be presented with advertisement videos for products. 
First describe the video in detail paying close attention to Product characteristics highlighted, 
Background images, Lighting, Color Palette and Human characteristics for persons in the video. 
Finally provide a summary of the video and talk about the main message the advertisement video tries to convey to the viewer. 
"""
user_prompt = "Summarize the ad video"

messages = [
    {"role": "system", "content": [{"type": "text", "text": sys_message}]},
    {"role": "user", "content": [{"type": "acv_document_id", "acv_document_id": video_id}]},
    {"role": "user", "content": [{"type": "text", "text": user_prompt}]},  # Prompt for the user
]

vision_api_config = {"endpoint": vision_api_endpoint, "key": vision_api_key}

video_config = {
    "video_SAS_url": video_SAS_url,
    "video_index_name": video_index_name,
}

# Call GPT-4V API and print the response
try:
    response = call_GPT4V_video(messages, vision_api=vision_api_config, video_index=video_config)
    text = response["choices"][0]["message"]["content"]
    sentences = re.split(r"(?<!\w\.\w.)(?<![A-Z][a-z]\.)(?<=\.|\?)\s", text)
    for sentence in sentences:  # Print the content of the response
        print(sentence)
except Exception as e:
    print(f"Failed to call GPT-4V API. Error: {e}")

The advertisement video is a visual showcase of a digital product, specifically an AI companion app named Copilot, presented by Microsoft.
The video uses vibrant and surreal background imagery, such as whimsical landscapes and fantastical flora, to depict various use cases of the app.
The color palette is soft and pastel, with predominant shades of blue, pink, and purple, creating a calming and inviting atmosphere.

Each frame highlights different features of the app with text prompts such as "Inspire new ideas," "Research a topic," and "Organize my plans," suggesting the app's versatility in assisting with creative, informational, and organizational tasks.
The app interface is intermittently displayed, featuring a clean and user-friendly design, with simulated screens hinting at a customizable and interactive experience.
Additionally, the video illustrates the dark mode feature, emphasizing comfort and accessibility.

There are no human characteristics displayed, as the focus remains 

## Cleaning up

To clean up all Azure ML resources used in this example, you can delete the individual resources you created in this tutorial.

If you made a resource group specifically to run this example, you could instead [delete the resource group](https://learn.microsoft.com/en-us/azure/azure-resource-manager/management/delete-resource-group).

In [None]:
if should_cleanup:
    # {{TODO: Add resource cleanup}}
    pass