# Video Captioning
This notebook shows how to use VideoCaptioningChain, which is implemented using Langchain's ImageCaptionLoader and AssemblyAI to produce .srt files.

This system autogenerates both subtitles and closed captions from a video URL.

## Installing Dependencies

In [1]:
# !pip install ffmpeg-python

## Imports

In [2]:
from langchain.chains.video_captioning import VideoCaptioningChain
from langchain.chat_models.openai import ChatOpenAI

## Setting up API Keys

In [3]:
#OPENAI_API_KEY = getpass.getpass("OpenAI API Key:")

#ASSEMBLYAI_API_KEY = getpass.getpass("AssemblyAI API Key:")

In [4]:
#TODO: Delete this cell before PR and uncomment above

OPENAI_API_KEY = "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"
ASSEMBLYAI_API_KEY = "f50c08e20ecd4544b175953636f0b936"

**Required parameters:**

* llm: The language model this chain will use to get suggestions on how to refine the closed-captions
* assemblyai_key: The API key for AssemblyAI, used to generate the subtitles

**Optional Parameters:**

* verbose (Default: True): Sets verbose mode for downstream chain calls
* use_logging (Default: True): Log the chain's processes in run manager
* frame_skip (Default: None): Choose how many video frames to skip during processing. Increasing it results in faster execution, but less accurate results. If None, frame skip is calculated manually based on the framerate Set this to 0 to sample all frames
* image_delta_threshold (Default: 3000000): Set the sensitivity for what the image processor considers a change in scenery in the video, used to delimit closed captions. Higher = less sensitive
* closed_caption_char_limit (Default: 20): Sets the character limit on closed captions
* closed_caption_similarity_threshold (Default: 80): Sets the percentage value to how similar two closed caption models should be in order to be clustered into one longer closed caption
* use_unclustered_video_models (Default: False): If true, closed captions that could not be clustered will be included. May result in spontaneous behaviour from closed captions such as very short lasting captions or fast-changing captions. Enabling this is experimental and not recommended

## Example run

In [5]:
# https://ia804703.us.archive.org/27/items/uh-oh-here-we-go-again/Uh-Oh%2C%20Here%20we%20go%20again.mp4
# https://ia601200.us.archive.org/9/items/f58703d4-61e6-4f8f-8c08-b42c7e16f7cb/f58703d4-61e6-4f8f-8c08-b42c7e16f7cb.mp4

chain = VideoCaptioningChain(
    llm=ChatOpenAI(model="gpt-4",
    max_tokens=4000,
    openai_api_key=OPENAI_API_KEY),
    assemblyai_key=ASSEMBLYAI_API_KEY,
)

srt_content = chain.run(
    video_file_path=
    "https://ia904700.us.archive.org/22/items/any-chibes/X2Download.com-FXX%20USA%20%C2%ABPromo%20Noon%20-%204A%20Every%20Day%EF%BF%BD%EF%BF%BD%C2%BB%20November%202021%EF%BF%BD%EF%BF%BD-%281080p60%29.mp4"
)

print(srt_content)



[1m> Entering new VideoCaptioningChain chain...[0m
Loading processors...
Finished loading processors.
Generating subtitles from audio...
Finished generating subtitles:
start_time: 250, end_time: 4106, subtitle_text: Any chives? What were you eating?
start_time: 4218, end_time: 8240, subtitle_text: Chives. We are not going off on some wild goose chase. Who's in?
start_time: 9010, end_time: 12078, subtitle_text: How easy? A little tighter. In at the couch. Yeah, it's a little tight in
start_time: 12084, end_time: 14620, subtitle_text: the couch. Did you eat?
Generating closed captions from video...
15.0


  from .autonotebook import tqdm as notebook_tqdm


Finished generating closed captions:
start_time: 0, end_time: 1083.3333333333333, image_description: an image of a man with a mustache and mustache
start_time: 1083.3333333333333, end_time: 1333.3333333333333, image_description: an image of a man with a mustache and mustache
start_time: 1333.3333333333333, end_time: 1583.3333333333333, image_description: an image of a man with a mustache and mustache
start_time: 1583.3333333333333, end_time: 1833.3333333333333, image_description: an image of a man with a mustache and mustache
start_time: 1833.3333333333333, end_time: 2083.3333333333335, image_description: an image of a man with a mustache and a mustache
start_time: 2083.3333333333335, end_time: 2333.3333333333335, image_description: an image of a man with a mustache in an office
start_time: 2333.3333333333335, end_time: 2583.3333333333335, image_description: an image of two people standing in front of a house
start_time: 2583.3333333333335, end_time: 2833.3333333333335, image_descripti

## Writing output to .srt file

In [6]:
with open('output.srt', 'w') as file:
    file.write(srt_content)