# Video Captioning
This notebook shows how to use VideoCaptioningChain, which is implemented using Langchain's ImageCaptionLoader and AssemblyAI to produce .srt files.

This system autogenerates both subtitles and closed captions from a video URL.

## Installing Dependencies

In [None]:
# !pip install ffmpeg-python

**Required parameters:**

* llm: The language model this chain will use to get suggestions on how to refine the closed-captions
* assemblyai_key: The API key for AssemblyAI, used to generate the subtitles

**Optional Parameters:**

* verbose (Default: True): Sets verbose mode for downstream chain calls
* use_logging (Default: True): Log the chain's processes in run manager
* frame_skip (Default: 3): Choose how many video frames to skip during processing. Increasing it results in faster execution, but less accurate results
* image_delta_threshold (Default: 3000000): Set the sensitivity for what the image processor considers a change in scenery in the video, used to delimit closed captions. Higher = less sensitive
* closed_caption_char_limit (Default: 20): Sets the character limit on closed captions
* closed_caption_similarity_threshold (Default: 90): Sets the percentage value to how similar two closed caption models should be in order to be clustered into one longer closed caption
* use_unclustered_video_models (Default: False): If true, closed captions that could not be clustered will be included. May result in spontaneous behaviour from closed captions such as very short lasting captions or fast-changing captions. Enabling this is experimental and not recommended

## Example run

In [2]:
from langchain.chains.video_captioning import VideoCaptioningChain
from langchain.chat_models.openai import ChatOpenAI


#TODO: env variable for keys

# The runtime for this instance is currently is about 3 minutes, but may vary based on your computer specs.
chain = VideoCaptioningChain(
    llm=ChatOpenAI(model="gpt-4", max_tokens=4000, openai_api_key="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"),
    assemblyai_key="f50c08e20ecd4544b175953636f0b936",
    frame_skip=20,
    image_delta_threshold=3000000,
    closed_caption_similarity_threshold=90,
)

# https://ia804703.us.archive.org/27/items/uh-oh-here-we-go-again/Uh-Oh%2C%20Here%20we%20go%20again.mp4
# https://ia601200.us.archive.org/9/items/f58703d4-61e6-4f8f-8c08-b42c7e16f7cb/f58703d4-61e6-4f8f-8c08-b42c7e16f7cb.mp4
result = chain.run(video_file_path="https://ia904700.us.archive.org/22/items/any-chibes/X2Download.com-FXX%20USA%20%C2%ABPromo%20Noon%20-%204A%20Every%20Day%EF%BF%BD%EF%BF%BD%C2%BB%20November%202021%EF%BF%BD%EF%BF%BD-%281080p60%29.mp4")

print(result)



[1m> Entering new VideoCaptioningChain chain...[0m
Loading processors...
Finished loading processors.
Generating subtitles from audio...
Finished generating subtitles:
start_time: 250, end_time: 4106, subtitle_text: Any chives? What were you eating?
start_time: 4218, end_time: 7274, subtitle_text: Chives. We are not going off on some wild goose chase.
start_time: 7322, end_time: 10378, subtitle_text: Who's in? How easy? A little tighter.
start_time: 10394, end_time: 12830, subtitle_text: In at the couch. Yeah, it's a little tight in the couch.
start_time: 13970, end_time: 14620, subtitle_text: Did you eat?
Generating closed captions from video...




Finished generating closed captions:
start_time: 0, end_time: 1066.6666666666667, image_description: an image of a man with a mustache and mustache
start_time: 1066.6666666666667, end_time: 1400.0, image_description: an image of a man with a mustache and mustache
start_time: 1400.0, end_time: 1733.3333333333335, image_description: an image of a man with a mustache and mustache
start_time: 1733.3333333333335, end_time: 2066.6666666666665, image_description: an image of a man with a mustache and a face
start_time: 2066.6666666666665, end_time: 2400.0, image_description: an image of a man with a mustache in an office
start_time: 2400.0, end_time: 2733.3333333333335, image_description: an image of two people standing in front of a building
start_time: 2733.3333333333335, end_time: 4116.666666666666, image_description: an image of two people standing in front of a house
start_time: 4116.666666666666, end_time: 4450.0, image_description: an image of two people standing in front of a house
st