In [1]:
from IPython.display import Audio, Video

from src.utils import (
    download_youtube_video,
    process_video,
    generate_descriptions,
    generate_narration,
    generate_audio,
    overlay_audio_on_video,
    get_video_length,
    generate_subtitles,
    add_subtitles_to_video
)

from src.prompts import (
    BRAZILLIAN_NARRATIVE, 
    SNAPSHOT_ANALYSIS,
    JAPANESE_NARRATIVE
)

In [2]:
youtube_url = 'https://www.youtube.com/watch?v=nEE2ZWwmqBc&ab_channel=Theguywiththelockface'  # replace with your YouTube video URL
video_path = download_youtube_video(youtube_url, 'videos')
base64Frames = process_video(video_path)
video_length = get_video_length(video_path)

print(len(base64Frames), "frames read.")
print("Video length:", video_length, "seconds.")

550 frames read.
Video length: 18.85 seconds.


In [3]:
Video(video_path)

## Prompts

See `src/prompts.py` for more. 

In [4]:
print(SNAPSHOT_ANALYSIS)

Analyze this standalone image, detailing the football action it captures. Describe player movements, ball location, and immediate context, like a tackle, pass, or goal attempt, as if this is the only moment you're aware of. Be structured and concise, and avoid referencing past or future events. Max 50 words.


In [5]:
print(BRAZILLIAN_NARRATIVE)

These are frames of a video. Create a short voiceover script in the style of a super excited brazilian sports narrator who is narrating his favorite match. Your output must be in english. You must only output the narration since eall the text output will be synthesized to audio. When the ball goes into the net, you must scream GOL either once or multiple times.


In [6]:
descriptions = generate_descriptions(
    base64Frames,
    frame_sampling_rate=30,
    prompt=SNAPSHOT_ANALYSIS,
)
print(descriptions)

Frame 0: The image shows a football match in progress with a player in blue in possession of the ball, potentially looking to make a forward pass or dribble. No immediate contact from opponents suggests open play, not a specific set piece like a tackle or goal attempt.
Frame 30: The image captures a match in progress with players in blue defending their half. A player in yellow appears in possession, possibly looking to initiate an attack. No immediate challenges are visible, suggesting a build-up play rather than a direct offensive or defensive action. The ball is central.
Frame 60: The image shows a football match with one team in yellow and another in blue. A player in yellow is in possession near the center circle, facing the blue team's half. No immediate challenge is visible; it may be a moment before a pass or move forward. Ball is at their feet.
Frame 90: A player in yellow is in possession, dribbling forward. Three opponents in blue are nearby, with one approaching to challeng

In [7]:
voiceover_narration = generate_narration(
    descriptions,
    prompt=BRAZILLIAN_NARRATIVE,
    max_tokens=100, # change this to control the length of the narration
)
print(voiceover_narration)

Ladies and gentlemen, hold onto your seats because we are witnessing an absolutely thrilling football match here! Our boys in blue have possession and look at that, what a smooth pass forward, oh the skill, the finesse! The tension is palpable as the yellow team picks up the pace. They're charging forward—what an attack build-up, my heart is racing!

We're back at the center circle, could this be the moment we've all been waiting for? And there he goes,


In [9]:
audio = generate_audio(voiceover_narration, model='tts-1-hd', voice='echo', output_path='outputs/audio/narration.mp3')
Audio(audio)

In [10]:
overlay_audio_on_video(video_path, audio, 'outputs/output.mp4')

Moviepy - Building video outputs/output.mp4.
MoviePy - Writing audio in outputTEMP_MPY_wvf_snd.mp3


                                                                    

MoviePy - Done.
Moviepy - Writing video outputs/output.mp4



                                                               

Moviepy - Done !
Moviepy - video ready outputs/output.mp4


'outputs/output.mp4'

In [11]:
Video('outputs/output.mp4')

## Other Languages

Lets try other languages for fun! Unfortunately Japanese doesn't work as well and I need to hack my way through this.

In [12]:
print(JAPANESE_NARRATIVE)

These are frames of a video. 超ワクワクする日本のスポーツ実況アナウンサー風に、彼のお気に入りの試合を実況する短いボイスオーバー台本を作成してください。出力は日本語でなければなりません。テキスト出力はすべて音声合成されるため、実況のみを出力してください。ボールがネットに入ったときは、「ゴール」と一回または複数回叫ぶ必要があります。


In [13]:
japanese_voiceover_narration = generate_narration(
    descriptions,
    prompt=JAPANESE_NARRATIVE,
    max_tokens=100, # for some reason japanese narration cuts off early
)
print(japanese_voiceover_narration)

実況開始！青ユニフォームの選手がボールを持って前を見据えています。いい動きかもしれませんね！ここから攻撃が始まるか――そして黄色の選手、ボールを持ってゆっくりと前進しています。どうやらチャンスメイ


In [31]:
# may require a few tries to get the narration/audio to generate correctly

JAPANESE_NARRATIVE_2 = JAPANESE_NARRATIVE + """
Output the full narration in 日本語 first, then add a --- then after that add the same naration in english. Both narrations should be the same length.
Ignore frames that may not be relevant. Below is an example of what the output should look like:
[日本語の実況]
---
[English Narration]
EOF
"""

voiceover_narrations = generate_narration(
    descriptions,
    prompt=JAPANESE_NARRATIVE_2,
    max_tokens=300, 
)

japanese_narration = voiceover_narrations.split("---")[0]
english_narration = voiceover_narrations.split("---")[1]

print(japanese_narration)
print(english_narration)

この瞬間はまさにサッカーの魔法だ！青いユニフォームの選手がボールをコントロールし、相手の黄色いディフェンダーを華麗にかわしていく。そして、そこからのシュート！ゴール！ゴール！これは見事な得点ですね。チームは歓喜に包まれ、観客も大興奮です！これぞフットボール、これぞスポーツの醍醐味！


This moment is pure football magic! The player in blue controls the ball, beautifully bypassing the yellow defenders. And there, the shot! Goal! Goal! What a splendid score. The team is engulfed in joy, and the audience is thrilled! This is football, this is the thrill of sports!


In [32]:
japanese_audio = generate_audio(japanese_narration, model='tts-1-hd', voice='echo', output_path='outputs/audio/japanese_narration.mp3')
Audio(japanese_audio)

In [33]:
overlay_audio_on_video(video_path, japanese_audio, 'outputs/japanese_output.mp4')

Moviepy - Building video outputs/japanese_output.mp4.
MoviePy - Writing audio in japanese_outputTEMP_MPY_wvf_snd.mp3


                                                                    

MoviePy - Done.
Moviepy - Writing video outputs/japanese_output.mp4



                                                               

Moviepy - Done !
Moviepy - video ready outputs/japanese_output.mp4




'outputs/japanese_output.mp4'

In [34]:
Video('outputs/japanese_output.mp4')

## Subtitles

In [35]:
srt_file = generate_subtitles(japanese_narration, video_length, output_path="outputs/subtitles.srt", language="ja-JP")
add_subtitles_to_video("outputs/japanese_output.mp4", srt_file, "outputs/japanese_subtitles.mp4")

ffmpeg version N-109758-gbdc76f467f Copyright (c) 2000-2023 the FFmpeg developers
  built with Apple clang version 13.1.6 (clang-1316.0.21.2)
  configuration: --prefix=/opt/homebrew/Cellar/ffmpeg/HEAD-bdc76f4_4 --enable-shared --enable-pthreads --enable-version3 --cc=clang --host-cflags= --host-ldflags= --enable-ffplay --enable-gnutls --enable-gpl --enable-libaom --enable-libaribb24 --enable-libbluray --enable-libdav1d --enable-libmp3lame --enable-libopus --enable-librav1e --enable-librist --enable-librubberband --enable-libsnappy --enable-libsrt --enable-libsvtav1 --enable-libtesseract --enable-libtheora --enable-libvidstab --enable-libvmaf --enable-libvorbis --enable-libvpx --enable-libwebp --enable-libx264 --enable-libx265 --enable-libxml2 --enable-libxvid --enable-lzma --enable-libfontconfig --enable-libfreetype --enable-frei0r --enable-libass --enable-libopencore-amrnb --enable-libopencore-amrwb --enable-libopenjpeg --enable-libspeex --enable-libsoxr --enable-libzmq --enable-libzi

In [39]:
Video('outputs/japanese_subtitles.mp4')

In [42]:
srt_file = generate_subtitles(voiceover_narration, video_length, output_path="outputs/eng_subtitles.srt", language="en-US")
add_subtitles_to_video("outputs/output.mp4", srt_file, "outputs/english_subtitles.mp4")

ffmpeg version N-109758-gbdc76f467f Copyright (c) 2000-2023 the FFmpeg developers
  built with Apple clang version 13.1.6 (clang-1316.0.21.2)
  configuration: --prefix=/opt/homebrew/Cellar/ffmpeg/HEAD-bdc76f4_4 --enable-shared --enable-pthreads --enable-version3 --cc=clang --host-cflags= --host-ldflags= --enable-ffplay --enable-gnutls --enable-gpl --enable-libaom --enable-libaribb24 --enable-libbluray --enable-libdav1d --enable-libmp3lame --enable-libopus --enable-librav1e --enable-librist --enable-librubberband --enable-libsnappy --enable-libsrt --enable-libsvtav1 --enable-libtesseract --enable-libtheora --enable-libvidstab --enable-libvmaf --enable-libvorbis --enable-libvpx --enable-libwebp --enable-libx264 --enable-libx265 --enable-libxml2 --enable-libxvid --enable-lzma --enable-libfontconfig --enable-libfreetype --enable-frei0r --enable-libass --enable-libopencore-amrnb --enable-libopencore-amrwb --enable-libopenjpeg --enable-libspeex --enable-libsoxr --enable-libzmq --enable-libzi

In [43]:
Video('outputs/english_subtitles.mp4')