UPDATE 2023: This project is completely outdated. Thanks to the progress in Deep Learning, we now have new powerful tools!

My new unpublished project works exclusively with YouTube URLs and produces a final video with a new English audio track. The process involves the following steps:

Downloading the video using yt_dlp.
Extracting the audio to WAV format using ffmpeg.
Utilizing WhisperX for text recognition.
Saving the recognized text as .STR and a special .TXT format for further processing.
Correcting errors in the text using chatGPT with a specific prompt.
Translating the text to English using chatGPT with a dedicated prompt based on the video's topic context.
Generating WAV files for each subtitle block using TTS (TorToiSe).
Processing all these WAV files with WhisperX to compare the recognized text with the subtitle text.
Concatenating all WAV files into a FINAL.WAV file, arranging them based on target time labels on the timeline.
Rendering the FINAL.MP4 file using moviepy, combining the original video with the new audio track and background music.
Generating XX thumbnails to choose from, using random video frames and adding text using Pillow.

Therefore, the input for this project is a URL, and the output is the FINAL.MP4 file.

Therefore, the text below is merely historical and can be disregarded

Audio track generator from subtitles on YouTube

The main use case

You have video with non-English audio, but you have English subtitles (or going to prepare). Now you are ready to generate new English audio-track for this video

Set your parameters in 'Setting' (like video ID from URL) and run this jupyter notebook step-by-step skipping optional cells (like DeepSpeech testing for generated audio)

Download captions for video id, save forever to local file. (Delete pickle-file and repeat this step if you need to re-download).
Contatenate all texts in captions and then clean it before using sentences tokenizer.
Split text to sentences using NLTK (Natural Language Toolkit).
Synthesize WAV files for each sentence by Mozilla TTS.
Compose new captions from these sentences. Arrange start point of each audio segment by matching with text in original subtitles. Visualize subtitles before and after arrangement by heat-map image. Rows = minutes. Cols = seconds. Numbers in cells = numbers of audio segment.
Concatenate all audio segments in final.wav

Example of subtitles arrangement (another video, where final text was edited and simplified).

Before:

After:

If correspondend text in original subtitles not found then segment will be moved right while free space found. So you should consider if you need to re-calibrate in video-editor (for cases when too many manual changes in text was done).

It is possible that English audio file will be longer then the original sound-track (you can notice this if there are no changes after arrangement). You can ignore it and fix in editor or try faster TTS model.

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
img		img
.gitignore		.gitignore
README.md		README.md
sub2audio.ipynb		sub2audio.ipynb
sub2audio.py		sub2audio.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

img

img

.gitignore

.gitignore

README.md

README.md

sub2audio.ipynb

sub2audio.ipynb

sub2audio.py

sub2audio.py

Repository files navigation

UPDATE 2023: This project is completely outdated. Thanks to the progress in Deep Learning, we now have new powerful tools!

Therefore, the text below is merely historical and can be disregarded

Audio track generator from subtitles on YouTube

The main use case

You have video with non-English audio, but you have English subtitles (or going to prepare). Now you are ready to generate new English audio-track for this video

Set your parameters in 'Setting' (like video ID from URL) and run this jupyter notebook step-by-step skipping optional cells (like DeepSpeech testing for generated audio)

About

Releases

Packages

Languages

KMiNT21/subtitles-to-audio-track

Folders and files

Latest commit

History

Repository files navigation

UPDATE 2023: This project is completely outdated. Thanks to the progress in Deep Learning, we now have new powerful tools!

Therefore, the text below is merely historical and can be disregarded

Audio track generator from subtitles on YouTube

The main use case

You have video with non-English audio, but you have English subtitles (or going to prepare). Now you are ready to generate new English audio-track for this video

Set your parameters in 'Setting' (like video ID from URL) and run this jupyter notebook step-by-step skipping optional cells (like DeepSpeech testing for generated audio)

About

Resources

Stars

Watchers

Forks

Languages