Subtitles Extraction

Extract key frames from Amanpreet Walia.

This project is used to extract subtitles from the video. First, the key frames is extracted from the video, and then the subtitle area of the frame picture is cropped, and the text is recognized by the OCR.

Getting Started

Install following dependences

OpenCV-Python (used for basic video processing e.g. read-frame-stream, crop, frame-diff, processing-gui)
PyTesseract (only use its image_to_string(img, lang))
NumPy (smooth filter) (find it here)
SciPy (signal.argrelextrema)
StrsimPy (NormalizedLevenshtein string similiarity)
Matplotlib (draw frame differences stem plot)
ProgressBar

Install missing dependences first using pip install -r requirements.txt

Install Tesseract OCR

Download and (try) run it, select language support in tesseract --list-lang if you want.

Run

λ python extract_subtitles.py <videopath>

./extract_subtitles.py -crop '0(907,940)[101,77]' -lang eng a.flv ; ./timeline_ops.py merge  frames/a.flv/timeline.txt 0|./timeline_ops.py to-lrc 1 srt >a.srt
# srt Audacity ops referTo my GH:mkey/yy/atag2srt
#and GH:tv/arecog.sh , GH:MontagePy/_1c
# export OPENCV_OPENCL_DEVICE='Clover' # if VideoReader fails

./extract_subtitles.py -crop '0(883,484)[134,120]' -lang index  --chunk-size 600 --crop-debug a.flv

./montage1_c.py -font /usr/share/fonts/wenquanyi/*/wqy-zenhei.ttc a.mp4 --subtitle b.srt -font-size 25 --mon-background '#6e6c39' --subtitle-placeholder 好 -spacing :5,3 -key-color '#000000' --key-thres 3

License

This project is licensed under the MIT License - see LICENSE for details

Name		Name	Last commit message	Last commit date
Latest commit History 44 Commits
deprecated		deprecated
libs		libs
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
extract_subtitles.py		extract_subtitles.py
gui_crop_select.py		gui_crop_select.py
requirements.txt		requirements.txt
timeline_ops.py		timeline_ops.py
vcat_subtitle_imgs.py		vcat_subtitle_imgs.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Subtitles Extraction

Getting Started

Install following dependences

Install Tesseract OCR

Run

License

About

Releases

Packages

Languages

License

duangsuse-valid-projects/extract-subtitles

Folders and files

Latest commit

History

Repository files navigation

Subtitles Extraction

Getting Started

Install following dependences

Install Tesseract OCR

Run

License

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages