Skip to content

Extract Subtitles From Video 视频字幕提取 帧间差分法识别关键帧 OCR识别

License

Notifications You must be signed in to change notification settings

duangsuse-valid-projects/extract-subtitles

 
 

Repository files navigation

Subtitles Extraction

Extract key frames from Amanpreet Walia.

This project is used to extract subtitles from the video. First, the key frames is extracted from the video, and then the subtitle area of the frame picture is cropped, and the text is recognized by the OCR.

Getting Started

Install following dependences

  • OpenCV-Python (used for basic video processing e.g. read-frame-stream, crop, frame-diff, processing-gui)
  • PyTesseract (only use its image_to_string(img, lang))
  • NumPy (smooth filter) (find it here)
  • SciPy (signal.argrelextrema)
  • StrsimPy (NormalizedLevenshtein string similiarity)
  • Matplotlib (draw frame differences stem plot)
  • ProgressBar

Install missing dependences first using pip install -r requirements.txt

Install Tesseract OCR

Download and (try) run it, select language support in tesseract --list-lang if you want.

Run

λ python extract_subtitles.py <videopath>

./extract_subtitles.py -crop '0(907,940)[101,77]' -lang eng a.flv ; ./timeline_ops.py merge  frames/a.flv/timeline.txt 0|./timeline_ops.py to-lrc 1 srt >a.srt
# srt Audacity ops referTo my GH:mkey/yy/atag2srt
#and GH:tv/arecog.sh , GH:MontagePy/_1c
# export OPENCV_OPENCL_DEVICE='Clover' # if VideoReader fails

./extract_subtitles.py -crop '0(883,484)[134,120]' -lang index  --chunk-size 600 --crop-debug a.flv

./montage1_c.py -font /usr/share/fonts/wenquanyi/*/wqy-zenhei.ttc a.mp4 --subtitle b.srt -font-size 25 --mon-background '#6e6c39' --subtitle-placeholder 好 -spacing :5,3 -key-color '#000000' --key-thres 3

License

This project is licensed under the MIT License - see LICENSE for details

About

Extract Subtitles From Video 视频字幕提取 帧间差分法识别关键帧 OCR识别

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Languages

  • Python 100.0%