No description, website, or topics provided.
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
lib/Text/SRT
share
t
CHANGES
Makefile.PL
README
srt2xml
srtalign
subalign

README

Some scripts for processing movie subtitles


srt2xml    .... convert subtitles in srt-format to simple OPUS-style XML 
                format (does sentence splitting and tokenization)
                (uses nonbreaking_prefix.* files for tokenization
                 which are just copies from the files distributed with 
                 the Europarl corpus version 3)

		Note that subtitle files are usually DOS files and 
		srt2xml expects UNIX-style text files! 
		--> use dos2unix before piping the text into srt2xml.pl


srtalign... ... align srt-files which have been converted to XML using 
		srt2xml (requires time-stamps!)
		For more information on using this script and its options:
		Look at the header of the script!

share/dic ..... This directory contains word alignment dictionaries
		obtained by aligning the OpenSubtitles corpus from OPUS
		These dictionaries can be used to improve sentence 
		alignment by synchronizing time stamps with the help of
		anchor points found by matching dictionary entries with
		word pairs in the subtitle pair