- ruby 1.9.x; it's a must
- gnu/linux or mac osx; windows complaints abuot encoding and I don't know how to fix it; no incentive, either
- in case of problems, blame yourself and read this file again.
ruby scriptname.rb inputfolder outputfolder
nota bene, aka, N.B.
This s a quick-and-dirty solution for a specific use caes. So lots of things are hardcoded and global variables all over the place. Yes, it's ugly. But it GTD!
Also the following assumptions are made:
- input files are encoded in utf-8, if not, please re-encode your file. On windowos, use notpad++
- output files are encoded in utf-8 with BOM, because excel on windows need the BOM header to recognize that a csv file is encoded in utf-8; please blame Microsoft for this
Want a flexible solution? Fork and DIY!
internal logic sum up
See the input file as text stream.
Origianl transcript and translation can be in separate lines, but they share the same timestamp.
Parse the line, get the timestamp, lang code, and transcript text, then make it a hash:
sort the hash by timestamp. This turn the hash to an array.
write the array to csv.