Skip to content

An automatic parser of Amazon Transcribe jobs - of podcast episodes - which outputs to HTML.

Notifications You must be signed in to change notification settings

crypto-jeronimo/aws-transcription-parser

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 

Repository files navigation

Automated parsing of Amazon-Transcribe-annotated episode transcripts

This is an automatic parser of Amazon Transcribe jobs - of podcast episodes - which outputs to HTML.

Compatible with Python 2.7 and Python 3+.

Just populate an input file, called input.txt, where each line is semi-colon-separated and contains the name of the Amazon Transcribe output JSON file, and a comma-separated, ordered list of speakers.

For example, the following input.txt file will result in the iterative processing of files episode_1.json, episode_2.json and episode_3.json. speaker_1, speaker_2 and speaker_3 will replace the automatically generated placeholders spk_0, spk_1 and spk_2. The output HTML files will be named after the jobName from each input JSON file.

episode_1.json;speaker_1,speaker_2
episode_2.json;speaker_2,speaker_3
episode_3.json;speaker_1,speaker_2,speaker_3

Once you've created your input.txt file and moved it in the same directory as the process_aws_output.py file, you simply need to run the script with Python:

$ python process_aws_output.py
SUCCESS!

A SUCCESS! message is expected, signifying that all HTML outputs have been stored in the same directory.

Please, don't hesitate to ask questions or request changes or improvements via the Issues section.

If you're feeling generous, donations are welcome:

BTC: 1QFNgTV3GQby8uv3mXwLKBHAgKUEenSREd

ETH: 0xa7350d9fb3c6193759b587bb984f0dfe3568c8ed

LTC: LW3SNJ61CXUfRQTpehpDfV7vv1iVdLh9En

ADA: DdzFFzCqrhtBbS7o5LQ3u1ZxFVz3Q6b2bQ86FEYanf6UsRgK6D3So4grpZEHPXcitQWEuRfnAA7jzi3xmj9Md6kng2UiVn4QLxEsAefK

BCH: 1QFNgTV3GQby8uv3mXwLKBHAgKUEenSREd

About

An automatic parser of Amazon Transcribe jobs - of podcast episodes - which outputs to HTML.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages