College de France automated audio transcripts

Worker and elasticsearch for automated College de France audio transcripts

Worker

The worker periodically polls datastore for scheduled transcriptions, if any it downloads the mp3 files from the College de France website, converts them to FLAC, stores them in a Google Storage bucket, sends a Speech to Text request, stores the transcription in the same storage bucket, and index the transcripts in an elasticsearch instance running in the same Kubernetes cluster.

A periodic job also runs to compute overall statistics about the transcriptions due to limitations of the datastore in this regard.

Elasticsearch

Elasticsearch runs as a single (thus "yellow") master&data node in a Kubernetes cluster, it does full text indexing of the transcripts using the French analyzer.

Name		Name	Last commit message	Last commit date
Latest commit History 106 Commits
data		data
db		db
errorreport		errorreport
health		health
indexer		indexer
money		money
pick		pick
stats		stats
transcribe		transcribe
upload		upload
worker		worker
.gitignore		.gitignore
.travis.yml		.travis.yml
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
cdf-deploy.yaml		cdf-deploy.yaml
main.go		main.go

License

attwad/cdf

Folders and files

Latest commit

History

Repository files navigation

College de France automated audio transcripts

Worker

Elasticsearch

About

Topics

Resources

License

Stars

Watchers

Forks

Languages