Skip to content

thesmith/podiki

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

podiki

There are hundreds of thousands of podcasts out there, each with hundreds of episodes chock full of really useful information and awesome songs. Problem is, that's all locked up behind a lifetime's worth of audio, unindexed and unsearchable.

Podiki detects songs and transcribes speech in podcasts, making it available to be searched, linked up, indexed and updated.

There are two parts of Podiki: the processing of podcasts and a wiki.

Podcast Processing

http://github.com/thesmith/podiki

Submitted podcasts' new episodes are crawled and all the speech and song data extracted. As users correct the text this creates a feedback loop that updates the linguistic model used to transcribe future episodes.

The song information is determined using EchoPrint and the speech detection and transcription uses the Sphinx4 library.

The background processing is written in Scala and is backed by Redis (atm).

Wiki

http://github.com/thesmith/podiki-web

The wiki lets users submit podcasts for processing and edit the songs and text and add additional links to things that are being talked about in the podcast.

The wiki web-app is also built in Scala using Play and tracks are linked to using the Spotify API.

TODO

It is better to be done than anywhere near perfect. This is only just done.

Currently the wiki only allows the text and a few other bits to be edited and the feedback loop to the linguistic model isn't working.

About

Podcast Wiki hack for MusicHackDay

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published