Edyson - a tool for exploring audio quickly

Edyson is a human-in-the-loop framework for exploring, and to some extent annotating, large amounts of audio efficiently. It uses the openSMILE feature extration tool and unsupervised machine learning to help the user browse and listen to their data. The project is a prototypical work in progress and should by no means be viewed as a finished product.

ONLINE DEMO (use Chrome)

VIDEO

Edyson is built on the idea of Temporally Disassembled Audio, as such it segments an audio file into short snippets of equal length and distributes them on a 2D map according to some chosen feature. See publications below for further information on this.

Getting started

Clone the repository and install the requirements. Tested on python 3.6:

pip install -r requirements.txt

Install openSMILE, name the root folder "opensmile" and put it inside the Edyson directory (alternatively change the pointers in python/audio_processing.py).

Start the flask server from inside /web:

python app.py

Navigate to localhost:3134, preferably using Chrome. You might need to zoom out ("cmd/ctrl" + "-"") and reload the page for everything to fit.

Before processing audio of your own I recommend testing out the included examples, choose one from the header tab. See functionality and controls here.

Docker

Alternatively, to run the app using Docker, first install Docker and clone this repo (git clone https://github.com/perfall/Edyson.git). Then build your image by running docker build -t edyson . from the root of the cloned repo. Then you can start a container with your image by running docker run -d --name edyson -p 3134:3134 edyson.

Process your audio

Choose an audio file, wav or mp3. Select segment size (duration of snippet), step size (the duration between two consecutive snippets), and features (currently only MFCC and mean-normalized MFCC). Click process audio.

The time it takes for processing is mostly dependent on how many resulting datapoints are generated. As an example, an hour of audio with a segment size and step size of 1000ms (3600 datapoints) currently takes around 3 minutes on a maxed out 2016 MacBook Pro.

The processed audio can be stored to avoid having to process it a second time. Simply take a note of the ID in the end of the url, alternatively add it to the "examples.txt" file for convenient access.

Publications

Human-in-the-Loop Efficiency Analysis for Binary Classification in Edyson

Speech technology for swedish: Current impact areas for applications and edyson, an innovative tool for accessing speech data

How to annotate 100 hours in 45 minutes (good start)

Towards fast browsing of found audio data: 11 presidents

Bringing Order to Chaos: A Non-Sequential Approach for Browsing Large Sets of Found Audio Data (early version of the method)

Name		Name	Last commit message	Last commit date
Latest commit History 64 Commits
docs		docs
misc		misc
python		python
web		web
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt
uwsgi.ini		uwsgi.ini

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

docs

docs

misc

misc

python

python

web

web

.gitignore

.gitignore

Dockerfile

Dockerfile

LICENSE

LICENSE

README.md

README.md

requirements.txt

requirements.txt

uwsgi.ini

uwsgi.ini

Repository files navigation

Edyson - a tool for exploring audio quickly

Getting started

Docker

Process your audio

Publications

Screenshot

About

Releases

Packages

Contributors 2

Languages

License

perfall/Edyson

Folders and files

Latest commit

History

Repository files navigation

Edyson - a tool for exploring audio quickly

Getting started

Docker

Process your audio

Publications

Screenshot

About

Resources

License

Stars

Watchers

Forks

Languages