AudioTagger

This project provides components for predicting audio input and showing its corresponding spectrogram representation. The repository contains a backend and a sample frontend Kivy - App which periodically calls the backend REST endpoints for prediction and spectrogram updates.

Packages

/server
directory containing all backend components
/viewer
directory with a sample python GUI

Installation

Backend

Use Python 3 and Anaconda in order to get the code running for guarantee!
Version used during development: 3.6.7

Create environment conda env create -f environment.yml
Install package python setup.py develop

GUI

We provide a simple GUI written in Kivy. To fullfil the requirements on Ubuntu, please install these packages.

sudo apt-get install -y \
    build-essential \
    git \
    libsdl2-dev \
    libsdl2-image-dev \
    libsdl2-mixer-dev \
    libsdl2-ttf-dev \
    libportmidi-dev \
    libswscale-dev \
    libavformat-dev \
    libavcodec-dev \
    zlib1g-dev

Architectural backend design

The figure below should give a basic understanding of the information flow and the components involved in the running backend system:

Configuration file

The file config.py includes some options which can be set before backend startup. Before the backend can be started, please set the variable PROJECT_ROOT to the absolute path of the projects root directory.

Usage

The current predictions and spectrograms can be accessed via a REST interface. Following command starts the backend:

python server/webserver.py

REST interface

In order to guarantee independence of programming languages, all output can be accessed by calling URL endpoints.

Get current spectrogram


Http-Method	`GET`
Response Content-Type	`JPEG`
URL	`http://127.0.0.1:5000/live_visual`
Return	a JPEG from the current visualization (e.g. spectrogram)

There is an additional endpoint to display the same content in the browser:

http://127.0.0.1:5000/live_visual_browser

Example output:

Get current prediction


Http-Method	`GET`
Response Content-Type	`JSON`
URL	`http://127.0.0.1:5000/live_pred`
Return	2D array of most current class probabilities with respect to the currently selected predictor

Example response: [["Acoustic_guitar", 0.0006955251446925104, 0], ["Applause", 0.0032770668622106314, 1], ...]
1. element: category name
2. element: probability of prediction for this class 3. element: positional argument (can be used to if special order of displayed classes is desired)

Get available audio files


Http-Method	`GET`
Response Content-Type	`JSON`
URL	`http://127.0.0.1:5000/audiofile_list`
Return	the available audio files as a list of json objects

Example response: [{"id": 0, "displayname": "Trumpets"}, {"id": 1, "displayname": "Song1"}, {"id": 2, "displayname": "Song2"}, ...]

Get available predictors


Http-Method	`GET`
Response Content-Type	`JSON`
URL	`http://127.0.0.1:5000/pred_list`
Return	the available prediction models as a list of json objects

Example response: [{"id": 0, "displayname": "DCASEPredictor", "classes": "41", "description": "sample description for dcase"}, {"id": 1, "displayname": "SportsPredictor", "classes": "3", "description": "sample description for detecting sports"}, ...]

Change audio input source and predictor

The audio tagger backend implements another endpoint to change the audio source as well as the currently active prediction model on the fly.


Http-Method	`POST`
Request Body	`JSON`
URL	`http://127.0.0.1:5000/settings`

Example request body: {'isLive': 1, 'file': 0, 'predictor': 1} where
isLive: 1 -> true, 0 -> false
file: id of the selected file
predictor: id of the predictor

Adding predictors

One can add new predictors by editing the CSV-file predictors.csv.

Steps for building predictor wrapper

The next few steps show how to integrate a predictor into the backend system of the audio tagger:

Extend predictors.csv with the properties of the new predictor
- Important note: Make sure that the given path in column predictorClassPath correctly identifies the path to the wrapper class. Otherwise, the backend cannot find the new predictor. There are already 2 predictors included. Have a look at this.
Implement a predictor such that it inherits from PredictorContract (see here).
Inform the manager once a new prediction has made with the function onNewPredictionCalculated(probabilities)
- parameter probabilities: [["class1", 0.0006955251446925104, 0], ["class2", 0.0032770668622106314, 1], ...]
  1. element: category name
  2. element: probability of prediction for this class
  3. element: positional argument (can be used to if special order of displayed classes is desired)

Note:

Consumers should rely on the global timing variable tGroundTruth which is provided by AudioTaggerManager. This counter variable should guarantee synchronization among consumers.
For further information read the corresponding documentation and have a look at the existing predictors (see here).

Adding audio files

One can equip the backend with new selectable WAV files by editing the CSV-file sources.csv.
The csv-file is of the following form:

id;displayname;path  
0;ExampleFile1;pathToWAVfile/file1.wav  
1;ExampleFile2;pathToWAVfile/file2.wav  
2;ExampleFile3;pathToWAVfile/file3.wav

Example GUI

The /viewer directory contains a sample GUI for the audio tagger backend. The app is based on the Python framework Kivy. GUI startup can be easily performed with the following command:

python viewer/startup.py

Important: The GUI requires a started instance of server/webserver.py to start up.

Documentation

If you want to dive deeper into the source code of audio tagger backend, please have a look at the detailed documentation.

Support

Please contact the Institute for Computational Perception at Johannes Kepler University in Linz for questions regarding usage and code.

Name		Name	Last commit message	Last commit date
Latest commit History 96 Commits
docs		docs
img		img
server		server
viewer		viewer
.gitignore		.gitignore
README.md		README.md
environment.yaml		environment.yaml
setup.py		setup.py

CPJKU/audio-tagger

Folders and files

Latest commit

History

Repository files navigation