Skip to content

Automatic detection of multi-speaker fragments with high time resolution

License

Notifications You must be signed in to change notification settings

NeurodataLab/multi-speaker

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Automatic detection of multi-speaker fragments with high time resolution

This is python implementation of the project, described in Automatic detection of multi-speaker fragments with high time resolution. The aim of the project is to detect fragments in audio files where there are more than one speakers.

How to use

First step: compute spectrogram from input audio --> specs/SPEC.jpeg
python2 SpecCreator.py [--audio-path=AUDIO_PATH] [--dir=DIRECTORY_WITH_AUDIOS]

Second step: process spectrogram by CNN --> results/RESULTS.json
pythn2 VoiceCounter.py [--spec-path=SPECTROGRAM_PATH] [--dir=DIRECTORY_WITH_SPECTROGRAMS]

Output will be in json format with probabilities of more than one speakers talking in the each frame.

For full information about params see
python2 SpecCreator.py --help and python2 VoiceCounter.py --help

Requirements

  • Linux
  • python2
  • numpy, scipy, scikit-image, matplotlib, mxnet, tqdm

Future updates

The output json file may need different processing depending on current aims. The code to obtain the results from the main paper, section 2.3, will be provided later (write the authors, if needed).

Authors and citation

When using this code, please cite this paper

Authors: Belyaev Andrey, Kazimirova Evdokia

Support: Neurodatalab LLC, USA

Contact: e.kazimirova@neurodatalab.com

About

Automatic detection of multi-speaker fragments with high time resolution

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages