This is python implementation of the project, described in Automatic detection of multi-speaker fragments with high time resolution. The aim of the project is to detect fragments in audio files where there are more than one speakers.
First step: compute spectrogram from input audio --> specs/SPEC.jpeg
python2 SpecCreator.py [--audio-path=AUDIO_PATH] [--dir=DIRECTORY_WITH_AUDIOS]
Second step: process spectrogram by CNN --> results/RESULTS.json
pythn2 VoiceCounter.py [--spec-path=SPECTROGRAM_PATH] [--dir=DIRECTORY_WITH_SPECTROGRAMS]
Output will be in json format with probabilities of more than one speakers talking in the each frame.
For full information about params see
python2 SpecCreator.py --help
and python2 VoiceCounter.py --help
- Linux
- python2
- numpy, scipy, scikit-image, matplotlib, mxnet, tqdm
The output json file may need different processing depending on current aims. The code to obtain the results from the main paper, section 2.3, will be provided later (write the authors, if needed).
When using this code, please cite this paper
Authors: Belyaev Andrey, Kazimirova Evdokia
Support: Neurodatalab LLC, USA
Contact: e.kazimirova@neurodatalab.com