RE: VERB is speaker diarization system, it allows the user to send/record audio of a conversation and receive timestamps of who spoke when
RE:VERB is our final project in Magshimim, and consists of a web client and a server.
-
The client can record audio and show the the timestamp results graphically
-
The server can be used with many other clients with the simple REST API it has.
- Vue.js - The front end framework used
- Wavesurfer.js - A library for waveform visualization
-
Pytorch - library for deep learning with python that has great support for GPUs with CUDA
-
Express.js - Node.js web server framework
The project contains the server and the web client(a CLI client also exists for debug purposes).
the server is located at ./server
and the web client is located at ./client/website
.
The model alongside the scripts for downloading, training and the weights from our training is located at ./server/speech_diarization/model
we used Docker to create a cross-platform environment to run the server on.
The server is made up of:
- a container for the web server
- a container for the diarization process
- a container for a redis database that will allow the others to communicate
docker compose will run and manage all 3 at once
Docker and docker-compose need to be installed in order to build and run the server, all the rest will be taken care of.
cd server
docker-compose up
This will run all 3 containers and install dependencies.
If you make a change in the server, use
docker-compose up --build
to rebuild.
sending a HTTP POST request with an audio file to the server at
http://localhost:1337/upload
(default port and url) will return a JSON file with the timestamps in milliseconds.{"0": [[40, 120], [3060, 3460], [3480, 3560]], "1": [[1260, 1660], [1680, 1960]]}
The client needs npm or yarn to be installed, more info about the client can be found here.
to install:
cd client/website
npm install
afterwards you can use
npm run serve
to run a development server
- Ofir Naccache - ofirnaccache
- Matan Yesharim - Tralfazz
This project is licensed under the MIT License - see the LICENSE.md file for details
-
The diarization algorithm is an implementation of this research, we also used their implementation of the spectral clustering
-
We took inspiration and some code from Harry volek's implementation of a different but similar problem - Speaker Verification
-
We had problems with training on the AMI corpus so we used the TIMIT corpus for the model provided.
-
We plan to train again on the VoxCeleb 1 and 2 datasets which contain a lot more data and hopefully improve feature extraction
-
We want to add integration with a speech-to-text service and transcribe the created segments