Skip to content


Repository files navigation


Representation learning package using side information, system for subword modeling for Zeroresource challenge.


Build Representation for speech frames based on side information. Composed of different modules :


Installation of the package

Using conda

To install the ABnet3 package, you can use Anaconda, and either create a conda environment:

conda env create --name abnet3 python=3.6 -f environment.yml

or use a conda environment you already have with python 3 : conda env update -f environment.yml

To install with GPU support (replace cuda75 with your version of cuda)

conda install  pytorch=0.2 cuda75 -c pytorch

Using pip

  • install the version 0.2.0 of pytorch for your hardware (

  • install the pip packages : pip install -r requirements.txt Once all the necessary packages are installed, simply launch:

Run abnet3 installation

python build && python install

If you want to work on ABnet3 and develop your own modules, instead of:

python install

you can launch:

python develop

Tensorboard vizualisation

The package tensorboardX needs to be installed to train the model: pip install tensorboardX.

The package will save train / dev loss during training. To vizualise them :

  • Install tensorboard (conda install tensorflow tensorflow-tensorboard)

  • run tensorboard --logdir path/to/logdir. The default logdir is ./run in the current directory.


You can see examples for running the gridsearch and replicating our results in the repository

The cli documentation is here


The package comes with a unit-tests suit. To run it, first install pytest on your Python environment:

pip install pytest
pytest test/


.. [1] Riad, R., Dancette, C., Karadayi, J., Zeghidour, N., Schatz, T., Dupoux, E.
       *Sampling strategies in Siamese Networks for unsupervised speech representation learning.*
       In Nineteenth Annual Conference of the International Speech Communication Association

.. [2] Thiolliere, R., Dunbar, E., Synnaeve, G., Versteegh, M., & Dupoux, E.
       *A hybrid dynamic time warping-deep neural network architecture for unsupervised acoustic modeling.*
       In Sixteenth Annual Conference of the International Speech Communication Association

.. [3] Zeghidour, N., Synnaeve, G., Usunier, N. & Dupoux, E.
       *Joint Learning of Speaker and Phonetic Similarities with Siamese Networks.*
       In: INTERSPEECH-2016, (pp 1295-1299)


A part of the code is inspired from the previous version in Theano of ABnet, and the examples in Pytorch