Skip to content

MKegler/SpeechVGG

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

WARNING: The official repository was moved to https://github.com/bepierre/SpeechVGG!

To make sure you are using the most recent, up-to-date version, please follow the link above.

While most of the below still holds up, the code, data and links can be outdated. Apologies for the inconvenience.

Mikolaj Kegler, 4th May 2020.


SpeechVGG: A deep feature extractor for speech processing

For some context... Here, we present code underlying SpeechVGG (sVGG) a simple, yet efficient feature extractor for training deep learning frameworks for speech processing through deep feature losses. In (Kegler et al., 2019) we firstly applied sVGG to improve the performance of the network for recovering missing parts of speech spectrograms (i.e. speech inpainting). In the follow up paper (Beckmann et al., 2019) we present a systematic analysis of the influence of the sVGG parameters on the main framework.

To summarize quickly we showed how training a VGG inspired speech to word classifier could be used to extract higher level feature losses.

The trained network can then be used to train another network on another task, using the information learned in the classifiction task.

Now we are going to walk you through how to use it!

Train your own...

Requirements

Packages :

  • Python 3.6.8
  • numpy 1.16.4
  • h5py 2.8.0
  • SoundFile 0.10.2
  • SciPy 1.2.1
  • Tensorflow 1.13.1
  • Keras 2.2.4

Data :

You should create a folder 'LibriSpeech' with the following folders :

LibriSpeech
	|_ word_labels
	|_ split
        |____ test-clean
        |____ test-other
        |____ dev-clean
        |____ dev-other
        |____ train-clean-100
        |____ train-clean-360
        |____ train-other-500

The word_label folder should contain the aligned labels, this folder can be downloaded here.

The split folder should contain the extracted Librispeech datasets that can be downloaded here.

Generate dataset

First, preprocess the data (here, LibriSpeech for example):

python preprocess.py --data ./LibriSpeech --dest_path ./LibriSpeechWords

Then, obtain the mean and standard deviation of the desired dataset (for normalization):

python compute_dataset_props.py --data ./LibriSpeechWords/train-clean-100/ --output_folder ./

Parameters will be saved as dataset_props_log.h5 file. Here we attach a version obtained from training part of LibriSpeech data.

Train

Now you can train the model using the training script:

python train.py --name my_model_name --train ./LibriSpeechWords/train-clean-100/ --test ./LibriSpeechWords/test-clean/ --weight_path ./results/

Finally the weights of the model will be saved in the desired direction, here './results/'. Subsequently you can use the trained model, for example, to obtain deep feature losses (as we did in Kegler et al., 2019 & Beckmann et al., 2019).

or ...use our pre-trained models!

Available here, for all the configurations considered in (Beckmann et al., 2019).

Links:

Original papers:

  1. Kegler, Mikolaj, Pierre Beckmann, and Milos Cernak. "Deep speech inpainting of time-frequency masks." arXiv preprint arXiv:1910.09058 (2019)
  2. Beckmann, Pierre, Mikolaj Kegler, Hugues Saltini, and Milos Cernak. "Speech-VGG: A deep feature extractor for speech processing." arXiv preprint arXiv:1910.09909 (2019)

About

The repository was moved! For the most recent version see:

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages