Document the DB directory structure #7

turian · 2020-07-03T16:39:37Z

For people who don't want to use VoxCeleb + VoxCeleb2, it is hard to figure out what the directory structure should be for DB. Could you please document it?

Or even nicer, if there were a simple to download audio dataset (e.g. from torchaudio) that the script would lay out in the right way, people could immediately try your repo and see if it works on their GPU.

turian · 2020-07-03T16:40:58Z

Duplicate of #5

turian · 2020-07-03T16:43:33Z

I am re-opening this because now train.py wants DB/wav, DB/eval_wav, and DB/wav

Could you please explain what the directory structure should be? I have different WAV files I want to train

Jungjee · 2020-07-04T06:04:09Z

As of default, the script is written to read all files with "wav" extension under 'DB/VoxCeleb2/wav' for training, and 'DB/VoxCeleb1/eval_wav/' for speaker embedding extraction in the test phase.

Put your dataset under aforementioned directory, or give "DB", "DB_vox2", "dev_wav" as arguments when running scripts :)
In the code, PyTorch Dataset will use args.DB_vox2+args.dev_wav as 'self.base_dir' and read utterances.

turian · 2020-07-04T11:21:21Z

So there are still some details that I am missing, and unfortunately I am having difficulty understanding the code.

It's more than all WAV files under DB/VoxCeleb2/wav for training and DB/VoxCeleb1/eval_wav/, it seems like you need one subdirectory per speaker? (I'm not doing speaker identification.) What goes in DB/VoxCeleb1/veri_test.txt and DB/VoxCeleb1/val_trial.txt?

Here's what I tried just to create a rough directory structure just to get the code running

!mkdir -p DB/VoxCeleb2/wav/everyone
!mkdir -p DB/VoxCeleb1/eval_wav/everyone
!cp -R train-small/* DB/VoxCeleb2/wav/everyone
!cp -R train-small/* DB/VoxCeleb1/eval_wav/everyone
!find DB/VoxCeleb1/eval_wav/ -name \*.wav > DB/VoxCeleb1/val_trial.txt
!find DB/VoxCeleb1/eval_wav/ -name \*.wav > DB/VoxCeleb1/veri_test.txt

But that still doesn't work, it crashes with

  File "/usr/local/lib/python3.6/dist-packages/soundfile.py", line 1357, in _error_check
    raise RuntimeError(prefix + _ffi.string(err_str).decode('utf-8', 'replace'))
RuntimeError: Error opening 'DB/VoxCeleb2/wav/wav/everyone/soundsofsocrates - leads - lead-e VARIATION-002-019.wav': System error.

So it's constructing the directory paths wrong somehow.

I set up a little Google Colab notebook.

Or if you could run:

find DB

to show your directory structure and also the contents of DB/VoxCeleb1/veri_test.txt and DB/VoxCeleb1/val_trial.txt that would be great.

turian · 2020-07-04T11:22:57Z

What would be most helpful would be a simple Google Colab notebook that demonstrates how to set up the data and run the code :)
RawNet2 is a very cool work based upon my reading of it, I am just struggling with the code because I want to try it for a different dataset and also because VoxCeleb is hard to get and the directory structure is not well-documented on their webpage or this code :(

Jungjee · 2020-07-06T02:29:23Z

I have added filetrees. You can discard val_trial.txt as it is unofficial (I just used it for model validation) and veri_test.txt is in "trials" folder.

Jungjee · 2020-07-06T02:31:19Z

I agree that adding documentation and making code available for other datasets will definitely increase readability for other domain researchers : ) However, I'm not sure if I can do it right now..
I'll update the codes ASAP :)

Jungjee · 2020-07-06T02:34:06Z

In the meantime, revising get_utt_list in utils.py and Dataset class's self.base_dir and y = self.labels[ID.split('/')[0]] (line 52 of dataloaders.py), I suppose you can train using your dataset :)

turian closed this as completed Jul 3, 2020

turian reopened this Jul 3, 2020

Jungjee closed this as completed Jul 20, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Document the DB directory structure #7

Document the DB directory structure #7

turian commented Jul 3, 2020 •

edited

Loading

turian commented Jul 3, 2020

turian commented Jul 3, 2020

Jungjee commented Jul 4, 2020

turian commented Jul 4, 2020

turian commented Jul 4, 2020

Jungjee commented Jul 6, 2020

Jungjee commented Jul 6, 2020

Jungjee commented Jul 6, 2020

Document the DB directory structure #7

Document the DB directory structure #7

Comments

turian commented Jul 3, 2020 • edited Loading

turian commented Jul 3, 2020

turian commented Jul 3, 2020

Jungjee commented Jul 4, 2020

turian commented Jul 4, 2020

turian commented Jul 4, 2020

Jungjee commented Jul 6, 2020

Jungjee commented Jul 6, 2020

Jungjee commented Jul 6, 2020

turian commented Jul 3, 2020 •

edited

Loading