DAMP-VSEP dataset #222

groadabike · 2020-08-20T17:56:44Z

Signed-off-by: Gerardo Roa Dabike gerardo.roa@gmail.com

This is my current version of the DAMP-VSEP dataset.
It relais in pre-constructed files with the details of the train, valid and test set.

If you run dampvsep/test_dataloader/test_dataloader.py, the first time will construct and save the metadata.

Signed-off-by: Gerardo Roa Dabike <gerardo.roa@gmail.com>

asteroid/data/dampvsep_dataset.py

mpariente

Overall, it looks great, thanks !
I have this concern about on-the-fly resampling that I'd like to discuss.
Also, instead of passing the root_dir, I would probably use sed to prepend it to all the path in the csv files. But we might get to that when working on the recipe !

mpariente · 2020-08-20T19:13:52Z

asteroid/data/dampvsep_dataset.py

+import json
+import librosa
+import warnings
+warnings.filterwarnings('ignore')


What does this filters?
A comment would be welcome.

What does this filters?
By default, librosa uses soundfile but if they can't open the file with soundfile they change to audioread.
Every time they change to audioread they show a Warning.
librosa/librosa#1015

A comment would be welcome.
Ok I will relate the comment to the question in librosa

mpariente · 2020-08-20T19:15:12Z

asteroid/data/dampvsep_dataset.py

+                {'mean': f"{mix.mean():.16f}",
+                 'std': f"{mix.std():.16f}",
+                 'scaler': f"{amplitude_scaler:.16f}",
+                 'vocal': sample['vocal_path'],
+                 'background': sample['background_path'],
+                 'vocal_duration': vocal_dur,
+                 'background_duration': back_dur})


Better to declare the dict first, and then return it, it's more readable.

mpariente · 2020-08-20T19:17:12Z

asteroid/data/dampvsep_dataset.py

+    @staticmethod
+    def _build_metadata(inputs):
+        sample, root, sample_rate = inputs
+        back, _ = librosa.load(root / sample['background_path'],


We usually use soundfile for loading but it doesn't resample.
Is resampling needed? If yes, isn't is expensive to resample on the fly when it could be fixed?

We usually use soundfile for loading but it doesn't resample.

This is a very good question. First, you need to know that the dataset provides three audio files: mixtures.m4a, background.m4a and vocal.ogg.
Soundfile can't handle M4A files but can handle OGG. In contrast, audioread can handle both.
Librosa, try with soundfile, if error, try audiofile. Also, it allows resampling and converts to mono.

Is resampling needed? If yes, isn't is expensive to resample on the fly when it could be fixed?

Fixed the sample rate would mean to create a copy of the dataset. In my subset it is not a big deal, In fact, I have a recipe that makes a 16KHz WAV copy of the corpus in stage 0. But, personally, don't like to have several copies of my datasets

Ok, understood. What about torchaudio by curiosity?
I would also resample beforehand for simplicity but this is a choice.

If we merge this, we'll need librosa as a dependency. I guess we couldn't avoid that ^^
Let's add the dependency then, I let you do it.

Librosa it is also used in AVSpeech and they have it as a requirement in the recipe.
I could do the same

mpariente · 2020-08-20T19:18:09Z

asteroid/data/dampvsep_dataset.py

+
+    def __init__(self, root_path, task, split, samples_per_track=1,
+                 random_segments=False, sample_rate=16000,
+                 segment=None, silence_in_segment=None, num_workers=None, norm=None):


Would need docs about the split and norm argument.

Oh, I missed these

mpariente · 2020-08-20T19:19:12Z

asteroid/data/dampvsep_dataset.py

+
+    dataset_name = 'DAMP-VSEP'
+
+    def __init__(self, root_path, task, split, samples_per_track=1,


How about having a default for task and split?
Positional args are always confusing in the long term.

ok will add
task='enh_both'
split=None

mpariente · 2020-08-20T19:20:15Z

asteroid/data/dampvsep_dataset.py

+      v: ndarray, vocal.
+      b: ndarray, backgroun.
+      snr: float, SNR. Default=0
+    Outputs:


mpariente · 2020-08-20T19:21:28Z

egs/dampvsep/test_dataloader/test_dataloader.py

+
+    parser.add_argument(
+        '--root', type=str, help='root path of dataset',
+        default='/media/gerardo/TOSHIBA/DataSets/DAMP/DAMP-VSEP'


We try to remove all user-specific absolute path.

…URLS_HASHTABLE Signed-off-by: Gerardo Roa Dabike <gerardo.roa@gmail.com>

… MODELS_URLS_HASHTABLE" This reverts commit d81dec7 Signed-off-by: Gerardo Roa Dabike <gerardo.roa@gmail.com>

…URLS_HASHTABLE Signed-off-by: Gerardo Roa Dabike <gerardo.roa@gmail.com> (cherry picked from commit d81dec7)

… MODELS_URLS_HASHTABLE" This reverts commit 0423858 Signed-off-by: Gerardo Roa Dabike <gerardo.roa@gmail.com>

…model in MODELS_URLS_HASHTABLE"" This reverts commit c72eac1 Signed-off-by: Gerardo Roa Dabike <gerardo.roa@gmail.com>

…retrain model in MODELS_URLS_HASHTABLE""" This reverts commit a47921c Signed-off-by: Gerardo Roa Dabike <gerardo.roa@gmail.com>

…nhboth pretrain model in MODELS_URLS_HASHTABLE"""" This reverts commit f2175db Signed-off-by: Gerardo Roa Dabike <gerardo.roa@gmail.com>

…P-VSEP_enhboth pretrain model in MODELS_URLS_HASHTABLE""""" This reverts commit 56b0c61 Signed-off-by: Gerardo Roa Dabike <gerardo.roa@gmail.com>

This reverts commit 84272b1 Signed-off-by: Gerardo Roa Dabike <gerardo.roa@gmail.com>

This reverts commit 274d9f4 Signed-off-by: Gerardo Roa Dabike <gerardo.roa@gmail.com>

This reverts commit 89acbeb Signed-off-by: Gerardo Roa Dabike <gerardo.roa@gmail.com>

This reverts commit d776041 Signed-off-by: Gerardo Roa Dabike <gerardo.roa@gmail.com>

…nhboth pretrain model in MODELS_URLS_HASHTABLE"""" This reverts commit f2175db Signed-off-by: Gerardo Roa Dabike <gerardo.roa@gmail.com>

…nhboth pretrain model in MODELS_URLS_HASHTABLE"""" This reverts commit f2175db Signed-off-by: Gerardo Roa Dabike <gerardo.roa@gmail.com> (cherry picked from commit f177934)

groadabike · 2020-08-21T14:41:26Z

Overall, it looks great, thanks !
I have this concern about on-the-fly resampling that I'd like to discuss.
The problem here is that the audios have different sample rate 48000Hz, 44100Hz, 22050Hz and in my task I am working with 16000Hz.
I'm resampling on-the-fly to avoid create a copy of the data and save space.

Also, instead of passing the root_dir, I would probably use sed to prepend it to all the path in the csv files. But we might get to that when working on the recipe !
Ok

Signed-off-by: Gerardo Roa Dabike <gerardo.roa@gmail.com>

mpariente

I think this is ready to merge. I'll go again through the changes after the last fixes.
Thanks again

mpariente · 2020-08-21T15:46:17Z

asteroid/data/dampvsep_dataset.py

@@ -150,7 +156,7 @@ def get_tracks(self):

    @staticmethod
    def _build_metadata(inputs):
-        sample, root, sample_rate = inputs
+        sample, root, sample_rate, snr = inputs


It looks weird to unpack like this but it's a private method, so ok, why not.

mpariente · 2020-08-21T15:46:32Z

asteroid/data/dampvsep_dataset.py

     """

    dataset_name = 'DAMP-VSEP'

-    def __init__(self, root_path, task, split, samples_per_track=1,
+    def __init__(self, root_path, task='enh_both', split=None, samples_per_track=1,


What happens if split is None then?

mpariente · 2020-08-21T15:47:35Z

asteroid/data/dampvsep_dataset.py

+    @staticmethod
+    def _build_metadata(inputs):
+        sample, root, sample_rate = inputs
+        back, _ = librosa.load(root / sample['background_path'],


Ok, understood. What about torchaudio by curiosity?
I would also resample beforehand for simplicity but this is a choice.

If we merge this, we'll need librosa as a dependency. I guess we couldn't avoid that ^^
Let's add the dependency then, I let you do it.

mpariente · 2020-10-02T14:52:01Z

Hey @groadabike, could we get this in?
Thanks ! =D

groadabike · 2020-10-02T16:11:52Z

Hey @groadabike, could we get this in?
Thanks ! =D

Hey @mpariente, Honestly, I did some modifications and I am running some experiments.
I need to recheck some things and commit those changes before we can merge.
Hope to finish before Wed next week.

groadabike · 2020-10-19T13:54:17Z

Hi @mpariente, didn't have much time to do all the modifications I wanted.
Think it would be better if I cancel this PR and create a new one that includes the data loader, and the ConvTasNet recipe.
It's that ok?

mpariente · 2020-10-19T18:02:15Z

Thanks for getting back! Yes, it's completely fine

groadabike · 2020-10-21T15:12:06Z

Close PR. Will be resubmitted with updated code and recipe.

groadabike added 3 commits August 20, 2020 18:52

DAMP-VSEP dataset

84272b1

Signed-off-by: Gerardo Roa Dabike <gerardo.roa@gmail.com>

DAMP-VSEP dataset

274d9f4

Signed-off-by: Gerardo Roa Dabike <gerardo.roa@gmail.com>

Merge remote-tracking branch 'origin/master'

aafe097

mpariente reviewed Aug 20, 2020

View reviewed changes

asteroid/data/dampvsep_dataset.py Outdated Show resolved Hide resolved

mpariente reviewed Aug 20, 2020

View reviewed changes

groadabike added 15 commits August 21, 2020 14:27

Add groadabike/ConvTasNet_DAMP-VSEP_enhboth pretrain model in MODELS_…

d81dec7

…URLS_HASHTABLE Signed-off-by: Gerardo Roa Dabike <gerardo.roa@gmail.com>

Revert "Add groadabike/ConvTasNet_DAMP-VSEP_enhboth pretrain model in…

e99a01c

… MODELS_URLS_HASHTABLE" This reverts commit d81dec7 Signed-off-by: Gerardo Roa Dabike <gerardo.roa@gmail.com>

Add groadabike/ConvTasNet_DAMP-VSEP_enhboth pretrain model in MODELS_…

0423858

…URLS_HASHTABLE Signed-off-by: Gerardo Roa Dabike <gerardo.roa@gmail.com> (cherry picked from commit d81dec7)

Revert "Add groadabike/ConvTasNet_DAMP-VSEP_enhboth pretrain model in…

c72eac1

… MODELS_URLS_HASHTABLE" This reverts commit 0423858 Signed-off-by: Gerardo Roa Dabike <gerardo.roa@gmail.com>

Revert "Revert "Add groadabike/ConvTasNet_DAMP-VSEP_enhboth pretrain …

a47921c

…model in MODELS_URLS_HASHTABLE"" This reverts commit c72eac1 Signed-off-by: Gerardo Roa Dabike <gerardo.roa@gmail.com>

Revert "Revert "Revert "Add groadabike/ConvTasNet_DAMP-VSEP_enhboth p…

f2175db

…retrain model in MODELS_URLS_HASHTABLE""" This reverts commit a47921c Signed-off-by: Gerardo Roa Dabike <gerardo.roa@gmail.com>

Revert "Revert "Revert "Revert "Add groadabike/ConvTasNet_DAMP-VSEP_e…

56b0c61

…nhboth pretrain model in MODELS_URLS_HASHTABLE"""" This reverts commit f2175db Signed-off-by: Gerardo Roa Dabike <gerardo.roa@gmail.com>

Revert "Revert "Revert "Revert "Revert "Add groadabike/ConvTasNet_DAM…

83a4580

…P-VSEP_enhboth pretrain model in MODELS_URLS_HASHTABLE""""" This reverts commit 56b0c61 Signed-off-by: Gerardo Roa Dabike <gerardo.roa@gmail.com>

Revert "DAMP-VSEP dataset"

d776041

This reverts commit 84272b1 Signed-off-by: Gerardo Roa Dabike <gerardo.roa@gmail.com>

Revert "DAMP-VSEP dataset"

89acbeb

This reverts commit 274d9f4 Signed-off-by: Gerardo Roa Dabike <gerardo.roa@gmail.com>

Revert "Revert "DAMP-VSEP dataset""

3ba7bfb

This reverts commit 89acbeb Signed-off-by: Gerardo Roa Dabike <gerardo.roa@gmail.com>

Revert "Revert "DAMP-VSEP dataset""

48bf3bb

This reverts commit d776041 Signed-off-by: Gerardo Roa Dabike <gerardo.roa@gmail.com>

Revert "Revert "Revert "Revert "Add groadabike/ConvTasNet_DAMP-VSEP_e…

f177934

…nhboth pretrain model in MODELS_URLS_HASHTABLE"""" This reverts commit f2175db Signed-off-by: Gerardo Roa Dabike <gerardo.roa@gmail.com>

Revert "Revert "Revert "Revert "Add groadabike/ConvTasNet_DAMP-VSEP_e…

8d3c81e

…nhboth pretrain model in MODELS_URLS_HASHTABLE"""" This reverts commit f2175db Signed-off-by: Gerardo Roa Dabike <gerardo.roa@gmail.com> (cherry picked from commit f177934)

Revert "Revert "Revert "Revert "Add groadabike/ConvTasNet_DAMP-VSEP_e…

1e4fb7a

…nhboth pretrain model in MODELS_URLS_HASHTABLE"""" This reverts commit f2175db Signed-off-by: Gerardo Roa Dabike <gerardo.roa@gmail.com> (cherry picked from commit f177934)

groadabike added 2 commits August 21, 2020 16:18

Corrections in DAMPVSEP dataset

45899b6

Signed-off-by: Gerardo Roa Dabike <gerardo.roa@gmail.com>

Corrections in DAMPVSEP dataset

e258e44

Signed-off-by: Gerardo Roa Dabike <gerardo.roa@gmail.com>

mpariente reviewed Aug 21, 2020

View reviewed changes

mpariente added the hackathon PyTorch Summer Hackathon contributions label Aug 23, 2020

groadabike closed this Oct 21, 2020

mpariente mentioned this pull request Nov 6, 2020

[docs] DAMP-VSEP in the docs ! #317

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DAMP-VSEP dataset #222

DAMP-VSEP dataset #222

groadabike commented Aug 20, 2020

mpariente left a comment

mpariente Aug 20, 2020

groadabike Aug 21, 2020

mpariente Aug 20, 2020

mpariente Aug 20, 2020

groadabike Aug 21, 2020 •

edited by mpariente

mpariente Aug 21, 2020

groadabike Aug 26, 2020

mpariente Aug 20, 2020

groadabike Aug 21, 2020

mpariente Aug 20, 2020

groadabike Aug 21, 2020

mpariente Aug 20, 2020

groadabike Aug 21, 2020

mpariente Aug 20, 2020

groadabike commented Aug 21, 2020

mpariente left a comment

mpariente Aug 21, 2020

mpariente Aug 21, 2020

mpariente Aug 21, 2020

mpariente commented Oct 2, 2020

groadabike commented Oct 2, 2020

groadabike commented Oct 19, 2020

mpariente commented Oct 19, 2020

groadabike commented Oct 21, 2020


		dataset_name = 'DAMP-VSEP'

		def __init__(self, root_path, task, split, samples_per_track=1,

DAMP-VSEP dataset #222

DAMP-VSEP dataset #222

Conversation

groadabike commented Aug 20, 2020

mpariente left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

groadabike Aug 21, 2020 • edited by mpariente

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

groadabike commented Aug 21, 2020

mpariente left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mpariente commented Oct 2, 2020

groadabike commented Oct 2, 2020

groadabike commented Oct 19, 2020

mpariente commented Oct 19, 2020

groadabike commented Oct 21, 2020

groadabike Aug 21, 2020 •

edited by mpariente