Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DAMP-VSEP dataset #222

Closed
wants to merge 20 commits into from
Closed

DAMP-VSEP dataset #222

wants to merge 20 commits into from

Conversation

groadabike
Copy link
Contributor

Signed-off-by: Gerardo Roa Dabike gerardo.roa@gmail.com

This is my current version of the DAMP-VSEP dataset.
It relais in pre-constructed files with the details of the train, valid and test set.

If you run dampvsep/test_dataloader/test_dataloader.py, the first time will construct and save the metadata.

Signed-off-by: Gerardo Roa Dabike <gerardo.roa@gmail.com>
Signed-off-by: Gerardo Roa Dabike <gerardo.roa@gmail.com>
Copy link
Collaborator

@mpariente mpariente left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, it looks great, thanks !
I have this concern about on-the-fly resampling that I'd like to discuss.
Also, instead of passing the root_dir, I would probably use sed to prepend it to all the path in the csv files. But we might get to that when working on the recipe !

import json
import librosa
import warnings
warnings.filterwarnings('ignore')
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does this filters?
A comment would be welcome.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does this filters?
By default, librosa uses soundfile but if they can't open the file with soundfile they change to audioread.
Every time they change to audioread they show a Warning.
librosa/librosa#1015

A comment would be welcome.
Ok I will relate the comment to the question in librosa

Comment on lines 168 to 174
{'mean': f"{mix.mean():.16f}",
'std': f"{mix.std():.16f}",
'scaler': f"{amplitude_scaler:.16f}",
'vocal': sample['vocal_path'],
'background': sample['background_path'],
'vocal_duration': vocal_dur,
'background_duration': back_dur})
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better to declare the dict first, and then return it, it's more readable.

@staticmethod
def _build_metadata(inputs):
sample, root, sample_rate = inputs
back, _ = librosa.load(root / sample['background_path'],
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We usually use soundfile for loading but it doesn't resample.
Is resampling needed? If yes, isn't is expensive to resample on the fly when it could be fixed?

Copy link
Contributor Author

@groadabike groadabike Aug 21, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We usually use soundfile for loading but it doesn't resample.

This is a very good question. First, you need to know that the dataset provides three audio files: mixtures.m4a, background.m4a and vocal.ogg.
Soundfile can't handle M4A files but can handle OGG. In contrast, audioread can handle both.
Librosa, try with soundfile, if error, try audiofile. Also, it allows resampling and converts to mono.

Is resampling needed? If yes, isn't is expensive to resample on the fly when it could be fixed?

Fixed the sample rate would mean to create a copy of the dataset. In my subset it is not a big deal, In fact, I have a recipe that makes a 16KHz WAV copy of the corpus in stage 0. But, personally, don't like to have several copies of my datasets

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, understood. What about torchaudio by curiosity?
I would also resample beforehand for simplicity but this is a choice.

If we merge this, we'll need librosa as a dependency. I guess we couldn't avoid that ^^
Let's add the dependency then, I let you do it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Librosa it is also used in AVSpeech and they have it as a requirement in the recipe.
I could do the same


def __init__(self, root_path, task, split, samples_per_track=1,
random_segments=False, sample_rate=16000,
segment=None, silence_in_segment=None, num_workers=None, norm=None):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would need docs about the split and norm argument.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I missed these


dataset_name = 'DAMP-VSEP'

def __init__(self, root_path, task, split, samples_per_track=1,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about having a default for task and split?
Positional args are always confusing in the long term.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok will add
task='enh_both'
split=None

v: ndarray, vocal.
b: ndarray, backgroun.
snr: float, SNR. Default=0
Outputs:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Returns

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True


parser.add_argument(
'--root', type=str, help='root path of dataset',
default='/media/gerardo/TOSHIBA/DataSets/DAMP/DAMP-VSEP'
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We try to remove all user-specific absolute path.

…URLS_HASHTABLE

Signed-off-by: Gerardo Roa Dabike <gerardo.roa@gmail.com>
… MODELS_URLS_HASHTABLE"

This reverts commit d81dec7

Signed-off-by: Gerardo Roa Dabike <gerardo.roa@gmail.com>
…URLS_HASHTABLE

Signed-off-by: Gerardo Roa Dabike <gerardo.roa@gmail.com>
(cherry picked from commit d81dec7)
… MODELS_URLS_HASHTABLE"

This reverts commit 0423858

Signed-off-by: Gerardo Roa Dabike <gerardo.roa@gmail.com>
…model in MODELS_URLS_HASHTABLE""

This reverts commit c72eac1

Signed-off-by: Gerardo Roa Dabike <gerardo.roa@gmail.com>
…retrain model in MODELS_URLS_HASHTABLE"""

This reverts commit a47921c

Signed-off-by: Gerardo Roa Dabike <gerardo.roa@gmail.com>
…nhboth pretrain model in MODELS_URLS_HASHTABLE""""

This reverts commit f2175db

Signed-off-by: Gerardo Roa Dabike <gerardo.roa@gmail.com>
…P-VSEP_enhboth pretrain model in MODELS_URLS_HASHTABLE"""""

This reverts commit 56b0c61

Signed-off-by: Gerardo Roa Dabike <gerardo.roa@gmail.com>
This reverts commit 84272b1

Signed-off-by: Gerardo Roa Dabike <gerardo.roa@gmail.com>
This reverts commit 274d9f4

Signed-off-by: Gerardo Roa Dabike <gerardo.roa@gmail.com>
This reverts commit 89acbeb

Signed-off-by: Gerardo Roa Dabike <gerardo.roa@gmail.com>
This reverts commit d776041

Signed-off-by: Gerardo Roa Dabike <gerardo.roa@gmail.com>
…nhboth pretrain model in MODELS_URLS_HASHTABLE""""

This reverts commit f2175db

Signed-off-by: Gerardo Roa Dabike <gerardo.roa@gmail.com>
…nhboth pretrain model in MODELS_URLS_HASHTABLE""""

This reverts commit f2175db

Signed-off-by: Gerardo Roa Dabike <gerardo.roa@gmail.com>
(cherry picked from commit f177934)
…nhboth pretrain model in MODELS_URLS_HASHTABLE""""

This reverts commit f2175db

Signed-off-by: Gerardo Roa Dabike <gerardo.roa@gmail.com>
(cherry picked from commit f177934)
@groadabike
Copy link
Contributor Author

Overall, it looks great, thanks !
I have this concern about on-the-fly resampling that I'd like to discuss.
The problem here is that the audios have different sample rate 48000Hz, 44100Hz, 22050Hz and in my task I am working with 16000Hz.
I'm resampling on-the-fly to avoid create a copy of the data and save space.

Also, instead of passing the root_dir, I would probably use sed to prepend it to all the path in the csv files. But we might get to that when working on the recipe !
Ok

Signed-off-by: Gerardo Roa Dabike <gerardo.roa@gmail.com>
Signed-off-by: Gerardo Roa Dabike <gerardo.roa@gmail.com>
Copy link
Collaborator

@mpariente mpariente left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is ready to merge. I'll go again through the changes after the last fixes.
Thanks again

@@ -150,7 +156,7 @@ def get_tracks(self):

@staticmethod
def _build_metadata(inputs):
sample, root, sample_rate = inputs
sample, root, sample_rate, snr = inputs
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks weird to unpack like this but it's a private method, so ok, why not.

"""

dataset_name = 'DAMP-VSEP'

def __init__(self, root_path, task, split, samples_per_track=1,
def __init__(self, root_path, task='enh_both', split=None, samples_per_track=1,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens if split is None then?

@staticmethod
def _build_metadata(inputs):
sample, root, sample_rate = inputs
back, _ = librosa.load(root / sample['background_path'],
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, understood. What about torchaudio by curiosity?
I would also resample beforehand for simplicity but this is a choice.

If we merge this, we'll need librosa as a dependency. I guess we couldn't avoid that ^^
Let's add the dependency then, I let you do it.

@mpariente mpariente added the hackathon PyTorch Summer Hackathon contributions label Aug 23, 2020
@mpariente
Copy link
Collaborator

Hey @groadabike, could we get this in?
Thanks ! =D

@groadabike
Copy link
Contributor Author

Hey @groadabike, could we get this in?
Thanks ! =D

Hey @mpariente, Honestly, I did some modifications and I am running some experiments.
I need to recheck some things and commit those changes before we can merge.
Hope to finish before Wed next week.

@groadabike
Copy link
Contributor Author

Hi @mpariente, didn't have much time to do all the modifications I wanted.
Think it would be better if I cancel this PR and create a new one that includes the data loader, and the ConvTasNet recipe.
It's that ok?

@mpariente
Copy link
Collaborator

Thanks for getting back! Yes, it's completely fine

@groadabike
Copy link
Contributor Author

Close PR. Will be resubmitted with updated code and recipe.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
hackathon PyTorch Summer Hackathon contributions
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants