[Discussion] Confusion about different sample rates #503

expectopatronum · 2020-10-09T09:29:12Z

Hi!
Until quite recently I assumed that the models were trained using 44100 Hz, since this is also the parameter in the config. I noticed that for some of the tracks I used, there is actually a difference between the mix and the summed sources. I read all the issues related to this (#2, #15, #106) and tried to follow what's written there and also the section in the FAQ but as you can read in my comment the outputs still don't sum up to the mix.

I have a few question:

First of all, why use 41000 in the config (for loading/training?) when the model ins only trained to 11k?
In the FAQs you mention different configurations, but separator = Separator('spleeter:5stems-16khz') fails with spleeter.SpleeterError: No embedded configuration 5stems-16khz found. Should this work or does it only work with the command line interface?
Could you add an example of how to use spleeter in Python? It took me a while to figure out how to load the data to work with separator.separate(...) (and maybe it's still wrong - see below)
Is the following the correct way of doing it? (Apparently not, but I don't know what else to change)

separator = Separator('/share/home/verena/experiments/spleeter/5stems_average.json')

def separate_and_report(audio_path):
    y, _ = librosa.load(audio_path, mono=False, sr=44100)
    y = np.swapaxes(y, 0, 1)
    prediction = separator.separate(y)
    summed_thing = np.stack(prediction.values()).sum(axis=0)
    return y - summed_thing

In 5stems_average.json I set "mask_extension":"average", and tried "F" with 1536 and 1024.

For some input audios the diff is 0 but for most tracks it is not.

Thanks a lot for the great package and pretrained models! I hope there is an easy fix for my problem.

Thanks and best regards
Verena

The text was updated successfully, but these errors were encountered:

romi1502 · 2020-10-09T15:19:52Z

Hi @expectopatronum
Thanks for your questions. I'll try to answer them:

First of all, why use 41000 in the config (for loading/training?) when the model ins only trained to 11k?

This is two different things. The spectrogram model is trained until 11kHz meaning input spectrograms are cut at 11kHz, but separation can be performed until 22kHz, either by increasing the value of F at test time (this is exactly what spleeter:stems-16kHz does) up to 2048 or by extending the mask averaging (using the "average" option for "mask_extension" as you mentioned). Note that models were trained with 44.1KHz, and changing the sampling rate at test time may lead to lower separation performances.

In the FAQs you mention different configurations, but separator = Separator('spleeter:5stems-16khz') fails with spleeter.SpleeterError: No embedded configuration 5stems-16khz found. Should this work or does it only work with the command line interface?

separator = Separator('spleeter:5stems-16khz') should work but if you have a quite old version of Spleeter. I've just tested it with the most recent version and it works on my side.

Could you add an example of how to use spleeter in Python? It took me a while to figure out how to load the data to work with separator.separate(...) (and maybe it's still wrong - see below)

To perform separation in python, either you do it at the file level using separator.separate_to_file (that behaves quite similarly to the command line) or you process directly waveforms as numpy arrays with separator.separate. The wiki provides sample code for separator.separate usage.

Is the following the correct way of doing it? (Apparently not, but I don't know what else to change)

See 3.

Using "mask_extension":"average", you should have separated signals that sum up to the mix. For older versions of spleeter (i.e. <=1.5.3), you may have differences at the very beginning or the very end of the signal due to the way STFT/iSTFT were managed, but that should be fixed from 1.5.4. Let us know if you still have troubles with recent versions.

expectopatronum · 2020-10-15T04:49:49Z

Thanks for your reply! I just checked, I am using version 1.5.4 which I installed not too long ago using pip, therefore I assumed that it should be recent enough (regarding question 2 and your remark regarding 'mask_extension'). I now switched to 2.0, which I wasn't aware about, when was it released? The release page is quite outdated.

.: ok, I'll have to try again, but first I need to get 2 to work, which still results in an error:

After pip install spleeter and checking that I get the correct version, I run:

from spleeter.separator import Separator

separator = Separator('spleeter:5stems-16khz')

I get the following:

2020-10-15 06:37:28.228400: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.1
Traceback (most recent call last):
  File "/home/verena/miniconda3/envs/py37-spleeter/lib/python3.7/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/home/verena/miniconda3/envs/py37-spleeter/lib/python3.7/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/home/verena/deployment/interpretable-audio-models/experiments/spleeter/test_spleeter_mix_vs_modelinput_example.py", line 23, in <module>
    separator = Separator('spleeter:5stems-16khz')
  File "/home/verena/miniconda3/envs/py37-spleeter/lib/python3.7/site-packages/spleeter/separator.py", line 94, in __init__
    self._params = load_configuration(params_descriptor)
  File "/home/verena/miniconda3/envs/py37-spleeter/lib/python3.7/site-packages/spleeter/utils/configuration.py", line 40, in load_configuration
    raise SpleeterError(f'No embedded configuration {name} found')
spleeter.SpleeterError: No embedded configuration 5stems-16khz found

How would it work anyways? There are not '-16khz' configs available in the config folder?

Thanks for the wiki link, I couldn't find that before!

Regarding you last comment I was trying with spleeter 1.5.4 and spleeter 2.0, but I still get differences. Should I take a delta into account when comparing?

jdsierral · 2021-04-13T00:44:18Z

I tried different configurations but I haven't been able to make the output files actually be 48kHz audio files, am I missing something? or is it not currently possible to make the whole chain work at 48kHz or even potentially 88.2 or 96kHz? (I understand that given that the models were trained @44.1 the performance might be lower at a different SR but I'd still like to be able to produce 48kHz files)

Also, Is there a reason for the output to be 16bits on wav files? is there a way to make this 24 or potentially 32FP?

jdsierral · 2021-04-13T00:50:58Z

Oh! I figured out the 48kHz :)

expectopatronum added the question Further information is requested label Oct 9, 2020

romi1502 closed this as completed Oct 13, 2020

lesecs mentioned this issue Nov 22, 2020

2 stems 22khz dont work but 16 kHz vers good #521

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Discussion] Confusion about different sample rates #503

[Discussion] Confusion about different sample rates #503

expectopatronum commented Oct 9, 2020

romi1502 commented Oct 9, 2020

expectopatronum commented Oct 15, 2020

jdsierral commented Apr 13, 2021 •

edited

Loading

jdsierral commented Apr 13, 2021

[Discussion] Confusion about different sample rates #503

[Discussion] Confusion about different sample rates #503

Comments

expectopatronum commented Oct 9, 2020

romi1502 commented Oct 9, 2020

expectopatronum commented Oct 15, 2020

jdsierral commented Apr 13, 2021 • edited Loading

jdsierral commented Apr 13, 2021

jdsierral commented Apr 13, 2021 •

edited

Loading