Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Discussion] Confusion about different sample rates #503

Closed
expectopatronum opened this issue Oct 9, 2020 · 4 comments
Closed

[Discussion] Confusion about different sample rates #503

expectopatronum opened this issue Oct 9, 2020 · 4 comments
Labels
question Further information is requested

Comments

@expectopatronum
Copy link

Hi!
Until quite recently I assumed that the models were trained using 44100 Hz, since this is also the parameter in the config. I noticed that for some of the tracks I used, there is actually a difference between the mix and the summed sources. I read all the issues related to this (#2, #15, #106) and tried to follow what's written there and also the section in the FAQ but as you can read in my comment the outputs still don't sum up to the mix.

I have a few question:

  • First of all, why use 41000 in the config (for loading/training?) when the model ins only trained to 11k?
  • In the FAQs you mention different configurations, but separator = Separator('spleeter:5stems-16khz') fails with spleeter.SpleeterError: No embedded configuration 5stems-16khz found. Should this work or does it only work with the command line interface?
  • Could you add an example of how to use spleeter in Python? It took me a while to figure out how to load the data to work with separator.separate(...) (and maybe it's still wrong - see below)
  • Is the following the correct way of doing it? (Apparently not, but I don't know what else to change)
separator = Separator('/share/home/verena/experiments/spleeter/5stems_average.json')

def separate_and_report(audio_path):
    y, _ = librosa.load(audio_path, mono=False, sr=44100)
    y = np.swapaxes(y, 0, 1)
    prediction = separator.separate(y)
    summed_thing = np.stack(prediction.values()).sum(axis=0)
    return y - summed_thing

In 5stems_average.json I set "mask_extension":"average", and tried "F" with 1536 and 1024.

For some input audios the diff is 0 but for most tracks it is not.

Thanks a lot for the great package and pretrained models! I hope there is an easy fix for my problem.

Thanks and best regards
Verena

@expectopatronum expectopatronum added the question Further information is requested label Oct 9, 2020
@romi1502
Copy link
Member

romi1502 commented Oct 9, 2020

Hi @expectopatronum
Thanks for your questions. I'll try to answer them:

First of all, why use 41000 in the config (for loading/training?) when the model ins only trained to 11k?

  1. This is two different things. The spectrogram model is trained until 11kHz meaning input spectrograms are cut at 11kHz, but separation can be performed until 22kHz, either by increasing the value of F at test time (this is exactly what spleeter:stems-16kHz does) up to 2048 or by extending the mask averaging (using the "average" option for "mask_extension" as you mentioned). Note that models were trained with 44.1KHz, and changing the sampling rate at test time may lead to lower separation performances.

In the FAQs you mention different configurations, but separator = Separator('spleeter:5stems-16khz') fails with spleeter.SpleeterError: No embedded configuration 5stems-16khz found. Should this work or does it only work with the command line interface?

  1. separator = Separator('spleeter:5stems-16khz') should work but if you have a quite old version of Spleeter. I've just tested it with the most recent version and it works on my side.

Could you add an example of how to use spleeter in Python? It took me a while to figure out how to load the data to work with separator.separate(...) (and maybe it's still wrong - see below)

  1. To perform separation in python, either you do it at the file level using separator.separate_to_file (that behaves quite similarly to the command line) or you process directly waveforms as numpy arrays with separator.separate. The wiki provides sample code for separator.separate usage.

Is the following the correct way of doing it? (Apparently not, but I don't know what else to change)

  1. See 3.

Using "mask_extension":"average", you should have separated signals that sum up to the mix. For older versions of spleeter (i.e. <=1.5.3), you may have differences at the very beginning or the very end of the signal due to the way STFT/iSTFT were managed, but that should be fixed from 1.5.4. Let us know if you still have troubles with recent versions.

@expectopatronum
Copy link
Author

Thanks for your reply! I just checked, I am using version 1.5.4 which I installed not too long ago using pip, therefore I assumed that it should be recent enough (regarding question 2 and your remark regarding 'mask_extension'). I now switched to 2.0, which I wasn't aware about, when was it released? The release page is quite outdated.

  1. .: ok, I'll have to try again, but first I need to get 2 to work, which still results in an error:

After pip install spleeter and checking that I get the correct version, I run:

from spleeter.separator import Separator

separator = Separator('spleeter:5stems-16khz')

I get the following:

2020-10-15 06:37:28.228400: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.1
Traceback (most recent call last):
  File "/home/verena/miniconda3/envs/py37-spleeter/lib/python3.7/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/home/verena/miniconda3/envs/py37-spleeter/lib/python3.7/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/home/verena/deployment/interpretable-audio-models/experiments/spleeter/test_spleeter_mix_vs_modelinput_example.py", line 23, in <module>
    separator = Separator('spleeter:5stems-16khz')
  File "/home/verena/miniconda3/envs/py37-spleeter/lib/python3.7/site-packages/spleeter/separator.py", line 94, in __init__
    self._params = load_configuration(params_descriptor)
  File "/home/verena/miniconda3/envs/py37-spleeter/lib/python3.7/site-packages/spleeter/utils/configuration.py", line 40, in load_configuration
    raise SpleeterError(f'No embedded configuration {name} found')
spleeter.SpleeterError: No embedded configuration 5stems-16khz found

How would it work anyways? There are not '-16khz' configs available in the config folder?

  1. Thanks for the wiki link, I couldn't find that before!

Regarding you last comment I was trying with spleeter 1.5.4 and spleeter 2.0, but I still get differences. Should I take a delta into account when comparing?

@jdsierral
Copy link

jdsierral commented Apr 13, 2021

I tried different configurations but I haven't been able to make the output files actually be 48kHz audio files, am I missing something? or is it not currently possible to make the whole chain work at 48kHz or even potentially 88.2 or 96kHz? (I understand that given that the models were trained @44.1 the performance might be lower at a different SR but I'd still like to be able to produce 48kHz files)

Also, Is there a reason for the output to be 16bits on wav files? is there a way to make this 24 or potentially 32FP?

@jdsierral
Copy link

Oh! I figured out the 48kHz :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

3 participants