Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] Sum of output audio has missing high frequencies compared to original audio #106

Closed
zardini123 opened this issue Nov 16, 2019 · 6 comments
Labels
bug Something isn't working invalid This doesn't seem right

Comments

@zardini123
Copy link

zardini123 commented Nov 16, 2019

The issue I'm experiencing is that the sum of spleeter's output audio has missing high frequencies compared to original audio. Regardless of the file input codec (and audio source), this problem seems to persist. A full report on what frequencies the sum is missing is provided at the end of the bug report.

I can't tell from my surface-level perspective if this is a decoding issue (ffmpeg decoding audio to a lower bitrate), or if this is inherit with the trained models. I would expect that part of the training criteria for a AI like this, is that if the AI gets trained so that the sum of the outputs equals almost exactly the input stream. I'd love to read the white paper to see if this condition was actually in the training criteria.

Step to reproduce

  1. Create venv using python3
  2. Install spleeter via venv's pip3
  3. Run spleeter's executable using any model (2 through 5 stems)
  4. Spleeter outputs correct number of audio files. Import all audio files into your favorite audio editor (Audacity, Ableton)
  5. Output sum does not contain as much high frequencies as original audio.

Output

Spleeter runs and reports like standard. No issues are thrown. Audio data is reported to load successfully.

Environment

OS MacOS
Installation type pip
RAM available 16 GB
Hardware spec i9 9700k, Radeon RX 590

Experimentation with the issue

After following the steps I provided above, I then imported my original file and the output spleeter audio to Ableton. I grouped all the spleeter output audio, and inverted the group's audio. The sum of the original audio and the inverted group reveals the audio "missing" from the spleeter audio. Using a spectrum analyzer set to a FFT size of 16384 shows a peculiar cutoff in frequencies at 11k Hz (See image).
Screen Shot 2019-11-16 at 12 59 50 PM
(Note: audio used here is Earth, Wind & Fire's Let's Groove, provided in the wav codec. The y scale is in dB. Notice the almost 72 decibel difference from the low and high shelf)
This odd case of missing frequencies above 11k Hz applies to any audio, any codec, at any sample rate I tried. Even after resampling EWF's Let's Groove from 44.1k to 192k, the same cutoff applies.

Interestingly, sampling rate of 44.1k divided by 4 results in 11025 Hz, which is roughly 11k. I have no idea if this gives a clue that it's a decoding issue (ffmpeg), or if its a model issue. Though, it's interesting to think about.

@zardini123 zardini123 added bug Something isn't working invalid This doesn't seem right labels Nov 16, 2019
@zardini123 zardini123 changed the title [Bug] Sum of output audio is missing high frequencies compared to original audio [Bug] Sum of output audio has missing high frequencies compared to original audio Nov 16, 2019
@CoderSear
Copy link

CoderSear commented Nov 16, 2019

Able to consistently reproduce. Here's a spectrogram from Audacity of the output. Environment: Windows, installed from the git source & using Anaconda w/ Python 3.7
image

@CoderSear
Copy link

Apparently this is a duplicate of closed issues and is the intended behavior. Cutoff can be changed here: https://github.com/deezer/spleeter/wiki/5.-FAQ#why-are-there-no-high-frequencies-in-the-generated-output-files-

@zardini123
Copy link
Author

@CoderSear Interesting find! The issue thread even has two other issues referenced referencing the same "problem." I think it would be a good idea for the managers of the repository to put that documentation somewhere easy to find for others!

@romi1502
Copy link
Member

@zardini123, it is already documented in the FAQ section of the wiki.

@zardini123
Copy link
Author

@romi1502 I believe it would be best to put a link to that specific FAQ entry in the readme. The option to change the behavior of the mask seems very crucial for many. I'd see many more people reporting this same "issue" I did in the future, simply because the information is not in the forefront.

@romi1502
Copy link
Member

We've just updated the FAQ to provide a new way of performing separation above 11kHz. Also configs that perform separation up to 16kHz were packaged in spleeter as mentioned in the wiki

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working invalid This doesn't seem right
Projects
None yet
Development

No branches or pull requests

3 participants