-
Notifications
You must be signed in to change notification settings - Fork 2.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug] Sum of output audio has missing high frequencies compared to original audio #106
Comments
Apparently this is a duplicate of closed issues and is the intended behavior. Cutoff can be changed here: https://github.com/deezer/spleeter/wiki/5.-FAQ#why-are-there-no-high-frequencies-in-the-generated-output-files- |
@CoderSear Interesting find! The issue thread even has two other issues referenced referencing the same "problem." I think it would be a good idea for the managers of the repository to put that documentation somewhere easy to find for others! |
@zardini123, it is already documented in the FAQ section of the wiki. |
@romi1502 I believe it would be best to put a link to that specific FAQ entry in the readme. The option to change the behavior of the mask seems very crucial for many. I'd see many more people reporting this same "issue" I did in the future, simply because the information is not in the forefront. |
The issue I'm experiencing is that the sum of spleeter's output audio has missing high frequencies compared to original audio. Regardless of the file input codec (and audio source), this problem seems to persist. A full report on what frequencies the sum is missing is provided at the end of the bug report.
I can't tell from my surface-level perspective if this is a decoding issue (ffmpeg decoding audio to a lower bitrate), or if this is inherit with the trained models. I would expect that part of the training criteria for a AI like this, is that if the AI gets trained so that the sum of the outputs equals almost exactly the input stream. I'd love to read the white paper to see if this condition was actually in the training criteria.
Step to reproduce
Output
Spleeter runs and reports like standard. No issues are thrown. Audio data is reported to load successfully.
Environment
Experimentation with the issue
After following the steps I provided above, I then imported my original file and the output spleeter audio to Ableton. I grouped all the spleeter output audio, and inverted the group's audio. The sum of the original audio and the inverted group reveals the audio "missing" from the spleeter audio. Using a spectrum analyzer set to a FFT size of 16384 shows a peculiar cutoff in frequencies at 11k Hz (See image).
(Note: audio used here is Earth, Wind & Fire's Let's Groove, provided in the wav codec. The y scale is in dB. Notice the almost 72 decibel difference from the low and high shelf)
This odd case of missing frequencies above 11k Hz applies to any audio, any codec, at any sample rate I tried. Even after resampling EWF's Let's Groove from 44.1k to 192k, the same cutoff applies.
Interestingly, sampling rate of 44.1k divided by 4 results in 11025 Hz, which is roughly 11k. I have no idea if this gives a clue that it's a decoding issue (ffmpeg), or if its a model issue. Though, it's interesting to think about.
The text was updated successfully, but these errors were encountered: