[Bug] Sum of output audio has missing high frequencies compared to original audio #106

zardini123 · 2019-11-16T18:11:28Z

The issue I'm experiencing is that the sum of spleeter's output audio has missing high frequencies compared to original audio. Regardless of the file input codec (and audio source), this problem seems to persist. A full report on what frequencies the sum is missing is provided at the end of the bug report.

I can't tell from my surface-level perspective if this is a decoding issue (ffmpeg decoding audio to a lower bitrate), or if this is inherit with the trained models. I would expect that part of the training criteria for a AI like this, is that if the AI gets trained so that the sum of the outputs equals almost exactly the input stream. I'd love to read the white paper to see if this condition was actually in the training criteria.

Step to reproduce

Create venv using python3
Install spleeter via venv's pip3
Run spleeter's executable using any model (2 through 5 stems)
Spleeter outputs correct number of audio files. Import all audio files into your favorite audio editor (Audacity, Ableton)
Output sum does not contain as much high frequencies as original audio.

Output

Spleeter runs and reports like standard. No issues are thrown. Audio data is reported to load successfully.

Environment


OS	MacOS
Installation type	pip
RAM available	16 GB
Hardware spec	i9 9700k, Radeon RX 590

Experimentation with the issue

After following the steps I provided above, I then imported my original file and the output spleeter audio to Ableton. I grouped all the spleeter output audio, and inverted the group's audio. The sum of the original audio and the inverted group reveals the audio "missing" from the spleeter audio. Using a spectrum analyzer set to a FFT size of 16384 shows a peculiar cutoff in frequencies at 11k Hz (See image).

(Note: audio used here is Earth, Wind & Fire's Let's Groove, provided in the wav codec. The y scale is in dB. Notice the almost 72 decibel difference from the low and high shelf)
This odd case of missing frequencies above 11k Hz applies to any audio, any codec, at any sample rate I tried. Even after resampling EWF's Let's Groove from 44.1k to 192k, the same cutoff applies.

Interestingly, sampling rate of 44.1k divided by 4 results in 11025 Hz, which is roughly 11k. I have no idea if this gives a clue that it's a decoding issue (ffmpeg), or if its a model issue. Though, it's interesting to think about.

CoderSear · 2019-11-16T22:10:17Z

Able to consistently reproduce. Here's a spectrogram from Audacity of the output. Environment: Windows, installed from the git source & using Anaconda w/ Python 3.7

CoderSear · 2019-11-16T23:08:13Z

Apparently this is a duplicate of closed issues and is the intended behavior. Cutoff can be changed here: https://github.com/deezer/spleeter/wiki/5.-FAQ#why-are-there-no-high-frequencies-in-the-generated-output-files-

zardini123 · 2019-11-16T23:47:19Z

@CoderSear Interesting find! The issue thread even has two other issues referenced referencing the same "problem." I think it would be a good idea for the managers of the repository to put that documentation somewhere easy to find for others!

romi1502 · 2019-11-17T08:17:14Z

@zardini123, it is already documented in the FAQ section of the wiki.

zardini123 · 2019-11-17T17:03:31Z

@romi1502 I believe it would be best to put a link to that specific FAQ entry in the readme. The option to change the behavior of the mask seems very crucial for many. I'd see many more people reporting this same "issue" I did in the future, simply because the information is not in the forefront.

romi1502 · 2019-12-27T14:24:43Z

We've just updated the FAQ to provide a new way of performing separation above 11kHz. Also configs that perform separation up to 16kHz were packaged in spleeter as mentioned in the wiki

zardini123 added bug invalid labels Nov 16, 2019

zardini123 changed the title ~~[Bug] Sum of output audio is missing high frequencies compared to original audio~~ [Bug] Sum of output audio has missing high frequencies compared to original audio Nov 16, 2019

romi1502 closed this as completed Nov 17, 2019

romi1502 reopened this Dec 27, 2019

romi1502 closed this as completed Dec 27, 2019

expectopatronum mentioned this issue Oct 9, 2020

[Discussion] Confusion about different sample rates #503

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug] Sum of output audio has missing high frequencies compared to original audio #106

[Bug] Sum of output audio has missing high frequencies compared to original audio #106

zardini123 commented Nov 16, 2019 •

edited

Loading

CoderSear commented Nov 16, 2019 •

edited

Loading

CoderSear commented Nov 16, 2019

zardini123 commented Nov 16, 2019

romi1502 commented Nov 17, 2019

zardini123 commented Nov 17, 2019

romi1502 commented Dec 27, 2019

[Bug] Sum of output audio has missing high frequencies compared to original audio #106

[Bug] Sum of output audio has missing high frequencies compared to original audio #106

Comments

zardini123 commented Nov 16, 2019 • edited Loading

Step to reproduce

Output

Environment

Experimentation with the issue

CoderSear commented Nov 16, 2019 • edited Loading

CoderSear commented Nov 16, 2019

zardini123 commented Nov 16, 2019

romi1502 commented Nov 17, 2019

zardini123 commented Nov 17, 2019

romi1502 commented Dec 27, 2019

zardini123 commented Nov 16, 2019 •

edited

Loading

CoderSear commented Nov 16, 2019 •

edited

Loading