Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Quality issues? #75

Closed
bluenote10 opened this issue Nov 2, 2020 · 5 comments
Closed

Quality issues? #75

bluenote10 opened this issue Nov 2, 2020 · 5 comments
Labels

Comments

@bluenote10
Copy link

bluenote10 commented Nov 2, 2020

Out of curiosity I did a test similar to this on the various resampling methods available in librosa. The test resamples an exponentially swept sine from 96kHz down to 44.1kHz. The test signal is 8 seconds long, and at around 7.2 seconds the swept sine passes the Nyquist frequency of the downsampled rate.

Considering that the resampy algorithm has a sound theoretical background straight from a JOS publication and is librosa's default I expected it to be very high quality. However what I got was somewhat surprising:

resampy_best

resampy_fast

scipy signal resample

scipy signal resample_poly

samplerate converters resample_sinc_best

samplerate converters resample_sinc_medium

samplerate converters resample_sinc_fastest

samplerate converters resample_linear

samplerate converters resample_zero_order_hold

full reproduction code
import time

import numpy as np
import matplotlib.pyplot as plt

import librosa
import librosa.display


def exp_swept_sine(f1, f2, sr, amp=1.0, t=1.0):
    num_samples = int(t * sr)
    ts = np.arange(num_samples) / sr
    L = t / np.log(f2 / f1)
    wave = amp * np.sin(2.0 * np.pi * f1 * L * (np.exp(ts / L) - 1))
    return wave


def analyze_and_plot(wave, sr, method_name, runtime):
    hop_length = 256
    S = librosa.stft(wave, hop_length=hop_length)

    fig, ax = plt.subplots(1, 1, figsize=(20, 10))
    img = librosa.display.specshow(
        librosa.amplitude_to_db(np.abs(S), ref=np.max, amin=1e-10, top_db=180.0),
        y_axis='log',
        x_axis='time',
        sr=sr,
        ax=ax,
        hop_length=hop_length,
    )
    plt.colorbar(img, ax=ax)
    fig.suptitle("{} ({:.1f} ms)".format(method_name, runtime * 1000))
    fig.tight_layout()
    fig.savefig("/tmp/{}.png".format(method_name))
    plt.show()


def multi_check():
    sr1 = 96000
    sr2 = 44100
    wave1 = exp_swept_sine(f1=20, f2=sr1/2, sr=sr1, amp=0.5, t=8.0)

    methods = [(
        "resampy_best",
        lambda: librosa.resample(wave1, sr1, sr2, res_type="kaiser_best")
    ), (
        "resampy_fast",
        lambda: librosa.resample(wave1, sr1, sr2, res_type="kaiser_fast")
    ), (
        "scipy.signal.resample",
        lambda: librosa.resample(wave1, sr1, sr2, res_type="scipy")
    ), (
        "scipy.signal.resample_poly",
        lambda: librosa.resample(wave1, sr1, sr2, res_type="polyphase")
    ), (
        "samplerate.converters.resample_sinc_best",
        lambda: librosa.resample(wave1, sr1, sr2, res_type="sinc_best")
    ), (
        "samplerate.converters.resample_sinc_medium",
        lambda: librosa.resample(wave1, sr1, sr2, res_type="sinc_medium")
    ), (
        "samplerate.converters.resample_sinc_fastest",
        lambda: librosa.resample(wave1, sr1, sr2, res_type="sinc_fastest")
    ), (
        "samplerate.converters.resample_linear",
        lambda: librosa.resample(wave1, sr1, sr2, res_type="linear")
    ), (
        "samplerate.converters.resample_zero_order_hold",
        lambda: librosa.resample(wave1, sr1, sr2, res_type="zero_order_hold")
    )]

    for method_name, func in methods:
        t1 = time.time()
        wave2 = func()
        t2 = time.time()
        runtime = t2 - t1
        print("{:<50s} runtime: {:.1f} ms".format(method_name, runtime * 1000))
        analyze_and_plot(wave2, sr2, method_name, runtime)


if __name__ == "__main__":
    multi_check()

Any ideas why resampy shows such strong distortion?

@bmcfee
Copy link
Owner

bmcfee commented Nov 3, 2020

Concerning indeed! I remember doing some tests along these lines when initially developing resampy, and it was never quite as good as libsamplerate because I never bothered with the iterative optimization procedure for producing the window parameters. In principle, we could do that at some point, and should get ~equivalent results.

The noise differences are in the -90dB range (from what I can tell), so not audible, but not great to see. I wonder if there's some numerical precision factors at play here? I haven't done a source-dive on libsamplerate, but it wouldn't surprise me if they're operating internally at higher precision than resampy is, and those round-off errors could add up. (Just a guess!)

Failing that, it's entirely possible that we have some kind of a bug, but for now I'd chalk it up to less-than-optimal windowing.

@bmcfee
Copy link
Owner

bmcfee commented Jun 27, 2022

Following up on this - I don't think it's actually a numerical precision discrepancy, though I suppose it's possible.

A couple of observations here:

  • Artifacts do not appear when upsampling (e.g. 44100→96000), only downsampling appears to be affected.
  • Downsampling at an integer ratio (eg 44100→22050) is also unaffected, and in fact marginally better than libsamplerate's analogous configuration.

Both of these suggest to me that the algorithm itself is implemented properly, and it all comes down to the precomputed window coefficients. Per the docstring in our filters module:

- `kaiser_best` : 64 zero-crossings, a Kaiser window with beta=14.769656459379492,
and a roll-off frequency of Nyquist * 0.9475937167399596.
- `kaiser_fast` : 16 zero-crossings, a Kaiser window with beta=8.555504641634386,
and a roll-off frequency of Nyquist * 0.85.

So there are three things at play here: the number of zero crossings retained, the beta parameter of the kaiser filter, and the roll-off frequency. There's also the precision parameter to think about, which controls how many coefficients to retain for each zc.

It's not exactly trivial to compare this to the libsamplerate implementation, as it's parametrized a little differently, but we can try. For example, adding

    (
        "resampy_custom",
        lambda: resampy.resample(wave1, sr1, sr2, filter='sinc_window', num_zeros=69, precision=15)
    ),  

to the benchmark script above gives a decent improvement mainly by using better filter interpolation:
image
compared to stock:
image

At a glance, this drops the artifacts from -90dB down to around -110dB. Pushing the precision up to 20 gives
image
bringing artifacts down to around -150 (at a significant performance hit).

Probably we could tune this better to have a more optimal setting of precision and filter shape. IIRC libsamplerate did this by some kind of automated parameter search. That might be a nice thing to implement here as well, but I think I'm convinced that the observed behavior is not a "bug" per se.

@bmcfee
Copy link
Owner

bmcfee commented Jun 27, 2022

Following up on this a bit, the libsamplerate octave code for parameter tuning is here: https://github.com/libsndfile/libsamplerate/tree/master/Octave

If you unpack this a bit, it looks like they're using beta=16.05 for their kaiser-best filter (compared to our 14.76). There's also a bit of over-sampling (fudge factor) in their filter, which is only reported in the docs for the fast filter (not best).

@bmcfee
Copy link
Owner

bmcfee commented Jun 28, 2022

After a bit of a dive into the libsamplerate filters, I think we can get some easy mileage out of changing the balance between the number of zero crossings (64 for kaiser_best) and the precision (number of interpolation points) in our filters. For reference, the libsamplerate high-quality filter has a half-length of 340239, whereas our current "best" filter has about 1/10 as many: 32769 (64 zeros and precision=9). Adding three bits of precision gets us in the same coefficient ballpark, but we can actually do better by going further with precision and removing some zeros from the filter.

Here's a prototype using 40 zeros, precision of 13 bits, beta=12.56ish, rolloff=0.90ish. (Beta and rolloff optimized the method noted in #96 with some modifications to come.) I've adjusted the colormap for these plots so that the midpoint at -120dB is black; anything tinted red is louder, blue is quieter. The idea here is that artifacts in blue should be tolerable.
image
compared to our current kaiser_best:
image
(note the similar runtimes)
and libsamplerate (much slower, but higher quality):
image
(Side note: resampy runtimes here are using the new parallel implementation, hence the speedup relative to other algorithms.)

TLDR is that we can bring the noise level down to the -120dB range without any appreciable loss in efficiency. It remains to be seen how far we can push the low-quality filter, as the trade-off between length and interpolation might be qualitatively different in that regime.

@bmcfee
Copy link
Owner

bmcfee commented Jun 29, 2022

Closing this one out now - 0.3 release improves things a bit, still not perfect, but within reasonable tolerances.

@bmcfee bmcfee closed this as completed Jun 29, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants