Improve default filters and make the whole process reproducible #96

bmcfee · 2022-06-27T17:43:18Z

Issue #75 raised some questions about the pre-packaged default filters that we ship with, and whether they could be improved. (I expect the answer is "yes".)

Previously, the filter optimization was implemented by a gaussian process hyperparameter optimization #8 as implemented in this gist: https://gist.github.com/4aa4c959bb0d310e3f12cdedf91d7661

The above notebook worked well enough given the constraints and tools of the time, but I did have to dredge it out of an old laptop. Properly this functionality should be included in the repository, and be fully reproducible (with rng seeds and all). Doing this will make it easier to improve the filters going forward. It would also make it possible to experiment with building a larger parameter search into the process.

If we reimplement this, it probably makes sense to discuss the window design objective (a little ad-hoc at the moment) and look into more modern tooling for GP search (eg hyperopt).

avalentino · 2022-06-27T21:36:43Z

Dear @bmcfee, I'm currently working on the packaging of resampy for debian (hope you don't mind).
According to the debian policy, It would be important to have the possibility to re-generate the data file containing the filter(s) during the build process.
For this reason having this issue closed would be the ideal solution.

Do you plan to implement is in the near future? Do you have already in mind a date for the v0.3.0 release?
If this is the case I will wait the next release before submitting the upload request.
Otherwise I will need to figure out some workaround that allows me to be compliant with the debian policy.

bmcfee · 2022-06-27T21:54:30Z

I'm currently working on the packaging of resampy for debian (hope you don't mind).

Not at all - thanks for putting in the effort!

Do you plan to implement is in the near future? Do you have already in mind a date for the v0.3.0 release?
If this is the case I will wait the next release before submitting the upload request.
Otherwise I will need to figure out some workaround that allows me to be compliant with the debian policy.

I think so, yes. I took a bit of time this afternoon to prototype a newer version of the parameter solver using optuna. (The previous version used https://github.com/craffel/simple_spearmint/ which was never properly packaged.) As it currently stands, it reliably produces filter parameters that are pretty close to what the old version did. I want to experiment with it a bit more to see if I can bring the noise down following the thread in #75, but I think this will be doable for the 0.3 release.

avalentino · 2022-06-27T22:11:44Z

Thanks for your quick reply.
Unfortunately optuna is not available in debian currently, so I'm not sure that the new script would solve mi problem.

Would it be possible to have a copy of the data saved in txt format in the repo?
Probably this could help.

bmcfee · 2022-06-27T22:15:24Z

Unfortunately optuna is not available in debian currently, so I'm not sure that the new script would solve mi problem.

Is that strictly necessary though? It wouldn't be a run-time dependency.

Would it be possible to have a copy of the data saved in txt format in the repo?

Is .npz not sufficient for this?

avalentino · 2022-06-27T22:22:21Z

Unfortunately optuna is not available in debian currently, so I'm not sure that the new script would solve mi problem.

Is that strictly necessary though? It wouldn't be a run-time dependency.

The idea is that the debian package should be re-build entirely form sources in a debian environment and without any access to the interned. Not having opuna in debian is blocking in this sense.
Of course one could also create a debian package for optuna but this would require more effort.

Would it be possible to have a copy of the data saved in txt format in the repo?

Is .npz not sufficient for this?

I fear it is not.
I will check again the policy and discuss with debian developers.

bmcfee · 2022-06-27T23:09:09Z

The idea is that the debian package should be re-build entirely form sources in a debian environment and without any access to the interned.

I think we still satisfy that requirement if the data is provided. The packaged filter coefficients are just a cache of something you could compute directly with an explicit parametrization. While I agree that it would be great in principle to have this all end-to-end, it seems way overkill IMO. They wouldn't require this for something like icons or audio excerpts, right? What makes this any different?

Is .npz not sufficient for this?

I fear it is not. I will check again the policy and discuss with debian developers.

That also seems weird to me. It's an open format, and generally preferably to a text-based encoding (which may be lossy via float<->decimal conversion).

avalentino · 2022-06-28T05:37:08Z

The idea is that the debian package should be re-build entirely form sources in a debian environment and without any access to the interned.

I think we still satisfy that requirement if the data is provided. The packaged filter coefficients are just a cache of something you could compute directly with an explicit parametrization. While I agree that it would be great in principle to have this all end-to-end, it seems way overkill IMO. They wouldn't require this for something like icons or audio excerpts, right? What makes this any different?

Sorry, just for me to understand, Is it something that I can compute using the resampy.filters.sinc_window function?
If so, probably it is just a matter of documenting the parameters somewhere e.g. a dedicated README in the data folder.

bmcfee · 2022-06-28T13:34:24Z

Sorry, just for me to understand, Is it something that I can compute using the resampy.filters.sinc_window function?

Basically yes. "kaiser_best" and "kaiser_fast" are cached versions of filters constructed by sinc_window. The concern of this issue is the code which selects which parametrization (beta, rolloff, maybe other parameters) should be cached, and this only needs to happen once (in 2016 😬). There is no runtime dependency, or even build-time dependency on this parameter optimization whatsoever.

avalentino · 2022-06-29T05:22:47Z

Sorry, just for me to understand, Is it something that I can compute using the resampy.filters.sinc_window function?

Basically yes. "kaiser_best" and "kaiser_fast" are cached versions of filters constructed by sinc_window. The concern of this issue is the code which selects which parametrization (beta, rolloff, maybe other parameters) should be cached, and this only needs to happen once (in 2016 grimacing). There is no runtime dependency, or even build-time dependency on this parameter optimization whatsoever.

OK, do you have parameters used to generate "kaiser_best" and "kaiser_fast" in resampy v0.2.2?
If my understanding is correct the parameters documented in #98 are the new ones, correct?

Having the parameters would completely solve my problem with the debian packaging, because I can generate the binary files during the build process with a very simple script.

avalentino · 2022-06-29T05:25:35Z

OK, do you have parameters used to generate "kaiser_best" and "kaiser_fast" in resampy v0.2.2?

Sorry I have just realized that the parameters are already documented in the (current) docstring.
Probably only the precision is missing, but I can retrieve it anyway.

bmcfee · 2022-06-29T11:39:13Z

Probably only the precision is missing, but I can retrieve it anyway.

Yeah, sorry for that - the open pr #98 documents this more fully. The precision values are stored in the data files though, so all the information is there.

avalentino · 2022-06-29T12:19:33Z

Thanks a lot @bmcfee
The package is now ready.
I should hopefully go into the main archive in a a couple of weeks

bmcfee · 2022-06-29T12:55:56Z

Very cool - thanks!

I'll also plan to have the 0.3.0 release done up soon, and the upgrade process should be pretty easy.

avalentino · 2022-06-29T15:14:15Z

yes, after the first upload in the debian archive I should be able to perform the update to new versions very quickly

bmcfee added functionality packaging labels Jun 27, 2022

bmcfee added this to the 0.3.0 milestone Jun 27, 2022

bmcfee mentioned this issue Jun 28, 2022

Quality issues? #75

Closed

bmcfee added a commit that referenced this issue Jun 28, 2022

fixing #96, #75 - update filter generation and new filters

a927af7

bmcfee mentioned this issue Jun 28, 2022

Parameter generation #98

Merged

bmcfee closed this as completed in #98 Jun 29, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve default filters and make the whole process reproducible #96

Improve default filters and make the whole process reproducible #96

bmcfee commented Jun 27, 2022

avalentino commented Jun 27, 2022

bmcfee commented Jun 27, 2022

avalentino commented Jun 27, 2022

bmcfee commented Jun 27, 2022

avalentino commented Jun 27, 2022

bmcfee commented Jun 27, 2022

avalentino commented Jun 28, 2022

bmcfee commented Jun 28, 2022

avalentino commented Jun 29, 2022

avalentino commented Jun 29, 2022

bmcfee commented Jun 29, 2022

avalentino commented Jun 29, 2022

bmcfee commented Jun 29, 2022

avalentino commented Jun 29, 2022

Improve default filters and make the whole process reproducible #96

Improve default filters and make the whole process reproducible #96

Comments

bmcfee commented Jun 27, 2022

avalentino commented Jun 27, 2022

bmcfee commented Jun 27, 2022

avalentino commented Jun 27, 2022

bmcfee commented Jun 27, 2022

avalentino commented Jun 27, 2022

bmcfee commented Jun 27, 2022

avalentino commented Jun 28, 2022

bmcfee commented Jun 28, 2022

avalentino commented Jun 29, 2022

avalentino commented Jun 29, 2022

bmcfee commented Jun 29, 2022

avalentino commented Jun 29, 2022

bmcfee commented Jun 29, 2022

avalentino commented Jun 29, 2022