Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve default filters and make the whole process reproducible #96

Closed
bmcfee opened this issue Jun 27, 2022 · 14 comments · Fixed by #98
Closed

Improve default filters and make the whole process reproducible #96

bmcfee opened this issue Jun 27, 2022 · 14 comments · Fixed by #98

Comments

@bmcfee
Copy link
Owner

bmcfee commented Jun 27, 2022

Issue #75 raised some questions about the pre-packaged default filters that we ship with, and whether they could be improved. (I expect the answer is "yes".)

Previously, the filter optimization was implemented by a gaussian process hyperparameter optimization #8 as implemented in this gist: https://gist.github.com/4aa4c959bb0d310e3f12cdedf91d7661

The above notebook worked well enough given the constraints and tools of the time, but I did have to dredge it out of an old laptop. Properly this functionality should be included in the repository, and be fully reproducible (with rng seeds and all). Doing this will make it easier to improve the filters going forward. It would also make it possible to experiment with building a larger parameter search into the process.

If we reimplement this, it probably makes sense to discuss the window design objective (a little ad-hoc at the moment) and look into more modern tooling for GP search (eg hyperopt).

@bmcfee bmcfee added this to the 0.3.0 milestone Jun 27, 2022
@avalentino
Copy link
Contributor

Dear @bmcfee, I'm currently working on the packaging of resampy for debian (hope you don't mind).
According to the debian policy, It would be important to have the possibility to re-generate the data file containing the filter(s) during the build process.
For this reason having this issue closed would be the ideal solution.

Do you plan to implement is in the near future? Do you have already in mind a date for the v0.3.0 release?
If this is the case I will wait the next release before submitting the upload request.
Otherwise I will need to figure out some workaround that allows me to be compliant with the debian policy.

@bmcfee
Copy link
Owner Author

bmcfee commented Jun 27, 2022

I'm currently working on the packaging of resampy for debian (hope you don't mind).

Not at all - thanks for putting in the effort!

Do you plan to implement is in the near future? Do you have already in mind a date for the v0.3.0 release?
If this is the case I will wait the next release before submitting the upload request.
Otherwise I will need to figure out some workaround that allows me to be compliant with the debian policy.

I think so, yes. I took a bit of time this afternoon to prototype a newer version of the parameter solver using optuna. (The previous version used https://github.com/craffel/simple_spearmint/ which was never properly packaged.) As it currently stands, it reliably produces filter parameters that are pretty close to what the old version did. I want to experiment with it a bit more to see if I can bring the noise down following the thread in #75, but I think this will be doable for the 0.3 release.

@avalentino
Copy link
Contributor

Thanks for your quick reply.
Unfortunately optuna is not available in debian currently, so I'm not sure that the new script would solve mi problem.

Would it be possible to have a copy of the data saved in txt format in the repo?
Probably this could help.

@bmcfee
Copy link
Owner Author

bmcfee commented Jun 27, 2022

Unfortunately optuna is not available in debian currently, so I'm not sure that the new script would solve mi problem.

Is that strictly necessary though? It wouldn't be a run-time dependency.

Would it be possible to have a copy of the data saved in txt format in the repo?

Is .npz not sufficient for this?

@avalentino
Copy link
Contributor

Unfortunately optuna is not available in debian currently, so I'm not sure that the new script would solve mi problem.

Is that strictly necessary though? It wouldn't be a run-time dependency.

The idea is that the debian package should be re-build entirely form sources in a debian environment and without any access to the interned. Not having opuna in debian is blocking in this sense.
Of course one could also create a debian package for optuna but this would require more effort.

Would it be possible to have a copy of the data saved in txt format in the repo?

Is .npz not sufficient for this?

I fear it is not.
I will check again the policy and discuss with debian developers.

@bmcfee
Copy link
Owner Author

bmcfee commented Jun 27, 2022

The idea is that the debian package should be re-build entirely form sources in a debian environment and without any access to the interned.

I think we still satisfy that requirement if the data is provided. The packaged filter coefficients are just a cache of something you could compute directly with an explicit parametrization. While I agree that it would be great in principle to have this all end-to-end, it seems way overkill IMO. They wouldn't require this for something like icons or audio excerpts, right? What makes this any different?

Is .npz not sufficient for this?

I fear it is not. I will check again the policy and discuss with debian developers.

That also seems weird to me. It's an open format, and generally preferably to a text-based encoding (which may be lossy via float<->decimal conversion).

@avalentino
Copy link
Contributor

The idea is that the debian package should be re-build entirely form sources in a debian environment and without any access to the interned.

I think we still satisfy that requirement if the data is provided. The packaged filter coefficients are just a cache of something you could compute directly with an explicit parametrization. While I agree that it would be great in principle to have this all end-to-end, it seems way overkill IMO. They wouldn't require this for something like icons or audio excerpts, right? What makes this any different?

Sorry, just for me to understand, Is it something that I can compute using the resampy.filters.sinc_window function?
If so, probably it is just a matter of documenting the parameters somewhere e.g. a dedicated README in the data folder.

@bmcfee
Copy link
Owner Author

bmcfee commented Jun 28, 2022

Sorry, just for me to understand, Is it something that I can compute using the resampy.filters.sinc_window function?

Basically yes. "kaiser_best" and "kaiser_fast" are cached versions of filters constructed by sinc_window. The concern of this issue is the code which selects which parametrization (beta, rolloff, maybe other parameters) should be cached, and this only needs to happen once (in 2016 😬). There is no runtime dependency, or even build-time dependency on this parameter optimization whatsoever.

@avalentino
Copy link
Contributor

Sorry, just for me to understand, Is it something that I can compute using the resampy.filters.sinc_window function?

Basically yes. "kaiser_best" and "kaiser_fast" are cached versions of filters constructed by sinc_window. The concern of this issue is the code which selects which parametrization (beta, rolloff, maybe other parameters) should be cached, and this only needs to happen once (in 2016 grimacing). There is no runtime dependency, or even build-time dependency on this parameter optimization whatsoever.

OK, do you have parameters used to generate "kaiser_best" and "kaiser_fast" in resampy v0.2.2?
If my understanding is correct the parameters documented in #98 are the new ones, correct?

Having the parameters would completely solve my problem with the debian packaging, because I can generate the binary files during the build process with a very simple script.

@avalentino
Copy link
Contributor

OK, do you have parameters used to generate "kaiser_best" and "kaiser_fast" in resampy v0.2.2?

Sorry I have just realized that the parameters are already documented in the (current) docstring.
Probably only the precision is missing, but I can retrieve it anyway.

@bmcfee
Copy link
Owner Author

bmcfee commented Jun 29, 2022

Probably only the precision is missing, but I can retrieve it anyway.

Yeah, sorry for that - the open pr #98 documents this more fully. The precision values are stored in the data files though, so all the information is there.

@avalentino
Copy link
Contributor

Thanks a lot @bmcfee
The package is now ready.
I should hopefully go into the main archive in a a couple of weeks

@bmcfee
Copy link
Owner Author

bmcfee commented Jun 29, 2022

Very cool - thanks!

I'll also plan to have the 0.3.0 release done up soon, and the upgrade process should be pretty easy.

@avalentino
Copy link
Contributor

yes, after the first upload in the debian archive I should be able to perform the update to new versions very quickly

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants