Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enhancement Request, Display Int values in print statement, cast int values to GP process #13

Closed
mpearmain opened this issue Mar 1, 2016 · 7 comments

Comments

@mpearmain
Copy link

Is it possible to have a better method print to int values?
(Even better to be able to cast them to the GP as ints)

Simple example, with Random Forests:

RFC(n_estimators=int(n_estimators),
        min_samples_split=int(min_samples_split),
        max_features=min(max_features, 0.999),
        random_state=2)

When running it may print out:
n_estimators = 10.3456, min_samples_split = 2.35643, max_features=0.99

I would expect it to print out:
n_estimators = 10, min_samples_split = 2, max_features=0.99

Going deeper, because its floats that have been passed, it may search the 'same' space again:

n_estimators = 10.3456, min_samples_split = 2.35643, max_features=0.99
n_estimators = 10.6334, min_samples_split = 2.12329, max_features=0.99

It seems rather wasteful to search int spaces.

I say this with no idea on how this affects the underlying GP process that is being called

@fmfn
Copy link
Member

fmfn commented Mar 2, 2016

I totally agree with that and the int(param) hack always rubbed me the wrong way. To be honest, I haven't really been able to come up with a simple and elegant way to deal with this issue.

In theory I imagine that it should be possible to define an appropriate kernel for the GP while also constraining the acquisition function optimization (or at least its output) to natural numbers. There's definitely a way, I just need to come up with something straightforward to keep in line with this package's theme.

In practice I find that adding a nugget to the "direction" of integer parameters helps the GP by providing it with some extra flexibility. I suppose I could wrap this in a single option when creating the object.

Let me see what I can do.

@gordania
Copy link

gordania commented Jan 3, 2017

Hi @fmfn , is there any update on this issue? I'm experiencing the same problem and it's causing unnecessary iterations to be run as @mpearmain outlined.

Thanks for your code!!

@fmfn
Copy link
Member

fmfn commented Jan 4, 2017

To be honest, I'm not sure how, or even whether, I want to tackle this. I can think of a couple of hacks to mitigate some of the issues, such as repeated sampling, however I am not aware of any simple and elegant solution that can scale to arbitrarily many integer parameters.

Since users with a need of a more comprehensive and robust toolset will probably favour something like spearmint, I tend to avoid adding features to this package unless they add more value than overhead.

I'll keep looking around though, see if I find any papers on this subject, maybe someone has figured out a nice solution and I just need to find it.

@jvmncs
Copy link

jvmncs commented Mar 1, 2017

Once constraints are implemented into the package, I might be able to make a go at this. The idea would be to keep a running count of how many times a point has been searched, and once that count has reached a certain threshold, you add a constraint that eliminates a precisely defined neighborhood around that point from the search space (something like abs(x_i-y_i)-.5>=0). Of course, we'd have to keep the user abreast of the situation with appropriate printing. This will make more sense once I finish my constraints idea. I'll return to this once that's done.

It's possible that this would add too many constraints for SciPy's optimizer to handle. Not sure how its performance works, but we'll see.

@sharthZ23
Copy link

@fmfn How about just add callback for printing?

@fmfn
Copy link
Member

fmfn commented Jan 11, 2018

That could be an option.

@fmfn
Copy link
Member

fmfn commented Nov 24, 2018

This can now be accomplished by building a (or extending the existing) logging observer. In the upcoming 1.0 release this feature might be included by default. Worse case scenario instructions on how to do it will be available.

@fmfn fmfn closed this as completed Nov 24, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants