Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for named parameters #386

Merged
merged 9 commits into from
Jun 23, 2021
Merged

Support for named parameters #386

merged 9 commits into from
Jun 23, 2021

Conversation

tmcclintock
Copy link
Contributor

This PR closes #385 by giving support for named parameters in the EnsembleSampler.

Specifically this PR:

  • allows a user to pass in a list of names (types as str but really can be anything) for the parameters to the constructor for EnsembleSampler. Checks are made to ensure that the number of unique names are ndim
  • In the EnsembleSampler.compute_log_prob method, these names are used to reshape the parameters passed in to the method (having previously been generated by the Move) so that instead of an ndarray of shape (nwalkers, ndim) it becomes a list of length nwalkers of dictionaries of length ndim, with each key-value pair being the names and parameters
  • A helper function was written to perform the ndarray -> List[Dict[str, np.number]] conversion, so that it could be tested separately from the compute_log_prob method (higher testable surface area)
  • test of this functionality were placed in the newly created src/emcee/tests/unit/test_ensemble.py file. I tested the helper method, compute_log_prob, and run_mcmc
  • very small bits of cleanup in ensemble.py to be more PEP8 compliant/more pythonic

As discussed in #385, this PR makes no assumptions about the kinds of likelihood functions or underlying models being utilized. It does not take gradients of the likelihood, nor does it break the API (frontend or backend) in any way. I believe the unit test in in the TestNamedParametes tests make it clear why this is useful. Positional bugs will be a thing of the past!

Support for named parameters could go a bit further. For example:

  1. I didn't allow vectorize=True when naming the parameters. I guess the right thing to allow for would be if vectorize=True then we produce one dictionary where the keys are the parameter names and the values are 1D arrays of each parameter.
  2. I did not implement support for parameter names in the PTSampler. IMO doing this would be "double work" and from a software perspective it would make more sense (IMO) to make an abstract Sampler class that handled the names, and that both the EnsembleSampler and PTSampler would subclass. But, this is obviously a non-trivial refactor (that I would be happy to contribute to).

Thanks @dfm for the discussion on #385. I look forward to your review.

@dfm
Copy link
Owner

dfm commented Apr 25, 2021

Thanks @tmcclintock: This looks like a good start!

I'll take a closer look ASAP, but I think I'm overall very happy with this approach. I think the one thing I would like to have would be to allow each parameter to have an arbitrary shape. This will complicate the logic a bit, but I think it would make this more impactful.

@dfm
Copy link
Owner

dfm commented Apr 25, 2021

Also: scipy needs to be an optional dependency even for the tests.

@tmcclintock
Copy link
Contributor Author

tmcclintock commented Apr 25, 2021

@dfm thanks for the quick response.

RE each parameter have an arbitrary shape - Going by this docstring I was under the impression that each parameter was a scalar. Are you saying you want to change the signature to EnsembleSampler.sample such that initial_state could be (List[Union[np.number, np.ndarray]]) (i.e. params[0] was a scalar but params[1] could be an array)? If not, can you please clarify the pattern you are requesting?

RE scipy in all tests - can you elaborate? Do you mean just import scipy at the top? That should not be necessary unless there is some magic in the github actions you set up. EDIT: nevermind I see what you mean. The CI/CD failed because I used scipy, but we don't want to assume users have it installed. I will fix this.

@dfm
Copy link
Owner

dfm commented Apr 25, 2021

RE each parameter have an arbitrary shape - Going by this docstring I was under the impression that each parameter was a scalar. Are you saying you want to change the signature to EnsembleSampler.sample such that initial_state could be (List[Union[np.number, np.ndarray]]) (i.e. params[0] was a scalar but params[1] could be an array)? If not, can you please clarify the pattern you are requesting?

I'm not suggesting changing the syntax, I'm just saying that if you're adding names to elements of the vector space, there's no reason why there needs to be one name per dimension. My parameters are often multivariate, and it would be annoying to manually add param_0, param_1, ... as names manually, vs. calling it param and providing a shape. Right now, you're requiring that a name maps to a scalar and that seems artificial if we're adding this logic anyways!

If this isn't clear, I'll try to put together a little demo this week.

@tmcclintock
Copy link
Contributor Author

tmcclintock commented Apr 26, 2021

I see, so for instance in a GMM you would essentially like to do:

params["weights"].shape
# (2,)  # component weights
params["means"].shape
# (2,)
params["cov"].shape
# (3,)  # independent elements of the cov matrix

Correct?

For an MVP this is simple -- if we restrict keywords to be either single values or consecutive numbers in the parameter array then this is easy. If instead we wanted to allow a keyword to map to an arbitrary set of indices in the params array, this is tricky (like if params["a"] mapped to the first, fifth, and tenth parameters in the params array, this may be a tricky pattern to support in general).

Are you OK with allowing only consecutive parameters?

@dfm
Copy link
Owner

dfm commented Apr 26, 2021

Yes this sounds about right and I would be totally happy with requiring contiguous values!

@tmcclintock
Copy link
Contributor Author

@dfm I have updated the PR. Now the following pattern is supported:

def ln_prob(params):
    mean1, mean2 = params["means"]
    var1, var2 = params["vars"]
    constant = params["constant"]
    ...


sampler = EnsembleSampler(..., parameter_names = {"means": [0, 1], "vars": [2, 3], "constant": 4})

That is, paramter_names can be any of:

  • a list of names ["a", "b", "c"]
  • a dictionary where the values can be integer positions of parameters or list of integers {"a": [1, 3, 5], "b": 0, "c": [2, 4]}

So, I was able to support non-contiguous parameter indexes.

Let me know if this all looks good. If you'd like docs work done I can either do that here or in another PR.

@tmcclintock
Copy link
Contributor Author

@dfm can you push the button on the workflow? I'd like to tackle any problems that might occur ahead of your review.

@dfm
Copy link
Owner

dfm commented May 3, 2021

@tmcclintock: Weird - I thought I already had. Maybe I have to do it each time?

@dfm
Copy link
Owner

dfm commented May 3, 2021

Don't worry about the coverage failure.

@dfm dfm merged commit 0311f0b into dfm:main Jun 23, 2021
@dfm
Copy link
Owner

dfm commented Jun 23, 2021

@tmcclintock: Sorry about the delay on this! I think this looks awesome so I've merged and we can address issues as they come up in other threads. If you'd be up for adding a tutorial about this, that would be awesome. Want to start another PR whenever you have a chance?

Thanks again for this!

@tmcclintock
Copy link
Contributor Author

@dfm no problem, and sure, I'd be happy to.

@cluhmann
Copy link

Sorry to revive this, but did the tutorial mentioned ever get put together? If so, could you point me to it? I can probably figure it out on my own, but I figured I would check. I'm super excited about this feature. Thanks!

@tmcclintock
Copy link
Contributor Author

@cluhmann I have not written the tutorial. My bad :/. You can see an example of how to set up a function with named parameters in the unit test found in src/emcee/tests/unit/test_ensemble.py.

@cluhmann
Copy link

No worries. I figured it out (just being lazy), but the tests help. Thanks again for the feature!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[ENH] support for dictionaries and/or namedtuples
3 participants