Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Quantity Support in Modeling [continued] #4615

Closed
wants to merge 33 commits into from

Conversation

astrofrog
Copy link
Member

Read me first!

This is a rebased continuation of @embray's #3852, and is still a work in progress, but I'll be pushing from time to time to make sure the CI doesn't break.

At this stage please only review the tests, which show a proposed API - do not comment on implementation (which is incomplete)

To-dos (some to-do list items are taken from @embray's list) - not all have to be done for this PR

# TODO: numerical test of results

# A more complex test with an equivalency that depends on x values
g_init.output_equivalencies = u.brightness_temperature, (3e-5 * u.sr, 'x')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the comma here is intentional and required, i.e. g_init.output_equivalencies = u.brightness_temperature(3e-5 * u.sr, 'x') will not work because x needs to be interpreted by the fitter? This is a bit subtle and easy-to-miss. Maybe output_equivalencies should be required to be a dict of {'equivalencies': ..., 'kwargs': ...} or similar?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed this is needed because 'x' is not known until the model is evaluated. I described this a little more in test_output_units_equivalencies_with_parameters. There may indeed be better ways to specify this syntax-wise.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I hope the implementation doesn't put any knowledge of quantities into the actual fitters (or at the very least simple wrappers around them).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@perrygreenfield - indeed, the idea in the fitters will be to get rid of units as soon as possible and add them back later

@cdeil
Copy link
Member

cdeil commented Feb 20, 2016

@astrofrog - Thanks! I skimmed the tests and this looks intuitive and nice.

What about parameter errors? I think astropy.modeling has some support for parameter errors or at least retrieving the covariance matrix. The matrix, if stored as a numpy array, has to be without units (because every parameter can have different units), and then if a user computes a parameter error you'd have to copy over the unit from the parameter? Or maybe there's nothing to do because computing parameter errors from the covariance matrix or in other ways is not part of the astropy.modeling API?

@mhvk
Copy link
Contributor

mhvk commented Feb 20, 2016

@cdeil, @astrofrog - for covariance matrices with different units for each element, there is an issue about this: #3777. As is, one can "cheat" and construct an ndarray filled with Quantity objects.

@astrofrog
Copy link
Member Author

Thanks everyone for the feedback so far!

Here are my own thoughts on this API, triggered by your comments. The current system (including the unit spec, etc.) proposed here may be unnecessarily complicated in terms of setting up models that will be used to fit data with units. The main issue is that this API currently requires the model units to be fully spec-ed out before one can fit a model to data with units. In other words,

m_init = Polynomial(degree=5)
m = fit(m_init, x * u.s, y * u.m)

would not work (at least the way I've written the fitting tests at the moment). However, I think it's pretty clear that this should work and furthermore that m should end up being a model with parameters that have units attached to them.

One crucial piece of the puzzle to get this to work would be a way to transform a unitless model into a unitful model given the units of the data alone. In other words, if I have:

p = Polynomial1D(degree=5)

I should be able to do (with a better method name):

pu = p.with_units_from_data(u.s, u.m)

a pu would then be a model with parameters that are quantities. What this would do is simply attach units to the parameters without changing any values. This then allows two use cases. First:

m_init = Polynomial1D(degree=5)
m = fit(m_init, x * u.s, y * u.m)

In this case, internally, the fitting method can drop the data units, do the fitting the normal way, then at the end call with_units_from_data and return a model with units on the parameters (which means there would be almost no performance overhead to this kind of fitting).

The other thing this allows is for users to specify initial values in a unitful way:

m_init = Polynomial(degree=5).with_units_from_data(x * u.s, y * u.m)
m_init.c5 = 3 * u.m
m = fit(m_init, x * u.s, y * u.m)

The question here is, can the units of the parameters always be determined from the units of the data the model is fit to? There are cases where the units of the model could be arbitrary and not related to the final data units (e.g. angle) but in those cases we can just pick arbitrarily. Are there cases where this logic wouldn't work?

In terms of models alone (forgetting about fitting) we could allow parameters to be set to quantities and quantities to be passed as input to models in the same way as any other function, but without having any constraints hard-coded in the model (I think this is what @hamogu and @mhvk are suggesting) - the quantity arithmetic in evaluate would take care of checking consistency.

Note that this and the previous suggestion about adding units to models then simplifies things a lot in some cases - for instance, if you're dealing with a Gaussian, you no longer need to worry about having to spec out that the mean and sigma need the same units. If you're just evaluating the model, this is taken care of by the arithmetic in evaluate, and if you're fitting, then with_units_from_data simply sets the units of mean and sigma to the units of x.

There are a couple more things one can consider:

  • with_units_from_data could also work on unitful models, converting a model in one set of units to another set of units.
  • We could have a mirror method without_units that returns a models with all the units stripped.

@hamogu @mhvk @perrygreenfield @cdeil @hcferguson @Cadair @keflavich - what do you think about these suggestions?

# Instantiate with a different, but compatible unit
m = TestModel(2.0 * u.pc)
assert m.a.unit == u.pc
assert m.a.value == 2.0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So it doesn't convert to the default units in this case?

@cdeil
Copy link
Member

cdeil commented Feb 22, 2016

The question here is, can the units of the parameters always be determined from the units of the data the model is fit to? There are cases where the units of the model could be arbitrary and not related to the final data units (e.g. angle) but in those cases we can just pick arbitrarily. Are there cases where this logic wouldn't work?

Is this possible / does this make sense? What would be the unit of the amplitude parameter for the PowerLaw model "inferred from data"?
I'm using power-laws that represent rates (units: s^-1), fluxes (units: cm^-2 s^-1 TeV^-1), surface brightness (units: cm^-2 s^-1 TeV^-1 sr^-1) ... how can this unit be inferred?

@astrofrog
Copy link
Member Author

Is this possible / does this make sense? What would be the unit of the amplitude parameter for the PowerLaw model "inferred from data"?

@cdeil - for a power law of the form y = amplitude * (x/x0) ^ alpha, the units are:

  • amplitude has the units of y
  • x0 has the units of x
  • alpha has the units of u.one

Does this make sense? There may be cases I'm not thinking of where this is not possible though?

@astrofrog
Copy link
Member Author

So just to clarify, if you are fitting data in units of rates (units: s^-1) then the amplitude would be a rate.

@cdeil
Copy link
Member

cdeil commented Feb 22, 2016

@astrofrog - Yes, that makes sense. Thanks for explaining.

As to whether this is a good idea to infer units from data automatically --- I don't know.
I guess it can be convenient, but this is also worth considering:

>>> import this
...
Explicit is better than implicit.
...

@astrofrog
Copy link
Member Author

@cdeil - in the case of a 5th order polynomial, you probably wouldn't want to have to set the units of all the coefficients manually though. Do you mean that the following should return a model without units unless explicitly requested with an optional argument in fit?

m_init = Polynomial1D(degree=5)
m = fit(m_init, x * u.s, y * u.m)

@cdeil
Copy link
Member

cdeil commented Feb 22, 2016

I agree that manually setting units for parameters of a polynomial model could be painful.
But that's a bit of a special case, no?
And maybe automatically inferring units there doesn't buy you much to the alternative: drop units for the polynomial evaluate / fit?

@astrofrog
Copy link
Member Author

But that's a bit of a special case, no?

Not necessarily - imagine you have a model with two gaussians and one polynomial background - in the current API described in this PR, you'd have to set the units on each of them before you try and fit the data, which is a fair number of parameters (yes the units would be simpler, but there's still a few...)

@cdeil
Copy link
Member

cdeil commented Feb 22, 2016

@astrofrog You seem pretty sure that inferring the parameter units is a good idea. I'm just a sceptical guy and especially grumpy today. So please ignore me and go ahead!

@pllim
Copy link
Member

pllim commented May 2, 2016

I moved over the 1.2 milestone from #3852, feel free to change it as appropriate.

@astrofrog
Copy link
Member Author

See #4855 for a simplified version of this PR

@astrofrog astrofrog removed this from the v1.2.0 milestone May 13, 2016
nden added a commit that referenced this pull request Jun 10, 2017
Implementation of units in models [simplified version of #4615]
@astrofrog
Copy link
Member Author

This was superseded by #4855, which has now been merged

@astrofrog astrofrog closed this Jun 10, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet