Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implementation of units in models [simplified version of #4615] #4855

Merged
merged 86 commits into from
Jun 10, 2017

Conversation

astrofrog
Copy link
Member

@astrofrog astrofrog commented May 9, 2016

This is a simplified version of #4615 - in particular it no longer includes all the unit spec implementation, and I've tried to remove any changes unrelated to the functionality being added here

Feedback needed! it would be great if people could try and clone this branch and try out various cases with the Gaussian1D model. In addition, I would really appreciate code reviews!

NOTE: please read my latest comment before reviewing this!

This pull request finishes the basic implementation of support for units in models and fitting. I'll try and give some background to the main changes here. At the moment, I've set units as opt-in on a per-model basis, and only the Gaussian1D model supports units for now.

Units on parameters and unit-friendly evaluation

For models that support units, parameters can now be quantities:

>>> from astropy.modeling.models import Gaussian1D
>>> g = Gaussian1D(amplitude=1 * u.J, mean=1 * u.m, stddev=0.1 * u.m)

Support for units of parameters and evaluation of models with parameters that have units can be enhanced by adding a propery called input_units that returns the units that the input should be in (and None can be used to indicate that any unit can be passed in). For Gaussian1D, this is:

    @property
    def input_units(self):
        if self.mean.unit is None:
            return {'x': dimensionless_unscaled}
        else:
            return {'x': self.mean.unit}

This allows us to check that the input has the correct units:

>>> g(1)
...
UnitsError: Units of input 'x', (dimensionless), could not be converted to required input units of m (length)

>>> g(3 * u.cm)
Out[13]: <Quantity 3.703531977652397e-21 J>

Note that input_units is just a convenience that allows us to give clearer error messages, but even if it is not present, quantities will still be passed to the evaluation function which is free to raise its own errors (this was a point that @hamogu and @mhvk made in #4615 (comment)).

Equivalencies on parameters

Equivalencies on parameters are deliberately not supported. This is because it would introduce conceptual difficulties: a Gaussian in wavelength space is not the same as a Gaussian in frequency space.

Equivalencies on input

However, one place where equivalencies can be unambiguous is when evaluating the model, for the input data values. These equivalencies are passed through a new keyword argument equivalencies in the model call:

>>> g(3 * u.Hz, equivalencies={'x': u.spectral()})
<Quantity 0.0 J>

This requires the input_units property to exist and converts the input values to the input_units units.

Fitting models/data with units

Adding support for fitting models to data with units is done by adding a method named _parameter_units_for_data_units (for lack of a better name) that takes the units of the data and returns the units that each parameter would have to make sure all the units were consistent. For Gaussian1D, this looks like:

    def _parameter_units_for_data_units(self, xunit, yunit, zunit):
        return OrderedDict([('mean', xunit),
                            ('stddev', xunit),
                            ('amplitude', yunit)])

In order to implement support for models and/or data with units for a particular fitter, one simply needs to add the fitter_unit_support decorator to the __call__ function. With that in place, we can do things like:

>>> import numpy as np
>>> from astropy import units as u
>>> from astropy.modeling import models, fitting

>>> x = np.linspace(-5., 5., 2000)
>>> y = 3 * np.exp(-0.5 * (x - 1.3)**2 / 0.8**2)
>>> y += np.random.normal(0., 0.2, x.shape)
>>> x = x * u.m
>>> y = y * u.Jy

>>> g_init = models.Gaussian1D(amplitude=1. * u.mJy, mean=3 * u.cm, stddev=2 * u.mm)
>>> fit_g = fitting.LevMarLSQFitter()
>>> g = fit_g(g_init, x, y)

>>> print(g)
Model: Gaussian1D
Inputs: ('x',)
Outputs: ('y',)
Model set size: 1
Parameters:
     amplitude        mean         stddev    
         Jy            m             m       
    ------------ ------------- --------------
    3.0161309672 1.30493337628 0.797252253802

Behind the scenes, this does the following:

  1. Determine the units the parameters would need to be in so that if we stripped the units from the data and model, everything would be correct relative to each other.
  2. Create a copy of the model, then convert the values to the units determined in the previous step, then drop the units, and drop the units on the data too
  3. Do the fitting (with no units)
  4. Add back the units determined in step 1 to the model copy

return outputs[0]
else:
return tuple(outputs)
return tuple(outputs)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the reasoning of this change? It seems unrelated to unit support at a glance.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was one of the original changes by @embray (I forgot to revert that for now and put it in a separate PR)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, okay. Now I see that this logic is moved further up.

@pllim
Copy link
Member

pllim commented May 10, 2016

Some comments based on your examples:

  1. When input unit is defined and I type g(1), I would expect the 1 to be treated as having g.mean.unit by default without me having to override input_units property. Or is it just me?
  2. When I set up the model as Gaussian1D(amplitude=1, mean=5000 * u.AA, stddev=100), it did not complain that mean was assigned an unit, but not stddev, which might lead an user to believe that stddev is automatically given g.mean.unit. However, this is clearly not the case because I get TypeError on evaluation.
  3. Linear1D(3 * u.m) did not raise the error as you indicated in your example above.

@pllim
Copy link
Member

pllim commented May 10, 2016

Also, this breaks the bounding_box property:

>>> g.bounding_box
.../astropy/modeling/core.pyc in bounding_box(self)
   1037             # with any optional parameters)
   1038             # (In other words, in this case self._bounding_box is a *class*)
-> 1039             bounding_box = self._bounding_box((), _model=self)()
   1040             return self._bounding_box(bounding_box, _model=self)
   1041 

.../astropy/modeling/core.py in __call__(self, factor)
    354             kwargs.append((param.name, param.default))
    355 
--> 356         __call__ = make_function_with_signature(__call__, ('self',), kwargs)
    357 
    358         return type(str('_{0}BoundingBox'.format(cls.name)), (_BoundingBox,),

.../astropy/modeling/core.pyc in __call__(self, **kwargs)
    336 
    337         def __call__(self, **kwargs):
--> 338             return func(self._model, **kwargs)
    339 
    340         kwargs = []

.../astropy/modeling/functional_models.pyc in bounding_box(self, factor)
    145         dx = factor * self.stddev
    146 
--> 147         return (x0 - dx, x0 + dx)
    148 
    149     @staticmethod

.../astropy/units/quantity.pyc in __array_prepare__(self, obj, context)
    343                                      "argument is not a quantity (unless the "
    344                                      "latter is all zero/infinity/nan)"
--> 345                                      .format(function.__name__))
    346             except TypeError:
    347                 # _can_have_arbitrary_unit failed: arg could not be compared

UnitsError: Can only apply 'subtract' function to dimensionless quantities when
other argument is not a quantity (unless the latter is all zero/infinity/nan)

@pllim
Copy link
Member

pllim commented May 10, 2016

Also, this breaks n_models > 1:

>>> g = Gaussian1D([1 * u.J, 2.0], [1 * u.m, 5000 * u.AA], [0.1 * u.m, 100 * u.AA],
...                n_models=2)
.../astropy/modeling/core.py in __init__(self, amplitude, mean, stddev, **kwargs)
    432 
    433             new_init = make_function_with_signature(
--> 434                     __init__, args, kwargs, varkwargs='kwargs')
    435             update_wrapper(new_init, cls)
    436             cls.__init__ = new_init

.../astropy/modeling/core.pyc in __init__(self, *params, **kwargs)
    429 
    430             def __init__(self, *params, **kwargs):
--> 431                 return super(cls, self).__init__(*params, **kwargs)
    432 
    433             new_init = make_function_with_signature(

.../astropy/modeling/core.pyc in __init__(self, *args, **kwargs)
    691         # Parameter values must be passed in as keyword arguments in order to
    692         # distinguish them
--> 693         self._initialize_parameters(args, kwargs)
    694 
    695     def __repr__(self):

.../astropy/modeling/core.pyc in _initialize_parameters(self, args, kwargs)
   1547                     continue
   1548                 else:
-> 1549                     params[param_name] = np.asanyarray(value, dtype=np.float)
   1550 
   1551         if kwargs:

.../numpy/core/numeric.pyc in asanyarray(a, dtype, order)
--> 525     return array(a, dtype, copy=False, order=order, subok=True)
    526 
    527 def ascontiguousarray(a, dtype=None):

ValueError: setting an array element with a sequence.

When units are not involved, it works:

>>> g = Gaussian1D([1, 2.0], [1, 5000], [0.1, 100], n_models=2)
>>> g
<Gaussian1D(amplitude=[ 1., 2.], mean=[    1., 5000.], stddev=[   0.1, 100. ], n_models=2)>

@pllim
Copy link
Member

pllim commented May 10, 2016

Also... Once models can handle units, one of the first things I am going to do is to add a blackbody model (there was an attempt via #1480). So, it would be desirable to make sure that this PR can handle something like astrofrog#58.

@mhvk
Copy link
Contributor

mhvk commented May 12, 2016

@astrofrog - top level comment first: if we do equivalencies at all, the data will have to converted to the parameter's units rather than the other way around in order for a Gaussian to work (or the Gaussian has to turn itself in another function...), which has obvious problems with multiple models possibly requiring different transformations. I think it makes more sense to start by omitting equivalencies.


model = self.copy()

xunit = x.unit if isinstance(x, Quantity) else dimensionless_unscaled
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just xunit = getattr(x, 'unit', dimensionless_unscaled)?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

@mhvk
Copy link
Contributor

mhvk commented May 12, 2016

Along the line of equivalencies: logarithmic units are going to be problematic here. Not a deal-breaker, but it makes me worry slightly about the approach. Possibly, it is better (even if much slower) to just let a model upon evaluation convert the data as appropriate for the parameters.

for name, unit in parameter_units.items():
parameter = getattr(model, name)
if parameter.unit is not None:
parameter.value *= parameter.unit.to(unit)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not all transformations are multiplicative... Instead: parameter.value = parameter.to(unit).value.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

@mhvk
Copy link
Contributor

mhvk commented May 12, 2016

Ai, this is not going to be a little bit of work... have to run...

@astrofrog
Copy link
Member Author

Along the line of equivalencies: logarithmic units are going to be problematic here. Not a deal-breaker, but it makes me worry slightly about the approach. Possibly, it is better (even if much slower) to just let a model upon evaluation convert the data as appropriate for the parameters.

The issue is that the units of the data can't easily be passed to the data since the model is called from e.g. scipy optimizers which won't know to pass through the units. I think you're still going to have to wrap the fitters and modify the model in some way to let it know about the data units. I'm not convinced the current approach can't be extended to deal with log units - I will give it a try.

@astrofrog
Copy link
Member Author

@nden - as promised, this is now rebased, and ready for you to try out

@MSeifert04
Copy link
Contributor

Seems like CircleCI doesn't like #5319 and the posix.types import.

@astrofrog
Copy link
Member Author

@MSeifert04 - see #5498

@MSeifert04
Copy link
Contributor

👍

@nden
Copy link
Contributor

nden commented Nov 23, 2016

Thanks @astrofrog. I'll give it a try in the next few days.

@pllim
Copy link
Member

pllim commented Nov 23, 2016

My concerns above still stand.

@astrofrog
Copy link
Member Author

@pllim - regarding your first three comments:

When input unit is defined and I type g(1), I would expect the 1 to be treated as having g.mean.unit by default without me having to override input_units property. Or is it just me?

I don't think this would be very safe - sometimes you are not the one setting up the model - for instance if we have the blackbody function which requires wavelength and is defined in Astropy, you wouldn't want that function to be able to take unitless input for the evaluation. So I'd rather not have g(1) work in this case.

When I set up the model as Gaussian1D(amplitude=1, mean=5000 * u.AA, stddev=100), it did not complain that mean was assigned an unit, but not stddev, which might lead an user to believe that stddev is automatically given g.mean.unit. However, this is clearly not the case because I get TypeError on evaluation.

The prior implementations of units in modeling included a lot of checking upon parameter setting which complicated the code base a lot. This PR tries to simplify this following the suggestion many people made that the consistency of units between parameters should be done at evaluation time as in your example here. One way to get around this without re-introducing a whole load of consistency checking between parameters would be to evaluate the model for one scalar value once any parameter is changed. There is a small performance penalty to this, so this could potentially be an option that could be turned off in the model init, for instance Gaussian1D(amplitude=1, mean=5000 * u.AA, stddev=100, check_parameter_units=False). What do you think?

Linear1D(3 * u.m) did not raise the error as you indicated in your example above.

I'll investigate!

@astrofrog
Copy link
Member Author

@pllim - I've fixed the bounding box property for Gaussian1D in e39dd6f when units are present. I will need to add a test for all the different models to make sure this works well since it is up to the bounding_box property (like evaluate) to support units.

I've also fixed the case for n_models > 1 with units in 6b4266c.

@astrofrog
Copy link
Member Author

@pllim - regarding:

Linear1D(3 * u.m) did not raise the error as you indicated in your example above.

Indeed this is not the case (because if units work in the evaluation, the model automatically supports units, so some models such as Linear1D 'just work'). So I've removed this from the PR description higher up.

@astrofrog
Copy link
Member Author

astrofrog commented Jan 17, 2017

I've decided to remove equivalencies from parameters, because these introduce conceptual difficulties: a Gaussian in wavelength space is not a Gausssian in frequency space. So the model should be defined in a well defined unit system. However, it's still possible to specify equivalencies when evaluating the model, e.g. g(1 * u.Hz, equivalencies=u.spectral()), which causes the input to be converted to input_units (as @mhvk was suggesting). I think this is unambiguous.

@astrofrog
Copy link
Member Author

@pllim - I've opened #6160 to remind us to deprecate and tidy up the blackbody stuff. This PR is becoming harder and harder to rebase so I'd like to merge it soon and make improvements like this in separate smaller PRs.

@nden - I've opened #6161 to remind myself to look at the performance implications. I think it will be easier to do this in a separate PR.

@astrofrog
Copy link
Member Author

This PR now includes docs. One last thing I want to do is to add tests for the Blackbody1D model.

@bsipocz bsipocz added this to the v2.0.0 milestone Jun 6, 2017
@astrofrog
Copy link
Member Author

On the other hand, if a parameter previously defined without units is given a
Quantity with a unit, this works because it is unambiguous::

>>> g0 = Gaussian1D(mean=3)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nitpicking, but the name g0 is unintuitive when all the rest of the names are ordered, looking in the middle of the docs, I assumed it was already been used before g1

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed, thanks!

@pllim
Copy link
Member

pllim commented Jun 6, 2017

80 commits! 😱 I am okay with merging now and improve later as long as @nden approves. Also, deprecating analytic_functions already has an issue as #5780.

@bsipocz
Copy link
Member

bsipocz commented Jun 6, 2017

@pllim - I would say 80 commits over 2 years by 4 authors for a net change of 1600+ lines is rather good ratio. Keeping the history in cases like this is quite valuable I think.

@astrofrog
Copy link
Member Author

astrofrog commented Jun 6, 2017

@pllim @nden (and anyone else interested) - this is basically ready for final review. The rendered docs can be seen here:

http://astrofrog-debug.readthedocs.io/en/latest/modeling/units.html

It would be good if we could merge at some point within the next couple of days as I'll need to open a few other smaller PRs that will need to be ideally merged before feature freeze ❄️

Also here is a mini-goat as a reward for anyone reading this PR:

goat

@nden nden merged commit 4f5ad36 into astropy:master Jun 10, 2017
@nden
Copy link
Contributor

nden commented Jun 10, 2017

Thanks for the hard work @astrofrog! 🎆

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
No open projects
Development

Successfully merging this pull request may close these issues.

None yet

10 participants