Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Quantity objects should accept pandas Series #11247

Closed
janerigby opened this issue Jan 13, 2021 · 11 comments · Fixed by #15904
Closed

Quantity objects should accept pandas Series #11247

janerigby opened this issue Jan 13, 2021 · 11 comments · Fixed by #15904

Comments

@janerigby
Copy link
Contributor

Where Astropy expects a Quantity object, it rejects as input a pandas.Series * astropy unit. I believe this is unnecessarily strict, since a numpy array * astropy unit would be permitted. Here is an example of astropy rejecting what I believe should be a valid input of pandas.Series() * u.unit:

----> 1 s1723_spec = specutils.Spectrum1D(spectral_axis = df_s1723['wave'] * u.AA, flux=df_s1723['fnu'] * u.Unit('Jansky'))

~/anaconda3/envs/astroconda/lib/python3.6/site-packages/specutils/spectra/spectrum1d.py in init(self, flux, spectral_axis, wcs, velocity_convention, rest_value, redshift, radial_velocity, bin_specification, **kwargs)
97 if flux is not None:
98 if not isinstance(flux, u.Quantity):
---> 99 raise ValueError("Flux must be a Quantity object.")
100 elif flux.isscalar:
101 flux = u.Quantity([flux])

ValueError: Flux must be a Quantity object.

The kludgy workaround is to put .values after each Series:
s1723_spec = specutils.Spectrum1D(spectral_axis = df_s1723['wave'].values * u.AA, flux=df_s1723['fnu'].values * u.Unit('Jansky'))

However, the workaround is kludgy, and I think unfair to pandas users. A Pandas.Series() will act like a numpy array when it needs to. Here, Astropy is rejecting it before it can.

Please see astropy/specutils#740 , because I discovered the issue in the specutils.Spectrum1D class, although I believe the problem actually lies with Quantities.

@pllim
Copy link
Member

pllim commented Jan 13, 2021

astropy integration with pandas is basically limited to Table.from_pandas and Table.to_pandas. To implement this might open a can of worms, but certainly warrants discussions.

@janerigby
Copy link
Contributor Author

I'm talking here about a very specific case, that if Quantities would accept a numpy array, then it should accept a pandas.series(). A pandas series will act like a numpy array when it needs to. If you're worried it won't, then astropy could japply pandas.series.to_numpy() or pandas.series.values and keep going.

@taldcroft
Copy link
Member

@mhvk should have the final say, but it's worth noting that Quantity is not just "accepting" numpy arrays, it is actually a subclass of the numpy ndarray class. In other words the two classes are fundamentally intertwined. Pandas.Series is not in that category, so much more apples and oranges (though both fruits).

Generally speaking I agree with @pllim that astropy is based on numpy. There are many many parts of astropy that simply won't work with Pandas Series because ndarray is entirely different (even if they look vaguely similar). After fixing this Quantity issue you will undoubtedly run into other things that don't work, so my inclination is to not even start down that slippery path.

@mhvk
Copy link
Contributor

mhvk commented Jan 13, 2021

I'm actually a bit surprised it does not work, as inside Quantity it just tries to convert any otherwise unrecognized input to ndarray and then turn it into a Quantity. So, perhaps good to see what actually happens:

import pandas as pd, numpy as np, astropy.units as u
a = pd.Series([1, 2., 3.])
np.array(a)  # works
# array([1., 2., 3.])
u.Quantity(a, u.m)  # should work too. It does!
# <Quantity [1., 2., 3.] m>
b = a * u.m
b  # Oops
# UnitConversionError: Can only apply 'greater' function to dimensionless quantities when other argument is not a quantity (unless the latter is all zero/infinity/nan)
type(b)  # Oops
# pandas.core.series.Series
b.values  # Interesting
<Quantity [1., 2., 3.] m>
u.m.__rmul__(a)  # works
# <Quantity [1., 2., 3.] m>

So, the problem seems to be that for multiplication, pd.Series thinks it can handle units, even though obviously it cannot. In that sense, arguably the problem is with pandas rather than astropy; it should really return NotImplemented (or rely less on arbitrary multiplication to give something that it still understands -- note that this really is not a trivial problem, I can definitely see the reasoning behind the approach they took...).

Anyway, bottom line is that the exact request is not something we can solve in astropy: pandas just takes care of the multiplcation and we get no chance to take control. But there is a good work around for converting a Series to a Quantity: one can just use the class initializer (which will work for everything else too, of course).

Interestingly, what works as well (somewhat coincidentally, just because it happens not to be defined by pandas) is the in-place shift operator:

a << u.m
# <Quantity [1., 2., 3.] m>

@apurvapatkar
Copy link

apurvapatkar commented Mar 15, 2021

I am intending to contribute to this issue. I have analyzed it to some extent. I understand that the problem is with multiplication pd.Series([1, 2., 3.]) * u.m.

However, I could not understand the workaround (for converting a Series to a Quantity: one can just use the class initializer) suggested by @mhvk in the above comment entirely.

@mhvk could you please explain a workaround to me?

@mhvk
Copy link
Contributor

mhvk commented Mar 15, 2021

@apurvapatkar - I don't think there is anything to do except perhaps to document that series * unit does not work (and that we cannot make that work), but series << unit does work, and that the best explicit way to do it is Quantity(series, unit). I now added labels to the issue to make that clearer.

@apurvapatkar
Copy link

Thanks for the reply. I would be happy to assist in documenting this issue and the approach.

@MridulS
Copy link
Contributor

MridulS commented Jan 16, 2024

An update on running the examples above again:

>>> import pandas as pd
>>> import astropy.units as u
>>> a = pd.Series([1, 2., 3.])
>>> a * u.m
0    1.0
1    2.0
2    3.0
dtype: float64

runs now. It doesn't give you the right result but it doesn't error out. series.__mul__(unit) seems to return a series object now.

tested with pandas 2.1.4

@mhvk
Copy link
Contributor

mhvk commented Jan 16, 2024

@MridulS - thanks for the update! I reproduced your result. Also, this time the unit is truly gone:

(a * u.m).values
# array([1., 2., 3.])

Since this remains out of our control, the only thing we can do is document it...

@MridulS
Copy link
Contributor

MridulS commented Jan 17, 2024

This behaves like np.full/zeros/ones now. We can just add this to https://docs.astropy.org/en/latest/known_issues.html#quantities-lose-their-units-with-some-operations ?

@mhvk
Copy link
Contributor

mhvk commented Jan 17, 2024

This behaves like np.full/zeros/ones now. We can just add this to https://docs.astropy.org/en/latest/known_issues.html#quantities-lose-their-units-with-some-operations ?

Yes, makes sense!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants