Skip to content

Commit

Permalink
DOC: Added note about groupby excluding Decimal columns by default (p…
Browse files Browse the repository at this point in the history
  • Loading branch information
Patrick Park authored and Pingviinituutti committed Feb 28, 2019
1 parent 8986e3c commit 4f5d64e
Showing 1 changed file with 27 additions and 0 deletions.
27 changes: 27 additions & 0 deletions doc/source/groupby.rst
Original file line number Diff line number Diff line change
Expand Up @@ -995,6 +995,33 @@ Note that ``df.groupby('A').colname.std().`` is more efficient than
is only interesting over one column (here ``colname``), it may be filtered
*before* applying the aggregation function.

.. note::
Any object column, also if it contains numerical values such as ``Decimal``
objects, is considered as a "nuisance" columns. They are excluded from
aggregate functions automatically in groupby.

If you do wish to include decimal or object columns in an aggregation with
other non-nuisance data types, you must do so explicitly.

.. ipython:: python
from decimal import Decimal
df_dec = pd.DataFrame(
{'id': [1, 2, 1, 2],
'int_column': [1, 2, 3, 4],
'dec_column': [Decimal('0.50'), Decimal('0.15'), Decimal('0.25'), Decimal('0.40')]
}
)
# Decimal columns can be sum'd explicitly by themselves...
df_dec.groupby(['id'])[['dec_column']].sum()
# ...but cannot be combined with standard data types or they will be excluded
df_dec.groupby(['id'])[['int_column', 'dec_column']].sum()
# Use .agg function to aggregate over standard and "nuisance" data types at the same time
df_dec.groupby(['id']).agg({'int_column': 'sum', 'dec_column': 'sum'})
.. _groupby.observed:

Handling of (un)observed Categorical values
Expand Down

0 comments on commit 4f5d64e

Please sign in to comment.