Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: Accept no fields for groupby by #61160

Open
1 of 3 tasks
simonaubertbd opened this issue Mar 21, 2025 · 10 comments · May be fixed by #61168
Open
1 of 3 tasks

ENH: Accept no fields for groupby by #61160

simonaubertbd opened this issue Mar 21, 2025 · 10 comments · May be fixed by #61168
Assignees
Labels
Enhancement Groupby Needs Discussion Requires discussion from core team before further action

Comments

@simonaubertbd
Copy link

Feature Type

  • Adding new functionality to pandas

  • Changing existing functionality in pandas

  • Removing existing functionality in pandas

Problem Description

Hello,

Sometimes, you have no fields to group by when aggregating. I know there is then an aggregate function but it would help having a more dynamic code to allow the use of groupy by without any grouping field instead of this error :

Image

Best regards,

Simon

Feature Description

Just the ability to select no fields in the by argument

Alternative Solutions

A conditional function that uses groupby or aggregate

Additional Context

No response

@simonaubertbd simonaubertbd added Enhancement Needs Triage Issue that has not been reviewed by a pandas team member labels Mar 21, 2025
@udit710
Copy link

udit710 commented Mar 23, 2025

take

@udit710
Copy link

udit710 commented Mar 23, 2025

Hi @simonaubertbd what's the expected output here? Assuming you have kept the alternative solution to use aggregate, do we need it to work the same as aggregate?

@simonaubertbd
Copy link
Author

Hello @udit710 and thanks for your answer.

let's say I have

<style> </style>
bird year weight age
Wilbur 1992 3 50
Donald Duck 1991 2 70
Scrooge McDuck 1993 2 100
aggregate1 = inlineInput1.groupby([]).agg(weight_max=('weight', 'max')).reset_index()

would be simply 3

@udit710 udit710 linked a pull request Mar 23, 2025 that will close this issue
5 tasks
@snitish
Copy link
Member

snitish commented Mar 24, 2025

Any thoughts on this @rhshadrach? I concur with OP that this feature would be helpful in cases where the grouping columns are dynamically determined.

@Delengowski
Copy link
Contributor

So you want the groupby to be a no op and just return the dataframe if no grouping columns are specified? I sort of get it, you dont want to split the dataframe on anything so it just passes through.

I don't see how overloading the method and muddying the API here is worth while. Just do the check yourself. I think it would be strange in some cases for groupby to return a data frame and a group by object in another.

@snitish
Copy link
Member

snitish commented Mar 24, 2025

The resulting object can still be a GroupBy object, as @udit710 implemented in #61168.

@rhshadrach
Copy link
Member

If I'm understanding the request right, @simonaubertbd desires for df.groupby([]).agg(...) to behave the same as df.agg(...). Here I am -1; the work it would take to get these to agree, and the ongoing maintenance to support, seems to me to be a non-starter.

On the other hand, I'm a bit more receptive to df.groupby([]) behaving the same as df.groupby(pd.Series(0, index=df.index)) (perhaps better would be to groupby np.zeros? Would have to run some benchmarks). However still here, it seems to me that having this live in user code rather than pandas is more explicit and readable.

@rhshadrach rhshadrach added Groupby Needs Discussion Requires discussion from core team before further action and removed Needs Triage Issue that has not been reviewed by a pandas team member labels Mar 26, 2025
@simonaubertbd
Copy link
Author

Hello @rhshadrach

My bad, I may have been unclear. exactly like df.agg(...) would mean one row by aggregation if I'm right and this is typically what I don't want. I have in mind more something like

aggregate1 = inlineInput1.assign(d=0).groupby('d').agg(Age_max=('Age', 'max'), FirstName_count=('FirstName', 'count'), LastName_count=('LastName', 'count')).reset_index(drop=True)

Image

About having this is user code, it took me a lot of time to deal with it, in developing as well as testing.

@rhshadrach
Copy link
Member

Thanks @simonaubertbd - then I believe your comment in #61160 (comment) should not be "simply 3" (the scalar), but rather a DataFrame with 3 as the value.

About having this is user code, it took me a lot of time to deal with it, in developing as well as testing.

I'm sympathetic, but still think adding a call to assign and reset_index is not onerous.

@simonaubertbd
Copy link
Author

@rhshadrach This is a little more complex than that ;) The project I'm on is a python code generator so I had to deal with some conditional typescript and even finding the solution wasn't that easy (Well, I'm obviously not talking about days but more about hours.. also, I must acknowledge I'm kind of a newbie with pandas but when asked for more experimented devs, it wasn't that obvious for them).

That said, Yes, it was a dataframe with 3 as a value. Thanks for your remark, I should have been more specific.

Best regards,

Simon

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancement Groupby Needs Discussion Requires discussion from core team before further action
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants