Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Potential regression induced by commit 924f246 #58285

Open
3 tasks
rhshadrach opened this issue Apr 17, 2024 · 4 comments
Open
3 tasks

Potential regression induced by commit 924f246 #58285

rhshadrach opened this issue Apr 17, 2024 · 4 comments
Labels
Output-Formatting __repr__ of pandas objects, to_string Performance Memory or execution speed performance Regression Functionality that used to work in a prior pandas version
Milestone

Comments

@rhshadrach
Copy link
Member

PR #57915 may have induced a performance regression. If it was a necessary behavior change, this may have been expected and everything is okay.

Please check the links below. If any ASVs are parameterized, the combinations of parameters that a regression has been detected for appear as subbullets.

Subsequent benchmarks may have skipped some commits. The link below lists the commits that are between the two benchmark runs where the regression was identified.

Commit Range

cc @NickCrews

@rhshadrach rhshadrach added Output-Formatting __repr__ of pandas objects, to_string Performance Memory or execution speed performance Regression Functionality that used to work in a prior pandas version labels Apr 17, 2024
@rhshadrach rhshadrach added this to the 3.0 milestone Apr 17, 2024
@NickCrews
Copy link
Contributor

Repr'ing like this shouldn't be in the hot path for any application, so a 30% slowdown that is only a few milliseconds seems like we shouldn't care at all

@rhshadrach
Copy link
Member Author

No disagreement on the severity of the regression, but the nature of it surprises me. In the time_to_html_mixed benchmark, I think the only things that hit the lines changed in the linked PR are where the object in question is None, int, str, or dict, so the code paths taken are the same. I'm seeing testing isinstance(..., dict) vs isinstance(..., Mapping) take 20ns vs 80ns on my machine, so for a 4ms slowdown these lines would need to be hit about ~60k times.

@NickCrews
Copy link
Contributor

hmm, yeah that is weird. I haven't looked at the benchmark in detail. Perhaps it is because the instance(..., dict) is able to short circuit with some optimization or c-only code, but the Mapping version actually requires going out to the python world and checking if the python object contains a .keys() method or something?

@rhshadrach
Copy link
Member Author

Right - I'm not too surprised dict is highly optimized here - it's a data structure that is used all over CPython itself. Rather, the difference in isinstance(..., dict) and isinstance(..., Mapping) does not seem sufficient to explain the regression that showed up in the ASVs. I'll dig into this a little more this weekend.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Output-Formatting __repr__ of pandas objects, to_string Performance Memory or execution speed performance Regression Functionality that used to work in a prior pandas version
Projects
None yet
Development

No branches or pull requests

2 participants