Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perf(postprocessing): improve pivot postprocessing operation #23465

Merged
merged 1 commit into from
Mar 24, 2023

Commits on Mar 23, 2023

  1. perf(postprocessing): improve pivot postprocessing operation

    Executing a pivot for with `drop_missing_columns=False` and lots of resulting columns can increase the postprocessing time by seconds or even minutes for large datasets.
    The main culprit is `df.drop(...)` operation in the for loop. We can refactor this slightly, without any change to the results, and push down the postprocessing time
    to seconds instead of minutes for large datasets (millions of columns).
    
    Fixes apache#23464
    Usiel committed Mar 23, 2023
    Configuration menu
    Copy the full SHA
    20bc99a View commit details
    Browse the repository at this point in the history