-
Notifications
You must be signed in to change notification settings - Fork 670
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
(enhancement): Apply modin repartitioning where required only #1701
Conversation
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@jaidisido This would be a good addition to Modin as well, would you be willing to contribute this back to Modin? |
@csnatarajan do you mean @modin_repartition decorator? |
@kukushking is there anything related to that decorator that would be good to include in Modin? Is the purpose of the decorator to work around bugs like modin-project/modin#3435? |
@kukushking also, would it help if modin allowed users to configure the column and row partitioning separately as here? That way you wouldn't have to keep undoing the column partitioning in |
Yes @mvashishtha the purpose of the decorator was precisely to work around the issues like when accessing columns from another column-axis partition. We also noticed that we need to have the df in this shape before operations like groupby - perhaps @jaidisido can explain better |
Feature or Bugfix
Detail
Currently modin repartitioning is applied as soon as a write method (
to_csv
,to_parquet
) is called. This is not only a wasteful operation performance-wise but also does not prevent the need to repartition once more if the dataframe is altered within the method (e.g. if a column is cast to a new type).Instead, repartitioning should be applied when it's currently required, that is before a modin group by only.
Relates
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.