Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

(enhancement): Apply modin repartitioning where required only #1701

Merged
merged 3 commits into from
Oct 25, 2022

Conversation

jaidisido
Copy link
Contributor

Feature or Bugfix

  • Enhancement

Detail

Currently modin repartitioning is applied as soon as a write method (to_csv, to_parquet) is called. This is not only a wasteful operation performance-wise but also does not prevent the need to repartition once more if the dataframe is altered within the method (e.g. if a column is cast to a new type).

Instead, repartitioning should be applied when it's currently required, that is before a modin group by only.

Relates

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

@jaidisido jaidisido self-assigned this Oct 21, 2022
@jaidisido jaidisido added this to the 3.0.0 milestone Oct 21, 2022
@jaidisido jaidisido added enhancement New feature or request major release Will be addressed in the next major release performance labels Oct 21, 2022
@malachi-constant
Copy link
Contributor

AWS CodeBuild CI Report

  • CodeBuild project: GitHubLoadTests5656BB24-s6u9F3qN9oFy
  • Commit ID: 01934e2
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@malachi-constant
Copy link
Contributor

AWS CodeBuild CI Report

  • CodeBuild project: GitHubDistributedCodeBuild6-jWcl5DLmvupS
  • Commit ID: 01934e2
  • Result: FAILED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@malachi-constant
Copy link
Contributor

AWS CodeBuild CI Report

  • CodeBuild project: GitHubStandardCodeBuild8C06-llutOAimTATs
  • Commit ID: 01934e2
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

tests/load/test_s3.py Outdated Show resolved Hide resolved
@malachi-constant
Copy link
Contributor

AWS CodeBuild CI Report

  • CodeBuild project: GitHubDistributedCodeBuild6-jWcl5DLmvupS
  • Commit ID: 72c3903
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@malachi-constant
Copy link
Contributor

AWS CodeBuild CI Report

  • CodeBuild project: GitHubStandardCodeBuild8C06-llutOAimTATs
  • Commit ID: 72c3903
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@malachi-constant
Copy link
Contributor

AWS CodeBuild CI Report

  • CodeBuild project: GitHubLoadTests5656BB24-s6u9F3qN9oFy
  • Commit ID: 72c3903
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

Copy link
Contributor

@LeonLuttenberger LeonLuttenberger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@jaidisido jaidisido merged commit 4028c08 into release-3.0.0 Oct 25, 2022
@jaidisido jaidisido deleted the enhancement/modin-repartition branch October 25, 2022 08:43
@kukushking kukushking moved this from In Review to Done in AWS SDK for pandas roadmap Oct 25, 2022
@csnatarajan
Copy link

@jaidisido This would be a good addition to Modin as well, would you be willing to contribute this back to Modin?

@kukushking
Copy link
Contributor

@csnatarajan do you mean @modin_repartition decorator?

@mvashishtha
Copy link

mvashishtha commented Dec 1, 2022

@kukushking is there anything related to that decorator that would be good to include in Modin? Is the purpose of the decorator to work around bugs like modin-project/modin#3435?

@mvashishtha
Copy link

@kukushking also, would it help if modin allowed users to configure the column and row partitioning separately as here? That way you wouldn't have to keep undoing the column partitioning in modin_repartition (from a cursory look I think that's what that decorator does).

@kukushking
Copy link
Contributor

kukushking commented Dec 5, 2022

Yes @mvashishtha the purpose of the decorator was precisely to work around the issues like when accessing columns from another column-axis partition. We also noticed that we need to have the df in this shape before operations like groupby - perhaps @jaidisido can explain better

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request major release Will be addressed in the next major release performance
Development

Successfully merging this pull request may close these issues.

None yet

6 participants