Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

(feat) add distributed s3 write parquet #1526

Merged
merged 6 commits into from
Sep 14, 2022

Conversation

kukushking
Copy link
Contributor

@kukushking kukushking commented Aug 17, 2022

Relates

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

@kukushking kukushking force-pushed the feat-3.0/distributed-s3-write-parquet branch from 9e1d161 to ec0dfff Compare August 26, 2022 16:09
@kukushking kukushking marked this pull request as ready for review September 6, 2022 14:11
@@ -55,13 +55,14 @@ def _validate_args(
description: Optional[str],
parameters: Optional[Dict[str, str]],
columns_comments: Optional[Dict[str, str]],
distributed: Optional[bool] = False,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very minor nitpick, but I'm curious why you're passing this parameter rather than just using config.distributed in the condition below.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was causing a circular dependency. But I'll have a look for other ways to resolve it

@kukushking kukushking force-pushed the feat-3.0/distributed-s3-write-parquet branch from 5d87f6f to d5a5d67 Compare September 8, 2022 11:14
@malachi-constant
Copy link
Contributor

AWS CodeBuild CI Report

  • CodeBuild project: GitHubStandardCodeBuild8C06-llutOAimTATs
  • Commit ID: 66f2b23
  • Result: FAILED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@malachi-constant
Copy link
Contributor

AWS CodeBuild CI Report

  • CodeBuild project: GitHubDistributedCodeBuild6-jWcl5DLmvupS
  • Commit ID: da4a3df
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@malachi-constant
Copy link
Contributor

AWS CodeBuild CI Report

  • CodeBuild project: GitHubLoadTests5656BB24-s6u9F3qN9oFy
  • Commit ID: da4a3df
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@kukushking kukushking force-pushed the feat-3.0/distributed-s3-write-parquet branch from da4a3df to a5d671a Compare September 12, 2022 10:46
* add type mappings to avoid inference
* Refactoring - separate distributed write_parquet imlementation
* Replace group iteration with apply() optimized for distributed scenario
* Fix test regressions
@kukushking kukushking force-pushed the feat-3.0/distributed-s3-write-parquet branch from a5d671a to 91387d0 Compare September 12, 2022 11:01
Copy link
Contributor

@jaidisido jaidisido left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Love the refactoring overall, particularly splitting between the standard and distributed implementations, thanks. Left a few comments for clarification

awswrangler/s3/_write.py Show resolved Hide resolved
awswrangler/s3/_write_dataset.py Show resolved Hide resolved
awswrangler/s3/_write_dataset.py Show resolved Hide resolved
awswrangler/s3/_write_parquet.py Show resolved Hide resolved
@aws aws deleted a comment from malachi-constant Sep 12, 2022
@aws aws deleted a comment from malachi-constant Sep 12, 2022
@aws aws deleted a comment from malachi-constant Sep 12, 2022
@aws aws deleted a comment from malachi-constant Sep 12, 2022
@aws aws deleted a comment from malachi-constant Sep 12, 2022
@aws aws deleted a comment from malachi-constant Sep 12, 2022
@aws aws deleted a comment from malachi-constant Sep 12, 2022
@aws aws deleted a comment from malachi-constant Sep 12, 2022
@aws aws deleted a comment from malachi-constant Sep 12, 2022
@aws aws deleted a comment from malachi-constant Sep 12, 2022
@aws aws deleted a comment from malachi-constant Sep 12, 2022
@aws aws deleted a comment from malachi-constant Sep 12, 2022
@aws aws deleted a comment from malachi-constant Sep 12, 2022
@aws aws deleted a comment from malachi-constant Sep 12, 2022
@aws aws deleted a comment from malachi-constant Sep 12, 2022
@aws aws deleted a comment from malachi-constant Sep 12, 2022
@aws aws deleted a comment from malachi-constant Sep 12, 2022
@aws aws deleted a comment from malachi-constant Sep 12, 2022
@aws aws deleted a comment from malachi-constant Sep 12, 2022
@aws aws deleted a comment from malachi-constant Sep 12, 2022
@aws aws deleted a comment from malachi-constant Sep 12, 2022
@aws aws deleted a comment from malachi-constant Sep 12, 2022
@aws aws deleted a comment from malachi-constant Sep 12, 2022
@aws aws deleted a comment from malachi-constant Sep 12, 2022
@aws aws deleted a comment from malachi-constant Sep 12, 2022
@kukushking
Copy link
Contributor Author

Cleaned up older CodeBuild comments

@malachi-constant
Copy link
Contributor

AWS CodeBuild CI Report

  • CodeBuild project: GitHubStandardCodeBuild8C06-llutOAimTATs
  • Commit ID: 91387d0
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@malachi-constant
Copy link
Contributor

AWS CodeBuild CI Report

  • CodeBuild project: GitHubDistributedCodeBuild6-jWcl5DLmvupS
  • Commit ID: 6aefbcb
  • Result: FAILED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@malachi-constant
Copy link
Contributor

AWS CodeBuild CI Report

  • CodeBuild project: GitHubLoadTests5656BB24-s6u9F3qN9oFy
  • Commit ID: 6aefbcb
  • Result: FAILED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@malachi-constant
Copy link
Contributor

AWS CodeBuild CI Report

  • CodeBuild project: GitHubStandardCodeBuild8C06-llutOAimTATs
  • Commit ID: 6aefbcb
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

awswrangler/s3/_write_parquet.py Show resolved Hide resolved
@malachi-constant
Copy link
Contributor

AWS CodeBuild CI Report

  • CodeBuild project: GitHubLoadTests5656BB24-s6u9F3qN9oFy
  • Commit ID: 6aefbcb
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@malachi-constant
Copy link
Contributor

AWS CodeBuild CI Report

  • CodeBuild project: GitHubDistributedCodeBuild6-jWcl5DLmvupS
  • Commit ID: 4d00bc7
  • Result: FAILED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@malachi-constant
Copy link
Contributor

AWS CodeBuild CI Report

  • CodeBuild project: GitHubLoadTests5656BB24-s6u9F3qN9oFy
  • Commit ID: 4d00bc7
  • Result: FAILED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@malachi-constant
Copy link
Contributor

AWS CodeBuild CI Report

  • CodeBuild project: GitHubStandardCodeBuild8C06-llutOAimTATs
  • Commit ID: 4d00bc7
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@malachi-constant
Copy link
Contributor

AWS CodeBuild CI Report

  • CodeBuild project: GitHubDistributedCodeBuild6-jWcl5DLmvupS
  • Commit ID: 9508531
  • Result: FAILED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@malachi-constant
Copy link
Contributor

AWS CodeBuild CI Report

  • CodeBuild project: GitHubLoadTests5656BB24-s6u9F3qN9oFy
  • Commit ID: 9508531
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@malachi-constant
Copy link
Contributor

AWS CodeBuild CI Report

  • CodeBuild project: GitHubDistributedCodeBuild6-jWcl5DLmvupS
  • Commit ID: 25d2dfc
  • Result: FAILED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@malachi-constant
Copy link
Contributor

AWS CodeBuild CI Report

  • CodeBuild project: GitHubLoadTests5656BB24-s6u9F3qN9oFy
  • Commit ID: 25d2dfc
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@malachi-constant
Copy link
Contributor

AWS CodeBuild CI Report

  • CodeBuild project: GitHubStandardCodeBuild8C06-llutOAimTATs
  • Commit ID: 25d2dfc
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@kukushking kukushking merged commit 2865c85 into release-3.0.0 Sep 14, 2022
@malachi-constant malachi-constant deleted the feat-3.0/distributed-s3-write-parquet branch September 14, 2022 15:20
@jaidisido jaidisido linked an issue Sep 16, 2022 that may be closed by this pull request
@kukushking kukushking self-assigned this Nov 28, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Distributed Ray wr.s3.write_parquet implementation
4 participants