Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Distribute s3 wait_objects #1539

Merged
merged 9 commits into from
Sep 7, 2022
Merged

Conversation

LeonLuttenberger
Copy link
Contributor

@LeonLuttenberger LeonLuttenberger commented Aug 23, 2022

Feature or Bugfix

  • Feature

Detail

  • Modified the S3 wait_objects functions to distribute across Ray
  • Refactored current functions to make use of the thread pool definition

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

@LeonLuttenberger LeonLuttenberger added this to the 3.0.0 milestone Aug 23, 2022
@LeonLuttenberger LeonLuttenberger added enhancement New feature or request major release Will be addressed in the next major release labels Aug 23, 2022
@LeonLuttenberger LeonLuttenberger self-assigned this Aug 23, 2022
@malachi-constant
Copy link
Contributor

AWS CodeBuild CI Report

  • CodeBuild project: GitHubDistributedCodeBuild6-jWcl5DLmvupS
  • Commit ID: a7f74c3
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@malachi-constant
Copy link
Contributor

AWS CodeBuild CI Report

  • CodeBuild project: GitHubLoadTests5656BB24-s6u9F3qN9oFy
  • Commit ID: a7f74c3
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@malachi-constant
Copy link
Contributor

AWS CodeBuild CI Report

  • CodeBuild project: GitHubStandardCodeBuild8C06-llutOAimTATs
  • Commit ID: a7f74c3
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

Copy link
Contributor

@jaidisido jaidisido left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was the batching suggestion not viable in the end?
And could we add a test under tests/load to test this at scale?

@LeonLuttenberger LeonLuttenberger changed the title Distributed s3 wait_objects Refactor s3 wait_objects Aug 24, 2022
@LeonLuttenberger LeonLuttenberger marked this pull request as ready for review August 24, 2022 22:15
@LeonLuttenberger
Copy link
Contributor Author

Due to some offline discussion regarding how suitable Ray is for this, I'm changing this CR to only include the refactor to reuse _ThreadPoolExecutor. From this point on, adding support for Ray merely involves replacing _ThreadPoolExecutor with _get_executor, and adding Ray annotations.

@malachi-constant
Copy link
Contributor

AWS CodeBuild CI Report

  • CodeBuild project: GitHubLoadTests5656BB24-s6u9F3qN9oFy
  • Commit ID: 4eeddef
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@malachi-constant
Copy link
Contributor

AWS CodeBuild CI Report

  • CodeBuild project: GitHubStandardCodeBuild8C06-llutOAimTATs
  • Commit ID: 4eeddef
  • Result: FAILED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@malachi-constant
Copy link
Contributor

AWS CodeBuild CI Report

  • CodeBuild project: GitHubDistributedCodeBuild6-jWcl5DLmvupS
  • Commit ID: 4eeddef
  • Result: FAILED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@LeonLuttenberger LeonLuttenberger changed the title Refactor s3 wait_objects Distribute s3 wait_objects Aug 30, 2022
@malachi-constant
Copy link
Contributor

AWS CodeBuild CI Report

  • CodeBuild project: GitHubDistributedCodeBuild6-jWcl5DLmvupS
  • Commit ID: d69849a
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@malachi-constant
Copy link
Contributor

AWS CodeBuild CI Report

  • CodeBuild project: GitHubLoadTests5656BB24-s6u9F3qN9oFy
  • Commit ID: d69849a
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@malachi-constant
Copy link
Contributor

AWS CodeBuild CI Report

  • CodeBuild project: GitHubStandardCodeBuild8C06-llutOAimTATs
  • Commit ID: d69849a
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

waiter.wait(Bucket=bucket, Key=key, WaiterConfig={"Delay": delay, "MaxAttempts": max_attempts})


def _wait_object_concurrent(
path: Tuple[str, str], waiter_name: str, delay: int, max_attempts: int, boto3_primitives: _utils.Boto3PrimitivesType
@ray_remote
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would like to specify here that the task is only allowed to take 25% of the cluster's available CPUs. This can be specified with the vanilla ray.remote function. However, we need to add support for it in our own ray_remote decorator. I've created an issue here: #1563

@malachi-constant
Copy link
Contributor

AWS CodeBuild CI Report

  • CodeBuild project: GitHubDistributedCodeBuild6-jWcl5DLmvupS
  • Commit ID: a3aa719
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@malachi-constant
Copy link
Contributor

AWS CodeBuild CI Report

  • CodeBuild project: GitHubLoadTests5656BB24-s6u9F3qN9oFy
  • Commit ID: a3aa719
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@malachi-constant
Copy link
Contributor

AWS CodeBuild CI Report

  • CodeBuild project: GitHubStandardCodeBuild8C06-llutOAimTATs
  • Commit ID: a3aa719
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@malachi-constant
Copy link
Contributor

AWS CodeBuild CI Report

  • CodeBuild project: GitHubDistributedCodeBuild6-jWcl5DLmvupS
  • Commit ID: cbdec69
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@malachi-constant
Copy link
Contributor

AWS CodeBuild CI Report

  • CodeBuild project: GitHubLoadTests5656BB24-s6u9F3qN9oFy
  • Commit ID: cbdec69
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@malachi-constant
Copy link
Contributor

AWS CodeBuild CI Report

  • CodeBuild project: GitHubStandardCodeBuild8C06-llutOAimTATs
  • Commit ID: cbdec69
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@malachi-constant
Copy link
Contributor

AWS CodeBuild CI Report

  • CodeBuild project: GitHubDistributedCodeBuild6-jWcl5DLmvupS
  • Commit ID: 9f70ab0
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@malachi-constant
Copy link
Contributor

AWS CodeBuild CI Report

  • CodeBuild project: GitHubLoadTests5656BB24-s6u9F3qN9oFy
  • Commit ID: 9f70ab0
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@malachi-constant
Copy link
Contributor

AWS CodeBuild CI Report

  • CodeBuild project: GitHubStandardCodeBuild8C06-llutOAimTATs
  • Commit ID: 9f70ab0
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

Copy link
Contributor

@jaidisido jaidisido left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@LeonLuttenberger LeonLuttenberger merged commit c465169 into release-3.0.0 Sep 7, 2022
@LeonLuttenberger LeonLuttenberger deleted the distributed-s3-wait branch September 7, 2022 14:14
@LeonLuttenberger LeonLuttenberger moved this from In Review to Done in AWS SDK for pandas roadmap Sep 7, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request major release Will be addressed in the next major release
Development

Successfully merging this pull request may close these issues.

None yet

3 participants