Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Distributed s3 delete objects #1474

Merged
merged 23 commits into from
Aug 10, 2022
Merged

Conversation

malachi-constant
Copy link
Contributor

Feature or Bugfix

  • Refactor s3.delete_objects to run in distributed fashion.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

@malachi-constant malachi-constant added the WIP Work in progress label Jul 24, 2022
@malachi-constant malachi-constant self-assigned this Jul 24, 2022
@malachi-constant malachi-constant marked this pull request as draft July 24, 2022 00:36
@malachi-constant
Copy link
Contributor Author

AWS CodeBuild CI Report

  • CodeBuild project: GitHubLoadTests5656BB24-s6u9F3qN9oFy
  • Commit ID: da502a3
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@jaidisido jaidisido added this to In progress in AWS SDK for pandas roadmap Aug 1, 2022
@malachi-constant
Copy link
Contributor Author

AWS CodeBuild CI Report

  • CodeBuild project: GitHubLoadTests5656BB24-s6u9F3qN9oFy
  • Commit ID: 885a15c
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@malachi-constant malachi-constant added the enhancement New feature or request label Aug 3, 2022
@malachi-constant
Copy link
Contributor Author

AWS CodeBuild CI Report

  • CodeBuild project: GitHubLoadTests5656BB24-s6u9F3qN9oFy
  • Commit ID: fd9ad26
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@malachi-constant malachi-constant removed the WIP Work in progress label Aug 3, 2022
@malachi-constant malachi-constant moved this from In progress to In Review in AWS SDK for pandas roadmap Aug 3, 2022
@malachi-constant malachi-constant marked this pull request as ready for review August 3, 2022 15:38
@malachi-constant
Copy link
Contributor Author

AWS CodeBuild CI Report

  • CodeBuild project: GitHubLoadTests5656BB24-s6u9F3qN9oFy
  • Commit ID: d71b671
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

awswrangler/s3/_delete.py Outdated Show resolved Hide resolved
awswrangler/s3/_delete.py Show resolved Hide resolved
awswrangler/s3/_delete.py Outdated Show resolved Hide resolved
tests/test_s3.py Outdated Show resolved Hide resolved
awswrangler/s3/_delete.py Outdated Show resolved Hide resolved
@malachi-constant
Copy link
Contributor Author

AWS CodeBuild CI Report

  • CodeBuild project: GitHubLoadTests5656BB24-s6u9F3qN9oFy
  • Commit ID: 87a92e2
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@malachi-constant
Copy link
Contributor Author

AWS CodeBuild CI Report

  • CodeBuild project: GitHubLoadTests5656BB24-s6u9F3qN9oFy
  • Commit ID: babf835
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@malachi-constant
Copy link
Contributor Author

AWS CodeBuild CI Report

  • CodeBuild project: GitHubLoadTests5656BB24-s6u9F3qN9oFy
  • Commit ID: 9fd3f4c
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@malachi-constant
Copy link
Contributor Author

AWS CodeBuild CI Report

  • CodeBuild project: GitHubLoadTests5656BB24-s6u9F3qN9oFy
  • Commit ID: 9b85b44
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

awswrangler/s3/_delete.py Show resolved Hide resolved
awswrangler/s3/_delete.py Show resolved Hide resolved
awswrangler/s3/_delete.py Outdated Show resolved Hide resolved
load_tests/_utils.py Show resolved Hide resolved
load_tests/test_s3.py Outdated Show resolved Hide resolved
load_tests/test_s3.py Outdated Show resolved Hide resolved
@aws aws deleted a comment from jaidisido Aug 5, 2022
@malachi-constant
Copy link
Contributor Author

AWS CodeBuild CI Report

  • CodeBuild project: GitHubLoadTests5656BB24-s6u9F3qN9oFy
  • Commit ID: 0affa4f
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@malachi-constant
Copy link
Contributor Author

AWS CodeBuild CI Report

  • CodeBuild project: GitHubLoadTests5656BB24-s6u9F3qN9oFy
  • Commit ID: 9548826
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@malachi-constant
Copy link
Contributor Author

AWS CodeBuild CI Report

  • CodeBuild project: GitHubLoadTests5656BB24-s6u9F3qN9oFy
  • Commit ID: b1995de
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@malachi-constant
Copy link
Contributor Author

malachi-constant commented Aug 7, 2022

Also, I need to figure out what's going on in our Load Tests job.

  1. It's not failing the Codebuild run on pytest failures.
  2. The delete test is failing due to setup with fixtures required by the test.
[31m�[1m___________________ ERROR at setup of test_s3_delete_objects ___________________�[0m

    @pytest.fixture(scope="session")
    def cloudformation_outputs():
>       return extract_cloudformation_outputs()

�[1m�[31mload_tests/conftest.py�[0m:8: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
�[1m�[31mload_tests/_utils.py�[0m:16: in extract_cloudformation_outputs
    client = boto3.client("cloudformation")
�[1m�[31manaconda3/envs/ray_cp38/lib/python3.8/site-packages/boto3/__init__.py�[0m:92: in client
    return _get_default_session().client(*args, **kwargs)
�[1m�[31manaconda3/envs/ray_cp38/lib/python3.8/site-packages/boto3/session.py�[0m:299: in client
    return self._session.create_client(
�[1m�[31manaconda3/envs/ray_cp38/lib/python3.8/site-packages/botocore/session.py�[0m:950: in create_client
    client = client_creator.create_client(
�[1m�[31manaconda3/envs/ray_cp38/lib/python3.8/site-packages/botocore/client.py�[0m:123: in create_client
    client_args = self._get_client_args(
�[1m�[31manaconda3/envs/ray_cp38/lib/python3.8/site-packages/botocore/client.py�[0m:466: in _get_client_args
    return args_creator.get_client_args(
�[1m�[31manaconda3/envs/ray_cp38/lib/python3.8/site-packages/botocore/args.py�[0m:87: in get_client_args
    final_args = self.compute_client_args(
�[1m�[31manaconda3/envs/ray_cp38/lib/python3.8/site-packages/botocore/args.py�[0m:183: in compute_client_args
    endpoint_config = self._compute_endpoint_config(
�[1m�[31manaconda3/envs/ray_cp38/lib/python3.8/site-packages/botocore/args.py�[0m:278: in _compute_endpoint_config
    return self._resolve_endpoint(**resolve_endpoint_kwargs)
�[1m�[31manaconda3/envs/ray_cp38/lib/python3.8/site-packages/botocore/args.py�[0m:381: in _resolve_endpoint
    return endpoint_bridge.resolve(
�[1m�[31manaconda3/envs/ray_cp38/lib/python3.8/site-packages/botocore/client.py�[0m:566: in resolve
    resolved = self.endpoint_resolver.construct_endpoint(
�[1m�[31manaconda3/envs/ray_cp38/lib/python3.8/site-packages/botocore/regions.py�[0m:205: in construct_endpoint
    result = self._endpoint_for_partition(
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = <botocore.regions.EndpointResolver object at 0x7f24687f53a0>
partition = OrderedDict([('defaults', OrderedDict([('hostname', '{service}.{region}.{dnsSuffix}'), ('protocols', ['https']), ('sig...ict([('variants', [OrderedDict([('hostname', 'xray-fips.us-west-2.amazonaws.com'), ('tags', ['fips'])])])]))]))]))]))])
service_name = 'cloudformation', region_name = None
use_dualstack_endpoint = None, use_fips_endpoint = None, force_partition = False

    def _endpoint_for_partition(
        self,
        partition,
        service_name,
        region_name,
        use_dualstack_endpoint,
        use_fips_endpoint,
        force_partition=False,
    ):
        partition_name = partition["partition"]
        if (
            use_dualstack_endpoint
            and partition_name in self._UNSUPPORTED_DUALSTACK_PARTITIONS
        ):
            error_msg = (
                "Dualstack endpoints are currently not supported"
                " for %s partition" % partition_name
            )
            raise EndpointVariantError(tags=['dualstack'], error_msg=error_msg)
    
        # Get the service from the partition, or an empty template.
        service_data = partition['services'].get(
            service_name, DEFAULT_SERVICE_DATA
        )
        # Use the partition endpoint if no region is supplied.
        if region_name is None:
            if 'partitionEndpoint' in service_data:
                region_name = service_data['partitionEndpoint']
            else:
>               raise NoRegionError()
�[1m�[31mE               botocore.exceptions.NoRegionError: You must specify a region.�[0m

Looks like I need to pass AWS_DEFAULT_REGION to ray or something similar.

UPDATE These have both been resolved #1506 & #1507

@malachi-constant
Copy link
Contributor Author

AWS CodeBuild CI Report

  • CodeBuild project: GitHubLoadTests5656BB24-s6u9F3qN9oFy
  • Commit ID: b1995de
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@malachi-constant
Copy link
Contributor Author

AWS CodeBuild CI Report

  • CodeBuild project: GitHubLoadTests5656BB24-s6u9F3qN9oFy
  • Commit ID: b1995de
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@malachi-constant
Copy link
Contributor Author

AWS CodeBuild CI Report

  • CodeBuild project: GitHubLoadTests5656BB24-s6u9F3qN9oFy
  • Commit ID: b1995de
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@malachi-constant
Copy link
Contributor Author

AWS CodeBuild CI Report

  • CodeBuild project: GitHubLoadTests5656BB24-s6u9F3qN9oFy
  • Commit ID: b1995de
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@malachi-constant
Copy link
Contributor Author

AWS CodeBuild CI Report

  • CodeBuild project: GitHubLoadTests5656BB24-s6u9F3qN9oFy
  • Commit ID: b1995de
  • Result: FAILED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@malachi-constant
Copy link
Contributor Author

AWS CodeBuild CI Report

  • CodeBuild project: GitHubLoadTests5656BB24-s6u9F3qN9oFy
  • Commit ID: b1995de
  • Result: FAILED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@malachi-constant
Copy link
Contributor Author

AWS CodeBuild CI Report

  • CodeBuild project: GitHubLoadTests5656BB24-s6u9F3qN9oFy
  • Commit ID: b1995de
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@malachi-constant
Copy link
Contributor Author

AWS CodeBuild CI Report

  • CodeBuild project: GitHubLoadTests5656BB24-s6u9F3qN9oFy
  • Commit ID: ccdaf0e
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@malachi-constant
Copy link
Contributor Author

AWS CodeBuild CI Report

  • CodeBuild project: GitHubLoadTests5656BB24-s6u9F3qN9oFy
  • Commit ID: 71a6534
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@jaidisido jaidisido merged commit 748215c into release-3.0.0 Aug 10, 2022
@jaidisido jaidisido deleted the distributed-s3-delete-objects branch August 10, 2022 09:39
jaidisido added a commit that referenced this pull request Aug 11, 2022
commit 748215c
Author: Lucas Hanson <lucascchanson@gmail.com>
Date:   Wed Aug 10 02:39:00 2022 -0700

    Distributed s3 delete objects (#1474)

    * first draft of delete_objects (distributed)

    * removing concurrent function, potentially not needed..

    * flake8

    * Fixing fixed iterable arg

    * restoring test script

    * Fixing typing

    * remove retry logic, redundant with botocore retry

    * Module name

    * Refactoring _delete_objects

    * ray get added

    * updating load tests with configuration and s3 delete test

    * reverting isort bad update

    * reverting isort bad update

    * changing chunk size

    * typing

    * pylint and test count

    * adding region to conftest

    * changing chunk size

    * updating load test

    * flake8

    * adding ExecutionTime context manager for benchmarking load tests

    * updating benchmark for s3 delete
@malachi-constant malachi-constant moved this from In Review to Done in AWS SDK for pandas roadmap Aug 15, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Development

Successfully merging this pull request may close these issues.

None yet

3 participants