Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use list_objects_v2 API for S3 #302

Merged
merged 21 commits into from Dec 18, 2022
Merged

Use list_objects_v2 API for S3 #302

merged 21 commits into from Dec 18, 2022

Conversation

pjbull
Copy link
Member

@pjbull pjbull commented Dec 18, 2022

This is primarily a fix for #155, but it should simplify and stabilize listing dirs from S3 in a way that makes a number of the other directory handling tasks easier (like #295, #181, #176, and #51).

One of the concerns is that this change may affect performance since we change both the recursive and non-recursive apis, so this PR includes an initial perf test suite for listing directories. See results below (TLDR: negligible change if any).

Bonus fixes:

  • Fix Smoke tests for non-all installations #171 (approved code would have introduced an error here, so worth it to fix)
  • Perf test suite
  • test-debug command stops debugger at failing tests from previous run
  • Check existence of buckets/containers when permitted by permissions
  • Add pytest-xdist to requirements, which will run the test suite in parallel and results in decent speed up. Comparison of latest master run and this branch shaves off ~10 minutes (makes sense since gh action machines have 2 cores):

image

Closes #151
Closes #171
Closes #291

Perf from same machine

Before changes

                                 Performance suite results: (2022-12-17T16:11:32.734517)
┏━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┳━━━━━━━━━┓
┃ Test Name    ┃ Config Name                ┃ Iterations ┃           Mean ┃              Std ┃            Max ┃ N Items ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━╇━━━━━━━━━┩
│ List Folders │ List shallow recursive     │         10 │ 0:00:01.425719 │ ± 0:00:00.172902 │ 0:00:01.788437 │   5,500 │
│ List Folders │ List shallow non-recursive │         10 │ 0:00:01.630402 │ ± 0:00:00.187300 │ 0:00:01.882856 │   5,500 │
│ List Folders │ List normal recursive      │         10 │ 0:00:02.168442 │ ± 0:00:00.039756 │ 0:00:02.219833 │   7,877 │
│ List Folders │ List normal non-recursive  │         10 │ 0:00:00.039067 │ ± 0:00:00.010252 │ 0:00:00.067796 │     113 │
│ List Folders │ List deep recursive        │         10 │ 0:00:03.237524 │ ± 0:00:00.060835 │ 0:00:03.317934 │   7,955 │
│ List Folders │ List deep non-recursive    │         10 │ 0:00:00.030883 │ ± 0:00:00.001670 │ 0:00:00.033317 │      31 │
└──────────────┴────────────────────────────┴────────────┴────────────────┴──────────────────┴────────────────┴─────────┘

With new changes

                                 Performance suite results: (2022-12-17T17:11:50.937390)
┏━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┳━━━━━━━━━┓
┃ Test Name    ┃ Config Name                ┃ Iterations ┃           Mean ┃              Std ┃            Max ┃ N Items ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━╇━━━━━━━━━┩
│ List Folders │ List shallow recursive     │         10 │ 0:00:01.293559 │ ± 0:00:00.027168 │ 0:00:01.336273 │   5,500 │
│ List Folders │ List shallow non-recursive │         10 │ 0:00:01.289645 │ ± 0:00:00.037398 │ 0:00:01.348425 │   5,500 │
│ List Folders │ List normal recursive      │         10 │ 0:00:01.874748 │ ± 0:00:00.079183 │ 0:00:02.006050 │   7,877 │
│ List Folders │ List normal non-recursive  │         10 │ 0:00:00.033232 │ ± 0:00:00.004852 │ 0:00:00.045547 │     113 │
│ List Folders │ List deep recursive        │         10 │ 0:00:03.193576 │ ± 0:00:00.180428 │ 0:00:03.455747 │   7,955 │
│ List Folders │ List deep non-recursive    │         10 │ 0:00:00.030362 │ ± 0:00:00.001615 │ 0:00:00.032398 │      31 │
└──────────────┴────────────────────────────┴────────────┴────────────────┴──────────────────┴────────────────┴─────────┘

@github-actions
Copy link
Contributor

github-actions bot commented Dec 18, 2022

@codecov-commenter
Copy link

codecov-commenter commented Dec 18, 2022

Codecov Report

Merging #302 (5a950bf) into master (289baeb) will decrease coverage by 0.5%.
The diff coverage is 91.4%.

Additional details and impacted files
@@           Coverage Diff            @@
##           master    #302     +/-   ##
========================================
- Coverage    94.9%   94.4%   -0.6%     
========================================
  Files          21      21             
  Lines        1341    1363     +22     
========================================
+ Hits         1273    1287     +14     
- Misses         68      76      +8     
Impacted Files Coverage Δ
cloudpathlib/s3/s3client.py 91.3% <88.0%> (-4.3%) ⬇️
cloudpathlib/azure/azblobclient.py 94.4% <100.0%> (+0.1%) ⬆️
cloudpathlib/client.py 87.9% <100.0%> (ø)
cloudpathlib/gs/gsclient.py 93.2% <100.0%> (-1.4%) ⬇️

Copy link
Member

@jayqi jayqi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. One comment typo, and a few minor comments.

Makefile Outdated Show resolved Hide resolved
tests/mock_clients/mock_azureblob.py Outdated Show resolved Hide resolved
tests/performance/cli.py Show resolved Hide resolved
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants