-
Notifications
You must be signed in to change notification settings - Fork 53
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Handle and test for s3 fake directories #190
Conversation
Codecov Report
@@ Coverage Diff @@
## master #190 +/- ##
========================================
+ Coverage 94.1% 94.3% +0.1%
========================================
Files 21 21
Lines 1189 1193 +4
========================================
+ Hits 1120 1125 +5
+ Misses 69 68 -1
|
tests/test_s3_specific.py
Outdated
|
||
Ref: https://github.com/boto/boto3/issues/377 | ||
""" | ||
if live_s3_like_rig.live_server: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rather than this if
inside the test body, I think it would be clearer to skip the whole test.
https://docs.pytest.org/en/latest/how-to/skipping.html#id1
@pytest.mark.skipif(os.getenv("USE_LIVE_CLOUD") == "1")
def test_fake_directories(live_s3_like_rig):
or maybe we can set a global flag USE_LIVE_CLOUD = True
in this if block in conftest.py
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The existing tests used this format, so I cleaned up all of them to do if not live_s3_like_rig.live_server: pytest.skip()
.
I think this is best of both worlds since now we could have different env vars per rig/backend at some point if we wanted without any changes at the test level, but we still show skip rather than pass if we don't run it.
I actually didn't realize / maybe forgot that we could explicitly create objects with trailing from pathlib import Path
p1 = Path("foo")
p1
#> PosixPath('foo')
p1.parts
#> ('foo',)
p2 = Path("foo/")
p2
#> PosixPath('foo')
p2.parts
#> ('foo',)
p1 == p2
#> True
from cloudpathlib import CloudPath
cp1 = CloudPath("s3://bucket/foo")
cp1.key
#> 'foo'
cp1.name
#> 'foo'
cp1.parts
#> ('s3://', 'bucket', 'foo')
cp2 = CloudPath("s3://bucket/foo/")
cp2.key
#> 'foo/'
cp2.name
#> 'foo'
cp2.parts
#> ('s3://', 'bucket', 'foo')
cp1 == cp2
#> False
cp2 == CloudPath("s3://bucket/foo/bar").parent
#> False
cp2 / "bar"
#> S3Path('s3://bucket/foo/bar')
cp2 == (cp2 / "bar").parent
#> False Created at 2022-01-05 18:04:04 EST by reprexlite v0.4.2 |
#51 is the existing issue. I guess one learning is that I was wrong when I said:
|
@jayqi updated to address comments + add note to changelog |
#148 reported inconsistent
is_dir
behavior between having a trailing/
on S3 and not having that/
.The root cause of this is that AWS does create "fake" folders. According to this boto3 thread these are just zero-size keys ending with
/
. This fix changes our file/dir check on s3 to count these zero-size keys as directories rather than files.Note: since we use
load
to check for existence, this change does not incur any additional network calls.Testing:
is_dir
/is_file
test (note: I discovered that these pass without the change since we do not create these fake folders normally, but may be useful so I left them in)Closes #148