-
Notifications
You must be signed in to change notification settings - Fork 283
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
VR-3505: Capture versioning information from S3 #526
Conversation
The concern of adopting something like |
Ahhh, I see. I'd been trying to figure out what the conceptual distinction was! |
s3 = pytest.importorskip("boto3").client('s3') | ||
S3_PATH = verta.dataset.S3._S3_PATH | ||
|
||
bucket = "verta-versioned-bucket" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm assuming you already put some data there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is data there! I almost forgot:
@ravishetye fyi, you had created this bucket to test versioning with. I'm now using this bucket for our automated Client tests. If this would pose any problem for you, I can create a separate bucket.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
go for it. just a related datapoint when we version stuff and delete the object, the object is not actually deleted. We need to delete the version to keep the s3 cost in control.
Still needs to be able to accept user-specified
version_id
s.Figuring out the API is complicated because
verta.dataset.S3()
accepts eitherthat can be either
I could separate out the multiple argument types into
S3.read_bucket(bucket)
andS3.read_object(key)
utils similar toPython
, but that would break the existing API and also make it more verbose to use.