Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

VR-3505: Capture versioning information from S3 #526

Merged
merged 12 commits into from
Apr 17, 2020
Merged

Conversation

convoliution
Copy link
Contributor

@convoliution convoliution commented Apr 16, 2020

Still needs to be able to accept user-specified version_ids.
Figuring out the API is complicated because verta.dataset.S3() accepts either

  • a single string
  • a list of strings

that can be either

  • a bucket
  • a specific key!

I could separate out the multiple argument types into S3.read_bucket(bucket) and S3.read_object(key) utils similar to Python, but that would break the existing API and also make it more verbose to use.

@conradoverta
Copy link
Contributor

The concern of adopting something like Python style isn't breaking or verbosity, but the fact that they represent very different styles. Python is divided because you are talking about multiple sources of information that would be confusing if they overlapped. On the other hand, giving a path to a bucket, or to an item, or to a list of them falls naturally inside the same class of "set of locations to register". So having an API similar to the one we have makes things simpler from user POV. We can expand what a location means besides just a string though.

@convoliution
Copy link
Contributor Author

Ahhh, I see. I'd been trying to figure out what the conceptual distinction was!

client/verta/verta/dataset/_s3.py Outdated Show resolved Hide resolved
client/verta/verta/dataset/_s3.py Outdated Show resolved Hide resolved
@convoliution convoliution marked this pull request as ready for review April 17, 2020 16:13
s3 = pytest.importorskip("boto3").client('s3')
S3_PATH = verta.dataset.S3._S3_PATH

bucket = "verta-versioned-bucket"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm assuming you already put some data there.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is data there! I almost forgot:
@ravishetye fyi, you had created this bucket to test versioning with. I'm now using this bucket for our automated Client tests. If this would pose any problem for you, I can create a separate bucket.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

go for it. just a related datapoint when we version stuff and delete the object, the object is not actually deleted. We need to delete the version to keep the s3 cost in control.

@convoliution convoliution merged commit cf8926a into master Apr 17, 2020
@convoliution convoliution deleted the client/ml/s3-ver branch April 17, 2020 18:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants