Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

use latest config in S3Path's unless passed explicitly #175

Merged
merged 16 commits into from Aug 17, 2021
Merged

use latest config in S3Path's unless passed explicitly #175

merged 16 commits into from Aug 17, 2021

Conversation

ericphanson
Copy link
Member

@ericphanson ericphanson commented Jul 19, 2021

Closes #168

  • needs tests
  • decide if its breaking (edit: likely should be breaking to ensure there aren't subtle errors downstream, even though it's somewhat unlikely this behavior is being relied on)

As an aside, I think this plus #161 will make S3Paths very ergonomic for reproducible usage, e.g.

module MyModellingPackage

const MyDataSet = S3Path(bucket, key; version)

...fun modelling work...

end # module

Then, since it's hardcoded, bumping the dataset means making a new commit, so the code and data versions are always tied for reproducibility (assuming you commit a manifest of course).

@ericphanson
Copy link
Member Author

bors try

bors bot added a commit that referenced this pull request Jul 19, 2021
@bors
Copy link
Contributor

bors bot commented Jul 19, 2021

try

Build failed:

Project.toml Show resolved Hide resolved
src/AWSS3.jl Outdated Show resolved Hide resolved
@ericphanson
Copy link
Member Author

bors try

bors bot added a commit that referenced this pull request Jul 19, 2021
@bors
Copy link
Contributor

bors bot commented Jul 19, 2021

@ericphanson
Copy link
Member Author

No tests failed, which is a bit encouraging that it's not breaking. Basically, the case that will fail is:

global_aws_config(; arguments_1...)
path = S3Path(bucket, key) # only valid with `arguments_1`

global_aws_config(; arguments_2...)
read(path) # works on master/release; fails on this PR because it picks up the new config

On release/master, path will hold the config with arguments_1 so this will work, whereas with this PR, it will pick up the latest global config (with arguments_2) and fail.

However, if you pass a config like

config = global_aws_config(; arguments_1...)
path = S3Path(bucket, key; config) # only valid with `arguments_1`

global_aws_config(; arguments_2...)
read(path) # still works

then it will "freeze" that config in like it does now. So it's only about the default when you don't pass a config explicitly.

I think the current behavior is surprising/unintuitive; most users expect a "path" to just be a stateless indicator/pointer/etc to an asset, not something stateful that depends on when it was created. Therefore, I think there's a decent argument to be made that this is a bugfix and likely anyone who wanted to create their paths with a particular config (different from the global one they'd be using later) would already be passing a config explictly. However, given AWSS3 likely has a large set of users, I also wouldn't be that surprised if someone was relying on freezing the global config in a path and then changing the global config.

If anyone has any thoughts about whether this should be a breaking release or not please share them. I think we shouldn't be too afraid to make breaking releases that improve the semantics, however, especially since we are pre-1.0 anyway, so I would be fine with just doing the breaking version bump here.

@ericphanson
Copy link
Member Author

bors try

bors bot added a commit that referenced this pull request Jul 19, 2021
@bors
Copy link
Contributor

bors bot commented Jul 19, 2021

try

Build failed:

@ericphanson
Copy link
Member Author

bors try

bors bot added a commit that referenced this pull request Jul 19, 2021
@bors
Copy link
Contributor

bors bot commented Jul 19, 2021

@ExpandingMan
Copy link
Contributor

In my opinion this should be a breaking change because the default behavior has changed. Had you left the defaults alone, it would be non-breaking.

On second thought, it makes me a little nervous to change the default at all... it seems reasonable to expect S3Path not to have to suffer the overhead of making this additional request every time it is used... Not sure if anyone has any strong opinions about it, but after a few moments of reflection my instinct would be not to make this a new default.

@ericphanson
Copy link
Member Author

There is no additional request; once the global config has been set once it is just deferencing a ref: https://github.com/JuliaCloud/AWS.jl/blob/76619b840682cea50c4069f89e2e36ee6f63b802/src/AWS.jl#L44-L62

@ExpandingMan
Copy link
Contributor

Clearly I did not read this carefully enough. That sounds great, I suppose this is non-breaking then. Might be worth waiting for someone more lucid than I to chime in before merging. I'm not sure if this will break anything for anyone, but I suspect that if it does it will be in a subtle and infuriating way.

@ericphanson
Copy link
Member Author

Ok, I think breaking is a reasonable choice then. That will help make sure no one is caught off guard by this change, and those relying on the previous behavior will have the opportuntity to update (by passing in the config as e.g. changing S3Path(bucket, key) to S3Path(bucket, key; config = global_aws_config())).

To help justify why I think this change is still worth a breaking release: the current behavior can be pretty unexpected, and if you aren't in us-east-1 which is the default region, the wrong config means 301 errors when you try to use the path. Users often don't expect the config to be "baked in", so they are confused when they create a path (maybe at precompile time), then set the region (maybe at __init__ time), then use the path (maybe at runtime), and then get an error. Region is just one example; any other choice requiring a custom config will be similar. Additionally, one can always pass a config object directly in order to recover the previous behavior.

@ericphanson ericphanson requested review from rofinn and omus July 19, 2021 23:43
@rofinn
Copy link
Member

rofinn commented Jul 20, 2021

Hmm, I wonder if this would be better handled in AWS.jl with a LazyAWSConfig <: AbstractAWSConfig type? Introducing that change wouldn't be breaking to either package and would allow you to use that config outside of the S3Path context. Thoughts?

@ericphanson
Copy link
Member Author

Hm, interesting idea. Are there other objects that "bake in" the config like this? Most of what I've used AWS.jl for has been with S3Paths or explicit API calls like

@service CloudWatch_Logs
CloudWatch_Logs.describe_log_streams(log_group, params)

which gets the latest config automatically (or you can pass one with aws_config).

The other concern I have with that idea is that I don't think it would work to support this kind of use-case:

module MyModellingPackage

const MyDataSet = S3Path(bucket, key)

function __init__()
    global_aws_config(; region="us-east-2")
    return nothing
end

...fun modelling work...

end # module

Here, the S3Path is created at precompilation time, which means it bakes in the wrong region (since the "right" region in this scenario is set at __init__ time). If we created a LazyAWSConfig, we could set it in __init__, but it's too late at that point, since the S3Path is created at precompile time with the wrong config.

Copy link
Member

@omus omus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall I agree with being able to decouple an S3Path from a specific AWS configuration. One thought on this: is it useful to have an S3Path with an embedded config? I think the dynamic behaviour allowed by nothing here is probably what most people want. I suspect including the config as field was done mainly as a convenience to avoid having to pass in a configuration when making a call such as stat(::S3Path).

This makes me think if we're making a breaking change anyway possibly we should remove the config field from S3Path and allow passing in an alternate config when one is actually used.

src/s3path.jl Outdated
@@ -1,4 +1,4 @@
struct S3Path{A<:AbstractAWSConfig} <: AbstractPath
struct S3Path{A<:Union{Nothing, AbstractAWSConfig}} <: AbstractPath
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't be breaking but there may be some incompatibilities with packages that dispatch on S3Path{<:AbstractAWSConfig}

src/s3path.jl Outdated Show resolved Hide resolved
src/s3path.jl Outdated Show resolved Hide resolved
src/AWSS3.jl Outdated Show resolved Hide resolved
@rofinn
Copy link
Member

rofinn commented Jul 21, 2021

@omus Removing the config field defeats the point of the type though. At that point you'll need to know it's an S3Path at the call-site which removes the entire benefit of this type. I still think we could replace the config fields with a Function that returns a config. If folks want to memoize that function for efficiency they can.

@omus
Copy link
Member

omus commented Jul 21, 2021

I'm definitely not going to push on removing config as part of this PR as it'll most definitely cause this to stall out. We can discuss the idea further in the original issue: #168

@bors
Copy link
Contributor

bors bot commented Aug 13, 2021

src/s3path.jl Outdated Show resolved Hide resolved
src/s3path.jl Outdated Show resolved Hide resolved
src/s3path.jl Outdated Show resolved Hide resolved
src/s3path.jl Show resolved Hide resolved
@christopher-dG
Copy link
Member

@ericphanson 0.8.8 is now out, so this can move forward. needs some conflict resolution to start.

@ericphanson
Copy link
Member Author

very exciting! if your schedule allows it please feel free to pick this up, otherwise I will try to get to it soon

@christopher-dG
Copy link
Member

bors try

bors bot added a commit that referenced this pull request Aug 17, 2021
@bors
Copy link
Contributor

bors bot commented Aug 17, 2021

try

Build failed:

@christopher-dG
Copy link
Member

bors try

bors bot added a commit that referenced this pull request Aug 17, 2021
@bors
Copy link
Contributor

bors bot commented Aug 17, 2021

try

Build failed:

@christopher-dG
Copy link
Member

bors try

bors bot added a commit that referenced this pull request Aug 17, 2021
@bors
Copy link
Contributor

bors bot commented Aug 17, 2021

@christopher-dG
Copy link
Member

Tests are passing. Not sure what else needs to be done for this, if anything.

Copy link
Member

@mattBrzezinski mattBrzezinski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

Copy link
Member Author

@ericphanson ericphanson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

last changes looks good to me, thanks @christopher-dG ! Not sure how we want to do merging for v0.9, there's a few breaking items on the milestone so it would be good to get them in together. Should we just call master v0.9 and start merging stuff in? (And not tag a release until the milestone is completed? We can of course still backport stuff on a branch if something comes up).

Copy link
Member

@omus omus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

src/s3path.jl Outdated Show resolved Hide resolved
test/s3path.jl Outdated Show resolved Hide resolved
@omus
Copy link
Member

omus commented Aug 17, 2021

Not sure how we want to do merging for v0.9, there's a few breaking items on the milestone so it would be good to get them in together. Should we just call master v0.9 and start merging stuff in? (And not tag a release until the milestone is completed? We can of course still backport stuff on a branch if something comes up).

I would just merge the changes we want to be in v0.9 into master, bump the version, then tag a release.

Co-authored-by: Curtis Vogt <curtis.vogt@gmail.com>
@christopher-dG
Copy link
Member

bors r+

@bors
Copy link
Contributor

bors bot commented Aug 17, 2021

@bors bors bot merged commit 4a3b7a4 into JuliaCloud:master Aug 17, 2021
@omus omus deleted the eph/default-config-nothing branch August 18, 2021 14:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

S3Paths: get config at use-time, not construction time?
6 participants