-
Notifications
You must be signed in to change notification settings - Fork 46
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding anonymous flag to s3 #70
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the idea as it has already come up once before. It's a clear win for the case where we were explicitly using a filesystem already. However, for the other recipe, I think we were intentionally demonstrating how to use a URI (instead of creating a filesystem). If that is the case I'd leave the temperamental approach for now. If that wasn't the intent of the recipe then I'm fine with both changes. I don't feel too strongly either way.
python/source/io.rst
Outdated
dataset = ds.dataset("s3://ursa-labs-taxi-data/2011", | ||
dataset = ds.dataset("2011", | ||
filesystem=s3 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, I suppose I feel this changes the intent a little if the goal was to demonstrate the ability to create a dataset from a URI.
That being said, I don't know how to make this test work effectively with a URI. If anonymous
is not provided then credentials will be fetched from by the AWS SDK (e.g. from ~/.aws/credentials). If no credentials at all are found then the request will fail. If you configure any kind of credentials then the request will pass. Perhaps the C++ library could catch the "no credentials found" error and try again with anonymous access.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@westonpace I understand the reasoning about the URI. Perhaps, instead of changing this to filesystem
we do, as you suggest, add ~/.aws/credentials
as an explicit step before this code? I think this would make sure this code works even if credentials are not present.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@westonpace I changed the code back to using URI and added a warning that if this code fails most likely it's because of the AWS credentials not being set. Added instructions how to set these properly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, I'm good with this approach. If amol- or jorisvandenbossche have better ideas when they get back we can fix it then.
.. warning:: | ||
|
||
If the above code throws an error most likely the reason is your | ||
AWS credentials are not set. Follow these instructions to get |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Based on #56, it seems you can also run into errors if you have set AWS credentials? (in which case adding anonymous=True can help, so it might still be worth mentioning it here in the warning?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This would need to revert to adding filesystem
parameter rather than a anonymous
flag directly in the dataset
object. Unless I'm doing something wrong.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I wouldn't suggest changing the actual example again to use filesystem
instead of a URI. But in the warning box it might still be worth mentioning the anonymous
flag (and that you can then pass it with filesystem=
), as based on my reading of #56, it might solve similar errors.
Now, it's not really clear to me when what is a exactly giving problems. In #56 it's having credentials set that makes reading the public bucket not work, except if you use anonymous flag, while the explanation added in the PR above seems to indicate that setting credentials can help.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense. I'll update the warning.
However, looks like #56 was having the reverse of my problems: having the credentials
file but still unable to connect to a public bucket. I wonder if the auth code on our end is buggy or the AWS throws...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I experimented and it seems, for me at least, the S3 URL works if I have no credentials file (presumably it is anonymous in that case) and it works if I have credentials belonging to the Ursa organization in S3. It fails if I have invalid credentials (not an interesting use case) and if I have valid credentials that aren't part of the Ursa organization (this is the case I think that people keep running into unexpectedly, which is aided by the anonymous=True flag)
Without the
anonymous=True
flag I was not able to read data from theursa-labs-taxi-data
S3 bucket. I updated the S3 section so this should work every time.