Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding anonymous flag to s3 #70

Merged
merged 3 commits into from
Sep 10, 2021
Merged

Conversation

drabastomek
Copy link
Contributor

Without the anonymous=True flag I was not able to read data from the ursa-labs-taxi-data S3 bucket. I updated the S3 section so this should work every time.

Copy link
Member

@westonpace westonpace left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the idea as it has already come up once before. It's a clear win for the case where we were explicitly using a filesystem already. However, for the other recipe, I think we were intentionally demonstrating how to use a URI (instead of creating a filesystem). If that is the case I'd leave the temperamental approach for now. If that wasn't the intent of the recipe then I'm fine with both changes. I don't feel too strongly either way.

@amol- @jorisvandenbossche

Comment on lines 422 to 426
dataset = ds.dataset("s3://ursa-labs-taxi-data/2011",
dataset = ds.dataset("2011",
filesystem=s3
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I suppose I feel this changes the intent a little if the goal was to demonstrate the ability to create a dataset from a URI.

That being said, I don't know how to make this test work effectively with a URI. If anonymous is not provided then credentials will be fetched from by the AWS SDK (e.g. from ~/.aws/credentials). If no credentials at all are found then the request will fail. If you configure any kind of credentials then the request will pass. Perhaps the C++ library could catch the "no credentials found" error and try again with anonymous access.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@westonpace I understand the reasoning about the URI. Perhaps, instead of changing this to filesystem we do, as you suggest, add ~/.aws/credentials as an explicit step before this code? I think this would make sure this code works even if credentials are not present.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@westonpace I changed the code back to using URI and added a warning that if this code fails most likely it's because of the AWS credentials not being set. Added instructions how to set these properly.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I'm good with this approach. If amol- or jorisvandenbossche have better ideas when they get back we can fix it then.

@westonpace westonpace merged commit 9750a64 into apache:main Sep 10, 2021
.. warning::

If the above code throws an error most likely the reason is your
AWS credentials are not set. Follow these instructions to get
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on #56, it seems you can also run into errors if you have set AWS credentials? (in which case adding anonymous=True can help, so it might still be worth mentioning it here in the warning?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would need to revert to adding filesystem parameter rather than a anonymous flag directly in the dataset object. Unless I'm doing something wrong.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I wouldn't suggest changing the actual example again to use filesystem instead of a URI. But in the warning box it might still be worth mentioning the anonymous flag (and that you can then pass it with filesystem=), as based on my reading of #56, it might solve similar errors.

Now, it's not really clear to me when what is a exactly giving problems. In #56 it's having credentials set that makes reading the public bucket not work, except if you use anonymous flag, while the explanation added in the PR above seems to indicate that setting credentials can help.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense. I'll update the warning.

However, looks like #56 was having the reverse of my problems: having the credentials file but still unable to connect to a public bucket. I wonder if the auth code on our end is buggy or the AWS throws...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I experimented and it seems, for me at least, the S3 URL works if I have no credentials file (presumably it is anonymous in that case) and it works if I have credentials belonging to the Ursa organization in S3. It fails if I have invalid credentials (not an interesting use case) and if I have valid credentials that aren't part of the Ursa organization (this is the case I think that people keep running into unexpectedly, which is aided by the anonymous=True flag)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants