Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

merge from databricks #653

Open
wants to merge 14 commits into
base: master
Choose a base branch
from
Open

merge from databricks #653

wants to merge 14 commits into from

Conversation

kahing
Copy link
Owner

@kahing kahing commented Aug 16, 2021

No description provided.

dotslash and others added 14 commits August 15, 2021 16:58
After #496, goofys gets confused about directories in azure if the there is a dir/ blob. This is how the bug looks like
- ls root/dir : Fails
- ls root/; ls root/dir works

This PR addresses this issue.
…82)

Both in s3 and azure, it is possible that when we call listPrefix with limit=N, we might get a result whose size is smaller than N, but still has a continuation token. This behaviour does not hurt when we are listing the fuse files under a directory. But when doing dir checks i.e. 1) testing if the given path is a directory 2) if a given directory is empty, this can make goofys wrongly think a directory is empty or a given prefix is not a directory.

Add a wrapper in list.go, that does this: If the backend returns less items than requested and has a continuation token, it will use the continuation token to fetch more items.
* [impl] add loggings for nil adlv1 blob last modified and size fields

* [impl] log full path of the blob

* [fix] change log level to warn from debug

* [fix] dereference directly when not nil
add a bg mount option that delays init
Currently, after committing a blob for azure, we continue renewing lease. This PR fixes this behavior.
* add logs
* more logs
* fix break
* clean up logs
* just return
for aws private link host style access resolves to the correct private
link IP (after some delay), whereas s3.amazonaws.com may not be accessible

exception is when bucket isn't a valid DNS name, or it contains `.' because the latter will fail SSL verification

Example:
-  An endpoint typically looks like this `s3.us-east-2.amazonaws.com`.
- We were doing   `HEAD https://s3.us-east-2.amazonaws.com`
- Instead of (one of the following)
  - `HEAD https://mybucket.s3.us-east-2.amazonaws.com`
  - `HEAD https://s3.us-east-2.amazonaws.com/mybucket`

see:
1. https://docs.aws.amazon.com/general/latest/gr/s3.html
2. https://docs.databricks.com/data/data-sources/aws/amazon-s3.html#configuration
the previous retry code that attempted to re-use the body buffer never
worked and instead removed the body on retry. This fixes that and
added a test.

also improved adlv2 logging when debug is on, and always dump the
response if the request failed
otherwise it only works for sts in the default region (us-east1)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants