-
Notifications
You must be signed in to change notification settings - Fork 6.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix directory permissions for multi-directory globs. Follow-up #50559 #52839
Fix directory permissions for multi-directory globs. Follow-up #50559 #52839
Conversation
This is an automated comment for commit 9e89055 with description of existing statuses. It's updated for the latest CI running ✅ Click here to open a full report in a separate page Successful checks
|
skip_permission_denied
for multi-directory globs, follow up #50559skip_permission_denied
for multi-directory globs. Follow-up #50559
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
…se into zvonand-globs-small-fix
src/Storages/StorageFile.cpp
Outdated
@@ -121,7 +121,7 @@ void listFilesWithFoldedRegexpMatchingImpl(const std::string & path_for_ls, | |||
return; | |||
|
|||
const fs::directory_iterator end; | |||
for (fs::directory_iterator it(path_for_ls); it != end; ++it) | |||
for (fs::directory_iterator it(path_for_ls, fs::directory_options::skip_permission_denied); it != end; ++it) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@zvonand : In the example, SELECT *, _path, _file FROM file('{data1/f1,data2/f2}.csv', CSV)
if user doesn't have rights to data2
looks like we would just skip it without throwing any error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like you're right, didn't think of this.
I'll take a look again.
Will think of a nice way to check if the checked path matches the provided glob. If not -- ignore error, otherwise throw it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Turns out it's not an easy thing to implement, and I'm not sure it is worth it.
AFAIK there is no easy and good way to match string to be a prefix of regex. This would require introducing a rather complex FSM into the code. It may also be very slow, as it would be a lot of creating and modifying strings.
The question is how critical this silent ignore is?
Globs already work like this, even w/o my previous PR: for a query like SELECT *, _path, _file FROM file('{data1,data2}/{f1,f2}.csv', CSV)
there will be no error thrown in case one of specified files or directories don't exist. If at least one matching file exists, it will keep silence.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO we could append the docs with something like a glob with '/' in curly braces will ignore directories that cannot be accessed
.
I don't think that making the code twice as unreadable and three times as complex is a good idea for such a small issue.
Also, it doesn't affect old workflow, it is only about {foo/bar,x/y/z}
globs.
I think it is fine to set some limits for a newly introduced feature :)
skip_permission_denied
for multi-directory globs. Follow-up #50559
Throwing an exception is better than silently ignoring the files. |
In general, sure it is. But this limits the use cases, as there are often inaccessible directories where you don't need to go :). Also, I added a docs update to this PR. It does not affect behavior prior to #50559. So I don't think it is a big deal to modify a thing that no one has used yet. However, a setting may be a good idea. But there are already tons of them, so I'm afraid it would be difficult to find :) |
In the example from the description:
Why this query will even attempt to access test_root? |
See this comment tldr: because there is no way to match a regex prefix against a string (kind of "reverse" matching -- match a prefix of a regex against a given string) the only working approach is to just go recursively inside all directories that are there (until some estimated depth limit is hit) and match the path against a full regex |
Ok. So, we need to traverse the whole directory tree to match by paths (rather than by separate components). That's understandable. |
But it is still a problem - we are doing it the wrong way.
|
We will have to rewrite all of this in the future and drop the setting you've introduced :( |
This comment was marked as outdated.
This comment was marked as outdated.
I will do it. |
Hi @zvonand, What is the proposed fix? |
Yes, exactly like this |
@alexey-milovidov could you please put a |
Follow-up for #50559.
Add setting
ignore_access_denied_multidirectory_globs
to avoidpermission denied
in case there are inaccessible directories/files.Explanation
Having the following structure in
user_files
:Where
data1
,data2
are accessible by CH, but no rights to readtest_root
.CH would throw:
Code: 1001. DB::Exception: std::__1::__fs::filesystem::filesystem_error: filesystem error: in directory_iterator::directory_iterator(...): Permission denied
for a query like
SELECT *, _path, _file FROM file('{data1/f1,data2/f2}.csv', CSV)
.Changelog category (leave one):
Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):
Add setting
ignore_access_denied_multidirectory_globs
.