-
Notifications
You must be signed in to change notification settings - Fork 80
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Download bulk raw data #2104
Download bulk raw data #2104
Conversation
@@ -77,8 +77,6 @@ def test_download_study(self): | |||
with open(tgz, 'w') as f: | |||
f.write('\n') | |||
|
|||
self._clean_up_files.append(tmp_dir) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rm cause it's twice, see line 65
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Couple of questions
qiita_pet/handlers/download.py
Outdated
for i, (fid, path, data_type) in enumerate(a.filepaths): | ||
# validate access only of the first artifact filepath, | ||
# the rest have the same permissions | ||
if (i == 0 and not vfabu(user, fid)): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A user can only have access to only some of the filepaths associated with a study?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To clarify, validate_filepath_access_by_user checks that the specific user has access to the specific filepath id based on it's data type so we are checking that the user has access to that filepath and if it does it allows download to all the other filepaths within that artifact. Remember an artifact can be formed by several files (filepath ids).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There can be an artifact associated with a study that is inaccessible to a user who has access to the study? That seems weird to me
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. We "normally" use validate_filepath_access_by_user to check that a user has access to a giving filepath for downloading specific files; with that in mind we are reusing it for bulk downloads but we don't really need to as we are checking that the user has "full" access to the artifact. Now, we can leave to be over cautious or remove ... I don't have a preference, @wasade?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know enough about the permissions model to understand the implications. Can someone else who is more familiar weigh in?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given the check on line 175 I don't think there is any case where validate_filepath_access_by_user
will return false. I don't have a preference on leaving or removing.
self.set_header('Cache-Control', 'no-cache') | ||
self.set_header('X-Archive-Files', 'zip') | ||
self.set_header('Content-Disposition', | ||
'attachment; filename=%s' % zip_fn) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
where is the file created?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nowhere, is created on the fly by nginx. Basically, we send a list of filepaths to nginx using the "protected" filepath (only nginx has access) and the zip is created during download.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, thought that might be the case. Can you add a comment since it's implicit?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
eh?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that this is deployed in the qiita-test env but we have some permissions issues (for nginx all files/folders should have read access) that @jdereus is fixing.
qiita_pet/handlers/download.py
Outdated
for i, (fid, path, data_type) in enumerate(a.filepaths): | ||
# validate access only of the first artifact filepath, | ||
# the rest have the same permissions | ||
if (i == 0 and not vfabu(user, fid)): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To clarify, validate_filepath_access_by_user checks that the specific user has access to the specific filepath id based on it's data type so we are checking that the user has access to that filepath and if it does it allows download to all the other filepaths within that artifact. Remember an artifact can be formed by several files (filepath ids).
self.set_header('Cache-Control', 'no-cache') | ||
self.set_header('X-Archive-Files', 'zip') | ||
self.set_header('Content-Disposition', | ||
'attachment; filename=%s' % zip_fn) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nowhere, is created on the fly by nginx. Basically, we send a list of filepaths to nginx using the "protected" filepath (only nginx has access) and the zip is created during download.
qiita_pet/handlers/download.py
Outdated
for i, (fid, path, data_type) in enumerate(a.filepaths): | ||
# validate access only of the first artifact filepath, | ||
# the rest have the same permissions | ||
if (i == 0 and not vfabu(user, fid)): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given the check on line 175 I don't think there is any case where validate_filepath_access_by_user
will return false. I don't have a preference on leaving or removing.
Just looking for that one comment |
ah, missed it, thanks @antgonza for linking directly |
Depends on #2102, please review/merge that one first.