Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Download bulk raw data #2104

Merged
merged 13 commits into from
Apr 11, 2017
Merged

Download bulk raw data #2104

merged 13 commits into from
Apr 11, 2017

Conversation

antgonza
Copy link
Member

Depends on #2102, please review/merge that one first.

@@ -77,8 +77,6 @@ def test_download_study(self):
with open(tgz, 'w') as f:
f.write('\n')

self._clean_up_files.append(tmp_dir)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rm cause it's twice, see line 65

Copy link
Contributor

@wasade wasade left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couple of questions

for i, (fid, path, data_type) in enumerate(a.filepaths):
# validate access only of the first artifact filepath,
# the rest have the same permissions
if (i == 0 and not vfabu(user, fid)):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A user can only have access to only some of the filepaths associated with a study?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To clarify, validate_filepath_access_by_user checks that the specific user has access to the specific filepath id based on it's data type so we are checking that the user has access to that filepath and if it does it allows download to all the other filepaths within that artifact. Remember an artifact can be formed by several files (filepath ids).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There can be an artifact associated with a study that is inaccessible to a user who has access to the study? That seems weird to me

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. We "normally" use validate_filepath_access_by_user to check that a user has access to a giving filepath for downloading specific files; with that in mind we are reusing it for bulk downloads but we don't really need to as we are checking that the user has "full" access to the artifact. Now, we can leave to be over cautious or remove ... I don't have a preference, @wasade?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know enough about the permissions model to understand the implications. Can someone else who is more familiar weigh in?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given the check on line 175 I don't think there is any case where validate_filepath_access_by_user will return false. I don't have a preference on leaving or removing.

self.set_header('Cache-Control', 'no-cache')
self.set_header('X-Archive-Files', 'zip')
self.set_header('Content-Disposition',
'attachment; filename=%s' % zip_fn)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

where is the file created?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nowhere, is created on the fly by nginx. Basically, we send a list of filepaths to nginx using the "protected" filepath (only nginx has access) and the zip is created during download.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, thought that might be the case. Can you add a comment since it's implicit?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

eh?

Copy link
Member Author

@antgonza antgonza left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that this is deployed in the qiita-test env but we have some permissions issues (for nginx all files/folders should have read access) that @jdereus is fixing.

for i, (fid, path, data_type) in enumerate(a.filepaths):
# validate access only of the first artifact filepath,
# the rest have the same permissions
if (i == 0 and not vfabu(user, fid)):
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To clarify, validate_filepath_access_by_user checks that the specific user has access to the specific filepath id based on it's data type so we are checking that the user has access to that filepath and if it does it allows download to all the other filepaths within that artifact. Remember an artifact can be formed by several files (filepath ids).

self.set_header('Cache-Control', 'no-cache')
self.set_header('X-Archive-Files', 'zip')
self.set_header('Content-Disposition',
'attachment; filename=%s' % zip_fn)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nowhere, is created on the fly by nginx. Basically, we send a list of filepaths to nginx using the "protected" filepath (only nginx has access) and the zip is created during download.

for i, (fid, path, data_type) in enumerate(a.filepaths):
# validate access only of the first artifact filepath,
# the rest have the same permissions
if (i == 0 and not vfabu(user, fid)):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given the check on line 175 I don't think there is any case where validate_filepath_access_by_user will return false. I don't have a preference on leaving or removing.

@coveralls
Copy link

Coverage Status

Coverage decreased (-0.06%) to 91.929% when pulling 7d1ea83 on antgonza:download-raw-data into 56a20b4 on biocore:master.

@wasade
Copy link
Contributor

wasade commented Apr 11, 2017

Just looking for that one comment

@wasade
Copy link
Contributor

wasade commented Apr 11, 2017

ah, missed it, thanks @antgonza for linking directly

@wasade wasade merged commit bef4640 into qiita-spots:master Apr 11, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants