New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DMPTool Waterbutler provider #204
Conversation
… the code might not be called by dmptool
…der.py and is fully async.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
class DmptoolFileMetadata(metadata.BaseFileMetadata): | ||
|
||
def __init__(self, raw): | ||
metadata.BaseFileMetadata.__init__(self, raw) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not use super().__init__()
?
|
||
@property | ||
def extra(self): | ||
return super(DmptoolFileMetadata, self).extra.update({ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think it's necessary to pass arguments to super()
for 3.5 classes.
|
||
return result | ||
|
||
async def plans_full(self, id_=None, format_='json'): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any particular reason for the underscores at the end of the kwarg names? Just curious, I've haven't seen that before.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This method, like plans()
, should probably be split into two methods
try: | ||
plan_id = path.parts[1].raw | ||
pdf = await self._dmptool_plan_pdf(plan_id) | ||
except Exception as e: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried downloading a "Test or Practice"-visible plan, but got the following message: {"response": "You are not authorized to look at this content."}
. Are those not available via the API?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A possible bug that won't be corrected: The DMPTool API does not return the PDF for plans with test visibility.
""" | ||
raise exceptions.ReadOnlyProviderError(self) | ||
|
||
async def _do_intra_move_or_copy(self, dest_provider, src_path, dest_path): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think you need this method, it's not called by anything else.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But you will want to add move()
and copy()
methods that respect the readonly-ness of DMPTool. See the btibucket provider for examples: https://github.com/CenterForOpenScience/waterbutler/pull/198/files#diff-af4a691a976fbf59e9aabd632386d206R191
else: | ||
resp = await self.get_url_async('plans/{}'.format(id_)) | ||
r = await resp.json() | ||
result = r.get('plan') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor: since the only commonality between these two paths is return result
, how about splitting this into two methods, _get_plans()
and _get_plan(id)
?
def plans_templates(self): | ||
return self._unroll(self.get_url('plans_templates').json()) | ||
|
||
def institutions_plans_count(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This and the previous two methods don't seem to be used anywhere. Are these stubs for future development, or just code that hasn't been trimmed yet?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
About using id_ and format_ as parameter names, I've gotten in the habit of trying to avoid names that conflict with functions in the builtins. Some discussion at https://stackoverflow.com/questions/16523789/naming-conflict-with-built-in-function. I'm ok with taking the _
out since the code should work without the underscore (I think!).
else: | ||
return None | ||
|
||
async def plans_owned(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is probably already on your todo list, but can you clarify the difference between plans()
, plans_full()
, and plans_owned()
?
removed unused method: plans_templates, institutions_plans_count
I thought the Download-as-a-zip function was working but although I got a valid zip file with several files with the right names and |
Another bug for which I'd appreciate help debugging: on a page for loading a specific DMPTool pdf (e.g., http://localhost:5000/kjayw/), the files sidebar doesn't finish loading: The error in the console:
When I dug into the problem, the problem boils down to osf.io/index.js at 732b9fff76492ea22867d41780f7cb3eb31c9505 · rdhyee/osf.io in which Any suggestion about how to debug this problem? |
Sorry, for replying out-of-band, but for some reason GH won't let me comment on #204 (comment) directly (the issue with DAZ not working). Yes, I suspect setting the size to 0 is your problem. When we build a DAZ zipfile, we use the streaming zipfile spec, which requires that we pass correct file size information in the zip central directory data structure. If setting the correct file size doesn't work, PLMK, and I'll dig deeper. |
Re: #204 (comment) (storageAddon) -- those are defined in https://github.com/CenterForOpenScience/osf.io/blob/develop-backup/website/static/storageAddons.json. You probably just need to add a key for |
Re need to calculate the length of the files: I will likely need to download a file to calculate to size of the file. Is there a smart way to keep my addon from downloading a file multiple times in the construction of the zip file? That is, is there any caching mechanism available for me to use? On further reflection: I'm going to look at whether doing a |
I may have spoke to soon about the file sizes. DAZ does download the file, and keeps a tally of its size as it goes, which I think is what is fed to the zipfile builder. For some reason, I was thinking you were overriding that and passing 0 as the size. I'm not sure where I got that idea from. I'll have another look. |
About file size: I think I have been setting the length to 0 somewhere (I can chase down the reference). It is possible to not set the length at all? |
You can override the Content-Length header when downloading a file, but we actually keep a separate tally when building the zip, since providers are pretty unreliable. Manually setting the length to zero sounds weird to me, but it shouldn't affect the building of the zipfile. I wonder if the files are getting truncated, or if maybe the deflater is corrupting them? |
…ded but all the files have file names with the id (with no .pdf suffix)
Re: adding A few things I don't understand:
Any hints as to how to debug the problem would be appreciated |
Re: 1.) Good catch! I'm not sure where the wording for that is set, so I'll need to dig into the code to figure it out. I suspect it's something that needs to be changed in the osf PR. 2.) That exception is okay. When a process takes too long to complete it gets shunted off into celery, and we throw the 3+4.) You definitely need the waterbutler celery worker running for this. Once the process is sent to WB's celery, any debugging output will appear in its terminal window. 5.) Sorry, what v2 api methods are you referring to? |
|
1.) debugging celery: does the request get logged in the celery worker window? Or does it just remain blank? I just tried it locally using your branch and was able to successfully copy a plan to osfstorage. 2.) Hah! ETOOMANYAPIS: that's referring to the OSF API, for which v2 is the latest version. 3.) They should be appearing. Did you restart the celery task after you added those? |
…Path. The zip file now has working pdf files with the right names.
…vidual files from DMPTool, the zip file, and in the copy to OSF Storage
Changes Unknown when pulling b6e8254 on rdhyee:feature/dmptool into ** on CenterForOpenScience:develop**. |
Upstream DMPTool will be releasing a new version of their API soon. This PR has been archived as the |
Paired with CenterForOpenScience/osf.io#7030