-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
OSF Data Download and Upload Scripts #25
Conversation
The unit tests for TEMUpload are currently failing as they require an authorization token from OSF.io to run. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall looks very clean. I liked the typing.
It would be good to include some screenshots of what the code does in the PR. Ie how things look like on a browser once they are uploaded.
Also, let's try to find some fix for the private token... So the issue is that we don't want anyone to be able to interact with OSF, but only those with the token. And if we put the token on github, then it's compromised...
There's some issues with importing. Have a look at the details of the failed Linting and Testing
|
You need to install simSPI in https://github.com/compSPI/ioSPI/blob/dataset/upload/environment.yml from github See how in simSPI we import ioSPI from git hub: https://github.com/compSPI/simSPI/blob/master/environment.yml#L23 |
Converted to draft to review and refactor |
I'm confused. Are we going ahead with PR #40, or is a major move of tem specific code from ioSPI --> simSPI called for? |
We are currently in the process of adjusting this PR to match up with master (which has had PR#46 merged in). That PR (refactoring) moved things into ioSPI that were TEM-specific - the major move has already happened. All the code we will push into ioSPI will be TEM-agnostic, whereas TEM-specific items will be pushed into their relevant places in simSPI. Unfortunately this will take us some time to get sorted which has caused some confusion - apologies! |
Codecov Report
@@ Coverage Diff @@
## master #25 +/- ##
==========================================
- Coverage 97.53% 96.43% -1.09%
==========================================
Files 4 5 +1
Lines 121 168 +47
==========================================
+ Hits 118 162 +44
- Misses 3 6 +3
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking good, great job on connecting with OSF through the tokens!! 🙌 not an easy task.
Important remarks:
- As mentioned in the previous code reviews: ioSPI cannot have elements that are simulator-specific: ioSPI is simulation agnostic: the name TEMUpload does not go in this direction and should be changed, there should be no sim_config.yml, etc.
- This PR also does not seem to respect the design that we discussed about ioSPI:
- the io functions go in files that are named after the biological structures they deal with: in this case, if you are downloading or uploading atomic models ("molecules"), then these functions should go in the atomic_models.py (your file osf_upload.py is misnamed, we do not name files in terms of where we are pulling the data from).
- we also decided that we would respect a write/read convention to upload or download data: is there any reason why you are not using this? (you use "post", "get", etc?)?
Let me know?
ioSPI/osf_upload.py
Outdated
|
||
|
||
class TEMUpload: | ||
"""Class to upload data to OSF.io generated by simSPI TEM Simulator. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The ioSPI module is agnotic from the TEM simulator (and any type of simulator).
Either:
- put only functions that know how to upload to OSF (independently of where the data initially came from)
- if this is TEM-specific, it should go in simSPI
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pretty much this whole file is already simSPI agnostic. I think you can just change some docstrings and function / class names for generality. I didn't notice much code at all that was intimately connected to the type of data that will be uploaded.
ioSPI/osf_upload.py
Outdated
Returns | ||
------- | ||
dict of type str : str | ||
Returns dictionary of node labels mapped to node GUIDs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NIT: Missing . at the end
ioSPI/osf_upload.py
Outdated
See Also | ||
-------- | ||
Protein Data Bank(PDB) : https://www.rcsb.org/ | ||
EM DataR esouce(EMDB) : https://www.emdataresource.org/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typo
ioSPI/osf_upload.py
Outdated
Parameters | ||
---------- | ||
token : str | ||
Personal token from OSF.io with access to cryoEM dataset. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could generalize Personal token from OSF.io with access to dataset (e.g. cryoEM, etc)
.
tests/test_files/path_config.yaml
Outdated
# absolute paths | ||
pdb_file: './test_files/4v6x.pdb' | ||
mrc_keyword: '_randomrot' | ||
output_dir: './test_files' | ||
local_sim_dir: './TEM-simulator' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ninamiolane this seems fine to me. While the ioSPI code is agnostic to simulator, it needs to be tested on something concretely.
tests/test_files/sim_config.yaml
Outdated
@@ -0,0 +1,39 @@ | |||
molecular_model: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is part of the "test meta data" that is uploaded. It makes sense to me to include it in the test to be uploaded.
tests/test_osf_upload.py
Outdated
random.choice(string.ascii_letters) for i in range(5) | ||
) | ||
|
||
print(f"Creating test node CryoEM Dataset -> internal -> {test_node_label} ") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Won't your code generalize beyond cryoEM datasets? If so, just refer to Dataset
not CryoEM Dataset
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generalize for any dataset as @ninamiolane is emphasizing. It seems that it shouldn't be to much work.
Ok so don't merge any more code that isn't aligned with the vision of ioSPI. |
…d instances of molecules to structures
@ninamiolane @geoffwoollard Thank you for the review! As pointed out while the script was TEM-agnostic, the docstrings/method names certainly weren't. I've made a few changes based on the feedback above.
Please let me know if these changes work. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Excellent, thank you very much!
@thisTyler @jedyeo @calhep
TEMUpload uploads data generated by TEMSimulator in the structure to CryoEM-dataset page the discussed here.
The implementation for
generate_tags_from_tem()
andtest_post_files()
will be merged in an upcoming PR.