Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SYNPY-1398] Support .[version] syntax for input SynIDs #1047

Merged
merged 3 commits into from
Jan 25, 2024

Conversation

jaymedina
Copy link
Collaborator

@jaymedina jaymedina commented Jan 19, 2024

problem

syncToSynapse fails for provenance pointing to a specific version of a synID using syn[id].[version] notation (e.g. syn1234.5). This revisits the discussion on the Python client adding support for synID.[version] notation.

image


solution

The scope of this PR is constrained to allowing .version syntax in syncTo/FromSynapse syn.get and synapse get for the CLI

  • syncToSynapse supports .[version] notation
  • syncFromSynapse supports .[version] notation
  • syn.get supports .[version] notation
  • synapse get command supports .[version] notation
  • Corresponding documentation is updated to show new support
  • New unit/integration tests for feature coverage and error handling

testing & preview

support for syn.get(...) :

image

support for synapse get for CLI :

(synapsePythonClient) (synapseclient) bash-3.2$ synapse get syn53182713
Downloaded file: foo.txt
Creating /Users/jmedina/Documents/forks/synapsePythonClient/jout/foo.txt
(synapsePythonClient) (synapseclient) bash-3.2$ mv foo.txt foo_v2.txt
(synapsePythonClient) (synapseclient) bash-3.2$ synapse get syn53182713.1
Downloaded file: foo.txt
Creating /Users/jmedina/Documents/forks/synapsePythonClient/jout/foo.txt
(synapsePythonClient) (synapseclient) bash-3.2$ cat foo.txt
VERSION 1
(synapsePythonClient) (synapseclient) bash-3.2$ cat foo_v2.txt 
VERSION 2
(synapsePythonClient) (synapseclient) bash-3.2$ 

support for syncFromSynapse :
image

support for syncToSynapse :
image

@jaymedina jaymedina self-assigned this Jan 19, 2024
@pep8speaks
Copy link

pep8speaks commented Jan 19, 2024

Hello @jaymedina! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

Line 1027:89: E501 line too long (120 > 88 characters)
Line 1028:89: E501 line too long (117 > 88 characters)
Line 1031:89: E501 line too long (104 > 88 characters)

Line 112:20: E711 comparison to None should be 'if cond is None:'

Comment last updated at 2024-01-25 22:27:28 UTC

@jaymedina
Copy link
Collaborator Author

I had a to-do item in the description for deprecating the version arg in synapse.get but I think it's best to leave that out until the full codebase supports .version syntax, what do you think @BryanFauble

@thomasyu888
Copy link
Member

thomasyu888 commented Jan 24, 2024

I had a to-do item in the description for deprecating the version arg in synapse.get but I think it's best to leave that out until the full codebase supports .version syntax, what do you think @BryanFauble

I think that is a good idea, but I think that would be a pretty big breaking change, and would require stakeholder feedback. Let's hold off for now until I can gather some of that feedback.

One thought I had was to suggest the API itself to support the dot notation. I think both the web and this current feature branch are extracting the version and feeding it to the API.

Another thing is this:

entity = syn.get("syn1234.3")
# entity.id = "syn1234"

The get API endpoint returns this object: https://rest-docs.synapse.org/rest/org/sagebionetworks/repo/model/FileEntity.html.

@jaymedina
Copy link
Collaborator Author

One thought I had was to suggest the API itself to support the dot notation.

I think it's ideal for the Python client behavior to reflect the API and web client as closely as possible, so this makes sense. Do we have anyone from PLATFORM team coming to the demo on Friday? It might also be good to get more feedback from users on how valuable the .version support would be, as it's starting to look like a bigger undertaking than previously thought.

@jaymedina
Copy link
Collaborator Author

jaymedina commented Jan 24, 2024

This test seems to be failing because there is already a File entity called foobarbat in Project syn12914772. I was able to re-create the error in my local machine after running syn.store on my File entity foobarbat consecutively:

image

After deleting my foobarbat file from my Project that I'm testing on, the syn.store call works fine again:

image image

It looks like the Projects don't get deleted after integration tests run. I'm wondering why that is? I notice when I run integration tests locally the Projects remain in my Synapse Projects page indefinitely. Should we start an initiative to update tests so that we have complete teardowns after integration tests complate/fail, or is there a reason for Entities to stay up after the fact? @BryanFauble @thomasyu888

@jaymedina jaymedina force-pushed the SYNPY-1398-support-synid-version branch from d5e7c5b to 8c5ca72 Compare January 25, 2024 14:55
@jaymedina jaymedina force-pushed the SYNPY-1398-support-synid-version branch from 8c5ca72 to 126c3e9 Compare January 25, 2024 15:05
@jaymedina
Copy link
Collaborator Author

jaymedina commented Jan 25, 2024

Another thing is this:

entity = syn.get("syn1234.3")
# entity.id = "syn1234"

The get API endpoint returns this object: https://rest-docs.synapse.org/rest/org/sagebionetworks/repo/model/FileEntity.html.

I'm thinking of handling this in two ways:

  1. Start a new PR to fix syncToSynapse that excludes .version support for syncFromSynapse, syn.get and the CLI synapse get. This PR would be for the larger initiative to support .version across the whole codebase. I'm pretty sure I can patch up syncToSynapse without needing to change the others. The only thing I don't like about this is that I think syncTo/FromSynapse should always have very similar functionalities, since a user who uses one will probably use the other. And for .version to be supported in one and not the other is confusing.
  2. Update the documentation letting people know that their input id may not be reflected in the output Entity object, and that the version is its own property. But this isn't ideal since I think it defeats the purpose of supporting .version notation in the input.

Aside from this, I think this PR is in a good state for a final review.

Let me know if there are other ideas @thomasyu888 @BryanFauble

@BryanFauble
Copy link
Contributor

The get API endpoint returns this object: https://rest-docs.synapse.org/rest/org/sagebionetworks/repo/model/FileEntity.html.

I'm thinking of handling this in two ways:

1. Start a new PR to fix `syncToSynapse` that excludes .version support for `syncFromSynapse`, `syn.get` and the CLI `synapse get`. This PR would be for the larger initiative to support .version across the whole codebase. I'm pretty sure I can patch up `syncToSynapse` without needing to change the others. The only thing I don't like about this is that I think `syncTo/FromSynapse` should always have very similar functionalities, since a user who uses one will probably use the other. And for .version to be supported in one and not the other is confusing.
2. Update the documentation letting people know that their input `id` may not be reflected in the output Entity object, and that the `version` is its own property. But this isn't ideal since I think it defeats the purpose of supporting .version notation in the input.

I agree that it'll be confusing, but IMO we should let folks use .version notation for inputs (As in support it), but don't return that format. I'm thinking that we should always return back 1 expected object format from our APIs. By this I mean we have an ID field for syn123 and we have a versionNumber field denoting what version the entity is. I want those using the API to have confidence that all the information we're returning will always be consistent.

I also put together the following code to show explicit examples of what you are talking about with syn.get and what you are currently supporting. I am in agreement with how you implemented the changes.

import synapseclient
from synapseclient import Entity


syn = synapseclient.Synapse()
syn.login()

MY_ENTITY = "syn53369103"

# Works
by_id = syn.get(MY_ENTITY, downloadFile=False)
print(f"by_id: \n{by_id}")

# Works
by_entity = syn.get(Entity(id=MY_ENTITY), downloadFile=False)
print(f"by_entity: \n{by_entity}")

# Works
by_id_and_version = syn.get(f"{MY_ENTITY}.10", downloadFile=False)
print(f"by_id_and_version: \n{by_id_and_version}")

# Works - uses version 11 instead of 10
by_id_and_version_input_argument = syn.get(
    f"{MY_ENTITY}.10", version=11, downloadFile=False
)
print(f"by_id_and_version_input_argument: \n{by_id_and_version_input_argument}")

# Works
by_entity_and_version_field = syn.get(
    Entity(id=f"{MY_ENTITY}", versionNumber=10), downloadFile=False
)
print(f"by_entity_and_version_field: \n{by_entity_and_version_field}")

# Does not work
by_entity_and_version_in_id = syn.get(Entity(id=f"{MY_ENTITY}.10"), downloadFile=False)
print(f"by_entity_and_version_in_id: \n{by_entity_and_version_in_id}")

For a final version, should we support creating an Entity object like: Entity(id=f"{MY_ENTITY}.10")? Specifically I am thinking that we automatically split .10 from the ID and put it into the versionNumber field. Handle conflicts in the same way you are in .get().

@jaymedina
Copy link
Collaborator Author

For a final version, should we support creating an Entity object like: Entity(id=f"{MY_ENTITY}.10")? Specifically I am thinking that we automatically split .10 from the ID and put it into the versionNumber field. Handle conflicts in the same way you are in .get().

I think that makes sense, and yes I can recycle the logic that splits up the synid from the version number. Are you suggesting folding this into the PR as well?

@jaymedina
Copy link
Collaborator Author

We've agreed on how I implemented the changes for syn.get, synapse get (CLI), syncTo/FromSynapse + tests pass, and I've completed the checklist, so I'll mark this as RFR and continue addressing comments as they come. Thank you Bryan and Tom for your comments so far.

@jaymedina jaymedina marked this pull request as ready for review January 25, 2024 18:42
@jaymedina jaymedina requested a review from a team as a code owner January 25, 2024 18:42
@BryanFauble
Copy link
Contributor

For a final version, should we support creating an Entity object like: Entity(id=f"{MY_ENTITY}.10")? Specifically I am thinking that we automatically split .10 from the ID and put it into the versionNumber field. Handle conflicts in the same way you are in .get().

I think that makes sense, and yes I can recycle the logic that splits up the synid from the version number. Are you suggesting folding this into the PR as well?

@jaymedina It's an option, but I don't think it's required for us to support at this exact moment.

@jaymedina
Copy link
Collaborator Author

For a final version, should we support creating an Entity object like: Entity(id=f"{MY_ENTITY}.10")? Specifically I am thinking that we automatically split .10 from the ID and put it into the versionNumber field. Handle conflicts in the same way you are in .get().

I think that makes sense, and yes I can recycle the logic that splits up the synid from the version number. Are you suggesting folding this into the PR as well?

@jaymedina It's an option, but I don't think it's required for us to support at this exact moment.

I've added this item to the Jira ticket for broader .version support across synpy: https://sagebionetworks.jira.com/browse/SYNPY-699

Copy link
Contributor

@BryanFauble BryanFauble left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@jaymedina
Copy link
Collaborator Author

jaymedina commented Jan 25, 2024

I need to leave this open for now because the syncFromSynapse and syn.get calls for synid.version input returns an empty etag for some reason. Need to investigate why.

image

@BryanFauble
Copy link
Contributor

I need to leave this open for now because the syncFromSynapse and syn.get calls for synid.version input returns an empty etag for some reason. Need to investigate why.
image

@jaymedina I believe 00000s is the intended eTag when you are grabbing the non-latest version of an Entity.

@jaymedina
Copy link
Collaborator Author

jaymedina commented Jan 25, 2024

That's right! Whew. Thanks! I'll go ahead and merge this now

image

@jaymedina jaymedina force-pushed the SYNPY-1398-support-synid-version branch from 067a29d to f02ec28 Compare January 25, 2024 22:27
Copy link

sonarcloud bot commented Jan 25, 2024

Quality Gate Passed Quality Gate passed

The SonarCloud Quality Gate passed, but some issues were introduced.

6 New issues
0 Security Hotspots
96.9% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarCloud

@jaymedina jaymedina merged commit eb01121 into develop Jan 25, 2024
38 checks passed
Copy link
Member

@thomasyu888 thomasyu888 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jaymedina Just wanted to say this is great work and great reviews by @BryanFauble . I left some nits just for future reference, but no action required. Thanks everyone!

"""
if utils.is_synapse_id_str(entity) or is_synapse_entity(entity):
return self.restGET("/entity/%s/benefactor" % id_of(entity))
synid, _ = utils.get_synid_and_version(entity)
return self.restGET("/entity/%s/benefactor" % synid)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: in the future, when we see % lets switch to f-string interpolation.

if "versionNumber" in obj:
version = obj["versionNumber"]

return id, version
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Just a thought for the future, but tuples can be a bit risky because the order really matters. Look into namedtuples for something more explicit.

# https://sagebionetworks.jira.com/browse/SYNPY-1425


def test_get_synid_and_version():
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Not to over-engineer the solution, but in the future, these are technically different test cases. For example, right now it would be harder to determine if Test 2 or Test 3 failed, but if you split them into separate tests OR use pytest.parametrize (it it makes sense to) then it'd be easier to navigate.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants