Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SYNPY-1357] Allow multiple values in manifest TSV #1030

Merged
merged 13 commits into from
Jan 12, 2024

Conversation

BryanFauble
Copy link
Contributor

@BryanFauble BryanFauble commented Dec 20, 2023

Problem:

  1. When using the syncFromSynapse function we were only storing the first indexed value for an annotation that had multiple values.
  2. When using the syncToSynapse function we had no way to specify that multiple values are supposed to be added to an Annotation.

Solution:

  1. During the syncFromSynapse function I am writing a comma delimited list wrapped in brackets for multiple annotation values. This is to match the expected format that works when syncing data to a FileView in Synapse.
  2. During the syncToSynapse function I am splitting fields by the comma delimited list wrapped in brackets to allow for multiple annotation values to be applied to an individual annotation key.

Testing:

  1. I updated unit tests and integration tests to test out this functionality.
  2. I manually tested this and verified when using either function I could update annotations, and they were updated. Also that no-changes did not modify the annotations as well:

image

This manifest TSV generated the screenshot below:

path	parent	name	id	synapseStore	contentType	used	executed	activityName	activityDescription	my_string_with_a_backslash	multi_value_annotation_string	multi_value_annotation_date	multi_value_booleans	multi_value_longs	multi_value_ints	single_value_int
/home/bfauble/my_synapse_downloads/ParentFolder/another_file.txt	syn52570245	file_1.txt	syn52570249	True	text/plain					my\ first string	[hello,my,annotation,"with quotes","with,commas,in,quotes"]	[2000-01-01T00:00:00.000Z,2001-01-01T01:00:00.000Z,2002-01-01T02:00:00.000Z]	[false,FalSe,tRuE,TRUE]	[1.2,3.4,5.6]	[1,2,3,4]	[1]

image

@BryanFauble BryanFauble requested a review from a team as a code owner December 20, 2023 18:06
@pep8speaks
Copy link

pep8speaks commented Dec 20, 2023

Hello @BryanFauble! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

Line 568:89: E501 line too long (99 > 88 characters)
Line 573:89: E501 line too long (90 > 88 characters)

Line 109:89: E501 line too long (89 > 88 characters)
Line 890:89: E501 line too long (108 > 88 characters)
Line 895:89: E501 line too long (91 > 88 characters)
Line 904:89: E501 line too long (90 > 88 characters)
Line 905:89: E501 line too long (89 > 88 characters)
Line 1167:89: E501 line too long (89 > 88 characters)
Line 1168:89: E501 line too long (90 > 88 characters)
Line 1177:89: E501 line too long (90 > 88 characters)
Line 1266:89: E501 line too long (89 > 88 characters)

Line 38:89: E501 line too long (233 > 88 characters)
Line 40:89: E501 line too long (374 > 88 characters)
Line 195:89: E501 line too long (104 > 88 characters)
Line 222:89: E501 line too long (89 > 88 characters)

Comment last updated at 2024-01-11 21:05:33 UTC

Copy link
Member

@thomasyu888 thomasyu888 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome start, left some comments.

synapseutils/sync.py Outdated Show resolved Hide resolved
synapseutils/sync.py Outdated Show resolved Hide resolved
synapseutils/sync.py Show resolved Hide resolved
docs/explanations/manifest_tsv.md Show resolved Hide resolved
synapseclient/core/utils.py Show resolved Hide resolved
synapseutils/sync.py Outdated Show resolved Hide resolved
synapseclient/core/utils.py Outdated Show resolved Hide resolved
self.row1 = (
'%s %s %s "%s;https://www.example.com" provName bar 2020-01-01 2023-12-04T07:00:00Z 2023-12-05 23:37:02.995000+00:00 2023-12-05 07:00:00+00:00\n'
'%s %s %s "%s;https://www.example.com" provName bar 2020-01-01 2023-12-04T07:00:00Z 2023-12-05 23:37:02+00:00 2023-12-05 07:00:00+00:00 a,b,c,d 2020-01-01,2023-12-04T07:00:00.111Z,2023-12-05 23:37:02.333+00:00,2023-12-05 07:00:00+00:00 fAlSe,False,tRuE,True 1,2,3,4 1.2,3.4,5.6,7.8\n'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have a test case of a string with escaped commas? What does that look like right now when theres a table value with 'my sentence, comma, has commas'?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added this.

This is what this looks like in google sheets for this situation:
image

synapseutils/sync.py Outdated Show resolved Hide resolved
"""
values_to_return = []

cell_values = re.split(pattern=COMMA_PATTERN, string=cell)
Copy link
Member

@thomasyu888 thomasyu888 Dec 21, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it confusing that setting multiple annotations via a fileview is different from setting multiple annotations via the entity itself?

Future work: do we need to standardize is this way? or is that overkill?

my_file_ent.multiple_annot = "my,multiple,annot"
my_file_ent.annot_with_comma = "my sentence\\,is broken"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The biggest difference is around escaping the comma in this case.

Edit annotation in file view:
image

Edit annotation on the File:
image

Annotation on the Manifest TSV:
Which yes is different from setting it via the manifest TSV:
image

Annotation in Python:
image

@BryanFauble
Copy link
Contributor Author

Adding comments here around internal discussions:

This implementation is going to require a major version update to the client as those users who might currently be using commas or backslashes within their annotations managed through the Manifest TSV file will need to escape those characters.

We will be meeting internally again in January to settle on these changes and propose the path forward for this, possibly tying this with the release to support only the personal access token for authentication and other ground work for future releases.

Copy link
Member

@thomasyu888 thomasyu888 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔥 nice! Going to approve here - we can create a ticket to revisit in January but this is a working solution.

From Milens comment in slack, it does seem like he wants to make it easier for users.

I think what this means is like you suggested, we should create a platform ticket to ask them how often "commas" are in annotations right now and then decide from there what is the least breaking change for all of our users. (Not just schematic)

Copy link

sonarcloud bot commented Jan 11, 2024

Quality Gate Passed Quality Gate passed

The SonarCloud Quality Gate passed, but some issues were introduced.

2 New issues
0 Security Hotspots
99.0% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarCloud

Copy link
Member

@thomasyu888 thomasyu888 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛸 LGTM! I gave most of my feedback during the first round of PR.

@BryanFauble BryanFauble merged commit fc96fd9 into develop Jan 12, 2024
38 checks passed
@BryanFauble BryanFauble deleted the SYNPY-1357-multiple-values-in-manifest branch January 12, 2024 16:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants