Skip to content

[SYNPY-1794] Modified EntityView creation of strings and list cols#1340

Merged
andrewelamb merged 8 commits intodevelopfrom
SYNPY-1794
Mar 20, 2026
Merged

[SYNPY-1794] Modified EntityView creation of strings and list cols#1340
andrewelamb merged 8 commits intodevelopfrom
SYNPY-1794

Conversation

@andrewelamb
Copy link
Copy Markdown
Contributor

@andrewelamb andrewelamb commented Mar 19, 2026

Problem:

When creating an EntityView for file-based validation, the max size was being set for string and list-string columns. Users were running into these limits. In addition, a use-case for finding the type of a column when the type was an array of possible types wasn't being accounted for:

"type": ["string", "null"]

Solution:

When creating the columns for the EntityView:

  • If the type is a string, or can't be determined, the Synapse type is set as MEDIUMTEXT
  • If the type is an array of types (["string", "null"]) and there is only one non null type, that type is used, otherwise the Synapse type is set as MEDIUMTEXT
  • The maximum_size parameter is no longer used

Testing:

Unit tests for new and changed functions extended

@andrewelamb andrewelamb requested a review from a team as a code owner March 19, 2026 17:31
@andrewelamb andrewelamb marked this pull request as draft March 19, 2026 17:31
@andrewelamb andrewelamb changed the title modified entotyview creation of strings and list cols [SYNPY-1794] Modified EntityView creation of strings and list cols Mar 19, 2026
@andrewelamb andrewelamb marked this pull request as ready for review March 19, 2026 19:46
@thomasyu888 thomasyu888 requested review from aditigopalan and linglp and removed request for aditigopalan March 19, 2026 20:49
Copy link
Copy Markdown
Member

@thomasyu888 thomasyu888 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do some of the jsonschema examples need to be updated in the test directory?

@andrewelamb
Copy link
Copy Markdown
Contributor Author

Do some of the jsonschema examples need to be updated in the test directory?

I don't think so, we're still expecting the same inputs to this.

Copy link
Copy Markdown
Member

@thomasyu888 thomasyu888 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔥 LGTM! From a product perspective, I ran this code

entity_view_id, task_id = create_file_based_metadata_task(
    synapse_client=syn,
    folder_id="syn74212632",          # Folder containing your data files
    curation_task_name="test different defaults", # Must be unique within the project
    instructions="Annotate each file with metadata according to the schema requirements.",
    attach_wiki=False,                 # Creates a wiki in the folder with the entity view (Defaults to False)
    entity_view_name="Animal Study Files View",
    schema_uri="HTAN2Organization-BulkWESLevel1-1.3.0",             # Schema found in Step 2
)
print(f"Created EntityView: {entity_view_id}")
print(f"Created CurationTask: {task_id}")

And here's the schema

Image

I'll defer to @linglp for final review

@thomasyu888
Copy link
Copy Markdown
Member

thomasyu888 commented Mar 20, 2026

@andrewelamb should the tests be re-run? May want to pull from upstream as tests have been fixed in develop (i think)

Copy link
Copy Markdown
Contributor

@linglp linglp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Just a minor comment.

@andrewelamb andrewelamb merged commit fa8f6fa into develop Mar 20, 2026
21 checks passed
@andrewelamb andrewelamb deleted the SYNPY-1794 branch March 20, 2026 19:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants