Skip to content

[GEN-2381] Pandas handling of nullable cells#1272

Merged
BryanFauble merged 12 commits intodevelopfrom
gen-2381-table-typing
Nov 20, 2025
Merged

[GEN-2381] Pandas handling of nullable cells#1272
BryanFauble merged 12 commits intodevelopfrom
gen-2381-table-typing

Conversation

@BryanFauble
Copy link
Copy Markdown
Member

@BryanFauble BryanFauble commented Nov 6, 2025

  • Verified that the system correctly handles null values during upserts and that the data is stored and retrieved accurately.

Problem:

  • When querying for, inserting, or updating cells of data that can be nullable things start to break as described in https://sagebionetworks.jira.com/browse/GEN-2381
  • Using convert_dtypes introduces int64 data not serialized by json error and attributes not matching error in integration tests such as StringDtype vs object.

Solution:

  • Use the suggestion from @danlu1 to use both the convert_dtypes and the dtype argument when reading in a CSV to pandas DF

Testing:

  • Unit tests and integration tests are extended and run though successfully.
  • More testing within Genie is needed

…ts and that the data is stored and retrieved accurately.\
@BryanFauble BryanFauble requested review from danlu1 and rxu17 November 6, 2025 18:44
Comment thread synapseclient/models/mixins/table_components.py
Comment thread synapseclient/models/mixins/table_components.py Outdated
Comment thread synapseclient/models/mixins/table_components.py
Comment thread synapseclient/models/mixins/table_components.py Outdated
Comment thread synapseclient/models/mixins/table_components.py Outdated
Comment thread synapseclient/models/mixins/table_components.py Outdated
Comment thread synapseclient/models/mixins/table_components.py Outdated
@danlu1 danlu1 marked this pull request as ready for review November 13, 2025 23:46
@danlu1 danlu1 requested a review from a team as a code owner November 13, 2025 23:46
Comment thread synapseclient/models/mixins/table_components.py Outdated
Comment thread synapseclient/models/mixins/table_components.py Outdated
pd.testing.assert_series_equal(
results["column_string"], pd.DataFrame(dict_data)["column_string"]
results["column_string"],
pd.DataFrame(dict_data)["column_string"].convert_dtypes(),
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For discussion: Should expected dataframes in these tests be created with the dtypes set so that we are testing the fact that convert_dtypes is being done?

The concern here is if there are issues with "convert_dtypes" function, it won't be caught because we are using the function itself.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only reason I added convert_dtypes is to make data types match between columns. But I forgot that we can use check_dtype=False instead. I would prefer to use check_dtype because it would be a quicker fix.

Comment thread synapseclient/models/mixins/table_components.py Outdated
@BryanFauble BryanFauble requested a review from rxu17 November 18, 2025 16:35
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's still expected that the result of calling query will return object type for something that is integer type and has nulls.

Then using convert_dtypes() would convert it to the correct pandas dtype (in this case Int64)What we are trying to prevent was that previously query would return something from a table that is integer with nulls -> float.

If that is the case, then i think this is good to go for the genie scenario.

Copy link
Copy Markdown
Contributor

@rxu17 rxu17 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Just a comment

* Adding support for Python 3.14 and dropping support for python 3.9
@BryanFauble BryanFauble merged commit ee64f1d into develop Nov 20, 2025
16 of 20 checks passed
@thomasyu888 thomasyu888 deleted the gen-2381-table-typing branch April 16, 2026 03:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants