Skip to content

Conversation

linglp
Copy link
Contributor

@linglp linglp commented Aug 14, 2025

Problem:

Deprecate the following:

tableQuery
_queryTable
_queryTableNext
_uploadCsv
_check_table_transaction_response
_queryTableCsv
downloadTableColumns
_build_table_download_file_handle_list
_get_default_view_columns
_get_annotation_view_columns

Solution:

Map the following to new methods:

tableQuery -> query or query_async
_queryTable -> query_part_mask
_queryTableNext -> query_part_mask
_uploadCsv -> _chunk_and_upload_csv
_check_table_transaction_response -> internal function, no replacement
_uploadCSV -> store_rows_async
_queryTableCSV -> query_async (with downloadLocation parameter)
downloadTableColumns -> not seeing a direct replacement?
_build_table_download_file_handle_list -> internal function, no replacement
_get_default_view_columns -> internal function, no replacement
_get_annotation_view_columns -> internal function, no replacement

New data classes:

SumFileSizes
QueryResultOutput (created by using the old QueryBundleRequest)
Row
RowSet
SelectColumn
ActionRequiredCount
Query
QueryResult
QueryResultBundle
QueryNextPageToken
QueryJob
QueryBundleRequest

Testing:

  1. Make sure all the new data classes have unit tests
  2. Added TestQueryTableRowSet to test _query_table_row_set
  3. Added TestQueryTableNextPage to test _query_table_next_page
  4. Added TestQueryTableCsv to test _query_table_csv

list_columns = []
dtype = {}

for select_column in self.headers:
Copy link
Contributor Author

@linglp linglp Aug 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How to deal with self.headers?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

headers for this logic is coming from the response of this API:
https://rest-docs.synapse.org/rest/org/sagebionetworks/repo/model/table/DownloadFromTableResult.html

We shouldn't need to store this data onto the Table-like class, so once you get the DownloadFromTableResult, you should be able to pass the result everywhere it's needed like this asDataFrame method. It shouldn't need to be exposed to the user querying for data on the Synapse Tables.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I reviewed the current code and it looks like headers go through two transformations:

  1. In this section, the headers from DownloadFromTableResult (initially dictionaries) are converted into SelectColumn objects.

  2. Later, in this section and this one, the column headers are transformed again.

If we’re not storing this information in CsvResult (or CsvFileTable), do we still need to convert headers into SelectColumn objects? Or are we moving away from SelectColumn entirely and planning to just treat headers as dictionaries?

To match our current behavior, I have:

        class CsvResult:
            def __init__(self, file_path, include_row_id_and_row_version=True):
                self.file_path = file_path
                self.include_row_id_and_row_version = include_row_id_and_row_version

                if result and result.get("headers"):
                    headers = result.get("headers")
                    headers = [SelectColumn(**header) for header in headers]
                    self.headers = self.set_column_headers(headers)
                else:
                    self.headers = None

Based on your suggestion, it sounds like we don’t need self.headers at all. Instead, in asDataFrame we could just do something like:

if result.get("headers") is not None:
    headers = result["headers"]
    for column in headers:
        xxx

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only thing that we need to return back to the user in the case that they call
query_async -> The DataFrame, OR the path to the downloaded CSV
query_part_mask_async -> The results wrapped in the QueryResultBundle object

We do not have to maintain how the previous code worked at all for any of the intermediate objects or class types.

With that being said, in our case:

If we’re not storing this information in CsvResult (or CsvFileTable), do we still need to convert headers into SelectColumn objects? Or are we moving away from SelectColumn entirely and planning to just treat headers as dictionaries?

We do not need to maintain a CsvResult, SelectColumn, or any of the concepts the original table had implemented. In fact - The new Tables class also got rid of the SchemaBase and inheritance structure that was previous in place.

Based on your suggestion, it sounds like we don’t need self.headers at all. Instead, in asDataFrame we could just do something like:

if result.get("headers") is not None:
    headers = result["headers"]
    for column in headers:
        xxx

That is exactly right, we could do something like that.

In the end - If we are not exposing the interface to an end user, we have quite a bit more flexibility with how we need to maintain the "guts", or actual implementation of the function/method. However - Consider this, if it makes our lives easier for development, then there is no harm in creating dataclasses/classes for ourselves.

"quoteCharacter": self.quote_character,
"escapeCharacter": self.escape_character,
"lineEnd": self.line_end,
"isFirstLineHeader": self.is_file_line_header,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a typo in the original CsvTableDescriptor class. It should be is_first_line_header

headers: Optional[List[SelectColumn]] = None
"""The list of SelectColumns that describes the rows of this set."""

rows: Optional[List[Row]] = field(default_factory=list)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Initialize this as a list per conversation

@linglp linglp marked this pull request as ready for review September 2, 2025 22:04
@linglp linglp requested a review from a team as a code owner September 2, 2025 22:04
query_job_request = QueryJob(
entity_id=entity_id,
sql=query,
write_header=header,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The definition of write_header based on the documentaiton of DownloadFromTableRequest is: Should the first line contain the columns names as a header in the resulting file? Set to 'true' to include the headers else, 'false'. The default value is 'true'.

There's also a isFirstLineHeader parameter under CsvTableDescriptor. I think both parameters mean the same thing. As you can see here, I have both: is_first_line_header=header, and is_first_line_header=header

Copy link
Member

@thomasyu888 thomasyu888 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔥 LGTM! Im going to defer to @BryanFauble for final review, but thanks for doing the giant deprecation.

Will there be a tutorial page we are going to update to use the new functions?

@thomasyu888 thomasyu888 requested a review from Copilot September 3, 2025 15:38
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR deprecates multiple methods from the Synapse class and table.py module while introducing new table component data classes as part of modernizing the table querying API. The changes map deprecated methods to new implementations and provide comprehensive test coverage for the new functionality.

  • Deprecated 11 methods from synapseclient.client.Synapse class related to table operations
  • Added 11 new data classes for structured table operations: SumFileSizes, QueryResultOutput, Row, RowSet, SelectColumn, ActionRequiredCount, Query, QueryResult, QueryResultBundle, QueryNextPageToken, QueryJob, QueryBundleRequest
  • Migrated table query functionality to use new async-based implementations

Reviewed Changes

Copilot reviewed 14 out of 14 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
tests/unit/synapseclient/mixins/unit_test_table_components.py Added comprehensive unit tests for all new data classes and query functions
tests/integration/synapseclient/models/synchronous/test_table.py Updated test expectations to reflect changes in batch processing behavior
tests/integration/synapseclient/models/async/test_table_async.py Updated test expectations and refined spy behavior for async table operations
tests/integration/synapseclient/models/async/test_entityview_async.py Enhanced test to capture call stack information for better verification
synapseclient/table.py Added deprecation warnings to row_labels functions
synapseclient/models/table_components.py Added 11 new data classes with complete REST API mappings and type conversions
synapseclient/models/mixins/table_components.py Implemented new query functions and converted existing methods to use new data structures
synapseclient/models/mixins/asynchronous_job.py Added endpoint mappings for new query request types
synapseclient/models/init.py Exported new data classes for public API
synapseclient/core/constants/concrete_types.py Added concrete type constants for new REST API models
synapseclient/client.py Added deprecation decorators and migration examples to 11 deprecated methods
docs/reference/experimental/sync/table.md Added documentation references for new data classes
docs/reference/experimental/async/table.md Added documentation references for new data classes
.pre-commit-config.yaml Updated bandit version from 1.7.5 to 1.8.0

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

This result is modeled from: <https://rest-docs.synapse.org/rest/org/sagebionetworks/repo/model/table/QueryResultBundle.html>
"""

concrete_type: str = QUERY_TABLE_CSV_REQUEST
Copy link
Preview

Copilot AI Sep 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The default concrete_type for QueryResultBundle should be the bundle type, not the CSV request type. This should be QUERY_BUNDLE_REQUEST or a dedicated query result bundle constant, not QUERY_TABLE_CSV_REQUEST.

Suggested change
concrete_type: str = QUERY_TABLE_CSV_REQUEST
concrete_type: str = QUERY_BUNDLE_REQUEST

Copilot uses AI. Check for mistakes.

@BryanFauble
Copy link
Member

🔥 LGTM! Im going to defer to @BryanFauble for final review, but thanks for doing the giant deprecation.

Will there be a tutorial page we are going to update to use the new functions?

Yes, https://sagebionetworks.jira.com/browse/SYNPY-1377 should capture the work to write up the tutorial page.

@linglp Do you mind reviewing this jira and add onto the topic for anything that we should cover in addition to whats there (if anything),

Copy link
Member

@BryanFauble BryanFauble left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I appreciate all your hard work that you've put into this!

@linglp
Copy link
Contributor Author

linglp commented Sep 5, 2025

I appreciate all your hard work that you've put into this!

I appreciate your review. Thank you again, Bryan!

@linglp linglp merged commit 0b11782 into develop Sep 5, 2025
28 checks passed
@linglp linglp deleted the synpy-1632 branch September 5, 2025 14:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants