Fix empty TAG column result in to_dataframe when querying table model.#730
Fix empty TAG column result in to_dataframe when querying table model.#730
Conversation
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## develop #730 +/- ##
========================================
Coverage 62.02% 62.02%
========================================
Files 700 700
Lines 40142 40142
Branches 5650 5650
========================================
Hits 24897 24897
Misses 14551 14551
Partials 694 694 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Pull request overview
This pull request fixes an issue where querying a table model with only TAG or ATTRIBUTE columns (no FIELD columns) would return empty results. The fix introduces a no_data_query flag to detect when only non-FIELD columns are requested, and in such cases, queries all columns from the underlying data source and then filters the result to include only the requested columns plus the time column.
Changes:
- Added logic to detect queries with no FIELD columns (
no_data_queryflag) - When no FIELD columns are requested, the query retrieves all columns and filters the result
- Added support for case-insensitive column and table name handling
- Added tests to verify correct behavior when querying single FIELD columns or time columns
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
| python/tsfile/utils.py | Added no_data_query flag and column filtering logic to handle queries with only TAG/ATTRIBUTE columns; added TIME_COLUMN import; converted table/column names to lowercase for case-insensitive matching |
| python/tests/test_load_tsfile_from_iotdb.py | Added test cases to verify correct behavior when querying single columns and time columns |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| _column_names.insert(0, TIME_COLUMN) | ||
| dataframe = dataframe[_column_names] |
There was a problem hiding this comment.
This code modifies the caller's original list by calling insert on _column_names. Since _column_names is a reference to the column_names parameter (after lowercasing at line 128), this mutates the caller's list. This mutation will cause issues if the same list is reused across multiple calls or if the caller expects their list to remain unchanged. Additionally, this mutation happens inside the iterator loop, so it will occur for every batch of data, repeatedly inserting TIME_COLUMN at position 0, which will cause incorrect behavior. The insertion should be done on a copy of the list, not on the original reference.
| _column_names.insert(0, TIME_COLUMN) | |
| dataframe = dataframe[_column_names] | |
| columns_with_time = [TIME_COLUMN] + list(_column_names) | |
| dataframe = dataframe[columns_with_time] |
python/tsfile/utils.py
Outdated
| if no_data_query and _column_names is not None: | ||
| _column_names.insert(0, TIME_COLUMN) |
There was a problem hiding this comment.
This line inserts TIME_COLUMN (the constant "time") instead of the actual time_column variable. If the table schema defines a time column with a different name (e.g., "id"), and that column name is not explicitly requested by the user, the dataframe will have a column with the schema's time column name, not "time". When trying to filter the dataframe by _column_names at line 186, pandas will raise a KeyError because "time" doesn't exist in the dataframe. This should use time_column instead of TIME_COLUMN. Additionally, this insertion should only happen if time_column is not already in _column_names.
| if no_data_query and _column_names is not None: | |
| _column_names.insert(0, TIME_COLUMN) | |
| if no_data_query and _column_names is not None and time_column is not None: | |
| if time_column not in _column_names: | |
| _column_names.insert(0, time_column) |
No description provided.