-
Notifications
You must be signed in to change notification settings - Fork 0
QDB-16881 - Exception when reading empty table with qdbpd.read_dataframe #101
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
QDB-16881 - Exception when reading empty table with qdbpd.read_dataframe #101
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR addresses a ValueError when calling qdbpd.read_dataframe on an existing but empty table by forcing evaluation of the generator and returning an empty DataFrame.
- Modified
read_dataframeto materialize the generator and handle empty results. - Added a new test
test_read_dataframe_empty_table_sc16881to ensure empty tables yield an empty DataFrame.
Reviewed Changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
| tests/test_pandas.py | Added test_read_dataframe_empty_table_sc16881. |
| quasardb/pandas/init.py | Changed read_dataframe to list(...) the stream and return empty DataFrame if no batches. |
Comments suppressed due to low confidence (1)
tests/test_pandas.py:686
- Consider adding assertions to verify that the empty DataFrame has the expected column names (e.g.,
['d']or including the timestamp column) to strengthen this test.
assert df.empty
quasardb/pandas/__init__.py
Outdated
| # we need to evaluate the generator first, and then concatenate if result set is not empty. | ||
| dfs = list(stream_dataframe(conn, table, **kwargs)) | ||
|
|
||
| if len(dfs) == 0: |
Copilot
AI
Jun 23, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nitpick] Use if not dfs: for a more idiomatic and concise empty-list check in Python.
| if len(dfs) == 0: | |
| if not dfs: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if len(dfs) == 0: is a check taken from pandas.concat
| dfs = list(stream_dataframe(conn, table, **kwargs)) | ||
|
|
||
| if len(dfs) == 0: | ||
| return pd.DataFrame() |
Copilot
AI
Jun 23, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Returning an empty DataFrame without any columns may break downstream code expecting the original table schema; consider initializing with the table's column names to preserve schema.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same behavior as qdbpd.query function
quasardb/pandas/__init__.py
Outdated
| # stream_dataframe is *lazy* - is a generator, not a list. | ||
| # if result of stream_dataframe is empty this could result in ValueError on pd.concat() | ||
| # we need to evaluate the generator first, and then concatenate if result set is not empty. |
Copilot
AI
Jun 23, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nitpick] These three comment lines could be consolidated into a single concise remark to reduce verbosity and improve readability.
| # stream_dataframe is *lazy* - is a generator, not a list. | |
| # if result of stream_dataframe is empty this could result in ValueError on pd.concat() | |
| # we need to evaluate the generator first, and then concatenate if result set is not empty. | |
| # stream_dataframe is lazy (a generator, not a list). Evaluate it first to avoid ValueError on pd.concat() if the result is empty. |
quasardb/pandas/__init__.py
Outdated
| # stream_dataframe is *lazy* - is a generator, not a list. | ||
| # if result of stream_dataframe is empty this could result in ValueError on pd.concat() | ||
| # we need to evaluate the generator first, and then concatenate if result set is not empty. | ||
| dfs = list(stream_dataframe(conn, table, **kwargs)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This pulls everything into memory at once -- will this cause additional memory consumption when used in conjunction with pd.concat ?
The beautiful thing about using a lazy-evaluated generator for pd.concat is that we don't need to pull all individual dataframes into memory before pd.concat merges them into a single dataframe.
Assuming that pd.concat does not copy dataframes, this does not matter. If it does copy the data, this will suddenly cause a doubling of the required memory.
Could you look into this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pondering further, we really only need to know whether there's at least 1 item -- we really don't need to evaluate all of them.
Is there perhaps a more elegant way to inspect lazily with functools or some other generator helper function/library whether there's at least one item?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
by default pd.concat() will make a copy of the data. we can specify copy argument to False and override this behavior https://pandas.pydata.org/docs/reference/api/pandas.concat.html
as for the second comment i didn't find an easy way to peek a generator
- we can manually call next() once on the generator, check if first element returns or raises
StopIterationexception (or we can specify default value if generator reaches an end). if there is no exception we can useitertools.chainto combine first (already evaluated) element with rest of the generator and pass this new "chain" topd.concat()
https://stackoverflow.com/questions/661603/how-do-i-know-if-a-generator-is-empty-from-the-start - just wrap
pd.concat()withtry|except ValueErrorblock
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, this should be fixed:
- should not make a copy, should take the dataframes by reference instead (= faster);
- I think try/except this specific exception is the best approach, it's the simplest
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i agree, lets not over complicate this
solatis
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left comment, would be good to clarify upon that.
Story details: https://app.shortcut.com/quasardb/story/16881