QDB-16881 - Exception when reading empty table with qdbpd.read_dataframe #101

igorniebylski · 2025-06-23T07:43:26Z

Story details: https://app.shortcut.com/quasardb/story/16881

Copilot

Pull Request Overview

This PR addresses a ValueError when calling qdbpd.read_dataframe on an existing but empty table by forcing evaluation of the generator and returning an empty DataFrame.

Modified read_dataframe to materialize the generator and handle empty results.
Added a new test test_read_dataframe_empty_table_sc16881 to ensure empty tables yield an empty DataFrame.

Reviewed Changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.

File	Description
tests/test_pandas.py	Added `test_read_dataframe_empty_table_sc16881`.
quasardb/pandas/init.py	Changed `read_dataframe` to `list(...)` the stream and return empty DataFrame if no batches.

Comments suppressed due to low confidence (1)

tests/test_pandas.py:686

Consider adding assertions to verify that the empty DataFrame has the expected column names (e.g., ['d'] or including the timestamp column) to strengthen this test.

    assert df.empty

Copilot · 2025-06-23T07:45:42Z

quasardb/pandas/__init__.py

+    # we need to evaluate the generator first, and then concatenate if result set is not empty.
+    dfs = list(stream_dataframe(conn, table, **kwargs))
+
+    if len(dfs) == 0:


[nitpick] Use if not dfs: for a more idiomatic and concise empty-list check in Python.

Suggested change

if len(dfs) == 0:

if not dfs:

if len(dfs) == 0: is a check taken from pandas.concat

Copilot · 2025-06-23T07:45:42Z

quasardb/pandas/__init__.py

+    dfs = list(stream_dataframe(conn, table, **kwargs))
+
+    if len(dfs) == 0:
+        return pd.DataFrame()


Returning an empty DataFrame without any columns may break downstream code expecting the original table schema; consider initializing with the table's column names to preserve schema.

same behavior as qdbpd.query function

Copilot · 2025-06-23T07:45:42Z

quasardb/pandas/__init__.py

+    # stream_dataframe is *lazy* - is a generator, not a list.
+    # if result of stream_dataframe is empty this could result in ValueError on pd.concat()
+    # we need to evaluate the generator first, and then concatenate if result set is not empty.


[nitpick] These three comment lines could be consolidated into a single concise remark to reduce verbosity and improve readability.

Suggested change

# stream_dataframe is *lazy* - is a generator, not a list.

# if result of stream_dataframe is empty this could result in ValueError on pd.concat()

# we need to evaluate the generator first, and then concatenate if result set is not empty.

# stream_dataframe is lazy (a generator, not a list). Evaluate it first to avoid ValueError on pd.concat() if the result is empty.

solatis · 2025-06-24T03:43:21Z

quasardb/pandas/__init__.py

+    # stream_dataframe is *lazy* - is a generator, not a list.
+    # if result of stream_dataframe is empty this could result in ValueError on pd.concat()
+    # we need to evaluate the generator first, and then concatenate if result set is not empty.
+    dfs = list(stream_dataframe(conn, table, **kwargs))


This pulls everything into memory at once -- will this cause additional memory consumption when used in conjunction with pd.concat ?

The beautiful thing about using a lazy-evaluated generator for pd.concat is that we don't need to pull all individual dataframes into memory before pd.concat merges them into a single dataframe.

Assuming that pd.concat does not copy dataframes, this does not matter. If it does copy the data, this will suddenly cause a doubling of the required memory.

Could you look into this?

Pondering further, we really only need to know whether there's at least 1 item -- we really don't need to evaluate all of them.

Is there perhaps a more elegant way to inspect lazily with functools or some other generator helper function/library whether there's at least one item?

by default pd.concat() will make a copy of the data. we can specify copy argument to False and override this behavior https://pandas.pydata.org/docs/reference/api/pandas.concat.html

as for the second comment i didn't find an easy way to peek a generator

we can manually call next() once on the generator, check if first element returns or raises StopIteration exception (or we can specify default value if generator reaches an end). if there is no exception we can use itertools.chain to combine first (already evaluated) element with rest of the generator and pass this new "chain" to pd.concat()
https://stackoverflow.com/questions/661603/how-do-i-know-if-a-generator-is-empty-from-the-start

just wrap pd.concat() with try|except ValueError block

Ok, this should be fixed:

should not make a copy, should take the dataframes by reference instead (= faster);

I think try/except this specific exception is the best approach, it's the simplest

i agree, lets not over complicate this

solatis

Left comment, would be good to clarify upon that.

igorniebylski added 2 commits June 23, 2025 09:35

evaluate stream_dataframe before concat and check if its empty

fd21999

add better test description

69679dc

igorniebylski requested review from Copilot and solatis June 23, 2025 07:43

Copilot AI reviewed Jun 23, 2025

View reviewed changes

solatis reviewed Jun 24, 2025

View reviewed changes

simplify

257993c

igorniebylski requested a review from solatis June 24, 2025 10:54

solatis approved these changes Jun 24, 2025

View reviewed changes

igorniebylski merged commit 3676ac7 into master Jun 24, 2025
2 checks passed

igorniebylski deleted the sc-16881/exception-when-reading-empty-table-with-qdbpd-read-dataframe branch June 24, 2025 14:03

QDB-16881 - Exception when reading empty table with qdbpd.read_dataframe #101

QDB-16881 - Exception when reading empty table with qdbpd.read_dataframe #101

Uh oh!

Conversation

igorniebylski commented Jun 23, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Copilot AI Jun 23, 2025

Choose a reason for hiding this comment

Uh oh!

igorniebylski Jun 23, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jun 23, 2025

Choose a reason for hiding this comment

Uh oh!

igorniebylski Jun 23, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jun 23, 2025

Choose a reason for hiding this comment

Uh oh!

solatis Jun 24, 2025

Choose a reason for hiding this comment

Uh oh!

solatis Jun 24, 2025

Choose a reason for hiding this comment

Uh oh!

igorniebylski Jun 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

solatis Jun 24, 2025

Choose a reason for hiding this comment

Uh oh!

igorniebylski Jun 24, 2025

Choose a reason for hiding this comment

Uh oh!

solatis left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

igorniebylski Jun 24, 2025 •

edited

Loading