Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

duckdb_from_timestamp throws for infinity and -infinity #10138

Closed
1 task done
Giorgi opened this issue Jan 4, 2024 · 16 comments · Fixed by #10157
Closed
1 task done

duckdb_from_timestamp throws for infinity and -infinity #10138

Giorgi opened this issue Jan 4, 2024 · 16 comments · Fixed by #10157

Comments

@Giorgi
Copy link
Contributor

Giorgi commented Jan 4, 2024

What happens?

When trying to read infinity and -infinity timestamps with C Api, duckdb_from_timestamp throws an exception

To Reproduce

    duckdb_database db;
    duckdb_connection con;
    duckdb_state state;
    duckdb_result result;

    if (duckdb_open(NULL, &db) == DuckDBError) {
        // handle error
    }
    if (duckdb_connect(db, &con) == DuckDBError) {
        // handle error
    }

    state = duckdb_query(con, "SELECT timestamp_array FROM test_all_types()", &result);
    duckdb_data_chunk chunk = duckdb_result_get_chunk(result, 0);
    idx_t row_count = duckdb_data_chunk_get_size(chunk);
    duckdb_vector vector = duckdb_data_chunk_get_vector(chunk, 0);
    duckdb_vector childVector= duckdb_list_vector_get_child(vector);

    duckdb_timestamp* timeData = (duckdb_timestamp *)duckdb_vector_get_data(childVector);

    //Works fine for the first element which is 1970, 1, 1 but throws for second and third
    duckdb_timestamp_struct timestampStruct = duckdb_from_timestamp(timeData[0]);
    timestampStruct = duckdb_from_timestamp(timeData[1]);

// cleanup
    duckdb_destroy_data_chunk(&chunk);
    duckdb_destroy_result(&result);
    duckdb_disconnect(&con);
    duckdb_close(&db);

image
image

OS:

Windows 11

DuckDB Version:

0.9.2

DuckDB Client:

C

Full Name:

Giorgi Dalakishvili

Affiliation:

Space International

Have you tried this on the latest main branch?

I have tested with a release build (and could not test with a main build)

Have you tried the steps to reproduce? Do they include all relevant data and configuration? Does the issue you report still appear there?

  • Yes, I have
@Giorgi Giorgi changed the title duckdb_from_timestamp thows for infinity and -infinity duckdb_from_timestamp throws for infinity and -infinity Jan 4, 2024
@Giorgi
Copy link
Contributor Author

Giorgi commented Jan 4, 2024

I can also successfully read the last element:

image

@hawkfish
Copy link
Contributor

hawkfish commented Jan 4, 2024

I think the problem here is that while duckdb_timestamp can represent the infinite values (they are just the extreme µs counts) the duckdb_timestamp_struct tries to split it into duckdb_date_struct and duckdb_time_struct, which is not possible. You should probably check for infinite values before attempting to break it apart like that.

@hawkfish
Copy link
Contributor

hawkfish commented Jan 4, 2024

I think the real issue is that we need duckdb_is_finite_timestamp(duckdb_timestamp) and duckdb_is_finite_date(duckdb_date) APIs.

@Giorgi
Copy link
Contributor Author

Giorgi commented Jan 4, 2024

In that case what are the minimum and maximum values that are supported by duckdb_timestamp_struct? I don't really need to read these infinity values, I was adding test_all_types based tests to the C# library and encountered this behaviour.

@Giorgi
Copy link
Contributor Author

Giorgi commented Jan 4, 2024

I think the real issue is that we need duckdb_is_finite_timestamp(duckdb_timestamp) and duckdb_is_finite_date(duckdb_date) APIs.

I don't have same issue with duckdb_date and can read infinity dates with the C API. I have issue only with timestamps.

Also, how does Julia read infinite timestamps?

df.timestamp_array,

@hawkfish
Copy link
Contributor

hawkfish commented Jan 4, 2024

Not familiar with Julia, but the values are in the timestamp.hpp header

@hawkfish
Copy link
Contributor

hawkfish commented Jan 4, 2024

I don't have same issue with duckdb_date and can read infinity dates with the C API. I have issue only with timestamps.

That's probably because you are not trying to split it into two parts?

@Giorgi
Copy link
Contributor Author

Giorgi commented Jan 4, 2024

I call duckdb_from_date and it gives me correct value for infinite dates. I don't split anything myself in either case.

@hawkfish
Copy link
Contributor

hawkfish commented Jan 4, 2024

Ah well that's a bug! Date::Convert should be checking and it isn't.

@hawkfish
Copy link
Contributor

hawkfish commented Jan 4, 2024

I call duckdb_from_date and it gives me correct value for infinite dates. I don't split anything myself in either case.

duckdb_from_date splits the day count into year/month/day. The values you get back are not really meaningful.

@Giorgi
Copy link
Contributor Author

Giorgi commented Jan 4, 2024

I'm not sure what you mean by meaningful but the dates that I get back from duckdb_from_date for infinity dates are the same that Julia tests use so if it's not meaningful for me, it's not meaningful for Julia tests either.

C# test: https://github.com/Giorgi/DuckDB.NET/blob/40d5d7aa1db9d6eb2ffc0326c1ad16cf4aa6aa15/DuckDB.NET.Test/DuckDBDataReaderTestAllTypes.cs#L296

VerifyDataList<DuckDBDateOnly>("date_array", 34, new List<List<DuckDBDateOnly?>> { new(), new()
{
    new DuckDBDateOnly(1970, 1, 1),
    new DuckDBDateOnly(5881580, 7, 11),
    new DuckDBDateOnly(-5877641, 6, 24),
    null,
    new DuckDBDateOnly(2022,5,12),
} });

Julia test:

@test isequal(
        df.date_array,
        [
            [],
            [
                Dates.Date(1970, 1, 1),
                Dates.Date(5881580, 7, 11),
                Dates.Date(-5877641, 6, 24),
                missing,
                Dates.Date(2022, 5, 12)
            ],
            missing
        ]
    )

@hawkfish
Copy link
Contributor

hawkfish commented Jan 4, 2024

I mean that infinite dates can't be broken apart into year/month/day. The fact that the Julia tests work just means those tests are broken.

@Giorgi
Copy link
Contributor Author

Giorgi commented Jan 4, 2024

I think all_types_test should have a parameter to indicate to return meaningful dates. The values returned currently cannot be represented in most programming languages.

@hawkfish
Copy link
Contributor

hawkfish commented Jan 4, 2024

They can be represented in Postgres SQL. Usually we translate them into language-specific types (e.g., Python min and max dates). But when that is not possible, we usually adjust the test query (e.g., Java does not support TIME values of 24:00:00).

As the Java example shows, we have no idea what the calling language supports, so a simple flag is not really possible. Another example is Python, where we support a wider range of valid dates than the datetime library. Or the HUGEINT type, which has to get translated to floating point in some languages, And so on.

This means that the onus is on the application to translate these types into something meaningful, just like they have to handle NULL values (e.g., C doesn't support NULL for integers). The basic C bindings produce all values - your problem came when you tried to use a helper function to break up an infinite date into parts. We should provide APIs that let you handle this situation (which could come up in any context), hence my original comment about needing special helper functions.

@Giorgi
Copy link
Contributor Author

Giorgi commented Jan 4, 2024

Python doesn't support these dates either:

# We replace these values since the extreme ranges are not supported in native-python.

If the purpose of test_all_types is to test various bindings, why not add a flag to return only dates that can be represented in these languages? But that's a conversation for a new discussion.

@Mytherin
Copy link
Collaborator

Mytherin commented Jan 5, 2024

I think the real issue is that we need duckdb_is_finite_timestamp(duckdb_timestamp) and duckdb_is_finite_date(duckdb_date) APIs.

I don't have same issue with duckdb_date and can read infinity dates with the C API. I have issue only with timestamps.

Also, how does Julia read infinite timestamps?

df.timestamp_array,

duckdb_from_timestamp and duckdb_to_timestamp are helper functions that are not used by Julia. The duckdb_timestamp value is defined as microseconds since epoch (1970-01-01), we convert this value directly to the internal Julia representation for timestamps (which is similar).

I agree with @hawkfish that we should add duckdb_is_finite_timestamp/duckdb_is_finite_date to the C API, that seems useful.

hawkfish added a commit to hawkfish/duckdb that referenced this issue Jan 5, 2024
Add two C APIs for detecting whether a date or a timestamp is finite.

fixes: duckdb#10138
fixes: duckdblabs/duckdb-internal#1005
hawkfish added a commit to hawkfish/duckdb that referenced this issue Jan 8, 2024
PR feedback: Add tests and fix API argument.
hawkfish added a commit to hawkfish/duckdb that referenced this issue Jan 9, 2024
Add tests for non-infinite values.
Mytherin added a commit that referenced this issue Jan 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants