Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Python] Add support for +/- infinity datetime objects #8294

Merged
merged 7 commits into from
Jul 27, 2023

Conversation

Tishj
Copy link
Contributor

@Tishj Tishj commented Jul 18, 2023

This PR fixes #8143 and #8214

To fix 8214, we now add double quotes around the identifier when performing the internal lookup

To fix 8143 we now produce datetime.datetime.max and datetime.date.max values for INFINITY timestamps and dates respectively.
For -INFINITY we produce the .min variant of the datetime object.

This also adds support for reading those values as INFINITY and -INFINITY

@@ -405,12 +448,31 @@ py::object PythonObject::FromValue(const Value &val, const LogicalType &type) {
case LogicalTypeId::TIMESTAMP_TZ: {
D_ASSERT(type.InternalType() == PhysicalType::INT64);
auto timestamp = val.GetValueUnsafe<timestamp_t>();

InfinityType infinity = InfinityType::NONE;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm slightly concerned with the existing code that we can overflow Python's more limited date range. Is that handled elsewhere?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we do anything to prevent that no

def test_getattr_spaces(self):
rel = duckdb.sql('select 42 as "hello world"')
assert rel['hello world'].fetchall()[0][0] == 42
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not an issue, but If you use .editorconfig, it will add trailing line feeds to Python files.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should probably run a formatter/linter on Python code in the repo at this point, @Mause any thoughts on that?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think Elliana suggested we should use black to format the python code

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually have a pr with that I was looking to submit once the big python PR's had been merged (because otherwise we'll end up a bunch of conflicts)

@Mytherin Mytherin requested a review from Mause July 19, 2023 12:21
Copy link
Collaborator

@Mytherin Mytherin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR! LGTM. Perhaps @Mause wants to have a look as well?

Copy link
Member

@Mause Mause left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! Just a couple of minor comments

def test_getattr_spaces(self):
rel = duckdb.sql('select 42 as "hello world"')
assert rel['hello world'].fetchall()[0][0] == 42
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually have a pr with that I was looking to submit once the big python PR's had been merged (because otherwise we'll end up a bunch of conflicts)

@@ -434,6 +496,12 @@ py::object PythonObject::FromValue(const Value &val, const LogicalType &type) {

auto date = val.GetValueUnsafe<date_t>();
int32_t year, month, day;
if (!duckdb::Date::IsFinite(date)) {
if (date == date_t::infinity()) {
return py::module::import("datetime").attr("date").attr("max");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not caching the min/max values?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think our import cache does anything other than remove a few extra string comparisons.
We haven't measured this, but from working with the code for a while I am confident pybind caches the imports itself

'TIMESTAMP_MS',
'TIMESTAMP_S'
]:
# This query is not supported in core duckdb but doesn't deterministically produce an error
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any idea why it doesn't deterministically error?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No clue, I think it has to do with an earlier under/overflow from an unchecked expression.
I noticed INFINITY and -INFINITY are not really implemented for TIMESTAMP_S/MS/NS, as I am having to check for the truncated versions of the timestamp_t::infinity value.

struct timestamp_ms_t : public timestamp_t { // NOLINT
	static constexpr timestamp_t infinity() {
		return timestamp_t(int64_t(NumericLimits<int64_t>::Maximum() / 1000) * 1000);
	}
	static constexpr timestamp_t ninfinity() {
		return timestamp_t(int64_t(-NumericLimits<int64_t>::Maximum() / 1000) * 1000);
	}
};

So it might be a result of that

};
struct timestamp_ms_t : public timestamp_t { // NOLINT
static constexpr timestamp_t infinity() {
return timestamp_t(int64_t(NumericLimits<int64_t>::Maximum() / 1000) * 1000);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's always use timestamp_t::infinity.

@hawkfish could you perhaps have a look at this, it looks like there is an issue when casting infinity to non-microsecond timestamp types, e.g.:

D select 'infinity'::timestamp_ms as ts;
┌───────────────────────────┐
│            ts             │
│       timestamp_ms        │
├───────────────────────────┤
│ 294247-01-10 04:00:54.775 │
└───────────────────────────┘

@github-actions github-actions bot marked this pull request as draft July 21, 2023 12:09
@Tishj Tishj marked this pull request as ready for review July 23, 2023 15:20
@github-actions github-actions bot marked this pull request as draft July 24, 2023 14:14
@Tishj Tishj marked this pull request as ready for review July 24, 2023 14:55
@Mytherin Mytherin merged commit 2138aa9 into duckdb:master Jul 27, 2023
53 checks passed
@Mytherin
Copy link
Collaborator

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

TIMESTAMP field with infinity value breaks DuckDBPyRelation fetchall() method.
4 participants