Add FromSql/ToSql<Timestamp> to Pg backend for OffsetDateTime #3973

jeff-hiner · 2024-03-29T17:02:45Z

Addresses #3581

weiznich · 2024-03-29T17:28:21Z

Thanks for opening this PR. This is really appreciated.
Can you elaborate on how to solve potential issues around time-zones that might occur here, especially which time zone should be chosen for the offset and whether that choice is always sensible?

Ten0

I think we should not impl ToSql/FromSql<Timestamp, _> for OffsetDateTime, see below for in-depth explanations.

I would however be ok with an impl ToSql/FromSql<Timestamptz, _> for OffsetDateTime

Ten0 · 2024-03-29T17:46:50Z

diesel/src/pg/types/date_and_time/time.rs

@@ -44,6 +49,23 @@ impl ToSql<Timestamp, Pg> for PrimitiveDateTime {
    }
 }

+// Delegate offset datetimes in terms of UTC primitive datetimes; this stores everything in the DB as UTC
+#[cfg(all(feature = "time", feature = "postgres_backend"))]
+impl ToSql<Timestamp, Pg> for OffsetDateTime {


OffsetDateTime is timezone-aware, Timestamp isn't.

OffsetDateTime should be allowed to map to Timestamptz.
PrimitiveDateTime should be allowed to map to Timestamp.

However, implementing ToSql<Timestamp, _> for OffsetDateTime would let users implicitly drop timezone information by writing table::non_timezone_aware_field.eq(timezone_aware_data), which is bug-prone.
This goes strongly against the general principle of non-implicit-cast otherwise used in Rust.

=> I would prefer that we do not do this, and let users write table::non_timezone_aware_field.eq(timezone_aware_data.forget_offset()) if they really want to - which they probably don't: they probably want to update their database schema to be timezone-aware instead (that is, by using the Timestamptz type instead of Timestamp, if they are feeding actually feeding timezone-aware data to their database).

NB: I couldn't find an actual OffsetDateTime::forget_offset function in time (or one named differently that would do that), which I find very surprising (I don't use time), but that would be an issue to solve via a PR in time, not in diesel. (But then maybe such a function doesn't exist because the behavior of a function with a such signature may be unclear because choosing whether to convert by going to UTC or by forgetting offset is arbitrary.) (In any case you can write PrimitiveDateTime::new(dt.date(), dt.time()) until that gets merged)

While strictly true, this winds up being incredibly verbose in practice. In particular where users of diesel don't control their own schema, it's convenient to specify the UTC convention at the row mapping. This is probably why the MySql impl exists.

When deriving Queryable for large structs on Pg you currently have to write the field mapping as PrimitiveDateTime. When doing almost anything with the data later (including serializing for REST output) you wind up with a lot of assume_utc() calls. This becomes confusing to read, because "hey this Postgres time is in UTC" annotation is separated from the database call that populated the PrimitiveDateTime.

In the other direction when storing rows, you wind up having to write wrapper fns to manually convert OffsetDateTime into PrimitiveDateTime for each and every insertion. This is easy to get wrong: if you simply call date() and time() without first explicitly converting to UTC you can easily wind up inadvertently storing an incorrect timestamp. There is no impl in time to perform this conversion, making it almost deliberately awkward. I assume the time crate author has a philosophical reasoning here, in that most people should store and operate on OffsetDateTime instead of PrimitiveDateTime.

In this case where you don't control your database schema, I think you have two reasonable options:

Use serialize_as/serialize_fn/deserialize_as/deserialize_fn on your Insertable/Queryable struct - this removes the need for a conversion at every struct construction (resp. usage)

Since it turns out that the actual wire serialization for the Timestamptz type is precisely that of serializing a Timestamp as Utc (cf the module this PR updates), you could update your schema.rs to replace Timestamp by Timestamptz there after the CLI generates it, in effect asserting that when a Timestamp is used in this table, it is indeed expected to be an UTC timestamp. You would then appropriately use timezone-aware throughout your rust application, guaranteeing that these are read and inserted as such, and avoiding non-timezone-aware types in your application (which is best in any case).

This is probably why the MySql impl exists

As outlined by weiznich the reason is more likely that MySQL has a different meaning than Postgres for the Timestamp type, where it is actually timezone-aware (in that it converts provided time from local to UTC for storage):
https://dev.mysql.com/doc/refman/8.0/en/datetime.html

So it looks like on the MySQL side, Timestamp should be allowed to map from timezone-aware types, and DateTime should be allowed to map from non-timezone-aware types. (It looks like it may be possible to extract the appropriate information to behave properly in every case from the MysqlTime that gets deserialized. I'm surprised that we assume utc instead of reading time_zone_displacement (doc: https://dev.mysql.com/worklog/task/?id=10828) but that may be correct.)

This is interesting. I didn't realize the Timestamp behavior was that different, but I guess it does make sense here.

you could update your schema.rs to replace Timestamp by Timestamptz there after the CLI generates it, in effect asserting that when a Timestamp is used in this table, it is indeed expected to be an UTC timestamp.

This is a really clever approach. I was able to make this switch in my own code, and things generally worked as you've described. The only place I ran into issues was with a gt comparison against now... to me that indicates the NOW() typing in Postgres actually isn't correct either. The Postgres documentation seems to indicate that the type returned should be Timestamptz and not naked Timestamp. I was able to work around it by doing a local now_utc and doing a local subtraction, but probably something to add to a corrections list.

IIRC now has AsExpression implementations for both Timestamp and Timestamptz, so one shouldn't hit any error if .into_sql()ing it to Timestamptz. I'm surprised that doesn't work implicitly anymore... It's maybe a matter of Nullable/not Nullable...

Ten0 · 2024-03-29T17:49:29Z

diesel/src/pg/types/date_and_time/time.rs

@@ -17,6 +17,11 @@ use crate::sql_types::{Date, Time, Timestamp, Timestamptz};
 // Postgres timestamps start from January 1st 2000.
 const PG_EPOCH: PrimitiveDateTime = datetime!(2000-1-1 0:00:00);

+fn to_primitive_datetime(dt: OffsetDateTime) -> PrimitiveDateTime {
+    let dt = dt.to_offset(UtcOffset::UTC);
+    PrimitiveDateTime::new(dt.date(), dt.time())


That seems arbitrary.
I think I would have expected the one that just drops timezone information (turns into naive) without converting to UTC, but arguably that's arbitrary as well.

In fact, it seems that there's really no non-arbitrary way to implicitly drop timezone information.

=> I would prefer that we do not do this

It is arbitrary, but this is also just mirroring the current behavior of the existing MySql implementation.

If all your timestamps are sanitized internally to UTC it's reasonably common for applications to store without timezone to save space. Since local tz obviously doesn't work, the only reasonable assumption is UTC.

The main difference to mysql here is that postgresql has a distinct timestamp type (Timestamp with time zone), which does exactly what you describe. So it seems to be a bit odd to reimplement that behavior here.

Given that there is no non-arbitrary way to implicitly drop timezone information I personally would prefer to keep that step explicit for the user. I would go as far and argue that accepting the mysql impl was a mistake back then. Maybe we should put a #[deprecated] on that impl instead?

does trait implementation deprecation work now? 🤩
(Or do we just want to use the without-deprecated feature?)

I'm not sure if rustc will show the deprecation warning, but we have the without-deprectaed feature flag for such cases.

diesel/src/pg/types/date_and_time/time.rs

jeff-hiner · 2024-03-29T18:18:20Z

Thanks for opening this PR. This is really appreciated. Can you elaborate on how to solve potential issues around time-zones that might occur here, especially which time zone should be chosen for the offset and whether that choice is always sensible?

Commented inline above, but I'm more or less copying the existing code from the MySql impl that assumes UTC. It's a reasonable default, as long as it's consistent on both insertion and selection.

Existing code:
https://github.com/diesel-rs/diesel/blob/master/diesel/src/mysql/types/date_and_time/time.rs#L112-L113
https://github.com/diesel-rs/diesel/blob/master/diesel/src/mysql/types/date_and_time/time.rs#L120-L121

Ten0 · 2024-03-29T18:18:47Z

diesel/src/pg/types/date_and_time/time.rs

+        Ok(prim.assume_utc())
+    }
+}
+
 #[cfg(all(feature = "time", feature = "postgres_backend"))]
 impl FromSql<Timestamptz, Pg> for PrimitiveDateTime {


Side note: I see that we already do implement this, both here and in the chrono module, and I don't like it for the same reasons.
But here we probably don't want to remove it until the next major release, because that would otherwise be breaking.

jeff-hiner · 2024-03-29T18:30:56Z

I think we should not impl ToSql/FromSql<Timestamp, _> for OffsetDateTime, see below for in-depth explanations.

I would however be ok with an impl ToSql/FromSql<Timestamptz, _> for OffsetDateTime

In that case, you may want to PR the removal of the existing MySql impl here:

https://github.com/diesel-rs/diesel/blob/master/diesel/src/mysql/types/date_and_time/time.rs#L108-L138

jeff-hiner · 2024-03-29T18:41:01Z

Maybe a reasonable compromise here would be to gate the "assume UTC on Timestamp" impls behind a feature flag? It's unfortunately not possible for users to impl this on their own because of foreign trait rules, and it does make dealing with implicit-UTC Timestamp fields a lot more ergonomic.

Ten0 · 2024-03-29T19:16:30Z

Maybe a reasonable compromise here would be to gate the "assume UTC on Timestamp" impls behind a feature flag?

That may be a reasonable compromise, depending on whether other solutions turn out to be acceptable without introducing additional maintenance complexity with feature flags.
The way in which features are enabled (for all crates in workspace if one has it) may hurt users who have only one project in their workspace that needs it, but the rest doesn't (they wouldn't see compilation errors for all the rest of their workspace), but that's enough of an edge-case at this point that I'd be ok with it. (At least that wouldn't negatively affect me 😅)

Ten0 · 2024-03-29T20:00:18Z

Similarly I don't think these impls:

diesel/diesel/src/pg/types/date_and_time/time.rs

Lines 47 to 59 in 877611a

    
           #[cfg(all(feature = "time", feature = "postgres_backend"))] 
        
           impl FromSql<Timestamptz, Pg> for PrimitiveDateTime { 
        
               fn from_sql(bytes: PgValue<'_>) -> deserialize::Result<Self> { 
        
                   FromSql::<Timestamp, Pg>::from_sql(bytes) 
        
               } 
        
           } 
        
           #[cfg(all(feature = "time", feature = "postgres_backend"))] 
        
           impl ToSql<Timestamptz, Pg> for PrimitiveDateTime { 
        
               fn to_sql<'b>(&'b self, out: &mut Output<'b, '_, Pg>) -> serialize::Result { 
        
                   ToSql::<Timestamp, Pg>::to_sql(self, out) 
        
               } 
        
           }

or these impls:

diesel/diesel/src/pg/types/date_and_time/chrono.rs

Lines 50 to 62 in 877611a

    
           #[cfg(all(feature = "chrono", feature = "postgres_backend"))] 
        
           impl FromSql<Timestamptz, Pg> for NaiveDateTime { 
        
               fn from_sql(bytes: PgValue<'_>) -> deserialize::Result<Self> { 
        
                   FromSql::<Timestamp, Pg>::from_sql(bytes) 
        
               } 
        
           } 
        
           #[cfg(all(feature = "chrono", feature = "postgres_backend"))] 
        
           impl ToSql<Timestamptz, Pg> for NaiveDateTime { 
        
               fn to_sql<'b>(&'b self, out: &mut Output<'b, '_, Pg>) -> serialize::Result { 
        
                   ToSql::<Timestamp, Pg>::to_sql(self, out) 
        
               } 
        
           }

should exist (especially the ToSql one).

Mainly for me the risk is that people start mistakenly inserting non timezone-aware datetimes as timezone-aware data in a Timestamptz field of the database (as if it was utc, when it's in fact not explicitly expressed anywhere - they should have called .assume_utc() at some point and used timezone-aware types from then on).

jeff-hiner · 2024-04-02T05:37:20Z

I'm going to withdraw this PR. I think the plan to deprecate the mismatched stuff makes a lot more sense. Let's see if there's a good way to document that it's possible on Pg to replace Timestamp with Timestamptz.

jeff-hiner · 2024-04-03T19:59:25Z

@Ten0 to follow up, modifying schema.rs to s/Timestamp/Timestamptz/ worked great... right up until a CI build tried to set up an integration test db with diesel migration run --locked-schema and barfed. I'm going to try the other option now.

Ten0 · 2024-04-03T20:09:36Z

You should use the schema patch feature where you can set a patch file for the schema in the diesel.toml, I would expect that locked-schema makes its comparison after. Or you can avoid locked-schema and run the diff check yourself in CI, after e.g. sed ing Timestamp to Timestamptz

weiznich · 2024-04-03T20:12:09Z

A patch file is the correct solution. The lock check is performed after modifying the generated schema via the patch. See the relevant guide for details.

jeff-hiner · 2024-04-04T04:55:56Z

I considered a patch file, but it doesn't scale when folks on the team add new tables with Timestamp fields-- they need to know how to regenerate the patch file. I'm guessing there's no way to run it like a sed and just substitute all?

weiznich · 2024-04-04T05:37:51Z

There is currently no way to replace all occurrence of one type with another while generating your schema.rs file. I think it might be possible for you to just add that a script check (via grep or whatever) to your CI setup and just reject any schema.rs file that contains a Timestamp field??

As a more general note: From time to time I think about introducing a more flexible overwrite mechanism that can be configured via the diesel.toml file. Something along the lines that would allow you to specify a match pattern (table, column, type, etc) and a replacement. That might be useful for sqlite (with the weak type system on their end) and also for such use cases. That's currently mostly blocked on capacity. If you are interested in helping out with that we can certainly discuss possible designs, although I should probably add that this is likely not a feature that's implemented in a day or two.

Ten0 · 2024-04-04T18:41:24Z

What we do is we only migration run without --locked-schema, then run our patches (we actually run cargo fmt on the generated file, and apply other patches manually - this code dates from before I knew there was a patch feature integrated to diesel, but I like that we run our custom formatting before applying patches, since that makes patches homogeneous with final file), then we run git diff --exit-code to make sure the file hasn't changed.

So you could do:

diesel migration run
sed -i 's/Timestamp /Timestamptz /' src/schema.rs
git diff --exit-code src/schema.rs

That doesn't require any additional Diesel feature, and does make the CI fail if the schema has changed (as a bonus, it prints out the diff between the expected and actual schema).

jeff-hiner · 2024-04-05T17:00:45Z

Yeah, I think I'll set up a make target to just autogen/overwrite the patch file from sed output. Then it'll show up as a diff in git, and devs can commit it along with changes to the schema itself.

@weiznich I don't wanna make more work for something that's likely only a problem for me. The existing patch functionality should be enough.

jeff-hiner force-pushed the pg_offsetdatetime branch from 0ce6ea6 to d53143c Compare March 29, 2024 17:17

Add FromSql/ToSql<Timestamp> to Pg backend for OffsetDateTime

5d6beda

jeff-hiner force-pushed the pg_offsetdatetime branch from d53143c to 5d6beda Compare March 29, 2024 17:22

weiznich requested a review from a team March 29, 2024 17:26

Ten0 requested changes Mar 29, 2024

View reviewed changes

Ten0 reviewed Mar 29, 2024

View reviewed changes

diesel/src/pg/types/date_and_time/time.rs Show resolved Hide resolved

Ten0 reviewed Mar 29, 2024

View reviewed changes

Ten0 mentioned this pull request Mar 29, 2024

Deserialization layer seems too permissive with regards to checking the actual types received #3868

Open

jeff-hiner closed this Apr 2, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add FromSql/ToSql<Timestamp> to Pg backend for OffsetDateTime #3973

Add FromSql/ToSql<Timestamp> to Pg backend for OffsetDateTime #3973

jeff-hiner commented Mar 29, 2024

weiznich commented Mar 29, 2024

Ten0 left a comment •

edited

Loading

Ten0 Mar 29, 2024 •

edited

Loading

jeff-hiner Mar 29, 2024

Ten0 Mar 29, 2024 •

edited

Loading

jeff-hiner Apr 1, 2024

Ten0 Apr 2, 2024

Ten0 Mar 29, 2024 •

edited

Loading

jeff-hiner Mar 29, 2024

weiznich Mar 29, 2024 •

edited by Ten0

Loading

Ten0 Mar 29, 2024 •

edited

Loading

weiznich Mar 29, 2024

jeff-hiner commented Mar 29, 2024

Ten0 Mar 29, 2024 •

edited

Loading

jeff-hiner commented Mar 29, 2024

jeff-hiner commented Mar 29, 2024

Ten0 commented Mar 29, 2024 •

edited

Loading

Ten0 commented Mar 29, 2024 •

edited

Loading

jeff-hiner commented Apr 2, 2024

jeff-hiner commented Apr 3, 2024

Ten0 commented Apr 3, 2024 •

edited

Loading

weiznich commented Apr 3, 2024

jeff-hiner commented Apr 4, 2024

weiznich commented Apr 4, 2024

Ten0 commented Apr 4, 2024

jeff-hiner commented Apr 5, 2024

Add FromSql/ToSql<Timestamp> to Pg backend for OffsetDateTime #3973

Add FromSql/ToSql<Timestamp> to Pg backend for OffsetDateTime #3973

Conversation

jeff-hiner commented Mar 29, 2024

weiznich commented Mar 29, 2024

Ten0 left a comment • edited Loading

Choose a reason for hiding this comment

Ten0 Mar 29, 2024 • edited Loading

Choose a reason for hiding this comment

jeff-hiner Mar 29, 2024

Choose a reason for hiding this comment

Ten0 Mar 29, 2024 • edited Loading

Choose a reason for hiding this comment

jeff-hiner Apr 1, 2024

Choose a reason for hiding this comment

Ten0 Apr 2, 2024

Choose a reason for hiding this comment

Ten0 Mar 29, 2024 • edited Loading

Choose a reason for hiding this comment

jeff-hiner Mar 29, 2024

Choose a reason for hiding this comment

weiznich Mar 29, 2024 • edited by Ten0 Loading

Choose a reason for hiding this comment

Ten0 Mar 29, 2024 • edited Loading

Choose a reason for hiding this comment

weiznich Mar 29, 2024

Choose a reason for hiding this comment

jeff-hiner commented Mar 29, 2024

Ten0 Mar 29, 2024 • edited Loading

Choose a reason for hiding this comment

jeff-hiner commented Mar 29, 2024

jeff-hiner commented Mar 29, 2024

Ten0 commented Mar 29, 2024 • edited Loading

Ten0 commented Mar 29, 2024 • edited Loading

jeff-hiner commented Apr 2, 2024

jeff-hiner commented Apr 3, 2024

Ten0 commented Apr 3, 2024 • edited Loading

weiznich commented Apr 3, 2024

jeff-hiner commented Apr 4, 2024

weiznich commented Apr 4, 2024

Ten0 commented Apr 4, 2024

jeff-hiner commented Apr 5, 2024

Ten0 left a comment •

edited

Loading

Ten0 Mar 29, 2024 •

edited

Loading

Ten0 Mar 29, 2024 •

edited

Loading

Ten0 Mar 29, 2024 •

edited

Loading

weiznich Mar 29, 2024 •

edited by Ten0

Loading

Ten0 Mar 29, 2024 •

edited

Loading

Ten0 Mar 29, 2024 •

edited

Loading

Ten0 commented Mar 29, 2024 •

edited

Loading

Ten0 commented Mar 29, 2024 •

edited

Loading

Ten0 commented Apr 3, 2024 •

edited

Loading