Source Postgres: lost milliseconds precision when a timestamp/datetime is converted to string #9157
Labels
area/connectors
Connector related issues
community
connectors/destination/snowflake
connectors/destinations-warehouse
connectors/source/postgres
connectors/sources-database
priority/critical
Critical priority!
type/bug
Something isn't working
Environment
Current Behavior
Postgres timestamp is converted to string and milliseconds precision is lost. It affects all timestamp fields and especially the cursor behavior because it saves the converted string value. Due to precision is being lost during the extraction phase from the source, it affects all the connections where this source is involved. Furthermore, and most important, it affects how Airbyte syncs data, depending on the last cursor value Airbyte will lose or duplicate data.
If the last selected cursor timestamp milliseconds value in Postgres is lower than
.500
the cursor value saved on Airbyte is truncated, and if it is equal to or greater than.500
, the cursor saved is rounded.Example of a selected cursor with the last value lower than
.500
:2021-11-18 15:28:26.499
2021-11-18T15:28:26Z
2021-11-18T15:28:26.000
would be resynced even though the cursor and data don't change.Example of a selected cursor with the last value equal to or greater than
.500
:2021-11-18 15:28:26.500
2021-11-18T15:28:27Z
2021-11-18 15:28:26.500
but equal to or lower than2021-11-18T15:28:27.000
will be lost.Expected Behavior
Timestamps should keep millisecond precision.
Logs
Steps to Reproduce
Selected cursor with the last value lower than
.500
Selected cursor with the last value equal to or greater than
.500
.500
Are you willing to submit a PR?
I've tried to fix it by myself but it looks like it should be fixed in a library that is used or can be used by other JAVA JDBC connections
AbstractJdbcCompatibleSourceOperations.java
. Also, I think other types of fields from the same (liketime
) or different source connectors can be compromised even though they're not using thisairbyte-db
lib because it looks like is a design decision to use only seconds precision. I want to collaborate but I think is something that should be discussed by core committers, I am willing to participate if you think that I might help.I've started to follow the code from here at
source-postgres
airbyte/airbyte-integrations/connectors/source-postgres/src/main/java/io/airbyte/integrations/source/postgres/PostgresSourceOperations.java
Line 100 in 141cbc5
Proposed solution:
We should modify
airbyte-db/lib/src/main/java/io/airbyte/db/jdbc/AbstractJdbcCompatibleSourceOperations.java
file.Change the line 125 that uses
DataTypeUtils.toISO8601String
functionairbyte/airbyte-db/lib/src/main/java/io/airbyte/db/jdbc/AbstractJdbcCompatibleSourceOperations.java
Line 125 in 141cbc5
to another existent function from the same lib
DataTypeUtils.toISO8601StringWithMilliseconds
airbyte/airbyte-db/lib/src/main/java/io/airbyte/db/DataTypeUtils.java
Line 47 in 141cbc5
FYI
I exposed this issue on Slack before in order to understand if it was my fault or it was really an issue. Marcos Marx, Liren Tu, and Augustin participated in the discussion.
┆Issue is synchronized with this Asana task by Unito
The text was updated successfully, but these errors were encountered: