Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DateTime64(9) are inserted incorrectly (near unix epoch) when using KafkaConnect TimestampConverter to convert from string to org.apache.kafka.connect.data.Time #406

Closed
greenlion opened this issue Jun 19, 2024 · 9 comments · Fixed by #413
Labels
bug Something isn't working

Comments

@greenlion
Copy link

I want to encode the following document as protobuf with schema and insert it into the database through kafka:

{ "Date":"2024-04-02 03:59:00.072229523","EventType":"QUOTE BID","Ticker":"AAPL","Price":0.00,"Quantity":0,"Exchange":"EDGX","Conditions":"00000008" }

Note that the Date field is a DateTime64(9) field encoded as a string.

The database schema is as follows:
CREATE TABLE kafka_test.kafka_test ( DateDateTime64(9),EventTypeLowCardinality(String),TickerLowCardinality(String),PriceFloat64,QuantityInt32,ExchangeLowCardinality(String),Conditions String ) ENGINE = MergeTree ORDER BY (Ticker, Date, EventType, Exchange) SETTINGS index_granularity = 8192

In order to convert from the string representation to org.apache.kafka.connect.data.Time I use a TimestampConverter in the config for the sink:
`
{
"name": "clickhouse-connect",
"config": {
"connector.class": "com.clickhouse.kafka.connect.ClickHouseSinkConnector",
"tasks.max": "1",
"database": "kafka_test",
"errors.retry.timeout": "60",
"exactlyOnce": "false",
"hostname": "10.249.176.81",
"port": "8123",
"ssl": "false",
"username": "default",
"password": "",
"topics": "kafka_test",
"transforms": "TimestampConverter",
"transforms.TimestampConverter.target.type": "Timestamp",
"transforms.TimestampConverter.field": "Date",
"transforms.TimestampConverter.format": "yyyy-MM-dd HH:mm:ss.SSSSSSSSS",
"transforms.TimestampConverter.type": "org.apache.kafka.connect.transforms.TimestampConverter$Value",
"value.converter": "io.confluent.connect.protobuf.ProtobufConverter",
"value.converter.schemas.enable": "true",
"value.converter.schema.registry.url": "http://schema-registry:8081",
"clickhouseSettings": ""
}
}
I insert the protobuf documents into kafka with schema like so:
cat /data/AAPL.json|kafka-protobuf-console-producer --bootstrap-server broker:9092 --property schema.registry.url=http://schema-registry:8081 --topic kafka_test --property value.schema='syntax = "proto3"; message TAQRecord { string Date=1; string EventType=2; string Ticker=3; double Price=4; int32 Quantity=5; string Exchange=6; string Conditions=7; }'

Steps to reproduce

  1. Start the connector with the given configuration (update the hostname and credentials)
  2. Insert the sample document into kafka encoded with protobuf and the given schema
  3. Examine the record in the kakfa_test table to see the date is wrong

The record in ClickHouse is munged. The date is wrong:
Row 1: ────── Date: 1969-12-31 19:28:32.031040032 EventType: QUOTE BID Ticker: AAPL ...

Expected behaviour

Dates are inserted correctly.

@greenlion greenlion added the bug Something isn't working label Jun 19, 2024
@greenlion
Copy link
Author

Any updates on this? Am I doing something wrong with the connector config or is there a bug in the connector around timestamp handling? We are currently blocked on this.

@Paultagoras
Copy link
Contributor

Any updates on this? Am I doing something wrong with the connector config or is there a bug in the connector around timestamp handling? We are currently blocked on this.

Hi! I'll take a look at this today and see if we can resolve it for you folks.

@Paultagoras
Copy link
Contributor

@greenlion I'm definitely able to reproduce this, just tracking down the how to resolve it for you folks

@Paultagoras
Copy link
Contributor

@greenlion Alright so using a UNIX timestamp, I see that same behavior when I set the precision higher than 3 (it seems that's the default for DateTime64) - while I'm looking at other options, do you folks need that higher level precision? (3=> milliseconds, 6 => Microseconds, 9=> Nanoseconds)

@greenlion
Copy link
Author

greenlion commented Jul 8, 2024 via email

@Paultagoras Paultagoras linked a pull request Jul 10, 2024 that will close this issue
@Paultagoras
Copy link
Contributor

@greenlion so I think this is happening because of a bug with the converter (and how Date works in Java) - I've created a PR to try and address this (by allowing you to pass the String value and set the formatter pattern), will need to have it approved before I can release an updated version though.

@greenlion
Copy link
Author

greenlion commented Jul 10, 2024 via email

@Paultagoras
Copy link
Contributor

@greenlion Just in case you didn't see it, we added the datetime format change to the recent release - See dateTimeFormats in https://clickhouse.com/docs/en/integrations/kafka/clickhouse-kafka-connect-sink#configuration-options for more details 🙂

Please let me know if you run into more issues!

@greenlion
Copy link
Author

greenlion commented Jul 22, 2024 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants