Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[sink jdbc-clickhouse] avro (and possibly others) sink is broken #11457

Open
smazurov opened this issue Jul 26, 2021 · 6 comments
Open

[sink jdbc-clickhouse] avro (and possibly others) sink is broken #11457

smazurov opened this issue Jul 26, 2021 · 6 comments
Labels
lifecycle/stale type/bug The PR fixed a bug or issue reported a bug

Comments

@smazurov
Copy link
Contributor

smazurov commented Jul 26, 2021

Describe the bug

When publishing & consuming messages avro & json schema encoding works as expected. After creating a simple clickhouse sink (using config from the docs), the messages get written to the table, but every value is null, making it useless. Downgrading to 2.7.2 re-running the test it all works as expected. No avro, db schema changes or sink config changes are needed when downgrading.

To Reproduce
Steps to reproduce the behavior:

  1. Start up a clickhouse and pulsar-standalone
  2. Create a clickhouse sink attached to the test topic
  3. Publish some messages with avro schema (i used go client for this)
  4. See error

Expected behavior

I expect the sink to correctly populate my fields.

@Lucgarg
Copy link

Lucgarg commented Aug 17, 2021

I think the problem is that to enter data in JSON format, you have to specify the clickhouse type of JSONEachRow format. But i think the jdbc connector doesn't give that option. Now i probe with "kafka on pulsar" and configure the table how say in this web https://clickhouse.tech/docs/en/engines/table-engines/integrations/kafka/ and consume. But i dont like to use "kafka on pulsar"

@smazurov
Copy link
Contributor Author

I'm assuming it has to do with the way schemas are handled regardless of destination, here's another issue that makes it seem specific to golang & sql

@codelipenghui
Copy link
Contributor

The issue had no activity for 30 days, mark with Stale label.

@smazurov
Copy link
Contributor Author

smazurov commented Mar 7, 2022

The PR mentioned in linked issue deals with this downstream, but I do not know if underlying issue has been fixed.

@estebanz01
Copy link

I'm seeing a similar issue but without schema on pulsar 3.1 and JDBC Clickhouse sink with pulsar-io-jdbc-clickhouse-3.1.1.nar NAR file.

Here's my config:

configs:
  userName: "default"
  password: "default"
  jdbcUrl: "jdbc:clickhouse://localhost:8123/iot"
  tableName: "pulsar_data"
  useTransactions: "false"
  nonKey: "device_name, data_content, device_properties, datetime_info, topicName"

@estebanz01
Copy link

I managed to setup the following Avro schema and it still not able to render data, everything is null:

{
  "name": "clickhouse-schema",
  "type": "AVRO",
  "schema": "{\"type\": \"record\", \"namespace\": \"iot\",\"name\":\"exampleAvro\",\"fields\":[{\"name\": \"device_name\",\"type\": \"string\", \"default\": \"\"},{\"name\": \"data_content\", \"type\":{\"type\":\"map\", \"values\": \"string\", \"default\": {}}},{\"name\": \"device_properties\", \"type\":{ \"type\": \"map\", \"values\": \"string\", \"default\": {}}},{\"name\":\"datetime_info\", \"type\": \"string\", \"default\": \"\"}, {\"name\": \"topicName\", \"type\": \"string\", \"default\": \"\"}]}",
  "properties": {
    "__jsr310ConversionEnabled": "false"
  }
}

Running the sink in localrun mode I get the following error:

023-11-10T10:56:24,865-0500 [pool-3-thread-1] ERROR org.apache.pulsar.io.jdbc.JdbcAbstractSink - Got exception Cannot set null to non-nullable column #1 [datetime_info String] after 0 ms, failing 200 messages
java.sql.SQLException: Cannot set null to non-nullable column #1 [datetime_info String]
        at com.clickhouse.jdbc.SqlExceptionUtils.clientError(SqlExceptionUtils.java:73) ~[clickhouse-jdbc-0.4.6-all.jar:clickhouse-jdbc 0.4.6 (revision: dd91e17)]
        at com.clickhouse.jdbc.internal.InputBasedPreparedStatement.addBatch(InputBasedPreparedStatement.java:340) ~[clickhouse-jdbc-0.4.6-all.jar:clickhouse-jdbc 0.4.6 (revision: dd91e17)]
        at com.clickhouse.jdbc.internal.InputBasedPreparedStatement.executeAny(InputBasedPreparedStatement.java:113) ~[clickhouse-jdbc-0.4.6-all.jar:clickhouse-jdbc 0.4.6 (revision: dd91e17)]
        at com.clickhouse.jdbc.internal.InputBasedPreparedStatement.execute(InputBasedPreparedStatement.java:312) ~[clickhouse-jdbc-0.4.6-all.jar:clickhouse-jdbc 0.4.6 (revision: dd91e17)]
        at org.apache.pulsar.io.jdbc.JdbcAbstractSink.flush(JdbcAbstractSink.java:289) ~[pulsar-io-jdbc-core-3.1.1.jar:?]
        at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source) ~[?:?]
        at java.util.concurrent.FutureTask.run(Unknown Source) ~[?:?]
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(Unknown Source) ~[?:?]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) ~[?:?]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) ~[?:?]
        at java.lang.Thread.run(Unknown Source) ~[?:?]

here's my clickhouse table definition:

SET allow_experimental_object_type = 1;
CREATE TABLE mqtt.pulsar_data(datetime_info String, topicName String, data_content JSON, device_properties JSON, device_name String) ENGINE = MergeTree ORDER BY(topicName, datetime_info);

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
lifecycle/stale type/bug The PR fixed a bug or issue reported a bug
Projects
None yet
Development

No branches or pull requests

4 participants