You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We've been running into the same issue as described in #1861, where we are trying to use the JSON Schema converter to convert ingested JSON messages to Parquet in s3. However, as mentioned in the issue,
The JsonSchemaConverter outputs schemas with fieldnames such as "io.confluent.connect.json.OneOf.field.0", which clashes with the official avro library. So eventually, when such schemas are converted to avro in the AvroData class, it will throw an exception.
PR #1873 by @rayokota addresses this issue and exposes the fix behind the scrub.invalid.names config.
Problem
We're unable to set scrub.invalid.names to true when using the JsonSchemaConverter, as it's only available for the AvroConverter and the ProtobufConverter (docs). However, the JsonSchemaConverter uses avro-data under the hood, and the fix is implemented in avro-data. We've been trying to work around this by manually setting the config by shading the avro-data jar, but haven't had success.
Is there a way to set scrub.invalid.names when using the JsonSchemaConverter?
The text was updated successfully, but these errors were encountered:
Caused by: java.lang.IllegalArgumentException: Avro schema must be a record.
at org.apache.parquet.avro.AvroSchemaConverter.convert(AvroSchemaConverter.java:124)
at org.apache.parquet.avro.AvroParquetWriter.writeSupport(AvroParquetWriter.java:150)
at org.apache.parquet.avro.AvroParquetWriter.access$200(AvroParquetWriter.java:36)
at org.apache.parquet.avro.AvroParquetWriter$Builder.getWriteSupport(AvroParquetWriter.java:182)
at org.apache.parquet.hadoop.ParquetWriter$Builder.build(ParquetWriter.java:563)
at io.confluent.connect.s3.format.parquet.ParquetRecordWriterProvider$1.write(ParquetRecordWriterProvider.java:102)
at io.confluent.connect.s3.format.S3RetriableRecordWriter.write(S3RetriableRecordWriter.java:46)
at io.confluent.connect.s3.format.KeyValueHeaderRecordWriterProvider$1.write(KeyValueHeaderRecordWriterProvider.java:107)
at io.confluent.connect.s3.TopicPartitionWriter.writeRecord(TopicPartitionWriter.java:562)
at io.confluent.connect.s3.TopicPartitionWriter.checkRotationOrAppend(TopicPartitionWriter.java:311)
at io.confluent.connect.s3.TopicPartitionWriter.executeState(TopicPartitionWriter.java:254)
at io.confluent.connect.s3.TopicPartitionWriter.write(TopicPartitionWriter.java:205)
at io.confluent.connect.s3.S3SinkTask.put(S3SinkTask.java:234)
at org.apache.kafka.connect.runtime.WorkerSinkTask.deliverMessages(WorkerSinkTask.java:581)
Background
We've been running into the same issue as described in #1861, where we are trying to use the JSON Schema converter to convert ingested JSON messages to Parquet in s3. However, as mentioned in the issue,
PR #1873 by @rayokota addresses this issue and exposes the fix behind the
scrub.invalid.names
config.Problem
We're unable to set
scrub.invalid.names
totrue
when using theJsonSchemaConverter
, as it's only available for theAvroConverter
and theProtobufConverter
(docs). However, theJsonSchemaConverter
usesavro-data
under the hood, and the fix is implemented inavro-data
. We've been trying to work around this by manually setting the config by shading theavro-data
jar, but haven't had success.Is there a way to set
scrub.invalid.names
when using theJsonSchemaConverter
?The text was updated successfully, but these errors were encountered: