Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue on "Date Format" using the S3 Sink Connector from Avro to Parquet #92

Open
remynollet opened this issue Feb 3, 2022 · 1 comment

Comments

@remynollet
Copy link

Hi, when using the connector to share event-data from Posgresql Debezium to S3 via Kafka in Parquet, we have an issue to get a "Date Format"

In Kafka, the payload is :
"created_date": 1643631507020,

The schema created by the Debezium is this one

 {
      "name": "created_date",
      "type": {
        "type": "long",
        "connect.version": 1,
        "connect.name": "org.apache.kafka.connect.data.Timestamp",
        "logicalType": "timestamp-millis"
      }
    },

Using a S3 connector to share this data as "Parquet file", we can configure a smt transformation to transform as string
"transforms.TsCreatedDate.type": "org.apache.kafka.connect.transforms.TimestampConverter$Value", "transforms.TsCreatedDate.field": "created_date", "transforms.TsCreatedDate.format": "yyyy-MM-dd'T'HH:mm:ssZ", "transforms.TsCreatedDate.target.type": "string",

But the expected date format in Parquet is date.
We still get the "long format" or "string" with SMT transformation.
required int64 created_date;

Attended format should be: DATE, TIMESTAMP_MILLIS, TIMESTAMP_MICROS

How can we resolve this ?

@ivanyu
Copy link
Contributor

ivanyu commented Jul 11, 2022

Hi @remynollet
Will setting the target type to Date help?
I experimented with this code:

Schema originalSchema = SchemaBuilder.struct().field("created_date", Schema.INT64_SCHEMA);
Struct originalValue = new Struct(originalSchema).put("created_date", 1643631507020L);
SourceRecord original = new SourceRecord(null, null, "topic", 0, originalSchema, originalValue);

TimestampConverter<SourceRecord> converter = new TimestampConverter.Value<>();
Map<String, String> config = new HashMap<>();
config.put("field", "created_date");
config.put("target.type", "Date");
converter.configure(config);

SourceRecord transformed = converter.apply(original);

System.out.println(original.value());
System.out.println(transformed.value());

...
Output:

Struct{created_date=1643631507020}
Struct{created_date=Mon Jan 31 02:00:00 EET 2022}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants