-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support Avro's decimal #69
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea
I wonder if it would work better 'toDecimal' instead of 'decimal'
a second thought:
probably the table has many columns of this type, probably we could have ALSO a specific step 'convert Cassandra CDC logical types' that does the conversion for all the fields.
it will help a lot because it would be a no Brainerd and a simple checkbox in the UI
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Amazing work @aymkhalil !
tests/src/test/java/com/datastax/oss/pulsar/functions/transforms/tests/AbstractDockerTest.java
Outdated
Show resolved
Hide resolved
tests/src/test/java/com/datastax/oss/pulsar/functions/transforms/tests/AbstractDockerTest.java
Outdated
Show resolved
Hide resolved
tests/src/test/java/com/datastax/oss/pulsar/functions/transforms/tests/AbstractDockerTest.java
Outdated
Show resolved
Hide resolved
...-transformations/src/main/java/com/datastax/oss/pulsar/functions/transforms/ComputeStep.java
Outdated
Show resolved
Hide resolved
...tions/src/main/java/com/datastax/oss/pulsar/functions/transforms/jstl/JstlTypeConverter.java
Show resolved
Hide resolved
Actually I originally had it
Yeah you are right this conversion could be reused multiple resulting in duplicate steps. It is like a "reasonable" work around for the CDC issue in specific. I was avoiding adding too specific of a conversion step to keep the transformation generic enough. I actually thought of adding another step like
but it turned out more complex that I though, however I can pursue this direction if we see value in it @cbornet what are your thoughts regarding both points above? |
I realized that in practice, BigDecimal would be of variable precision/scale. So what will happen in this patch is if scale changes and we reuse the schema, validation could pass if no rounding is required: https://github.com/apache/avro/blob/master/lang/java/avro/src/main/java/org/apache/avro/Conversions.java#L122-L132, if we cache per field, then compatibility rules would apply. I'm not sure how practical the changes in the PR would be, yes it conforms to AVRO's logical type, but may not be very useful to CDC customers in particular (they could still use |
...ormations/src/main/java/com/datastax/oss/pulsar/functions/transforms/jstl/JstlEvaluator.java
Outdated
Show resolved
Hide resolved
@@ -239,6 +245,12 @@ protected byte[] coerceToBytes(Object value) { | |||
if (value instanceof byte[]) { | |||
return (byte[]) value; | |||
} | |||
if (value instanceof ByteBuffer) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When do we get ByteBuffer ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It comes from the custom CDC CQL Type (the bigint part is encoded as ByteBuffer) so I mimicked the same in the tests. The coerceToBytes
from ByteBuffer will kick in when expression like value.cqlDecimalField.bigint
are evaluated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So this is only for tests ? Can we use instead the type that we'll get from the AVRO deserializer (probably byte[] ?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not only for testing. If the byte[] was encoded as ByteBuffer, the AVRO deserializer will return ByteBuffer even if the schema type is bytes.
This how CDC code converts it: https://github.com/datastax/cdc-apache-cassandra/blob/cd64bdd03af7a608687b6c54aa614021c62d8027/commons/src/main/java/com/datastax/oss/cdc/CqlLogicalTypes.java#L146
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see how the info of which structure was used byte[]/ByteBuffer during serialization would be in the AVRO record...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just realized the conversation from ByteBuffer had been removed from the coerceToBytes
and is now limited to coerceToBigInteger
only. However, I'm not sure how the AVRO recognizes ByteBuffer under the hoods.
...tions/src/main/java/com/datastax/oss/pulsar/functions/transforms/jstl/JstlTypeConverter.java
Show resolved
Hide resolved
...s/src/test/java/com/datastax/oss/pulsar/functions/transforms/tests/util/CqlLogicalTypes.java
Show resolved
Hide resolved
...s/src/test/java/com/datastax/oss/pulsar/functions/transforms/tests/util/CqlLogicalTypes.java
Outdated
Show resolved
Hide resolved
...s/src/test/java/com/datastax/oss/pulsar/functions/transforms/tests/util/CqlLogicalTypes.java
Outdated
Show resolved
Hide resolved
...s/src/test/java/com/datastax/oss/pulsar/functions/transforms/tests/util/CqlLogicalTypes.java
Outdated
Show resolved
Hide resolved
...ormations/src/main/java/com/datastax/oss/pulsar/functions/transforms/jstl/JstlEvaluator.java
Outdated
Show resolved
Hide resolved
@cbornet I addressed your feedback, PTAL create table ks1.d2 (id int primary key, v1 decimal) with cdc=true;
insert into ks1.d2(id, v1) values (1, 1.334); assume the following transform is in place: {
"steps":[
{
"type":"compute",
"fields":[
{
"name":"value.v2",
"expression":"fn:decimal(value.v1.bigint, value.v1.scale)",
"type":"DECIMAL"
}
]
}
]
} works fine, results in the following schema: {
"name": "v2",
"type": [
"null",
{
"type": "bytes",
"logicalType": "decimal",
"precision": 4,
"scale": 3
}
]
} now decreasing the scale works fine as well (without changing the schema on the topic): insert into ks1.d2(id, v1) values (1, 1.3); however, increasing the scale is problematic: insert into ks1.d2(id, v1) values (1, 1.33499); will fail with
so basically depending on the very first decimal that was received, the scale will limit all future decimal values which is impractical. Otherwise, this patch is technically correct. Let me know WDYT. |
Is this bc it sets the schema on the topic with the first scale received ? |
Exactly. |
LGTM. |
Issue for tracking: #75 |
This patch adds support for an AVRO only type: decimal.
It would enable conversion from other types like CQL Decimal available in C* CDC.
For example, the use can use an ARRO decimal aware sink with an upstream topic that has a CQL Decimal type like the following:
can be converted to Avro's decimal with the following converion:
The output schema of the conversion would look like this:
In order to support the conversion, a new function is registered with the following sig: