New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[HUDI-2608] add json to avro converter #7463
base: release-0.11.0
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@abhishekkh Thanks for your contribution! May I know why the destination branch is release-0.11.0
and not master
?
String jsonType = property.get("type").getAsString(); | ||
|
||
switch (jsonType) { | ||
case "integer": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aren't there some static constants / type enums defined in gson for these? If not, it would be better to define static constants in our code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@codope yes, we could. This PR is not ready for review yet. I am still working on using the KafkaJsonSchemaDeserializer
when deserializing events read from the kafka source. Looking at the JsonKafkaSource
class, everything seems to be converted into a String first and then avro
later. That becomes a problem when reading decimal numbers as the string conversion leaves the value as AA==
for a decimal value of 0.00
. Now when the value is converted back to AVRO
it throws a wrong datatype exception as it was expecting a double
and not a string
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am planning to introduce a JsonSchemaKafka
source that reads the consumer records and keeps the type information instead of converting them to string, before conversion to Avro. Let me know if you see a better way to do this
return "boolean"; | ||
case "object": | ||
return jsonPropertiesToAvro(property.get(PROPERTIES).getAsJsonObject()); | ||
// TODO: handle json array |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please create a ticket for this todo
Schema parse = new Schema.Parser().parse(avroSchema); | ||
|
||
assert !parse.isError(); | ||
assert parse.getName().equals("tranlog"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We use Assertions.assertEquals
and similar in our test cases. Also, let's cover more scenarios to test.
* @return Object avro datatype | ||
* @throws Exception | ||
*/ | ||
private static Object jsonTypeToAvroType(JsonObject property) throws Exception { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does it also cover nested fields?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes it does
Wasn't sure which was the stable release as this is my first contribution, I can move it against master |
public static final String PROPERTIES = "properties"; | ||
public static final String FIELDS = "fields"; | ||
public static final String RECORD = "record"; | ||
public static final String MARQETA_JCARD_NAMESPACE = "com.marqeta.jcard"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems specific to your company or project, can we make this more generic?
@@ -1044,8 +1044,7 @@ | |||
<dependency> | |||
<groupId>com.google.code.gson</groupId> | |||
<artifactId>gson</artifactId> | |||
<version>2.3.1</version> | |||
<scope>test</scope> | |||
<version>2.9.0</version> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to be careful with this change since it is changing the scope for all modules that are not currently overriding the scope
* @return table name | ||
*/ | ||
private static String getTableName(String title) { | ||
Pattern pattern = Pattern.compile("cdc_marqeta_jcard_(.*).Envelope"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similarly this seems specific to a use case
avroFields.add(avroObject); | ||
} | ||
} catch (Exception e) { | ||
System.out.println("exception: " + e.getCause().toString()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's use logging instead of printing
|
||
import com.google.gson.JsonArray; | ||
import com.google.gson.JsonElement; | ||
import com.google.gson.JsonObject; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's use jackson instead of gson since we're already using jackson for json deserialization
Change Logs
Handle Jsonschema serialized kafka messages as a Hudi source. Refer issue https://issues.apache.org/jira/browse/HUDI-2608
Impact
Describe any public API or user-facing feature change or any performance impact.
Risk level (write none, low medium or high below)
If medium or high, explain what verification was done to mitigate the risks.
Documentation Update
Describe any necessary documentation update if there is any new feature, config, or user-facing change
ticket number here and follow the instruction to make
changes to the website.
Contributor's checklist