-
Notifications
You must be signed in to change notification settings - Fork 2.5k
[HUDI-1659] Basic Implementation Of Spark Sql Support #2645
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -189,8 +189,12 @@ protected boolean isUpdateRecord(HoodieRecord<T> hoodieRecord) { | |
| private Option<IndexedRecord> getIndexedRecord(HoodieRecord<T> hoodieRecord) { | ||
| Option<Map<String, String>> recordMetadata = hoodieRecord.getData().getMetadata(); | ||
| try { | ||
| Option<IndexedRecord> avroRecord = hoodieRecord.getData().getInsertValue(writerSchema); | ||
| Option<IndexedRecord> avroRecord = hoodieRecord.getData().getInsertValue(tableSchema, | ||
| config.getProps()); | ||
| if (avroRecord.isPresent()) { | ||
| if (avroRecord.get().equals(IGNORE_RECORD)) { | ||
|
||
| return avroRecord; | ||
| } | ||
| // Convert GenericRecord to GenericRecord with hoodie commit metadata in schema | ||
| avroRecord = Option.of(rewriteRecord((GenericRecord) avroRecord.get())); | ||
| String seqId = | ||
|
|
@@ -336,7 +340,7 @@ public void doAppend() { | |
| protected void appendDataAndDeleteBlocks(Map<HeaderMetadataType, String> header) { | ||
| try { | ||
| header.put(HoodieLogBlock.HeaderMetadataType.INSTANT_TIME, instantTime); | ||
| header.put(HoodieLogBlock.HeaderMetadataType.SCHEMA, writerSchemaWithMetafields.toString()); | ||
| header.put(HoodieLogBlock.HeaderMetadataType.SCHEMA, writeSchemaWithMetaFields.toString()); | ||
| List<HoodieLogBlock> blocks = new ArrayList<>(2); | ||
| if (recordList.size() > 0) { | ||
| blocks.add(HoodieDataBlock.getBlock(hoodieTable.getLogDataBlockFormat(), recordList, header)); | ||
|
|
@@ -444,7 +448,10 @@ private void writeToBuffer(HoodieRecord<T> record) { | |
| } | ||
| Option<IndexedRecord> indexedRecord = getIndexedRecord(record); | ||
| if (indexedRecord.isPresent()) { | ||
| recordList.add(indexedRecord.get()); | ||
| // Skip the Ignore Record. | ||
| if (!indexedRecord.get().equals(IGNORE_RECORD)) { | ||
| recordList.add(indexedRecord.get()); | ||
| } | ||
| } else { | ||
| keysToDelete.add(record.getKey()); | ||
| } | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so switching to
config.getWriteSchema()here, would affect hive sync? We use the schema from the commit file to sync to hive. So if the write schema is a subset of the table schema, then we can have an issue here. Did you run into any issues like that? I think we could actually write both into the commit metadata.?Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @vinothchandar ,as I described below, The
writeSchemais the same to the table schema. So there is no negative effects to the hive sync. I have run the case in our production environment and it works well.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Below you mentioned that
writeSchemawill be not the same as inputSchema right? I think inputSchema is what will be equal to the table schema. no?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, the
writeSchemamay not equal to theinputSchemafor MergeInto.The
inputSchemais the schema of the incoming records( come from thehoodie.avro.schema) , we use it to parse the bytes to the avro record in the HoodiePayload.The
writeSchemais always equal to the table schema, we use thewriteSchemato write the record to the table. ThewriteSchemacome from thehoodie.write.schemaif we set this property, or not, we get it from thehoodie.avro.schema.So here, we pass the
writeSchemato theHoodieCommitMetadata.