Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MINOR] Fix npe for get internal schema #9984

Merged
merged 1 commit into from
Nov 14, 2023

Conversation

watermelon12138
Copy link
Contributor

@watermelon12138 watermelon12138 commented Nov 4, 2023

Change Logs

related issue: #9902
get internal schema maybe meet npe when parse avroSchema. So, we need to return InternalSchema.getEmptyInternalSchema() when avroSchema is null or empty.

Impact

none

Risk level (write none, low medium or high below)

none

Contributor's checklist

  • Read through contributor's guide
  • Change Logs and Impact were stated clearly
  • Adequate tests were added if applicable
  • CI passed

@watermelon12138
Copy link
Contributor Author

@xiarixiaoyao

@danny0405 danny0405 self-assigned this Nov 4, 2023
@danny0405 danny0405 added the writer-core Issues relating to core transactions/write actions label Nov 4, 2023
@watermelon12138
Copy link
Contributor Author

@hudi-bot run azure

? (StringUtils.isNullOrEmpty(avroSchema)
? InternalSchema.getEmptyInternalSchema()
: AvroInternalSchemaConverter.convert(HoodieAvroUtils.addMetadataFields(new Schema.Parser().parse(avroSchema))))
: fileSchema;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for your fix.
why avro schema is null here ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can read the exception stack of this problem:

Caused by: org.apache.avro.SchemaParseException: Cannot parse schema
at org.apache.avro.Schema.parse(Schema.java:1633)
at org.apache.avro.Schema$Parser.parse(Schema.java:1430)
at org.apache.avro.Schema$Parser.parse(Schema.java:1418)
at org.apache.hudi.common.util.InternalSchemaCache.getInternalSchemaByVersionId(InternalSchemaCache.java:220)
at org.apache.hudi.common.util.InternalSchemaCache.getInternalSchemaByVersionId(InternalSchemaCache.java:226)
at org.apache.hudi.table.action.commit.HoodieMergeHelper.composeSchemaEvolutionTransformer(HoodieMergeHelper.java:177)
at org.apache.hudi.table.action.commit.HoodieMergeHelper.runMerge(HoodieMergeHelper.java:94)
at org.apache.hudi.table.HoodieSparkCopyOnWriteTable.handleUpdateInternal(HoodieSparkCopyOnWriteTable.java:252)
at org.apache.hudi.table.HoodieSparkCopyOnWriteTable.handleUpdate(HoodieSparkCopyOnWriteTable.java:235)
at org.apache.hudi.table.action.compact.CompactionExecutionHelper.writeFileAndGetWriteStats(CompactionExecutionHelper.java:64)
at org.apache.hudi.table.action.compact.HoodieCompactor.compact(HoodieCompactor.java:237)
at org.apache.hudi.table.action.compact.HoodieCompactor.lambda$compact$988df80a$1(HoodieCompactor.java:132)
at org.apache.spark.api.java.JavaPairRDD$.$anonfun$toScalaFunction$1(JavaPairRDD.scala:1070)
at scala.collection.Iterator$$anon$10.next(Iterator.scala:461)
at scala.collection.Iterator$$anon$11.nextCur(Iterator.scala:486)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:492)
at org.apache.spark.storage.memory.MemoryStore.putIterator(MemoryStore.scala:223)
at org.apache.spark.storage.memory.MemoryStore.putIteratorAsBytes(MemoryStore.scala:352)
at org.apache.spark.storage.BlockManager.$anonfun$doPutIterator$1(BlockManager.scala:1498)
at org.apache.spark.storage.BlockManager.org$apache$spark$storage$BlockManager$$doPut(BlockManager.scala:1408)
at org.apache.spark.storage.BlockManager.doPutIterator(BlockManager.scala:1472)
at org.apache.spark.storage.BlockManager.getOrElseUpdate(BlockManager.scala:1295)
at org.apache.spark.rdd.RDD.getOrCompute(RDD.scala:384)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:335)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:337)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
at org.apache.spark.scheduler.Task.run(Task.scala:133)
at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:506)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1474)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:509)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:750)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@xiarixiaoyao Hi, This problem is difficult to reproduce. I think that we can try to prevent it from the code perspective.

@watermelon12138 watermelon12138 force-pushed the FixNpeForGetInternalSchema branch 2 times, most recently from 63a77e3 to 23eb3d5 Compare November 7, 2023 13:17
return fileSchema.isEmptySchema() ? AvroInternalSchemaConverter.convert(HoodieAvroUtils.addMetadataFields(new Schema.Parser().parse(avroSchema))) : fileSchema;
return fileSchema.isEmptySchema()
? StringUtils.isNullOrEmpty(avroSchema)
? InternalSchema.getEmptyInternalSchema()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it because the version upgrade or something? Is the null avro schema coming from an old version Hudi table?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@danny0405 Yes, Some users find this problem in the upgrade scenario(0.12.3 -> 0.14).

@watermelon12138
Copy link
Contributor Author

@hudi-bot run azure

1 similar comment
@watermelon12138
Copy link
Contributor Author

@hudi-bot run azure

@hudi-bot
Copy link

hudi-bot commented Nov 8, 2023

CI report:

Bot commands @hudi-bot supports the following commands:
  • @hudi-bot run azure re-run the last Azure build

@watermelon12138
Copy link
Contributor Author

@hudi-bot run azure

@danny0405 danny0405 merged commit 00ece7b into apache:master Nov 14, 2023
27 of 28 checks passed
@zyclove
Copy link

zyclove commented Nov 14, 2023

@danny0405 👍🏻
By the way,
Will 0.14.1 be released again? Or should we release 1.0.0 directly? Can you update the official roadmap?

@danny0405
Copy link
Contributor

@zyclove 1.0.0-beta is already under release process.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
writer-core Issues relating to core transactions/write actions
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants