Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hoodie Upset fails with InvalidParquetMetadata because of a parquet file partially written #31

Closed
prazanna opened this issue Jan 6, 2017 · 0 comments · Fixed by #32
Closed
Assignees

Comments

@prazanna
Copy link
Contributor

prazanna commented Jan 6, 2017

I believe we have figured out the root cause

Step 1:
Executor starts writing 11337c2a-9acd-4c07-aa0e-de5e78e9f951_393_20170104223220.parquet, but failed (reason unknown)

17/01/04 22:51:39 INFO executor.Executor: Running task 393.0 in stage 34.0 (TID 23337)
17/01/04 22:51:39 INFO io.HoodieUpdateHandle: Merging new data into oldPath /app/hoodie/schemaless/trifle-client_bills-tcb005/2016/11/25/11337c2a-9acd-4c07-aa0e-de5e78e9f951_1232_20170104210747.parquet, as newPath /app/hoodie/schemaless/trifle-client_bills-tcb005/2016/11/25/11337c2a-9acd-4c07-aa0e-de5e78e9f951_393_20170104223220.parquet
17/01/04 22:52:16 ERROR executor.CoarseGrainedExecutorBackend: RECEIVED SIGNAL 15: SIGTERM
17/01/04 22:52:16 INFO storage.DiskBlockManager: Shutdown hook called

Step 2:
Driver realizes this

1253563 [sparkDriver-akka.actor.default-dispatcher-37] ERROR org.apache.spark.scheduler.cluster.YarnClusterScheduler - Lost executor 466 on hadoopworker674-sjc1.prod.uber.internal: remote Rpc client disassociated
1253564 [sparkDriver-akka.actor.default-dispatcher-37] WARN org.apache.spark.scheduler.TaskSetManager - Lost task 393.0 in stage 34.0 (TID 23337, hadoopworker674-sjc1.prod.uber.internal): ExecutorLostFailure (executor 466 lost)

and Schedules a retry of the task

1253582 [sparkDriver-akka.actor.default-dispatcher-37] INFO  org.apache.spark.scheduler.TaskSetManager  - Starting task 393.1 in stage 34.0 (TID 23543, hadoopworker523-sjc1.prod.uber.internal, PROCESS_LOCAL, 1901 bytes)

Step 3:
Retry failed because of - maybe datanode is down / restarted

17/01/04 22:56:48 ERROR io.HoodieUpdateHandle: Error in update task at commit 20170104223220
org.apache.spark.shuffle.FetchFailedException: Failed to connect to hadoopworker674-sjc1.prod.uber.internal/10.11.37.11:7337
...
	at scala.collection.convert.Wrappers$IteratorWrapper.hasNext(Wrappers.scala:29)
	at com.uber.hoodie.io.HoodieUpdateHandle.init(HoodieUpdateHandle.java:57)
	at com.uber.hoodie.io.HoodieUpdateHandle.<init>(HoodieUpdateHandle.java:48)
	at com.uber.hoodie.table.HoodieCopyOnWriteTable.handleUpdate(HoodieCopyOnWriteTable.java:375)
	at com.uber.hoodie.table.HoodieCopyOnWriteTable.handleUpsertPartition(HoodieCopyOnWriteTable.java:425)
	at com.uber.hoodie.HoodieWriteClient$5.call(HoodieWriteClient.java:181)
	at com.uber.hoodie.HoodieWriteClient$5.call(HoodieWriteClient.java:177)

Because of this error HoodieUpdateHandle is not initialized properly
https://code.uberinternal.com/diffusion/DAHOOD/browse/master/hoodie-client/src/main/java/com/uber/hoodie/io/HoodieUpdateHandle.java;1006e4b0b115ddfe246fe3f43024a9213a003637$93

This error is kind of gobbled in and the task succeeds

17/01/04 22:56:48 ERROR table.HoodieCopyOnWriteTable: Error in finding the old file path at commit 20170104223220
17/01/04 22:56:48 INFO table.HoodieCopyOnWriteTable: Upsert Handle has partition path as null null, WriteStatus {fileId=null, globalError='org.apache.spark.shuffle.FetchFailedException: Failed to connect to hadoopworker674-sjc1.prod.uber.internal/10.11.37.11:7337', hasErrors='false', errorCount='0', errorPct='NaN'}
17/01/04 22:56:48 INFO storage.MemoryStore: ensureFreeSpace(3142) called with curMem=2185001, maxMem=1111369973
17/01/04 22:56:48 INFO storage.MemoryStore: Block rdd_53_393 stored as bytes in memory (estimated size 3.1 KB, free 1057.8 MB)
17/01/04 22:56:48 INFO executor.Executor: Finished task 393.1 in stage 34.0 (TID 23543). 1807 bytes result sent to driver

Step 4:
When Partition path is null on the write stat, it is ommitted from the commit json
https://code.uberinternal.com/diffusion/DAHOOD/browse/master/hoodie-common/src/main/java/com/uber/hoodie/common/model/HoodieCommitMetadata.java;1006e4b0b115ddfe246fe3f43024a9213a003637$55

Hence we do not see that fileID in the commit metadata and also HoodieUpdateHandle does not cleanup the partial written file in Step 1 and susequent bloom filter index checks fails because of this file.

Actual problem:
Task succeeds when HoodieUpdateHandle fails on init(), if it fails on init() it should show a UpsertException.

@prazanna prazanna added this to the 0.2.5 milestone Jan 6, 2017
@prazanna prazanna self-assigned this Jan 6, 2017
@prazanna prazanna added the bug label Jan 6, 2017
LinMingQiang added a commit to LinMingQiang/hudi that referenced this issue Feb 28, 2023
vinishjail97 pushed a commit to vinishjail97/hudi that referenced this issue Dec 15, 2023
* add in field sanitization on conversion

* use precompiled pattern for perf
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant