[SUPPORT] Exception on snapshot query on MOR table (hudi 0.6.0) #2285

zherenyu831 · 2020-11-26T14:28:04Z

Tips before filing an issue

Have you gone through our FAQs?
Join the mailing list to engage in conversations and get faster support at dev-subscribe@hudi.apache.org.
If you have triaged this as a bug, then file an issue directly.

Describe the problem you faced

A clear and concise description of the problem.

To Reproduce

Steps to reproduce the behavior:

have a table with 100GB data and under compaction
kill the spark job
try to read the data by snapshot query

val df = spark.read.format("org.apache.hudi")
.option("hoodie.datasource.query.type","snapshot")
.load("s3://path_to_data/*")

Expected behavior

A clear and concise description of what you expected to happen.

Environment Description

Hudi version : 0.6.0
Spark version : 2.4.4
Hive version : not using
Hadoop version : 3.2.1
Storage (HDFS/S3/GCS..) : s3
Running on Docker? (yes/no) : no

Additional context

Add any other context about the problem here.

Stacktrace

Exception: Task failed while writing rows.
	at org.apache.spark.sql.execution.datasources.FileFormatWriter$.executeTask(FileFormatWriter.scala:257)
	at org.apache.spark.sql.execution.datasources.FileFormatWriter$.$anonfun$write$15(FileFormatWriter.scala:177)
	at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
	at org.apache.spark.scheduler.Task.run(Task.scala:123)
	at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:411)
	at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.ArrayIndexOutOfBoundsException: 4191
	at org.apache.parquet.column.values.dictionary.PlainValuesDictionary$PlainDoubleDictionary.decodeToDouble(PlainValuesDictionary.java:208)
	at org.apache.spark.sql.execution.datasources.parquet.ParquetDictionary.decodeToDouble(ParquetDictionary.java:46)
	at org.apache.spark.sql.execution.vectorized.OnHeapColumnVector.getDouble(OnHeapColumnVector.java:460)
	at org.apache.spark.sql.execution.vectorized.MutableColumnarRow.getDouble(MutableColumnarRow.java:126)
	at org.apache.spark.sql.execution.vectorized.MutableColumnarRow.get(MutableColumnarRow.java:178)
	at org.apache.hudi.HoodieMergeOnReadRDD$$anon$2.$anonfun$createRowWithRequiredSchema$1(HoodieMergeOnReadRDD.scala:239)
	at org.apache.hudi.HoodieMergeOnReadRDD$$anon$2.$anonfun$createRowWithRequiredSchema$1$adapted(HoodieMergeOnReadRDD.scala:237)
	at scala.collection.Iterator.foreach(Iterator.scala:941)
	at scala.collection.Iterator.foreach$(Iterator.scala:941)
	at scala.collection.AbstractIterator.foreach(Iterator.scala:1429)
	at scala.collection.IterableLike.foreach(IterableLike.scala:74)
	at scala.collection.IterableLike.foreach$(IterableLike.scala:73)
	at org.apache.spark.sql.types.StructType.foreach(StructType.scala:99)
	at org.apache.hudi.HoodieMergeOnReadRDD$$anon$2.createRowWithRequiredSchema(HoodieMergeOnReadRDD.scala:237)
	at org.apache.hudi.HoodieMergeOnReadRDD$$anon$2.hasNext(HoodieMergeOnReadRDD.scala:197)
	at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown Source)
	at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
	at org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$2.hasNext(WholeStageCodegenExec.scala:636)
	at org.apache.spark.sql.execution.datasources.FileFormatWriter$.$anonfun$executeTask$1(FileFormatWriter.scala:244)
	at org.apache.spark.util.Utils$.tryWithSafeFinallyAndFailureCallbacks(Utils.scala:1394)
	at org.apache.spark.sql.execution.datasources.FileFormatWriter$.executeTask(FileFormatWriter.scala:242)
	... 9 more

The text was updated successfully, but these errors were encountered:

zherenyu831 · 2020-11-27T03:09:25Z

zherenyu831 · 2020-11-27T03:10:36Z

but array doubleDictionaryContent only has 3000 elements, it caused the problem

bvaradar · 2020-12-01T19:03:00Z

Wondering if parquet version has anything to do here. Can you check if the hadoop installation has 1.10.1 parquet bundles ?

zherenyu831 · 2020-12-04T03:41:07Z

@bvaradar
here is the all parquet on my emr cluster,

/usr/lib/spark/jars/parquet-column-1.10.1-spark-amzn-1.jar
/usr/lib/spark/jars/parquet-common-1.10.1-spark-amzn-1.jar
/usr/lib/spark/jars/parquet-encoding-1.10.1-spark-amzn-1.jar
/usr/lib/spark/jars/parquet-format-2.4.0.jar
/usr/lib/spark/jars/parquet-hadoop-1.10.1-spark-amzn-1.jar
/usr/lib/spark/jars/parquet-hadoop-bundle-1.6.0.jar
/usr/lib/spark/jars/parquet-jackson-1.10.1-spark-amzn-1.jar

we also tried to see if there is any difference with official jar on these amazon built jars
seems fine...

JYI: by using read optimized query, we can have all values

bvaradar · 2020-12-07T08:19:50Z

cc @umehrot2 : Wondering why there is parquet-hadoop-bundle-1.6.0.jar along with parquet-hadoop-1.10.1-spark-amzn-1.jar. Wouldn't they cause conflict ?

zherenyu831 · 2020-12-09T01:31:31Z

@bvaradar
I deleted parquet-hadoop-bundle-1.6.0.jar and tired again, but error still happens
then I replaced all parquet lib with official ones, but not worked

bvaradar · 2020-12-09T17:48:12Z

@n3nash : Can you look at this ?

@zherenyu831 : As the integration tests are passing with for compaction, I am suspecting this is still has to do with parquet version mismatch. Would it be possible to replicate this using docker setup : https://hudi.apache.org/docs/docker_demo.html ?

adaniline-paytm · 2021-01-22T21:40:26Z

I have the same sporadic issue, using standard Spark 2.4.7 distribution and Hudi 0.6:

$ ls -l /opt/spark-2.4.7-bin-without-hadoop/jars/parquet-*
 /opt/spark-2.4.7-bin-without-hadoop/jars/parquet-column-1.10.1.jar
/opt/spark-2.4.7-bin-without-hadoop/jars/parquet-common-1.10.1.jar
/opt/spark-2.4.7-bin-without-hadoop/jars/parquet-encoding-1.10.1.jar
/opt/spark-2.4.7-bin-without-hadoop/jars/parquet-format-2.4.0.jar
/opt/spark-2.4.7-bin-without-hadoop/jars/parquet-hadoop-1.10.1.jar
/opt/spark-2.4.7-bin-without-hadoop/jars/parquet-jackson-1.10.1.jar

the only workaround we found is to disable VectorizedReader:

      rc.spark.conf.set("spark.sql.parquet.enableVectorizedReader", "false")

nsivabalan · 2021-01-24T21:35:39Z

@zherenyu831 : can you please respond with any updates on your end.
@n3nash : can you please take a look when you have time. If you were able to narrow down the issue, please do file a jira and add "user-support-issues" label.

vinothchandar · 2021-01-24T22:25:37Z

cc @garyli1019 as well

zherenyu831 · 2021-01-25T01:24:49Z

@bvaradar
Hi Bavaradar, it will be little difficult to replicate the problem, since it only happens on huge amount of data.

nsivabalan · 2021-02-06T17:40:26Z

@n3nash : would you be able to help in here.

nsivabalan · 2021-02-06T17:40:55Z

@zherenyu831 : few quick questions as we triage the issue.

Were you running older version of Hudi and encountered this trying to upgrade to a latest version?
Is this affecting your production? trying to gauge the severity.
Or you are trying out a POC ? and this is the first time trying out Hudi.

zherenyu831 · 2021-02-09T12:53:03Z

@nsivabalan

Were you running older version of Hudi and encountered this trying to upgrade to a latest version?

We faced the problem by using hudi 0.6.0, didn't tried with hudi 0.7.0

Is this affecting your production? trying to gauge the severity.
Or you are trying out a POC ? and this is the first time trying out Hudi.

Not actually, because it only happens on reading while compaction of huge table.
Compaction is not happens all the time, so retry usually works for us.
We using hudi for about one year, before we used read optimized query(since it the only method supported on spark datasource of MOR table before 0.6.0).

nsivabalan · 2021-02-09T16:05:55Z

@vinothchandar @n3nash @bvaradar : One of the customer mentioned that disabling vectorized reader fixed the issue for them. Hope it should be fine? And, do we need to make a note of this in faq or somewhere?

vinothchandar · 2021-03-01T17:41:25Z

I see lot of general spark issues reported like this. making this sev:high for now, as we figure out more

nsivabalan · 2021-08-31T05:43:15Z

@zherenyu831 : We made some fixes to spillablemap thats been used in compaction path and should help w/ large datasets. Can you try giving it a try. else, feel free to close it out if its not an issue anymore. thanks!

vinothchandar · 2021-09-23T20:30:00Z

Closing since the fix has since been landed

bvaradar assigned n3nash Dec 14, 2020

nsivabalan added priority:major degraded perf; unable to move forward; potential bugs priority:critical production down; pipelines stalled; Need help asap. and removed priority:major degraded perf; unable to move forward; potential bugs labels Feb 6, 2021

vinothchandar assigned vinothchandar and unassigned n3nash Feb 9, 2021

vinothchandar added priority:major degraded perf; unable to move forward; potential bugs and removed priority:critical production down; pipelines stalled; Need help asap. labels Mar 1, 2021

garyli1019 mentioned this issue Mar 20, 2021

[HUDI-1667]: Fix bug when HoodieMergeOnReadRDD read record from base … #2636

Merged

5 tasks

n3nash added schema-and-data-types awaiting-community-help labels Jun 4, 2021

vinothchandar changed the title ~~[SUPPORT] Exception on snapshot query while compaction (hudi 0.6.0)~~ [SUPPORT] Exception on snapshot query on MOR table (hudi 0.6.0) Jun 5, 2021

vinothchandar removed schema-and-data-types awaiting-triage labels Jun 5, 2021

vinothchandar added the needs-repro label Jun 5, 2021

nsivabalan added priority:minor everything else; usability gaps; questions; feature reqs and removed priority:major degraded perf; unable to move forward; potential bugs labels Aug 31, 2021

vinothchandar closed this as completed Sep 23, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SUPPORT] Exception on snapshot query on MOR table (hudi 0.6.0) #2285

[SUPPORT] Exception on snapshot query on MOR table (hudi 0.6.0) #2285

zherenyu831 commented Nov 26, 2020 •

edited

zherenyu831 commented Nov 27, 2020

zherenyu831 commented Nov 27, 2020

bvaradar commented Dec 1, 2020

zherenyu831 commented Dec 4, 2020

bvaradar commented Dec 7, 2020

zherenyu831 commented Dec 9, 2020

bvaradar commented Dec 9, 2020

adaniline-paytm commented Jan 22, 2021

nsivabalan commented Jan 24, 2021 •

edited

vinothchandar commented Jan 24, 2021

zherenyu831 commented Jan 25, 2021

nsivabalan commented Feb 6, 2021

nsivabalan commented Feb 6, 2021

zherenyu831 commented Feb 9, 2021 •

edited

nsivabalan commented Feb 9, 2021

vinothchandar commented Mar 1, 2021

nsivabalan commented Aug 31, 2021

vinothchandar commented Sep 23, 2021

[SUPPORT] Exception on snapshot query on MOR table (hudi 0.6.0) #2285

[SUPPORT] Exception on snapshot query on MOR table (hudi 0.6.0) #2285

Comments

zherenyu831 commented Nov 26, 2020 • edited

zherenyu831 commented Nov 27, 2020

zherenyu831 commented Nov 27, 2020

bvaradar commented Dec 1, 2020

zherenyu831 commented Dec 4, 2020

bvaradar commented Dec 7, 2020

zherenyu831 commented Dec 9, 2020

bvaradar commented Dec 9, 2020

adaniline-paytm commented Jan 22, 2021

nsivabalan commented Jan 24, 2021 • edited

vinothchandar commented Jan 24, 2021

zherenyu831 commented Jan 25, 2021

nsivabalan commented Feb 6, 2021

nsivabalan commented Feb 6, 2021

zherenyu831 commented Feb 9, 2021 • edited

nsivabalan commented Feb 9, 2021

vinothchandar commented Mar 1, 2021

nsivabalan commented Aug 31, 2021

vinothchandar commented Sep 23, 2021

zherenyu831 commented Nov 26, 2020 •

edited

nsivabalan commented Jan 24, 2021 •

edited

zherenyu831 commented Feb 9, 2021 •

edited