Skip to content

Conversation

@iemejia
Copy link
Member

@iemejia iemejia commented Jul 31, 2022

What changes were proposed in this pull request?

Update the Avro version to 1.11.1

Why are the changes needed?

To stay up to date with upstream

Does this PR introduce any user-facing change?

No

How was this patch tested?

Unit tests

@HyukjinKwon
Copy link
Member

Seems like the test failure looks related:

AvroV1Suite.support user provided avro schema for writing non-nullable enum type
java.lang.AssertionError: assertion failed: Exception tree doesn't contain the expected exception of type org.apache.avro.AvroTypeException with message: Not an enum: null
org.apache.avro.AvroTypeException: value null is not a SuitEnumType
	at org.apache.avro.generic.GenericDatumWriter.writeEnum(GenericDatumWriter.java:269)
	at org.apache.avro.specific.SpecificDatumWriter.writeEnum(SpecificDatumWriter.java:66)
	at org.apache.avro.generic.GenericDatumWriter.writeWithoutConversion(GenericDatumWriter.java:148)
	at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:95)
	at org.apache.avro.reflect.ReflectDatumWriter.write(ReflectDatumWriter.java:158)
	at org.apache.avro.generic.GenericDatumWriter.writeField(GenericDatumWriter.java:245)
	at org.apache.avro.specific.SpecificDatumWriter.writeField(SpecificDatumWriter.java:117)
	at org.apache.avro.reflect.ReflectDatumWriter.writeField(ReflectDatumWriter.java:184)
	at org.apache.avro.generic.GenericDatumWriter.writeRecord(GenericDatumWriter.java:234)
	at org.apache.avro.specific.SpecificDatumWriter.writeRecord(SpecificDatumWriter.java:92)
	at org.apache.avro.generic.GenericDatumWriter.writeWithoutConversion(GenericDatumWriter.java:145)
	at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:95)
	at org.apache.avro.reflect.ReflectDatumWriter.write(ReflectDatumWriter.java:158)
	at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:82)
	at org.apache.avro.file.DataFileWriter.append(DataFileWriter.java:314)
	at org.apache.spark.sql.avro.SparkAvroKeyRecordWriter.write(SparkAvroKeyOutputFormat.java:86)
	at org.apache.spark.sql.avro.SparkAvroKeyRecordWriter.write(SparkAvroKeyOutputFormat.java:63)
	at org.apache.spark.sql.avro.AvroOutputWriter.write(AvroOutputWriter.scala:86)
	at org.apache.spark.sql.execution.datasources.SingleDirectoryDataWriter.write(FileFormatDataWriter.scala:175)
	at org.apache.spark.sql.execution.datasources.FileFormatDataWriter.writeWithMetrics(FileFormatDataWriter.scala:85)
	at org.apache.spark.sql.execution.datasources.FileFormatDataWriter.writeWithIterator(FileFormatDataWriter.scala:92)
	at org.apache.spark.sql.execution.datasources.FileFormatWriter$.$anonfun$executeTask$1(FileFormatWriter.scala:323)
	at org.apache.spark.util.Utils$.tryWithSafeFinallyAndFailureCallbacks(Utils.scala:1524)

@xkrogen
Copy link
Contributor

xkrogen commented Aug 1, 2022

Is this an official release? The documentation links you updated (such as this one) give a 404 error and I don't see Avro 1.11.1 listed on the Avro Releases page.

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@AmplabJenkins
Copy link

Can one of the admins verify this patch?

@dongjoon-hyun
Copy link
Member

Gentle ping, @iemejia .

@martin-g
Copy link
Member

martin-g commented Aug 4, 2022

I don't see Avro 1.11.1 listed on the Avro Releases page.

Also, we work on replacing the Avro website with a new one. Probably it will be done later this week!

@iemejia iemejia force-pushed the SPARK-39927-avro-1.11.1 branch from 6da9901 to e919d1b Compare August 4, 2022 10:07
@LuciferYang
Copy link
Contributor

seems support user provided non-nullable avro schema for nullable catalyst schema without any null record *** FAILED *** (181 milliseconds) still failed

@dongjoon-hyun
Copy link
Member

Yes, it does. @LuciferYang .

To @iemejia , as I mentioned here, please take a look at the test case. Otherwise, Avro 1.11.1 cannot pass the CIs.

@iemejia iemejia force-pushed the SPARK-39927-avro-1.11.1 branch from e919d1b to 48ba166 Compare August 11, 2022 08:05
@iemejia iemejia force-pushed the SPARK-39927-avro-1.11.1 branch from 48ba166 to 5b8783a Compare August 11, 2022 21:32
.option("avroSchema", avroSchema).save(s"$tempDir/${UUID.randomUUID()}")
}
assertExceptionMsg[AvroTypeException](e1, "Not an enum: null")
assertExceptionMsg[AvroTypeException](e1, "value null is not a SuitEnumType")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the fix.

assert(message.contains("Caused by: java.lang.NullPointerException: "))
assert(message.contains(
"null of string in string in field Name of test_schema in test_schema"))
assert(message.contains("null in string in field Name"))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, LGTM. I verified Avro module.

$ build/sbt "avro/test"
...
[info] Test run org.apache.spark.sql.avro.JavaAvroFunctionsSuite started
[info] Test org.apache.spark.sql.avro.JavaAvroFunctionsSuite.testToAvroFromAvro started
[info] Test run org.apache.spark.sql.avro.JavaAvroFunctionsSuite finished: 0 failed, 0 ignored, 1 total, 0.633s
[info] ScalaTest
[info] Run completed in 2 minutes, 42 seconds.
[info] Total number of tests run: 288
[info] Suites: completed 13, aborted 0
[info] Tests: succeeded 288, failed 0, canceled 0, ignored 2, pending 0
[info] All tests passed.
[info] Passed: Total 289, Failed 0, Errors 0, Passed 289, Ignored 2
[success] Total time: 530 s (08:50), completed Aug 11, 2022 3:04:35 PM

@dongjoon-hyun
Copy link
Member

Merged to master for Apache Spark 3.4.0.

@dongjoon-hyun
Copy link
Member

Thank you, @iemejia , @HyukjinKwon , @xkrogen , @martin-g , @LuciferYang .

@iemejia iemejia deleted the SPARK-39927-avro-1.11.1 branch August 11, 2022 22:17
@iemejia
Copy link
Member Author

iemejia commented Aug 11, 2022

Thanks as usual @dongjoon-hyun !

leejaywei pushed a commit to Kyligence/spark that referenced this pull request Mar 9, 2023
### What changes were proposed in this pull request?
Update the Avro version to 1.11.1

### Why are the changes needed?
To stay up to date with upstream

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Unit tests

Closes apache#37352 from iemejia/SPARK-39927-avro-1.11.1.

Authored-by: Ismaël Mejía <iemejia@gmail.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>

# Conflicts:
#	dev/deps/spark-deps-hadoop-2.7-hive-2.3
#	dev/deps/spark-deps-hadoop-3-hive-2.3
#	docs/sql-data-sources-avro.md
#	external/avro/src/main/scala/org/apache/spark/sql/avro/AvroOptions.scala
#	external/avro/src/test/scala/org/apache/spark/sql/avro/AvroSuite.scala
#	pom.xml
#	project/SparkBuild.scala
#	sql/hive/src/test/scala/org/apache/spark/sql/hive/client/HiveClientSuite.scala
leejaywei added a commit to Kyligence/spark that referenced this pull request Mar 10, 2023
### What changes were proposed in this pull request?
Update the Avro version to 1.11.1

### Why are the changes needed?
To stay up to date with upstream

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Unit tests

Closes apache#37352 from iemejia/SPARK-39927-avro-1.11.1.

Authored-by: Ismaël Mejía <iemejia@gmail.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>

# Conflicts:
#	dev/deps/spark-deps-hadoop-2.7-hive-2.3
#	dev/deps/spark-deps-hadoop-3-hive-2.3
#	docs/sql-data-sources-avro.md
#	external/avro/src/main/scala/org/apache/spark/sql/avro/AvroOptions.scala
#	external/avro/src/test/scala/org/apache/spark/sql/avro/AvroSuite.scala
#	pom.xml
#	project/SparkBuild.scala
#	sql/hive/src/test/scala/org/apache/spark/sql/hive/client/HiveClientSuite.scala

Co-authored-by: Ismaël Mejía <iemejia@gmail.com>
leejaywei pushed a commit to Kyligence/spark that referenced this pull request Mar 16, 2023
### What changes were proposed in this pull request?
Update the Avro version to 1.11.1

### Why are the changes needed?
To stay up to date with upstream

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Unit tests

Closes apache#37352 from iemejia/SPARK-39927-avro-1.11.1.

Authored-by: Ismaël Mejía <iemejia@gmail.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>

# Conflicts:
#	dev/deps/spark-deps-hadoop-2.7-hive-2.3
#	dev/deps/spark-deps-hadoop-3-hive-2.3
#	docs/sql-data-sources-avro.md
#	external/avro/src/main/scala/org/apache/spark/sql/avro/AvroOptions.scala
#	external/avro/src/test/scala/org/apache/spark/sql/avro/AvroSuite.scala
#	pom.xml
#	project/SparkBuild.scala
#	sql/hive/src/test/scala/org/apache/spark/sql/hive/client/HiveClientSuite.scala
leejaywei added a commit to Kyligence/spark that referenced this pull request Mar 20, 2023
### What changes were proposed in this pull request?
Update the Avro version to 1.11.1

### Why are the changes needed?
To stay up to date with upstream

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Unit tests

Closes apache#37352 from iemejia/SPARK-39927-avro-1.11.1.

Authored-by: Ismaël Mejía <iemejia@gmail.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>

# Conflicts:
#	dev/deps/spark-deps-hadoop-2.7-hive-2.3
#	dev/deps/spark-deps-hadoop-3-hive-2.3
#	docs/sql-data-sources-avro.md
#	external/avro/src/main/scala/org/apache/spark/sql/avro/AvroOptions.scala
#	external/avro/src/test/scala/org/apache/spark/sql/avro/AvroSuite.scala
#	pom.xml
#	project/SparkBuild.scala
#	sql/hive/src/test/scala/org/apache/spark/sql/hive/client/HiveClientSuite.scala

Co-authored-by: Ismaël Mejía <iemejia@gmail.com>
leejaywei added a commit to Kyligence/spark that referenced this pull request Mar 24, 2023
### What changes were proposed in this pull request?
Update the Avro version to 1.11.1

### Why are the changes needed?
To stay up to date with upstream

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Unit tests

Closes apache#37352 from iemejia/SPARK-39927-avro-1.11.1.

Authored-by: Ismaël Mejía <iemejia@gmail.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>

# Conflicts:
#	dev/deps/spark-deps-hadoop-2.7-hive-2.3
#	dev/deps/spark-deps-hadoop-3-hive-2.3
#	docs/sql-data-sources-avro.md
#	external/avro/src/main/scala/org/apache/spark/sql/avro/AvroOptions.scala
#	external/avro/src/test/scala/org/apache/spark/sql/avro/AvroSuite.scala
#	pom.xml
#	project/SparkBuild.scala
#	sql/hive/src/test/scala/org/apache/spark/sql/hive/client/HiveClientSuite.scala

Co-authored-by: Ismaël Mejía <iemejia@gmail.com>
leejaywei added a commit to Kyligence/spark that referenced this pull request Mar 24, 2023
### What changes were proposed in this pull request?
Update the Avro version to 1.11.1

### Why are the changes needed?
To stay up to date with upstream

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Unit tests

Closes apache#37352 from iemejia/SPARK-39927-avro-1.11.1.

Authored-by: Ismaël Mejía <iemejia@gmail.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>

# Conflicts:
#	dev/deps/spark-deps-hadoop-2.7-hive-2.3
#	dev/deps/spark-deps-hadoop-3-hive-2.3
#	docs/sql-data-sources-avro.md
#	external/avro/src/main/scala/org/apache/spark/sql/avro/AvroOptions.scala
#	external/avro/src/test/scala/org/apache/spark/sql/avro/AvroSuite.scala
#	pom.xml
#	project/SparkBuild.scala
#	sql/hive/src/test/scala/org/apache/spark/sql/hive/client/HiveClientSuite.scala

Co-authored-by: Ismaël Mejía <iemejia@gmail.com>
hellozepp pushed a commit to hellozepp/spark that referenced this pull request Aug 10, 2023
…herrypick

### What changes were proposed in this pull request?
Update the Avro version to 1.11.1

### Why are the changes needed?
To stay up to date with upstream

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Unit tests

Closes apache#37352 from iemejia/SPARK-39927-avro-1.11.1.

Authored-by: Ismaël Mejía <iemejia@gmail.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>

# Conflicts:
#	dev/deps/spark-deps-hadoop-2.7-hive-2.3
#	dev/deps/spark-deps-hadoop-3-hive-2.3
#	docs/sql-data-sources-avro.md
#	external/avro/src/main/scala/org/apache/spark/sql/avro/AvroOptions.scala
#	external/avro/src/test/scala/org/apache/spark/sql/avro/AvroSuite.scala
#	pom.xml
#	project/SparkBuild.scala
#	sql/hive/src/test/scala/org/apache/spark/sql/hive/client/HiveClientSuite.scala

Co-authored-by: Ismaël Mejía <iemejia@gmail.com>
zheniantoushipashi pushed a commit to Kyligence/spark that referenced this pull request Aug 21, 2023
### What changes were proposed in this pull request?
Update the Avro version to 1.11.1

### Why are the changes needed?
To stay up to date with upstream

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Unit tests

Closes apache#37352 from iemejia/SPARK-39927-avro-1.11.1.

Authored-by: Ismaël Mejía <iemejia@gmail.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>

# Conflicts:
#	dev/deps/spark-deps-hadoop-2.7-hive-2.3
#	dev/deps/spark-deps-hadoop-3-hive-2.3
#	docs/sql-data-sources-avro.md
#	external/avro/src/main/scala/org/apache/spark/sql/avro/AvroOptions.scala
#	external/avro/src/test/scala/org/apache/spark/sql/avro/AvroSuite.scala
#	pom.xml
#	project/SparkBuild.scala
#	sql/hive/src/test/scala/org/apache/spark/sql/hive/client/HiveClientSuite.scala

Co-authored-by: Ismaël Mejía <iemejia@gmail.com>
RolatZhang pushed a commit to Kyligence/spark that referenced this pull request Aug 29, 2023
### What changes were proposed in this pull request?
Update the Avro version to 1.11.1

### Why are the changes needed?
To stay up to date with upstream

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Unit tests

Closes apache#37352 from iemejia/SPARK-39927-avro-1.11.1.

Authored-by: Ismaël Mejía <iemejia@gmail.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>

# Conflicts:
#	dev/deps/spark-deps-hadoop-2.7-hive-2.3
#	dev/deps/spark-deps-hadoop-3-hive-2.3
#	docs/sql-data-sources-avro.md
#	external/avro/src/main/scala/org/apache/spark/sql/avro/AvroOptions.scala
#	external/avro/src/test/scala/org/apache/spark/sql/avro/AvroSuite.scala
#	pom.xml
#	project/SparkBuild.scala
#	sql/hive/src/test/scala/org/apache/spark/sql/hive/client/HiveClientSuite.scala

Co-authored-by: Ismaël Mejía <iemejia@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants