Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SUPPORT] Support Apache Spark 3.2 #4202

Closed
melin opened this issue Dec 3, 2021 · 6 comments
Closed

[SUPPORT] Support Apache Spark 3.2 #4202

melin opened this issue Dec 3, 2021 · 6 comments
Labels
feature-enquiry issue contains feature enquiries/requests or great improvement ideas spark Issues related to spark

Comments

@melin
Copy link

melin commented Dec 3, 2021

@pengzhiwei2018

[INFO] Compiling 67 source files to /Users/huaixin/Documents/codes/bigdata/hudi/hudi-spark-datasource/hudi-spark/target/classes at 1638515566715
[ERROR] /Users/huaixin/Documents/codes/bigdata/hudi/hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/hudi/HoodieFileIndex.scala:562: error: value literals is not a member of org.apache.spark.sql.execution.datasources.PartitioningUtils.PartitionValues
[ERROR]           partitionValues.map(_.literals.map(_.value))
[ERROR]                                 ^
[ERROR] /Users/huaixin/Documents/codes/bigdata/hudi/hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/hudi/HoodieFileIndex.scala:563: error: missing argument list for method fromSeq in object InternalRow
[ERROR] Unapplied methods are only converted to functions when a function type is expected.
[ERROR] You can make this conversion explicit by writing `fromSeq _` or `fromSeq(_)` instead of `fromSeq`.
[ERROR]             .map(InternalRow.fromSeq)
[ERROR]                              ^
[ERROR] /Users/huaixin/Documents/codes/bigdata/hudi/hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/spark/sql/avro/HoodieAvroDeserializer.scala:28: error: overloaded method constructor AvroDeserializer with alternatives:
[ERROR]   (rootAvroType: org.apache.avro.Schema,rootCatalystType: org.apache.spark.sql.types.DataType,datetimeRebaseMode: String)org.apache.spark.sql.avro.AvroDeserializer <and>
[ERROR]   (rootAvroType: org.apache.avro.Schema,rootCatalystType: org.apache.spark.sql.types.DataType,positionalFieldMatch: Boolean,datetimeRebaseMode: org.apache.spark.sql.internal.SQLConf.LegacyBehaviorPolicy.Value,filters: org.apache.spark.sql.catalyst.StructFilters)org.apache.spark.sql.avro.AvroDeserializer
[ERROR]  cannot be applied to (org.apache.avro.Schema, org.apache.spark.sql.types.DataType)
[ERROR]   extends AvroDeserializer(rootAvroType, rootCatalystType) {
[ERROR]           ^
[WARNING] /Users/huaixin/Documents/codes/bigdata/hudi/hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/spark/sql/hudi/DataSkippingUtils.scala:169: warning: non-variable type argument org.apache.spark.sql.catalyst.expressions.Literal in type pattern Seq[org.apache.spark.sql.catalyst.expressions.Literal] (the underlying of Seq[org.apache.spark.sql.catalyst.expressions.Literal]) is unchecked since it is eliminated by erasure
[WARNING]       case In(attribute: AttributeReference, list: Seq[Literal]) =>
[WARNING]                                                    ^
[WARNING] /Users/huaixin/Documents/codes/bigdata/hudi/hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/spark/sql/hudi/DataSkippingUtils.scala:178: warning: non-variable type argument org.apache.spark.sql.catalyst.expressions.Literal in type pattern Seq[org.apache.spark.sql.catalyst.expressions.Literal] (the underlying of Seq[org.apache.spark.sql.catalyst.expressions.Literal]) is unchecked since it is eliminated by erasure
[WARNING]       case Not(In(attribute: AttributeReference, list: Seq[Literal])) =>
[WARNING]                                                        ^
[ERROR] /Users/huaixin/Documents/codes/bigdata/hudi/hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/spark/sql/hudi/analysis/HoodieAnalysis.scala:427: error: wrong number of arguments for pattern org.apache.spark.sql.execution.command.ShowPartitionsCommand(tableName: org.apache.spark.sql.catalyst.TableIdentifier,output: Seq[org.apache.spark.sql.catalyst.expressions.Attribute],spec: Option[org.apache.spark.sql.catalyst.catalog.CatalogTypes.TablePartitionSpec])
[ERROR]       case ShowPartitionsCommand(tableName, specOpt)
[ERROR]                                 ^
[ERROR] /Users/huaixin/Documents/codes/bigdata/hudi/hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/spark/sql/hudi/command/AlterHoodieTableAddColumnsCommand.scala:90: error: overloaded method value checkDataColNames with alternatives:
[ERROR]   (provider: String,schema: org.apache.spark.sql.types.StructType)Unit <and>
[ERROR]   (table: org.apache.spark.sql.catalyst.catalog.CatalogTable,schema: org.apache.spark.sql.types.StructType)Unit
[ERROR]  cannot be applied to (org.apache.spark.sql.catalyst.catalog.CatalogTable, Seq[String])
[ERROR]     DDLUtils.checkDataColNames(table, colsToAdd.map(_.name))
[ERROR]              ^
[ERROR] /Users/huaixin/Documents/codes/bigdata/hudi/hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/spark/sql/hudi/command/CreateHoodieTableCommand.scala:201: error: not found: value DATASOURCE_SCHEMA_NUMPARTS
[ERROR]     properties.put(DATASOURCE_SCHEMA_NUMPARTS, parts.size.toString)
[ERROR]                    ^
[ERROR] /Users/huaixin/Documents/codes/bigdata/hudi/hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/spark/sql/hudi/command/MergeIntoHoodieTableCommand.scala:206: error: wrong number of arguments for pattern org.apache.spark.sql.catalyst.expressions.Cast(child: org.apache.spark.sql.catalyst.expressions.Expression,dataType: org.apache.spark.sql.types.DataType,timeZoneId: Option[String],ansiEnabled: Boolean)
[ERROR]       case Cast(attr: AttributeReference, _, _) if sourceColumnName.find(resolver(_, attr.name)).get.equals(targetColumnName) => true
[ERROR]                ^
[WARNING] two warnings found
[ERROR] 7 errors found

@xushiyan
Copy link
Member

xushiyan commented Dec 4, 2021

@melin Understood the need for 3.2 support. we have been tracking this in https://issues.apache.org/jira/browse/HUDI-2811
and we should be prioritizing this in next release.
cc @YannByron

@xushiyan xushiyan closed this as completed Dec 4, 2021
@xushiyan xushiyan added jira-filed priority:critical production down; pipelines stalled; Need help asap. spark Issues related to spark feature-enquiry issue contains feature enquiries/requests or great improvement ideas and removed priority:critical production down; pipelines stalled; Need help asap. labels Dec 4, 2021
@maddy2u
Copy link

maddy2u commented Dec 26, 2021

Facing the same issue. When will this be fixed? Is it part of the next minor release of 0.10.1 ?

@xushiyan
Copy link
Member

Facing the same issue. When will this be fixed? Is it part of the next minor release of 0.10.1 ?

@maddy2u Pls let me clarify: spark 3.2 support is a feature to be added, not an issue.
And no, 0.10.1 will be a bug fix release, no new feature should be added there. This is expected in 0.11.0 major release.

@maddy2u
Copy link

maddy2u commented Dec 27, 2021

If i am trying to compile the master branch, I am not able to compile the code as it is failing on Spark 3.2. I have to revert to the earlier release (0.10.0) to be able to do it. Not sure if it is something wrong i am doing with the configuration.

@xushiyan
Copy link
Member

If i am trying to compile the master branch, I am not able to compile the code as it is failing on Spark 3.2. I have to revert to the earlier release (0.10.0) to be able to do it. Not sure if it is something wrong i am doing with the configuration.

it can be compiled now. spark 3.2 support was added in #4270 .

@maddy2u
Copy link

maddy2u commented Mar 11, 2022

It works. Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature-enquiry issue contains feature enquiries/requests or great improvement ideas spark Issues related to spark
Projects
None yet
Development

No branches or pull requests

3 participants