Skip to content

[HUDI-4452] Include hudi-aws to hudi-spark-bundle to fix cloudwatch reporter issue#6183

Closed
rahil-c wants to merge 1 commit intoapache:masterfrom
rahil-c:rahil-c/cloudwatch-hudi-spark
Closed

[HUDI-4452] Include hudi-aws to hudi-spark-bundle to fix cloudwatch reporter issue#6183
rahil-c wants to merge 1 commit intoapache:masterfrom
rahil-c:rahil-c/cloudwatch-hudi-spark

Conversation

@rahil-c
Copy link
Collaborator

@rahil-c rahil-c commented Jul 22, 2022

Tips

What is the purpose of the pull request

When running hudi-spark-bundle on emr cluster, saw the following error when enabling CW

Enabled configs

hoodie.metrics.on = true
hoodie.metrics.reporter.type = CLOUDWATCH

error

2/07/22 06:47:58 WARN package: Truncated the string representation of a plan since it was too large. This behavior can be adjusted by setting 'spark.sql.debug.maxToStringFields'.
java.lang.NoClassDefFoundError: org/apache/hudi/aws/cloudwatch/CloudWatchReporter
  at org.apache.hudi.metrics.cloudwatch.CloudWatchMetricsReporter.createCloudWatchReporter(CloudWatchMetricsReporter.java:57)
  at org.apache.hudi.metrics.cloudwatch.CloudWatchMetricsReporter.<init>(CloudWatchMetricsReporter.java:47)
  at org.apache.hudi.metrics.MetricsReporterFactory.createReporter(MetricsReporterFactory.java:82)
  at org.apache.hudi.metrics.Metrics.<init>(Metrics.java:50)
  at org.apache.hudi.metrics.Metrics.init(Metrics.java:96)
  at org.apache.hudi.metrics.HoodieMetrics.<init>(HoodieMetrics.java:61)
  at org.apache.hudi.client.BaseHoodieWriteClient.<init>(BaseHoodieWriteClient.java:177)
  at org.apache.hudi.client.SparkRDDWriteClient.<init>(SparkRDDWriteClient.java:95)
  at org.apache.hudi.client.SparkRDDWriteClient.<init>(SparkRDDWriteClient.java:79)
  at org.apache.hudi.DataSourceUtils.createHoodieClient(DataSourceUtils.java:194)
  at org.apache.hudi.HoodieSparkSqlWriter$.$anonfun$write$12(HoodieSparkSqlWriter.scala:302)
  at scala.Option.getOrElse(Option.scala:189)
  at org.apache.hudi.HoodieSparkSqlWriter$.write(HoodieSparkSqlWriter.scala:301)
  at 

it seems that we are not including the hudi-aws in hudi spark bundle however we do include hudi aws in the flink bundle by default. https://github.com/apache/hudi/blob/master/packaging/hudi-flink-bundle/pom.xml#L87

(For example: This pull request adds quick-start document.)

Brief change log

(for example:)

  • Modify AnnotationLocation checkstyle rule in checkstyle.xml

Verify this pull request

(Please pick either of the following options)

This pull request is a trivial rework / code cleanup without any test coverage.

(or)

This pull request is already covered by existing tests, such as (please describe tests).

(or)

This change added tests and can be verified as follows:

(example:)

  • Added integration tests for end-to-end.
  • Added HoodieClientWriteTest to verify the change.
  • Manually verified the change by running a job locally.

Committer checklist

  • Has a corresponding JIRA in PR title & commit

  • Commit message is descriptive of the change

  • CI is green

  • Necessary doc changes done or have another open PR

  • For large changes, please consider breaking it into sub-tasks under an umbrella JIRA.

@rahil-c
Copy link
Collaborator Author

rahil-c commented Jul 22, 2022

cc @umehrot2 @zhedoubushishi

@codope codope self-assigned this Jul 22, 2022
@codope codope added dependencies Dependency updates priority:blocker Production down; release blocker labels Jul 22, 2022
@codope
Copy link
Member

codope commented Jul 22, 2022

@rahil-c Ideally, we need a way to completely decouple the two. Just because one class we need to pull in this dependency. Let's ensure that it is in provided scope so that other upstream libraries, where hudi-spark-bundle can be used, does not include hudi-aws. And on EMR the hudi-aws jar can be pre-installed in the classpath. wdyt?

@codope
Copy link
Member

codope commented Jul 22, 2022

I am ok with landing this PR if we ensure hudi-aws is in provided scope. You can create a JIRA under HUDI-3529 for the decoupling task which can be tackled later on.

@hudi-bot
Copy link
Collaborator

CI report:

Bot commands @hudi-bot supports the following commands:
  • @hudi-bot run azure re-run the last Azure build

@rahil-c rahil-c changed the title Include hudi-aws to hudi-spark-bundle to fix cloudwatch reporter issue [HUDI-4452] Include hudi-aws to hudi-spark-bundle to fix cloudwatch reporter issue Jul 22, 2022
@rahil-c
Copy link
Collaborator Author

rahil-c commented Jul 22, 2022

Thanks @codope for taking a look, I think we will update our emr docs to specify customers to pass the the hudi-aws bundle to spark class path instead of adding it to spark bundle.

@rahil-c rahil-c closed this Jul 22, 2022
@codope
Copy link
Member

codope commented Jul 23, 2022

I think we will update our emr docs to specify customers to pass the the hudi-aws bundle to spark class path instead of adding it to spark bundle.

Sounds good. Thanks @rahil-c

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

dependencies Dependency updates priority:blocker Production down; release blocker

Projects

Archived in project

Development

Successfully merging this pull request may close these issues.

3 participants