Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

error occurred during lineage processing for excelPlugin? #652

Closed
jinmu0410 opened this issue Apr 17, 2023 · 9 comments · Fixed by #704
Closed

error occurred during lineage processing for excelPlugin? #652

jinmu0410 opened this issue Apr 17, 2023 · 9 comments · Fixed by #704
Assignees
Labels
bug Something isn't working
Milestone

Comments

@jinmu0410
Copy link

jinmu0410 commented Apr 17, 2023

java.lang.NoSuchFieldException: org.apache.hadoop.hdfs.client.HdfsDataInputStream.file
	at za.co.absa.commons.reflect.ValueExtractor.$anonfun$extract$2(ValueExtractor.scala:39)
	at scala.Option.getOrElse(Option.scala:189)
	at za.co.absa.commons.reflect.ValueExtractor.extract(ValueExtractor.scala:39)
	at za.co.absa.commons.reflect.ReflectionUtils$.extractValue(ReflectionUtils.scala:140)
	at za.co.absa.commons.reflect.ReflectionUtils$.extractFieldValue(ReflectionUtils.scala:116)
	at za.co.absa.commons.reflect.ReflectionUtils$.extractValue(ReflectionUtils.scala:146)
	at za.co.absa.spline.harvester.plugin.embedded.ExcelPlugin$$anonfun$baseRelationProcessor$1.applyOrElse(ExcelPlugin.scala:46)
	at za.co.absa.spline.harvester.plugin.embedded.ExcelPlugin$$anonfun$baseRelationProcessor$1.applyOrElse(ExcelPlugin.scala:41)
	at scala.PartialFunction$OrElse.apply(PartialFunction.scala:172)
	at za.co.absa.spline.harvester.plugin.embedded.ElasticSearchPlugin$$anonfun$baseRelationProcessor$1.applyOrElse(ElasticSearchPlugin.scala:39)
	at za.co.absa.spline.harvester.plugin.embedded.ElasticSearchPlugin$$anonfun$baseRelationProcessor$1.applyOrElse(ElasticSearchPlugin.scala:39)
	at scala.PartialFunction$OrElse.apply(PartialFunction.scala:172)
	at za.co.absa.spline.harvester.plugin.embedded.CobrixPlugin$$anonfun$baseRelationProcessor$1.applyOrElse(CobrixPlugin.scala:34)
	at za.co.absa.spline.harvester.plugin.embedded.CobrixPlugin$$anonfun$baseRelationProcessor$1.applyOrElse(CobrixPlugin.scala:34)
	at scala.PartialFunction$OrElse.apply(PartialFunction.scala:172)
	at za.co.absa.spline.harvester.plugin.embedded.CassandraPlugin$$anonfun$baseRelationProcessor$1.applyOrElse(CassandraPlugin.scala:38)
	at za.co.absa.spline.harvester.plugin.embedded.CassandraPlugin$$anonfun$baseRelationProcessor$1.applyOrElse(CassandraPlugin.scala:38)
	at scala.PartialFunction$OrElse.apply(PartialFunction.scala:172)
	at za.co.absa.spline.harvester.plugin.embedded.BigQueryPlugin$$anonfun$baseRelationProcessor$1.applyOrElse(BigQueryPlugin.scala:50)
	at za.co.absa.spline.harvester.plugin.embedded.BigQueryPlugin$$anonfun$baseRelationProcessor$1.applyOrElse(BigQueryPlugin.scala:50)
	at scala.PartialFunction$OrElse.apply(PartialFunction.scala:172)
	at za.co.absa.spline.harvester.plugin.composite.LogicalRelationPlugin$$anonfun$1.applyOrElse(LogicalRelationPlugin.scala:37)
	at za.co.absa.spline.harvester.plugin.composite.LogicalRelationPlugin$$anonfun$1.applyOrElse(LogicalRelationPlugin.scala:34)
	at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:38)
	at za.co.absa.spline.harvester.plugin.embedded.SQLPlugin$$anonfun$1.applyOrElse(SQLPlugin.scala:48)
	at za.co.absa.spline.harvester.plugin.embedded.SQLPlugin$$anonfun$1.applyOrElse(SQLPlugin.scala:48)
	at scala.PartialFunction$OrElse.apply(PartialFunction.scala:172)
	at za.co.absa.spline.harvester.plugin.embedded.DataSourceV2Plugin$$anonfun$1.applyOrElse(DataSourceV2Plugin.scala:43)
	at za.co.absa.spline.harvester.plugin.embedded.DataSourceV2Plugin$$anonfun$1.applyOrElse(DataSourceV2Plugin.scala:43)
	at scala.PartialFunction$OrElse.apply(PartialFunction.scala:172)
	at za.co.absa.spline.harvester.builder.read.PluggableReadCommandExtractor$$anonfun$1.applyOrElse(PluggableReadCommandExtractor.scala:48)
	at za.co.absa.spline.harvester.builder.read.PluggableReadCommandExtractor$$anonfun$1.applyOrElse(PluggableReadCommandExtractor.scala:46)
	at scala.PartialFunction$Lifted.apply(PartialFunction.scala:228)
	at scala.PartialFunction$Lifted.apply(PartialFunction.scala:224)
	at scala.PartialFunction$.condOpt(PartialFunction.scala:292)
	at za.co.absa.spline.harvester.builder.read.PluggableReadCommandExtractor.asReadCommand(PluggableReadCommandExtractor.scala:46)
	at za.co.absa.spline.harvester.LineageHarvester.createOperationBuilder(LineageHarvester.scala:191)
	at za.co.absa.spline.harvester.LineageHarvester.$anonfun$createOperationBuildersRecursively$1(LineageHarvester.scala:167)
	at scala.Option.getOrElse(Option.scala:189)
	at za.co.absa.spline.harvester.LineageHarvester.traverseAndCollect$1(LineageHarvester.scala:167)
	at za.co.absa.spline.harvester.LineageHarvester.createOperationBuildersRecursively(LineageHarvester.scala:186)
	at za.co.absa.spline.harvester.LineageHarvester.$anonfun$harvest$4(LineageHarvester.scala:63)
	at scala.Option.flatMap(Option.scala:271)
	at za.co.absa.spline.harvester.LineageHarvester.harvest(LineageHarvester.scala:61)
	at za.co.absa.spline.agent.SplineAgent$$anon$1.$anonfun$handle$1(SplineAgent.scala:91)
	at za.co.absa.spline.agent.SplineAgent$$anon$1.withErrorHandling(SplineAgent.scala:100)
	at za.co.absa.spline.agent.SplineAgent$$anon$1.handle(SplineAgent.scala:72)
	at za.co.absa.spline.harvester.listener.QueryExecutionListenerDelegate.onSuccess(QueryExecutionListenerDelegate.scala:28)
	at za.co.absa.spline.harvester.listener.SplineQueryExecutionListener.$anonfun$onSuccess$1(SplineQueryExecutionListener.scala:41)
	at za.co.absa.spline.harvester.listener.SplineQueryExecutionListener.$anonfun$onSuccess$1$adapted(SplineQueryExecutionListener.scala:41)
	at scala.Option.foreach(Option.scala:407)
	at za.co.absa.spline.harvester.listener.SplineQueryExecutionListener.onSuccess(SplineQueryExecutionListener.scala:41)
	at org.apache.spark.sql.util.ExecutionListenerBus.doPostEvent(QueryExecutionListener.scala:165)
	at org.apache.spark.sql.util.ExecutionListenerBus.doPostEvent(QueryExecutionListener.scala:135)
	at org.apache.spark.util.ListenerBus.postToAll(ListenerBus.scala:117)
	at org.apache.spark.util.ListenerBus.postToAll$(ListenerBus.scala:101)
	at org.apache.spark.sql.util.ExecutionListenerBus.postToAll(QueryExecutionListener.scala:135)
	at org.apache.spark.sql.util.ExecutionListenerBus.onOtherEvent(QueryExecutionListener.scala:147)
	at org.apache.spark.scheduler.SparkListenerBus.doPostEvent(SparkListenerBus.scala:100)
	at org.apache.spark.scheduler.SparkListenerBus.doPostEvent$(SparkListenerBus.scala:28)
	at org.apache.spark.scheduler.AsyncEventQueue.doPostEvent(AsyncEventQueue.scala:37)
	at org.apache.spark.scheduler.AsyncEventQueue.doPostEvent(AsyncEventQueue.scala:37)
	at org.apache.spark.util.ListenerBus.postToAll(ListenerBus.scala:117)
	at org.apache.spark.util.ListenerBus.postToAll$(ListenerBus.scala:101)
	at org.apache.spark.scheduler.AsyncEventQueue.super$postToAll(AsyncEventQueue.scala:105)
	at org.apache.spark.scheduler.AsyncEventQueue.$anonfun$dispatch$1(AsyncEventQueue.scala:105)
	at scala.runtime.java8.JFunction0$mcJ$sp.apply(JFunction0$mcJ$sp.java:23)
	at scala.util.DynamicVariable.withValue(DynamicVariable.scala:62)
	at org.apache.spark.scheduler.AsyncEventQueue.org$apache$spark$scheduler$AsyncEventQueue$$dispatch(AsyncEventQueue.scala:100)
	at org.apache.spark.scheduler.AsyncEventQueue$$anon$2.$anonfun$run$1(AsyncEventQueue.scala:96)
	at org.apache.spark.util.Utils$.tryOrStopSparkContext(Utils.scala:1446)
	at org.apache.spark.scheduler.AsyncEventQueue$$anon$2.run(AsyncEventQueue.scala:96)
23/04/17 14:30:57 INFO KafkaProducer: [Producer clientId=producer-1] Closing the Kafka producer with timeoutMillis = 9223372036854775807 ms.
@jinmu0410
Copy link
Author

excel path like hdfs://lake-node1:8020/jinmu/test/test_simple.xlsx

@cerveada
Copy link
Contributor

What versions of Spark and Spline Agent were used?

@jinmu0410
Copy link
Author

@cerveada 1.0

@jinmu0410
Copy link
Author

spark 3.3.1

@wajda wajda added the bug Something isn't working label Apr 17, 2023
@wajda wajda added this to the 1.1.1 milestone Apr 17, 2023
@cerveada cerveada self-assigned this Apr 17, 2023
@jinmu0410
Copy link
Author

like file:///Users/jinmu/Downloads/test_simple.xlsx is ok! but hdfs://..... is error

@cerveada
Copy link
Contributor

That is what I thought, I will try to simulate the issue and fix this.

@jinmu0410
Copy link
Author

thanks

cerveada added a commit that referenced this issue Apr 21, 2023
cerveada added a commit that referenced this issue Jun 16, 2023
cerveada added a commit that referenced this issue Jun 16, 2023
cerveada added a commit that referenced this issue Jun 16, 2023
cerveada added a commit that referenced this issue Jun 16, 2023
cerveada added a commit that referenced this issue Jun 16, 2023
@cerveada
Copy link
Contributor

@jinmu0410 I was able to reproduce the issue. Unfortunately, the needed url is an arg of some lambda expression and I don't know how to extract it. I would need more time to do it, which I don't have now.

But, spark-excel also supports Sparks's data source V2 which should work out of the box. I added some test and even test it on hdfs, and it was working fine. So I recommend using DSV2 and that should fix the lineage issues as well.

see:
https://github.com/crealytics/spark-excel#excel-api-based-on-datasourcev2

@wajda wajda modified the milestones: 1.1.1, 1.2.0 Jun 16, 2023
@jinmu0410
Copy link
Author

@jinmu0410 I was able to reproduce the issue. Unfortunately, the needed url is an arg of some lambda expression and I don't know how to extract it. I would need more time to do it, which I don't have now.

But, spark-excel also supports Sparks's data source V2 which should work out of the box. I added some test and even test it on hdfs, and it was working fine. So I recommend using DSV2 and that should fix the lineage issues as well.

see: https://github.com/crealytics/spark-excel#excel-api-based-on-datasourcev2

ok thank you i will try

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
Status: Done
Development

Successfully merging a pull request may close this issue.

3 participants