Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Excel Relation is not supported #588

Closed
Qurashetufail opened this issue Feb 13, 2020 · 21 comments
Closed

Excel Relation is not supported #588

Qurashetufail opened this issue Feb 13, 2020 · 21 comments
Assignees
Labels
Milestone

Comments

@Qurashetufail
Copy link

Hi Team,

Spline version 0.4.1

I am facing this issue while accessing the data from the .xlsx file. Please find the error below

java.lang.RuntimeException: Relation is not supported: ExcelRelation(data/input/batch/Lumendata_modified.xlsx,Some(RLD_ENGINE_CLAIM_DETAIL_MV),true,false,false,false,None,0,2147483647,None,None)
     [java]     at scala.sys.package$.error(package.scala:27)
     [java]     at za.co.absa.spline.harvester.builder.read.ReadCommandExtractor$$anonfun$asReadCommand$1.applyOrElse(ReadCommandExtractor.scala:73)
     [java]     at za.co.absa.spline.harvester.builder.read.ReadCommandExtractor$$anonfun$asReadCommand$1.applyOrElse(ReadCommandExtractor.scala:42)
     [java]     at scala.PartialFunction$Lifted.apply(PartialFunction.scala:223)
     [java]     at scala.PartialFunction$Lifted.apply(PartialFunction.scala:219)
     [java]     at scala.PartialFunction$.condOpt(PartialFunction.scala:286)
     [java]     at za.co.absa.spline.harvester.builder.read.ReadCommandExtractor.asReadCommand(ReadCommandExtractor.scala:42)
     [java]     at za.co.absa.spline.harvester.LineageHarvester.za$co$absa$spline$harvester$LineageHarvester$$createOperationBuilder(LineageHarvester.scala:142)
     [java]     at za.co.absa.spline.harvester.LineageHarvester$$anonfun$6.apply(LineageHarvester.scala:119)
     [java]     at za.co.absa.spline.harvester.LineageHarvester$$anonfun$6.apply(LineageHarvester.scala:119)
     [java]     at scala.Option.getOrElse(Option.scala:121)
     [java]     at za.co.absa.spline.harvester.LineageHarvester.traverseAndCollect$1(LineageHarvester.scala:119)
     [java]     at za.co.absa.spline.harvester.LineageHarvester.za$co$absa$spline$harvester$LineageHarvester$$createOperationBuildersRecursively(LineageHarvester.scala:138)
     [java]     at za.co.absa.spline.harvester.LineageHarvester$$anonfun$harvest$1.apply(LineageHarvester.scala:70)
     [java]     at za.co.absa.spline.harvester.LineageHarvester$$anonfun$harvest$1.apply(LineageHarvester.scala:68)
     [java]     at scala.Option.flatMap(Option.scala:171)
     [java]     at za.co.absa.spline.harvester.LineageHarvester.harvest(LineageHarvester.scala:68)
     [java]     at za.co.absa.spline.harvester.QueryExecutionEventHandler.onSuccess(QueryExecutionEventHandler.scala:41)
     [java]     at za.co.absa.spline.harvester.listener.SplineQueryExecutionListener$$anonfun$onSuccess$1.apply(SplineQueryExecutionListener.scala:37)
     [java]     at za.co.absa.spline.harvester.listener.SplineQueryExecutionListener$$anonfun$onSuccess$1.apply(SplineQueryExecutionListener.scala:37)
     [java]     at scala.Option.foreach(Option.scala:257)
     [java]     at za.co.absa.spline.harvester.listener.SplineQueryExecutionListener.onSuccess(SplineQueryExecutionListener.scala:37)
     [java]     at org.apache.spark.sql.util.ExecutionListenerManager$$anonfun$onSuccess$1$$anonfun$apply$mcV$sp$1.apply(QueryExecutionListener.scala:114)
     [java]     at org.apache.spark.sql.util.ExecutionListenerManager$$anonfun$onSuccess$1$$anonfun$apply$mcV$sp$1.apply(QueryExecutionListener.scala:113)
     [java]     at org.apache.spark.sql.util.ExecutionListenerManager$$anonfun$org$apache$spark$sql$util$ExecutionListenerManager$$withErrorHandling$1.apply(QueryExecutionListener.scala:135)
     [java]     at org.apache.spark.sql.util.ExecutionListenerManager$$anonfun$org$apache$spark$sql$util$ExecutionListenerManager$$withErrorHandling$1.apply(QueryExecutionListener.scala:133)
     [java]     at scala.collection.immutable.List.foreach(List.scala:392)
     [java]     at scala.collection.generic.TraversableForwarder$class.foreach(TraversableForwarder.scala:35)
     [java]     at scala.collection.mutable.ListBuffer.foreach(ListBuffer.scala:45)
     [java]     at org.apache.spark.sql.util.ExecutionListenerManager.org$apache$spark$sql$util$ExecutionListenerManager$$withErrorHandling(QueryExecutionListener.scala:133)
     [java]     at org.apache.spark.sql.util.ExecutionListenerManager$$anonfun$onSuccess$1.apply$mcV$sp(QueryExecutionListener.scala:113)
     [java]     at org.apache.spark.sql.util.ExecutionListenerManager$$anonfun$onSuccess$1.apply(QueryExecutionListener.scala:113)
     [java]     at org.apache.spark.sql.util.ExecutionListenerManager$$anonfun$onSuccess$1.apply(QueryExecutionListener.scala:113)
     [java]     at org.apache.spark.sql.util.ExecutionListenerManager.readLock(QueryExecutionListener.scala:146)
     [java]     at org.apache.spark.sql.util.ExecutionListenerManager.onSuccess(QueryExecutionListener.scala:112)
     [java]     at org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:611)
     [java]     at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:233)
     [java]     at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:217)
     [java]     at org.apache.spark.sql.DataFrameWriter.parquet(DataFrameWriter.scala:508)
     [java]     at za.co.absa.spline.example.batch.CumminsJob1$.delayedEndpoint$za$co$absa$spline$example$batch$CumminsJob1$1(CumminsJob1.scala:249)
     [java]     at za.co.absa.spline.example.batch.CumminsJob1$delayedInit$body.apply(CumminsJob1.scala:19)
     [java]     at scala.Function0$class.apply$mcV$sp(Function0.scala:34)
     [java]     at scala.runtime.AbstractFunction0.apply$mcV$sp(AbstractFunction0.scala:12)
     [java]     at scala.App$$anonfun$main$1.apply(App.scala:76)
     [java]     at scala.App$$anonfun$main$1.apply(App.scala:76)
     [java]     at scala.collection.immutable.List.foreach(List.scala:392)
     [java]     at scala.collection.generic.TraversableForwarder$class.foreach(TraversableForwarder.scala:35)
     [java]     at scala.App$class.main(App.scala:76)
     [java]     at za.co.absa.spline.example.SparkApp.main(SparkApp.scala:27)
     [java]     at za.co.absa.spline.example.batch.CumminsJob1.main(CumminsJob1.scala)

The code snippet is mentioned below

val dfRawEngClaimDetail = dfRawData.filter((col("payment_date") >= "2016-01-01")&&(col("payment_date") < current_date)&&(col("total_fail_count_num") === 1)&&(col("fail_mode_code") =!= "NF")&&(col("fail_mode_code") =!= "WP")&&(col("program_group_name") =!= "CAMP/TRP")&&(col("program_group_name") =!= "FIELD TEST")&&(col("program_group_name") =!= "OTHERS")).select("engine_serial_num","build_date",
      "build_year",
      "build_qtr",
      "build_month",
      "plant_id_code",
      "engine_family_code",
      "engine_group_desc",
      "engine_name_desc",
      "design_phase_code",
      "variable_timing_id_num",
      "mktg_hsp_num",
      "mktg_rpm_num",
      "ship_date",
      "oem_code",
      "application_code",
      "in_service_date",
      "fail_service_date_diff",
      "failure_date",
      "fail_code_and_mode",
      "fail_code",
      "fail_mode_code",
      "engine_miles",
      "engine_hrs",
      "original_unit_of_measure",
      "dealer_code",
      "distributor_code",
      "claim_num",
      "claim_year",
      "claim_id_seq",
      "payment_date",
      "payment_year",
      "payment_qtr",
      "payment_month",
      "program_account_code",
      "program_group_name",
      "labor_hours",
      "net_amount",
      "materials_amount",
      "markup_amount",
      "repair_labor_amount",
      "region_code",
      "territory_code",
      "design_config_num",
      "mktg_config_num",
      "mktg_config_name",
      "vin_num",
      "authorization_num",
      "shop_order_num",
      "cpl_num",
      "total_fail_count_num",
      "claim_detail_id_seq",
      "fail_system_code",
      "fail_component_code",
      "user_appl_code",
      "travel_lodging_amount",
      "travel_to_site_amount",
      "travel_labor_amount",
      "beyond_fact_charge",
      "tax_amount",
      "other_expense_amount",
      "deductible_amount",
      "cummins_administration_amount",
      "undetailed_parts_amount",
      "dollar_differ_amount").
      withColumn("month",substring(col("failure_date"),6,2)).
      withColumn("failyear", year(col("failure_date")).cast("String")).
      withColumn("quarter", quarter(col("failure_date"))).
      withColumn("failmonth",(concat(col("failyear"),col("month")))).
      withColumn("failqtr",(concat(col("failyear"),col("quarter")))).
      withColumn("travelamt",col("travel_lodging_amount") + col("travel_to_site_amount") + col("travel_labor_amount")).
      withColumn("otheramt",col("beyond_fact_charge") + col("tax_amount") + col("other_expense_amount") + col("deductible_amount") + col("cummins_administration_amount") + col("undetailed_parts_amount") + col("dollar_differ_amount")).
      withColumn("config",col("design_config_num"))

Request you to please let me know why this issue is happening?

Thanks,
Tufail

@wajda wajda removed the question label Feb 13, 2020
@wajda wajda modified the milestones: bug, 0.5.0 Feb 13, 2020
@wajda wajda added the bug label Feb 13, 2020
@wajda wajda modified the milestones: 0.5.0, 0.4.2 Feb 13, 2020
@cerveada
Copy link
Contributor

Hello Tufail,
Spline currently doesn't support every Spark command or every Relation. The most common ones are covered, but there is still more to be done.

We will try to include this relation support in the next Spline release.

@Qurashetufail
Copy link
Author

Thank you @cerveada for the information. Can you please confirm when will the next version of Spline will be released?

@wajda
Copy link
Contributor

wajda commented Feb 14, 2020

@cerveada I think we could fix it in 0.4 and release an update asap. Should be easy, no?

@cerveada
Copy link
Contributor

It should, OK I will take a look at the issue at Monday. So the bugfix version should be released next week.

@cerveada cerveada self-assigned this Feb 14, 2020
@Qurashetufail
Copy link
Author

Qurashetufail commented Feb 14, 2020

Thank you @cerveada that would be great. Also, this issue is with the excel data source. If I change the data source to the oracle would this issue be still occurring?

Thanks,
Tufail

@cerveada
Copy link
Contributor

  • I think for Oracle Spark will return JDBCRelation which is supported.
  • There is no ExcelRelation in Spark. Were you using crealytics/spark-excel library or something else for that?

@Qurashetufail
Copy link
Author

  • I will try with oracle today.
  • I am using com.crealytics.spark.excel to read the excel file. I have data in multiple tabs in a single excel.

@cerveada cerveada changed the title Relation is not supported Excel Relation is not supported Feb 17, 2020
@cerveada
Copy link
Contributor

Capturing excel write operation seems to be currently impossible. We will support only read for now.

More here: nightscape/spark-excel#209

@Qurashetufail
Copy link
Author

@cerveada Thank you for keeping the ticket open. Please let me know when this will be available in the future.

@wajda I understand what you are saying. It worked with Spark 3 preview by chance. I have now switched to Spark 2.4v for my POC. The oracle issue is still there, i will update the ticket when I will figure out why this error is happening.

@Qurashetufail
Copy link
Author

Hi @wajda

I solved the dependency issue while using spline with docker and spark 2.2.2v. The code is working fine with oracle as the database. However, when using the spark 2,3v and 2.4v the code was throwing the said error. I had modified the parent/pom.xml file and pom.xml file available in the examples folder under 0.4.1v to add the additional dependencies. Please let me know if this is the right fix for this issue.

Thanks,
Tufail

@cerveada
Copy link
Contributor

cerveada commented Feb 20, 2020

@Qurashetufail I will ask you once more. If you have any question or comments unrelated to Excel, please create new ticket for it, or use ticket that is related to your problem.

Each problem should have its own dedicated ticket. Oracle and Excel are unrelated things. You can link the other tickets if you need to provide context.

This will make searching for information in future much easier for everybody.

@AbsaOSS AbsaOSS deleted a comment from Qurashetufail Feb 21, 2020
@AbsaOSS AbsaOSS deleted a comment from Qurashetufail Feb 21, 2020
@AbsaOSS AbsaOSS deleted a comment from Qurashetufail Feb 21, 2020
@AbsaOSS AbsaOSS deleted a comment from cerveada Feb 21, 2020
@wajda
Copy link
Contributor

wajda commented Feb 21, 2020

@Qurashetufail,
"Not a version: 9" issue has been moved to #598

cerveada added a commit that referenced this issue Feb 25, 2020
cerveada added a commit that referenced this issue Feb 25, 2020
@wajda wajda closed this as completed Feb 25, 2020
@wajda
Copy link
Contributor

wajda commented Feb 25, 2020

@Qurashetufail,
Spline 0.4.2 is out. Can you try it and see if the issue is fixed now?
Thanks.

@Qurashetufail
Copy link
Author

Hi @wajda,
I ran the script on Spline 0.4.2. I am encountering the new issue now

     [java] Exception in thread "main" java.lang.NoSuchMethodError: com.crealytics.spark.excel.ExcelRelation.workbookReader()Lcom/crealytics/spark/excel/WorkbookReader;
     [java]     at za.co.absa.spline.harvester.builder.read.ReadCommandExtractor$$anonfun$asReadCommand$1.applyOrElse(ReadCommandExtractor.scala:75)
     [java]     at za.co.absa.spline.harvester.builder.read.ReadCommandExtractor$$anonfun$asReadCommand$1.applyOrElse(ReadCommandExtractor.scala:44)
     [java]     at scala.PartialFunction$Lifted.apply(PartialFunction.scala:223)
     [java]     at scala.PartialFunction$Lifted.apply(PartialFunction.scala:219)
     [java]     at scala.PartialFunction$.condOpt(PartialFunction.scala:286)
     [java]     at za.co.absa.spline.harvester.builder.read.ReadCommandExtractor.asReadCommand(ReadCommandExtractor.scala:44)
     [java]     at za.co.absa.spline.harvester.LineageHarvester.za$co$absa$spline$harvester$LineageHarvester$$createOperationBuilder(LineageHarvester.scala:143)
     [java]     at za.co.absa.spline.harvester.LineageHarvester$$anonfun$6.apply(LineageHarvester.scala:120)
     [java]     at za.co.absa.spline.harvester.LineageHarvester$$anonfun$6.apply(LineageHarvester.scala:120)
     [java]     at scala.Option.getOrElse(Option.scala:121)
     [java]     at za.co.absa.spline.harvester.LineageHarvester.traverseAndCollect$1(LineageHarvester.scala:120)
     [java]     at za.co.absa.spline.harvester.LineageHarvester.za$co$absa$spline$harvester$LineageHarvester$$createOperationBuildersRecursively(LineageHarvester.scala:139)
     [java]     at za.co.absa.spline.harvester.LineageHarvester$$anonfun$harvest$1.apply(LineageHarvester.scala:71)
     [java]     at za.co.absa.spline.harvester.LineageHarvester$$anonfun$harvest$1.apply(LineageHarvester.scala:69)
     [java]     at scala.Option.flatMap(Option.scala:171)
     [java]     at za.co.absa.spline.harvester.LineageHarvester.harvest(LineageHarvester.scala:69)
     [java]     at za.co.absa.spline.harvester.QueryExecutionEventHandler.onSuccess(QueryExecutionEventHandler.scala:41)
     [java]     at za.co.absa.spline.harvester.listener.SplineQueryExecutionListener$$anonfun$onSuccess$1.apply(SplineQueryExecutionListener.scala:37)
     [java]     at za.co.absa.spline.harvester.listener.SplineQueryExecutionListener$$anonfun$onSuccess$1.apply(SplineQueryExecutionListener.scala:37)
     [java]     at scala.Option.foreach(Option.scala:257)
     [java]     at za.co.absa.spline.harvester.listener.SplineQueryExecutionListener.onSuccess(SplineQueryExecutionListener.scala:37)
     [java]     at org.apache.spark.sql.util.ExecutionListenerManager$$anonfun$onSuccess$1$$anonfun$apply$mcV$sp$1.apply(QueryExecutionListener.scala:114)
     [java]     at org.apache.spark.sql.util.ExecutionListenerManager$$anonfun$onSuccess$1$$anonfun$apply$mcV$sp$1.apply(QueryExecutionListener.scala:113)
     [java]     at org.apache.spark.sql.util.ExecutionListenerManager$$anonfun$org$apache$spark$sql$util$ExecutionListenerManager$$withErrorHandling$1.apply(QueryExecutionListener.scala:135)
     [java]     at org.apache.spark.sql.util.ExecutionListenerManager$$anonfun$org$apache$spark$sql$util$ExecutionListenerManager$$withErrorHandling$1.apply(QueryExecutionListener.scala:133)
     [java]     at scala.collection.immutable.List.foreach(List.scala:392)
     [java]     at scala.collection.generic.TraversableForwarder$class.foreach(TraversableForwarder.scala:35)
     [java]     at scala.collection.mutable.ListBuffer.foreach(ListBuffer.scala:45)
     [java]     at org.apache.spark.sql.util.ExecutionListenerManager.org$apache$spark$sql$util$ExecutionListenerManager$$withErrorHandling(QueryExecutionListener.scala:133)
     [java]     at org.apache.spark.sql.util.ExecutionListenerManager$$anonfun$onSuccess$1.apply$mcV$sp(QueryExecutionListener.scala:113)
     [java]     at org.apache.spark.sql.util.ExecutionListenerManager$$anonfun$onSuccess$1.apply(QueryExecutionListener.scala:113)
     [java]     at org.apache.spark.sql.util.ExecutionListenerManager$$anonfun$onSuccess$1.apply(QueryExecutionListener.scala:113)
     [java]     at org.apache.spark.sql.util.ExecutionListenerManager.readLock(QueryExecutionListener.scala:146)
     [java]     at org.apache.spark.sql.util.ExecutionListenerManager.onSuccess(QueryExecutionListener.scala:112)
     [java]     at org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:611)
     [java]     at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:233)
     [java]     at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:217)
     [java]     at org.apache.spark.sql.DataFrameWriter.parquet(DataFrameWriter.scala:508)
     [java]     at za.co.absa.spline.example.batch.CumminsJob1$.delayedEndpoint$za$co$absa$spline$example$batch$CumminsJob1$1(CumminsJob1.scala:248)
     [java]     at za.co.absa.spline.example.batch.CumminsJob1$delayedInit$body.apply(CumminsJob1.scala:19)
     [java]     at scala.Function0$class.apply$mcV$sp(Function0.scala:34)
     [java]     at scala.runtime.AbstractFunction0.apply$mcV$sp(AbstractFunction0.scala:12)
     [java]     at scala.App$$anonfun$main$1.apply(App.scala:76)
     [java]     at scala.App$$anonfun$main$1.apply(App.scala:76)
     [java]     at scala.collection.immutable.List.foreach(List.scala:392)
     [java]     at scala.collection.generic.TraversableForwarder$class.foreach(TraversableForwarder.scala:35)
     [java]     at scala.App$class.main(App.scala:76)
     [java]     at za.co.absa.spline.example.SparkApp.main(SparkApp.scala:27)
     [java]     at za.co.absa.spline.example.batch.CumminsJob1.main(CumminsJob1.scala)
     [java] 20/02/26 16:20:41 INFO SparkContext: Invoking stop() from shutdown hook
     [java] 20/02/26 16:20:41 INFO SparkUI: Stopped Spark web UI at http://192.168.1.3:4040
     [java] 20/02/26 16:20:41 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
     [java] 20/02/26 16:20:42 INFO MemoryStore: MemoryStore cleared
     [java] 20/02/26 16:20:42 INFO BlockManager: BlockManager stopped
     [java] 20/02/26 16:20:42 INFO BlockManagerMaster: BlockManagerMaster stopped
     [java] 20/02/26 16:20:42 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!

I ran the script on all the compatible spark versions.

Thanks,
Tufail

@cerveada
Copy link
Contributor

Hello, which version of spark-excel library are you using?

@Qurashetufail
Copy link
Author

Qurashetufail commented Feb 26, 2020

I am running this on below dependency.

<dependency>
 <groupId>com.crealytics</groupId>
 <artifactId>spark-excel_2.11</artifactId>
 <version>0.9.6</version>
</dependency>

First I ran the code on the given pom.xml in 0.4.2v. I started facing this issue nightscape/spark-excel#119

@cerveada
Copy link
Contributor

Could you try to update the spark-excel version to the latest version0.13.0?

@Qurashetufail
Copy link
Author

Qurashetufail commented Feb 26, 2020

After updating, I am getting the same issue mentioned above.
nightscape/spark-excel#119.
I will try to check why this issue is occurring later. Will update the case when I do.

Thanks,
Tufail

@cerveada
Copy link
Contributor

Ok, I added the support for Spark Excel, but that doesn't mean it will work for all past version. But I expect it will work at least for 0.12.x and 0.13.x.

@Qurashetufail
Copy link
Author

Thank you for your help. Its the issue now the with excel package. nightscape/spark-excel#137
I will update the case with the working confirmation.
Thanks,
Tufail

@Qurashetufail
Copy link
Author

Thanks mate its working now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Status: Done
Development

No branches or pull requests

3 participants