saving multiple datasets as different sheets in excel file #43

antonkulaga · 2017-12-16T15:55:39Z

saving multiple datasets as different sheets in excel file

zcking · 2018-02-08T17:52:10Z

Is there any plans for this? I need this specific functionality for my use case. I have a handful of DataFrames and want to save them each as a sheet in a single Excel workbook. Perhaps an "append" mode support for the DataFrameWriter is how this would be done?

nightscape · 2018-02-11T15:59:13Z

We currently don't have any plans of doing this ourselves, we're open to PRs though.
I guess the easiest thing to do would be to add an artificial sheet column to the DataFrame which - if set - determines into which sheet the data should be written.

sbvarre · 2018-02-23T19:13:24Z

With just "overwrite" available how is the above solution viable? I created a column called sheet and tried doing it. It just overwrites and doesn't write the output to individual sheets based on the value of sheet.Any suggestions greatly appreciated.

nightscape · 2018-02-23T20:30:12Z

@sbvarre My answer was a hint how this could be implemented, not how it currently works 😜

Tasselmi · 2018-08-18T16:17:02Z

I occured the same problem~~~
I just want TO WRITE LOTS OF DATASETS IN ONE FILE.
THE EXPERIENCE OF WRITING OF SPARK IS SO TERRIBLE.
I LOVE PYTHON.

nightscape · 2018-10-31T17:00:50Z

I'm just publishing 0.11.0-beta1. Once it has landed on Maven Central you can give it a try using the new .mode("append") option.
This allows you to write Datasets into existing files, so you should be able to write different Datasets into different sheets by calling .write multiple times with different dataAddresses (see the corresponding section in the README).

hbenzineb · 2018-11-09T10:15:50Z

Hi
When I use .mode("append"), I have this Error : java.lang.RuntimeException: Append mode is not supported by com.crealytics.spark.excel.DefaultSource
In the scala code of com.crealytics.spark.excel.DefaultSource (here https://github.com/crealytics/spark-excel/pull/25/files ) I found that the append mode is not available :(
Have you an other solution ?
Thanks

nightscape · 2018-11-09T10:45:23Z

@hbenzineb did you actually use 0.11.0-beta1?
And what does #25 have to do with this?

hbenzineb · 2018-11-09T13:16:42Z

@nightscape : thank you for your answer.
I use 0.10.0.
I change the dependency maven to 0.11.0-beta1 but I have a new Error
18/11/09 13:58:06 ERROR ApplicationMaster: User class threw exception: scala.MatchError: Map(path -> /project/cdn/dwh/tomfir/trt/Worktime2.xlsx, useheader -> true, timestampformat -> mm-dd-yyyy hh:mm:ss, dateformat -> yy-mmm-d, sheetname -> 01-Code partenaire) (of class org.apache.spark.sql.catalyst.util.CaseInsensitiveMap)
scala.MatchError: Map(path -> /project/cdn/dwh/tomfir/trt/Worktime2.xlsx, useheader -> true, timestampformat -> mm-dd-yyyy hh:mm:ss, dateformat -> yy-mmm-d, sheetname -> 01-Code partenaire) (of class org.apache.spark.sql.catalyst.util.CaseInsensitiveMap)
at com.crealytics.spark.excel.DataLocator$.apply(DataLocator.scala:52)
at com.crealytics.spark.excel.DefaultSource.createRelation(DefaultSource.scala:59)
at org.apache.spark.sql.execution.datasources.DataSource.write(DataSource.scala:471)
at org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:50)
at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:58)
at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:56)
at org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:74)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:117)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:117)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:138)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:135)
at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:116)
at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:92)
at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:92)
at org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:609)
at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:233)
at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:217)
at com.socgen.tomfir.spark.collect.ControlBsre$.controlAgregateBsre(ControlBsre.scala:35)
at com.socgen.tomfir.spark.collect.CollectTomfir$.main(CollectTomfir.scala:138)
at com.socgen.tomfir.spark.collect.CollectTomfir.main(CollectTomfir.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$3.run(ApplicationMaster.scala:646)

nightscape · 2018-11-09T15:10:17Z

0.11.0-beta1 has a different way of addressing cells. Please read the links in my message above your first message or the corresponding CHANGELOG.

hbenzineb · 2018-11-09T15:34:25Z

Thanks. I'm starting to understand better :)
Which option can I use to add a new sheet in my file.xlsx and write in from the first cell A1 please ?

nightscape · 2018-11-09T15:54:00Z

https://github.com/crealytics/spark-excel/blob/append_to_existing_file/README.md#write-a-dataframe-to-an-excel-file
If the sheet given in dataAddress does not exist yet, it will be created.

hbenzineb · 2018-11-09T16:33:02Z

Ok
I tried with this syntax:
` val dfCountCdPtn = dfData.groupBy("AT138_CODE_PTN").count().alias("NB_LINES")
dfCountCdPtn.write
.format("com.crealytics.spark.excel")
.option("dataAddress", "'01-Code partenaire'!A1:B1000")
//.option("preHeader", "Pre-header\tin\tcells\nand\trows") // Optional, default None. If set adds rows (split by "\n") and cells (split by "\t") before the column headers.
.option("useHeader", "true")
//.option("dateFormat", "yy-mmm-d") // Optional, default: yy-m-d h:mm
//.option("timestampFormat", "mm-dd-yyyy hh:mm:ss") // Optional, default: yyyy-mm-dd hh:mm:ss.000
.mode("overwrite")
.save("/project/cdn/dwh/tomfir/trt/MAD BSRE Quotidienne.xlsx")

//Agregate on code Banque
val dfCountByBq = dfData.groupBy("BQ").count().alias("Nombre d'enregistrements")
dfCountByBq.write
  .format("com.crealytics.spark.excel")
  .option("dataAddress", "'02-Code banque'!A1:B1000")
  //.option("preHeader", "Pre-header\tin\tcells\nand\trows") // Optional, default None. If set adds rows (split by "\n") and cells (split by "\t") before the column headers.
  .option("useHeader", "true")
  //.option("dateFormat", "yy-mmm-d") // Optional, default: yy-m-d h:mm
  //.option("timestampFormat", "mm-dd-yyyy hh:mm:ss") // Optional, default: yyyy-mm-dd hh:mm:ss.000
  .mode("Append")
  .save("/project/cdn/dwh/tomfir/trt/MAD BSRE Quotidienne.xlsx")`

The first write is OK
But the second print the Error : diagnostics: User class threw exception: java.lang.IllegalArgumentException: InputStream of class class org.apache.commons.compress.archivers.zip.ZipArchiveInputStream is not implementing InputStreamStatistics.

I m using the dependency

org.apache.commons
commons-compress
1.18

nightscape · 2018-11-09T21:42:14Z

@hbenzineb this is issue #93.
Let's continue the discussion regarding this specific problem there.

nightscape · 2018-11-28T22:54:41Z

I just released 0.11.0. Please have a look at the corresponding CHANGELOG and the README about the new .mode("append") feature which allows you to write into existing files.

navg223 · 2020-06-17T12:18:22Z

I am also facing the below issue .
Append mode is not supported by com.crealytics.spark.excel.DefaultSource

Using version 0.9.15

nightscape · 2020-06-17T22:36:53Z

@navg223 please read the comment directly above yours 😉

nightscape mentioned this issue May 18, 2018

Is it possible to write multiple dataframes to different sheets in the same Excel worksheet #68

Closed

nightscape mentioned this issue Oct 31, 2018

Would need multiple sheets to generate for multiple dataset. #77

Closed

nightscape closed this as completed Nov 28, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

saving multiple datasets as different sheets in excel file #43

saving multiple datasets as different sheets in excel file #43

antonkulaga commented Dec 16, 2017

zcking commented Feb 8, 2018

nightscape commented Feb 11, 2018

sbvarre commented Feb 23, 2018

nightscape commented Feb 23, 2018

Tasselmi commented Aug 18, 2018

nightscape commented Oct 31, 2018

hbenzineb commented Nov 9, 2018

nightscape commented Nov 9, 2018

hbenzineb commented Nov 9, 2018

nightscape commented Nov 9, 2018 •

edited

Loading

hbenzineb commented Nov 9, 2018

nightscape commented Nov 9, 2018

hbenzineb commented Nov 9, 2018

nightscape commented Nov 9, 2018

nightscape commented Nov 28, 2018

navg223 commented Jun 17, 2020

nightscape commented Jun 17, 2020

saving multiple datasets as different sheets in excel file #43

saving multiple datasets as different sheets in excel file #43

Comments

antonkulaga commented Dec 16, 2017

zcking commented Feb 8, 2018

nightscape commented Feb 11, 2018

sbvarre commented Feb 23, 2018

nightscape commented Feb 23, 2018

Tasselmi commented Aug 18, 2018

nightscape commented Oct 31, 2018

hbenzineb commented Nov 9, 2018

nightscape commented Nov 9, 2018

hbenzineb commented Nov 9, 2018

nightscape commented Nov 9, 2018 • edited Loading

hbenzineb commented Nov 9, 2018

nightscape commented Nov 9, 2018

hbenzineb commented Nov 9, 2018

nightscape commented Nov 9, 2018

nightscape commented Nov 28, 2018

navg223 commented Jun 17, 2020

nightscape commented Jun 17, 2020

nightscape commented Nov 9, 2018 •

edited

Loading