Add sheetNames to WorkbookReader #196

qfjp · 2020-01-24T16:08:19Z

Targeting #42

nightscape · 2020-01-26T00:55:21Z

Lgtm 😄
Ready to merge from your side?

nightscape · 2020-01-26T00:55:28Z

Lgtm 😄
Ready to merge from your side?

qfjp · 2020-01-27T14:09:01Z

Seems to work okay for me, just waiting for an official merge and bump in version number.

nightscape · 2020-01-30T10:43:38Z

@qfjp forgot to ask that before merging: Could you add a little documentation/example to README.md and an entry to CHANGELOG.md?

qfjp · 2020-01-30T17:52:34Z

Just requested to merge my additions in the readme and changelog. Let me know if it needs any changes.

nightscape · 2020-01-31T11:55:02Z

Released as 0.12.5. Thanks for contributing!

qfjp · 2020-02-03T16:09:41Z

No problem, thanks for having such a quick turn-around

ktpatrick · 2021-11-29T23:52:43Z

@qfjp, Could you share the pyspark syntax to return the sheet names?

E-HO · 2022-04-29T11:40:09Z

Hi,

Is there a way to do the same (get a list of sheetNames) in Python ?
Tried to add a method ".sheetNames()" to spark reader or "dataAddress : sheetNames" option, but without proper result.

nightscape · 2022-04-29T19:40:26Z

Hi @E-HO, you would probably need to use an approach similar to this (on a phone, so can't test):

reader = spark._jvm.com.crealytics.spark.excel.WorkbookReader(
  {"path": "Worktime.xlsx"}, 
  spark.sparkContext.hadoopConfiguration
)

sheetnames = reader.sheetNames()

Alternatively, you could try reading the Excel file with a Python-based Excel reader to get the sheet names and use spark-excel to read the contents.

Fingolfin123 · 2022-05-24T00:28:26Z

Anyone get this to work with Python? I have tried suggestion by nightscape but getting this attribute error:

nightscape · 2022-05-24T09:41:38Z

@Fingolfin123 a quick Google search yielded this:
https://donagh.io/2020/04/08/accessing-hadoop-configuration-from-pyspark.html

deonchia · 2022-05-27T07:14:57Z

Hello! I'm trying to use the WorkbookReader to read the sheet names of an Excel in Python to programmatically read each sheet into a DataFrame.
May I get some help with this? Currently I'm stuck with the following error code:

py4j.protocol.Py4JError: An error occurred while calling None.com.crealytics.spark.excel.WorkbookReader. Trace:
py4j.Py4JException: Constructor com.crealytics.spark.excel.WorkbookReader([class java.util.HashMap, class org.apache.hadoop.conf.Configuration]) does not exist
        at py4j.reflection.ReflectionEngine.getConstructor(ReflectionEngine.java:179)
        at py4j.reflection.ReflectionEngine.getConstructor(ReflectionEngine.java:196)
        at py4j.Gateway.invoke(Gateway.java:237)
        at py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:80)
        at py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:69)
        at py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:182)
        at py4j.ClientServerConnection.run(ClientServerConnection.java:106)
        at java.base/java.lang.Thread.run(Thread.java:834)

Below is a snippet of the code.

reader = spark._jvm.com.crealytics.spark.excel.WorkbookReader(
    {"path": "Worktime.xlsx"},
    spark.sparkContext._jsc.hadoopConfiguration()
    )
sheetnames = reader.sheetNames()

Thanks in advance!

nightscape · 2022-05-27T18:59:50Z

Ah, a Python dict does not get converted into a Scala Map, but a Java one...
We'd probably need a second more Java-/Python-friendly constructor.
Is anybody here familiar with Scala and can create a PR which adds a second constructor accepting a Java Map and delegates to the Scala one after calling .asScala.

deonchia · 2022-05-28T18:40:19Z

That's unfortunate; thanks for the clarification!

williamdphillips · 2022-10-17T21:41:20Z

Added a PR for this - @nightscape can you verify?

#664

nightscape · 2022-10-18T07:15:02Z

Merged 👍
Thanks @williamdphillips!

cometta · 2022-12-01T01:58:55Z

I still getting

py4j.Py4JException: Constructor com.crealytics.spark.excel.WorkbookReader([class java.util.HashMap, class org.apache.hadoop.conf.Configuration]) does not exist

may i know is the merge available in spark-excel_2.12-3.3.1_0.18.5 ?

nightscape · 2022-12-01T10:37:14Z

Hmm, it should definitely be in 0.18.5: https://github.com/crealytics/spark-excel/blob/main/src/main/scala/com/crealytics/spark/excel/WorkbookReader.scala#L58
I have no idea what's going wrong...

cometta · 2022-12-02T01:12:34Z

hello @nightscape i use this https://repo1.maven.org/maven2/com/crealytics/spark-excel_2.12/3.3.1_0.18.5/spark-excel_2.12-3.3.1_0.18.5.jar , anyone else faced similar issue?

cometta · 2022-12-27T02:22:21Z

anyone tested this ?

mattoh91 · 2023-01-10T08:34:11Z

Hi, am trying to use the workbook reader to dynamically obtain multiple sheet names from the same excel file, but ran into this error:

"An error occurred while calling None.com.crealytics.spark.excel.WorkbookReader. Trace:\npy4j.Py4JException: Constructor com.crealytics.spark.excel.WorkbookReader([class java.util.HashMap, class org.apache.hadoop.conf.Configuration]) does not exist\n\tat py4j.reflection.ReflectionEngine.getConstructor(ReflectionEngine.java:179)\n\tat py4j.reflection.ReflectionEngine.getConstructor(ReflectionEngine.java:196)\n\tat py4j.Gateway.invoke(Gateway.java:237)\n\tat py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:80)\n\tat py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:69)\n\tat py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:182)\n\tat py4j.ClientServerConnection.run(ClientServerConnection.java:106)\n\tat java.lang.Thread.run(Thread.java:750)\n\n"

Any suggestion on how to resolve this?

nightscape · 2023-01-10T09:33:46Z

Ah, I think I know what the issue is. The method you need to call is actually not a constructor, but a static method.
Can you try com.crealytics.spark.excel.WorkbookReader.apply(...) instead of com.crealytics.spark.excel.WorkbookReader(...)?

You can also check the required signature like this

javap jar:file:///path/to/downloaded/spark-excel_2.12-3.3.1_0.18.5.jar!/com/crealytics/spark/excel/WorkbookReader.class

mattoh91 · 2023-01-11T09:02:03Z

Thanks nightscape! The static method works!

nightscape · 2023-01-11T15:36:16Z

Great! Would you mind creating a PR to enhance the documentation?

Add sheetNames to WorkbookReader

b5ad8f1

nightscape merged commit 21b440f into crealytics:master Jan 27, 2020

qfjp pushed a commit to qfjp/spark-excel that referenced this pull request Jan 30, 2020

Add documentation for crealytics#196

d069122

qfjp mentioned this pull request Jan 30, 2020

Add documentation for sheetNames pull request #199

Merged

nightscape pushed a commit that referenced this pull request Jan 31, 2020

Add documentation for #196

a58a265

nightscape mentioned this pull request Feb 4, 2020

getting a List of all sheets in the excel file #42

Closed

nightscape mentioned this pull request Jan 10, 2023

How to read an xlsx file having multiple sheets? #107

Closed

Krukosz mentioned this pull request Apr 30, 2024

Extract sheet names using pyspark #856

Open

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add sheetNames to WorkbookReader #196

Add sheetNames to WorkbookReader #196

qfjp commented Jan 24, 2020

nightscape commented Jan 26, 2020

nightscape commented Jan 26, 2020

qfjp commented Jan 27, 2020

nightscape commented Jan 30, 2020

qfjp commented Jan 30, 2020

nightscape commented Jan 31, 2020

qfjp commented Feb 3, 2020

ktpatrick commented Nov 29, 2021 •

edited

Loading

E-HO commented Apr 29, 2022

nightscape commented Apr 29, 2022

Fingolfin123 commented May 24, 2022

nightscape commented May 24, 2022

deonchia commented May 27, 2022

nightscape commented May 27, 2022

deonchia commented May 28, 2022

williamdphillips commented Oct 17, 2022

nightscape commented Oct 18, 2022

cometta commented Dec 1, 2022

nightscape commented Dec 1, 2022

cometta commented Dec 2, 2022

cometta commented Dec 27, 2022

mattoh91 commented Jan 10, 2023 •

edited

Loading

nightscape commented Jan 10, 2023 •

edited

Loading

mattoh91 commented Jan 11, 2023

nightscape commented Jan 11, 2023

Add sheetNames to WorkbookReader #196

Add sheetNames to WorkbookReader #196

Conversation

qfjp commented Jan 24, 2020

nightscape commented Jan 26, 2020

nightscape commented Jan 26, 2020

qfjp commented Jan 27, 2020

nightscape commented Jan 30, 2020

qfjp commented Jan 30, 2020

nightscape commented Jan 31, 2020

qfjp commented Feb 3, 2020

ktpatrick commented Nov 29, 2021 • edited Loading

E-HO commented Apr 29, 2022

nightscape commented Apr 29, 2022

Fingolfin123 commented May 24, 2022

nightscape commented May 24, 2022

deonchia commented May 27, 2022

nightscape commented May 27, 2022

deonchia commented May 28, 2022

williamdphillips commented Oct 17, 2022

nightscape commented Oct 18, 2022

cometta commented Dec 1, 2022

nightscape commented Dec 1, 2022

cometta commented Dec 2, 2022

cometta commented Dec 27, 2022

mattoh91 commented Jan 10, 2023 • edited Loading

nightscape commented Jan 10, 2023 • edited Loading

mattoh91 commented Jan 11, 2023

nightscape commented Jan 11, 2023

ktpatrick commented Nov 29, 2021 •

edited

Loading

mattoh91 commented Jan 10, 2023 •

edited

Loading

nightscape commented Jan 10, 2023 •

edited

Loading