Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add sheetNames to WorkbookReader #196

Merged
merged 1 commit into from
Jan 27, 2020
Merged

Conversation

qfjp
Copy link

@qfjp qfjp commented Jan 24, 2020

Targeting #42

@nightscape
Copy link
Collaborator

Lgtm 😄
Ready to merge from your side?

1 similar comment
@nightscape
Copy link
Collaborator

Lgtm 😄
Ready to merge from your side?

@qfjp
Copy link
Author

qfjp commented Jan 27, 2020

Seems to work okay for me, just waiting for an official merge and bump in version number.

@nightscape nightscape merged commit 21b440f into crealytics:master Jan 27, 2020
@nightscape
Copy link
Collaborator

@qfjp forgot to ask that before merging: Could you add a little documentation/example to README.md and an entry to CHANGELOG.md?

qfjp pushed a commit to qfjp/spark-excel that referenced this pull request Jan 30, 2020
@qfjp
Copy link
Author

qfjp commented Jan 30, 2020

Just requested to merge my additions in the readme and changelog. Let me know if it needs any changes.

nightscape pushed a commit that referenced this pull request Jan 31, 2020
@nightscape
Copy link
Collaborator

Released as 0.12.5. Thanks for contributing!

@qfjp
Copy link
Author

qfjp commented Feb 3, 2020

No problem, thanks for having such a quick turn-around

@ktpatrick
Copy link

ktpatrick commented Nov 29, 2021

@qfjp, Could you share the pyspark syntax to return the sheet names?

@E-HO
Copy link

E-HO commented Apr 29, 2022

Hi,

Is there a way to do the same (get a list of sheetNames) in Python ?
Tried to add a method ".sheetNames()" to spark reader or "dataAddress : sheetNames" option, but without proper result.

@nightscape
Copy link
Collaborator

Hi @E-HO, you would probably need to use an approach similar to this (on a phone, so can't test):

reader = spark._jvm.com.crealytics.spark.excel.WorkbookReader(
  {"path": "Worktime.xlsx"}, 
  spark.sparkContext.hadoopConfiguration
)

sheetnames = reader.sheetNames()

Alternatively, you could try reading the Excel file with a Python-based Excel reader to get the sheet names and use spark-excel to read the contents.

@Fingolfin123
Copy link

Anyone get this to work with Python? I have tried suggestion by nightscape but getting this attribute error:
image

@nightscape
Copy link
Collaborator

@deonchia
Copy link

Hello! I'm trying to use the WorkbookReader to read the sheet names of an Excel in Python to programmatically read each sheet into a DataFrame.
May I get some help with this? Currently I'm stuck with the following error code:

py4j.protocol.Py4JError: An error occurred while calling None.com.crealytics.spark.excel.WorkbookReader. Trace:
py4j.Py4JException: Constructor com.crealytics.spark.excel.WorkbookReader([class java.util.HashMap, class org.apache.hadoop.conf.Configuration]) does not exist
        at py4j.reflection.ReflectionEngine.getConstructor(ReflectionEngine.java:179)
        at py4j.reflection.ReflectionEngine.getConstructor(ReflectionEngine.java:196)
        at py4j.Gateway.invoke(Gateway.java:237)
        at py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:80)
        at py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:69)
        at py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:182)
        at py4j.ClientServerConnection.run(ClientServerConnection.java:106)
        at java.base/java.lang.Thread.run(Thread.java:834)

Below is a snippet of the code.

reader = spark._jvm.com.crealytics.spark.excel.WorkbookReader(
    {"path": "Worktime.xlsx"},
    spark.sparkContext._jsc.hadoopConfiguration()
    )
sheetnames = reader.sheetNames()

Thanks in advance!

@nightscape
Copy link
Collaborator

Ah, a Python dict does not get converted into a Scala Map, but a Java one...
We'd probably need a second more Java-/Python-friendly constructor.
Is anybody here familiar with Scala and can create a PR which adds a second constructor accepting a Java Map and delegates to the Scala one after calling .asScala.

@deonchia
Copy link

That's unfortunate; thanks for the clarification!

@williamdphillips
Copy link
Collaborator

Added a PR for this - @nightscape can you verify?

#664

@nightscape
Copy link
Collaborator

Merged 👍
Thanks @williamdphillips!

@cometta
Copy link

cometta commented Dec 1, 2022

I still getting

py4j.Py4JException: Constructor com.crealytics.spark.excel.WorkbookReader([class java.util.HashMap, class org.apache.hadoop.conf.Configuration]) does not exist

may i know is the merge available in spark-excel_2.12-3.3.1_0.18.5 ?

@nightscape
Copy link
Collaborator

Hmm, it should definitely be in 0.18.5: https://github.com/crealytics/spark-excel/blob/main/src/main/scala/com/crealytics/spark/excel/WorkbookReader.scala#L58
I have no idea what's going wrong...

@cometta
Copy link

cometta commented Dec 2, 2022

@cometta
Copy link

cometta commented Dec 27, 2022

anyone tested this ?

@mattoh91
Copy link

mattoh91 commented Jan 10, 2023

Hi, am trying to use the workbook reader to dynamically obtain multiple sheet names from the same excel file, but ran into this error:

"An error occurred while calling None.com.crealytics.spark.excel.WorkbookReader. Trace:\npy4j.Py4JException: Constructor com.crealytics.spark.excel.WorkbookReader([class java.util.HashMap, class org.apache.hadoop.conf.Configuration]) does not exist\n\tat py4j.reflection.ReflectionEngine.getConstructor(ReflectionEngine.java:179)\n\tat py4j.reflection.ReflectionEngine.getConstructor(ReflectionEngine.java:196)\n\tat py4j.Gateway.invoke(Gateway.java:237)\n\tat py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:80)\n\tat py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:69)\n\tat py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:182)\n\tat py4j.ClientServerConnection.run(ClientServerConnection.java:106)\n\tat java.lang.Thread.run(Thread.java:750)\n\n"

Any suggestion on how to resolve this?

@nightscape
Copy link
Collaborator

nightscape commented Jan 10, 2023

Ah, I think I know what the issue is. The method you need to call is actually not a constructor, but a static method.
Can you try com.crealytics.spark.excel.WorkbookReader.apply(...) instead of com.crealytics.spark.excel.WorkbookReader(...)?

You can also check the required signature like this

javap jar:file:///path/to/downloaded/spark-excel_2.12-3.3.1_0.18.5.jar!/com/crealytics/spark/excel/WorkbookReader.class

@mattoh91
Copy link

Thanks nightscape! The static method works!

@nightscape
Copy link
Collaborator

Great! Would you mind creating a PR to enhance the documentation?

@Krukosz Krukosz mentioned this pull request Apr 30, 2024
2 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

10 participants