How to fetch/read the multiple sheets from the excel(.xlsx, .xls) file. #79

Bala-1370 · 2023-02-24T09:46:47Z

Bala-1370
Feb 24, 2023

Here we are using the format("org.zuinnote.spark.office.excel").option("hadoopoffice.read.header.read", "true").option("read.locale.bcp47", "en").option("read.spark.simpleMode", true).load(path);

We need to read the multiple sheets data from an excel file. We are using above approach, but it is returning clubbed and half of the information from all the sheets present in it.

.xlsx we are using having 3 sheets in it. It is giving first sheet cells information and it is clubbing the information of other two sheets cell information.

jornfranke · 2023-02-24T20:10:44Z

jornfranke
Feb 24, 2023
Maintainer

Thanks for the message.

You read them in simple mode - this requires that all sheets have the same number of columns and the same type. Spark requires the data in tabular format.
You can though deactivate the "read.spark.simpleMode" (set it to false). In this way each row in the Spark dataframe contains an Excel cell (https://github.com/ZuInnoTe/hadoopoffice/blob/main/fileformat/src/main/java/org/zuinnote/hadoop/office/format/common/dao/SpreadSheetCellDAO.java).

https://github.com/ZuInnoTe/hadoopoffice/blob/main/examples/scala-spark3-excel-in-ds/src/main/scala/org/zuinnote/spark/office/example/excel/SparkScalaExcelInDataSource.scala shows how you can process this.

An alternative would be that you ensure that your Excel has the same columns in all sheets.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to fetch/read the multiple sheets from the excel(.xlsx, .xls) file. #79

{{title}}

Replies: 1 comment

{{title}}

Select a reply

How to fetch/read the multiple sheets from the excel(.xlsx, .xls) file. #79

Bala-1370 Feb 24, 2023

Replies: 1 comment

jornfranke Feb 24, 2023 Maintainer

Bala-1370
Feb 24, 2023

jornfranke
Feb 24, 2023
Maintainer