Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reading multiple files #97

Closed
surtamikalai opened this issue Nov 29, 2018 · 1 comment
Closed

Reading multiple files #97

surtamikalai opened this issue Nov 29, 2018 · 1 comment

Comments

@surtamikalai
Copy link

It is no an issue, but I just don't know where is the best place to discuss project related question. Do you have in plans to add feature to read multiple xlsx files in directory, not only one at once? Spark supports doing it while working with .csv files by specifying wildcards/regexp. Then you "input_file_name()" function that returns column with source filename of every record.

@nightscape
Copy link
Collaborator

Duplicate of #74

@nightscape nightscape marked this as a duplicate of #74 Nov 29, 2018
quanghgx added a commit to quanghgx/spark-excel that referenced this issue Jun 20, 2021
quanghgx added a commit to quanghgx/spark-excel that referenced this issue Aug 12, 2021
nightscape added a commit that referenced this issue Aug 21, 2021
* register data source for .format("excel")

* ignore .vscode

* V2 with new Spark Data Source API, uses FileDataSourceV2

* set header default to true, got 1st test passed

* ExcelHelper become options awareness

* handle string type for error-formula

* PlainNumberReadSuite is good now. Also fixed the issue in #285. This introduces a breaking change (good, I think)

* test-case for issue_285

* Handling Error Cells and Undefined Rows

* Test cases for #52 #74 #97 issues

* format & test cases for column pruning (projection)

* Added more test-cases for numerical types

* Stricter numerical types (Integer, Long and Double) in schema inferring. Issue #162

* preparing for final push on writing

* Apply format & Writing is working

* Added excel-row-number column for issues #40 #59 #115 and refactoring

* refactoring unit-tests

* preparing for MR

* Update all test-cases with ScalaTest 3.x

* Writing aware about dataAddress

* writing with dataAddress; No change on dependencies nor build script

* Schema Infering Improvement: {Iterator instead of Seq; Use both samplingRatio and excerptSize}

* added more recent spark version to CI/CD

* support from spark 2.4.1 up

* Fix scalastyle check & enable non-ascii character due to native of unit-tests

* Update src/main/2.4/scala/com/crealytics/spark/v2/excel/ExcelDataSource.scala

Co-authored-by: Martin Mauch <martin.mauch@gmail.com>

* Update src/main/2.4/scala/com/crealytics/spark/v2/excel/ExcelDataSource.scala

Co-authored-by: Martin Mauch <martin.mauch@gmail.com>

* spark-excel examples in Jupyter Notebook

Co-authored-by: Martin Mauch <martin.mauch@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants