Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Excel add new sheet without overwriting the file #157

Merged
merged 7 commits into from
Jan 10, 2024

Conversation

LeandroC89
Copy link
Contributor

@LeandroC89 LeandroC89 commented Aug 28, 2022

Hello,
I've made a couple changes in order to be able to add sheets to an already existing file (without completely replacing it).

Overview

By using a keepFile parameter the user can now decide to add a new sheet to an Excel file, this parameter is optional and defaults to false to keep the current behavior by default.

Implementation

keepFile

  • Added the new parameter as an optional setting allowing the user to add a sheet to an existing file (if set as true), leaving it blank or setting as false will keep current behavior, completely overwriting the file.

Factory

I had to update the logic by which the workbook was determined for a couple reasons:

  1. Current setting would always create a new file, so a new flow had to be added
  2. Keeping the factory function was generating 0b files when trying to reuse existing file, so it is now returning a WorkBook (keeping the wb type logic as is for xls and xlsx files).

Please let me know if something here needs to be updated as this was the less easy part of the change

Test

  • Added test to test the feasibility of creating/overwriting an Excel file with data for a sheet1, and then the same data is written on a sheet2.
    • Test checks if both sheet contents are the same

.gitignore

  • Added new entry so the Excel file generated by the new test is not kept in version control, as it would have changes everytime the tests were rerun.

Final comments

I hope this helps, and please let me know if there is something I should update or that could have been done in a better way.
Have a good weekend!

Signed-off-by: LeandroC89 <leandroadao3@gmail.com>
Signed-off-by: LeandroC89 <leandroadao3@gmail.com>
 - Updated WorkBook factory as it was previously incompatible with reusing existing workbook
 - Revised test use generated random data

Signed-off-by: LeandroC89 <leandroadao3@gmail.com>
@koperagen
Copy link
Collaborator

Hi! Thank you for the PR! What do you think about creating Workbook in the user code?
Like

val wb = ...
df1.writeExcel(wb, "Sheet1")
df2.writeExcel(wb, "Sheet2")

I like your idea better, because this way people won't have to deal with factories, streams. But still interesting to understand alternatives

@LeandroC89
Copy link
Contributor Author

LeandroC89 commented Aug 29, 2022

Hi,
not a bad a idea, having null create and if a wb is provided it would add. I could even add that having writeExcel return the wb would help in this situation as I see most people would use this to create a new file and then add sheets to it, so this would be an easier way to get the wb.

However, this would require people to create the wb on their own and import the dependencies on their own (and apply the xls/xlsx logic that determines HSSF/XSSF). I feel this latest bit justifies leaving this to the function instead. Specially in Notebooks where the project can be imported with

%use dataframe

I'm not entirely sure as I haven't touched notebooks in a while but I believe in this situation the dependency to the workbook would also have to be added with the import statement which is not as straightforward.

Considering the pros and cons I would have this handled by the function writeExcel itself, although it can also be overloaded to allow both behaviors. (I can also submit this bit if needed)

@koperagen
Copy link
Collaborator

I agree, and writing several sheets is a common enough task to be solved by writeExcel itself. So keepFile is good. Could you also add a paragraph with an example here https://github.com/Kotlin/dataframe/blob/master/docs/StardustDocs/topics/write.md?
Create a new function in
org.jetbrains.kotlinx.dataframe.samples.api.Write and follow the instruction https://github.com/Kotlin/dataframe/blob/master/docs/contributions.md.

@@ -246,9 +246,9 @@ public fun <T> DataFrame<T>.writeExcel(
columnsSelector: ColumnsSelector<T, *> = { all() },
sheetName: String? = null,
writeHeader: Boolean = true,
factory: () -> Workbook
factory: Workbook
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's remove this overload. I think i wanted to create an alias for () -> Workbook so that people could call this function like df.writeExcel(os, ..., WorkbookFactory.XLSX), and WorkbookFactory.XLSX here is a class owned by the dataframe. But i shelved the idea, so this overload adds no value compared to the one below

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking back at it, should it be renamed to workBook now? Since it is what is really getting passed after this change. I can change it if so.

…Write for keepFile functionality

 - Added example of keepFile use in writeExcel to write.md

Signed-off-by: LeandroC89 <leandroadao3@gmail.com>
Signed-off-by: LeandroC89 <leandroadao3@gmail.com>
fun `write to new sheet when keepFile is true`() {
val names = (1..5).map { "column$it" }
val df = dataFrameOf(names).randomDouble(7)
val fileLoc = testResource("generated_wb.xlsx").toURI().toString().removeRange(0, 6)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test is failing with NPE =( Let's use
Files.createTempFile("excel", ".xlsx").toFile()

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated, thanks for the tip!

Signed-off-by: LeandroC89 <leandroadao3@gmail.com>
@Jolanrensen Jolanrensen added this to the 0.9.0 milestone Nov 28, 2022
@Jolanrensen
Copy link
Collaborator

@LeandroC89 Could you update the pr to the newest master? Then we might be able to merge it for the 0.9.0 release

@Jolanrensen Jolanrensen added the enhancement New feature or request label Nov 28, 2022
@Jolanrensen Jolanrensen removed this from the 0.9.0 milestone Dec 15, 2022
# Conflicts:
#	dataframe-excel/src/main/kotlin/org/jetbrains/kotlinx/dataframe/io/xlsx.kt
#	dataframe-excel/src/test/kotlin/org/jetbrains/kotlinx/dataframe/io/XlsxTest.kt
#	docs/StardustDocs/topics/write.md
#	tests/src/test/kotlin/org/jetbrains/kotlinx/dataframe/samples/api/Write.kt
@LeandroC89
Copy link
Contributor Author

Sorry it took so long. At the time there was an issue with Korro, and I haven't checked this one in a while after the import was removed! It touches a few more files now due to having taken all the changes since originally PR'ed.

@koperagen
Copy link
Collaborator

Run formatKotlin task please, then i can merge it

@koperagen koperagen merged commit a33e104 into Kotlin:master Jan 10, 2024
1 check was pending
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants