Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Write to file as stream row by row? #29

Closed
Ryouku opened this issue Mar 10, 2017 · 15 comments
Closed

Write to file as stream row by row? #29

Ryouku opened this issue Mar 10, 2017 · 15 comments
Labels
enhancement New feature or request

Comments

@Ryouku
Copy link

Ryouku commented Mar 10, 2017

Hi,
would it be possible to have something like stream-writing to a file in case of lots of rows? Like hitting the excel sheet row limit?

I'm not so familiar with GoLang's IO, but maybe there is no problem to achieve this?

For example I have millions of rows to put in excel sheets, and it is obviously memory heavy task. The idea is to read slice of rows from data source, like 100,000 at the time, then write them to the excel file in a loop until all rows was written. In such case there would be produced either excel file with few sheets or few excel files.

The primary idea is to handle memory consumption to reasonable amounts of RAM, like few gigabytes not tens..

What's Your opinion?

@xuri xuri added the enhancement New feature or request label Mar 10, 2017
@xuri
Copy link
Member

xuri commented Mar 10, 2017

@Ryouku Thanks for your issue. As a performance problem, I think this problem can be linked with issue #20 and #26. Next I would like to design a streaming interface to solve this problem.

@Ryouku
Copy link
Author

Ryouku commented Mar 10, 2017

Thanks!

@beezir
Copy link

beezir commented Mar 30, 2017

Regarding low-hanging fruit for performance - I was having issues outputting a few thousand rows with roughly 30-40 columns each in any sort of reasonable time. I narrowed it down and it appears the performance problems are coming from changes made in this commit (fixing missing hyperlink): 6e1475a#diff-ec4216644fd4919ab5c2a68efb70d453

That change causes the entire spreadsheet data structure to be fully rebuilt every time cell data is set instead of appending new rows when needed. I haven't had time to dive into exactly what was fixed with that issue which is why I'm not submitting a pull request right now, but if I use a version of completeRow similar to the one prior to that commit, my time for xslx generation goes from upwards of a couple minutes to about 3 seconds with identical output (20,000 rows, 30 columns). It still slows down quite a bit at higher numbers, but it's a target for massive improvement in the lower ranges as a starting point.

@ghost
Copy link

ghost commented Mar 30, 2017 via email

xuri added a commit that referenced this issue Mar 31, 2017
…. Relate issue #29. The benchmark report of the current version of this library is shown on the wiki page.
@xuri
Copy link
Member

xuri commented Mar 31, 2017

@beezir Thanks for your comments. I have made some optimize, remove redundant XML element checking logic. The benchmark report of the current version of this library is shown on the wiki page.

@mewben
Copy link

mewben commented Jun 5, 2017

+1 to this. Is there a way to send the result by row? I experienced around 5-10 minutes creating and downloading a 2MB xlsx file having 1000+ rows.

@Ryouku
Copy link
Author

Ryouku commented Jul 4, 2017

Hey @xuri, any update?

@xuri
Copy link
Member

xuri commented Jul 4, 2017

@Ryouku Yes, I have optimized the memory usage when parsing large file. Please upgrade the library to the last version.

@Ryouku
Copy link
Author

Ryouku commented Jul 4, 2017

Thanks! Will give it a shot.

@pjmuller
Copy link

pjmuller commented Oct 13, 2017

Hi xuri,

I've gone through the multiple github issues related to performance / memory management. However I did not figure out if the following is already possible.

Situation: Having to write > 100.000 lines.
Solution A: have an io interface which periodically will write to disk so that memory consumption does go through the roof.
Solution B: Write to different xlsx files, and then have a way to merge them together (but again, without having to use big amounts of memory)

Does this exist and give you a snippet demonstrating this? Would also be nice for the standard documentation.

@pjmuller
Copy link

@xuri Can you check my comment above?

@kharism
Copy link

kharism commented Mar 5, 2018

@pjmuller maybe https://github.com/eaciit/hoboexcel will works for your need? the lib uses buffered input/output and custom tailored to handle large simple xlsx file.

@cemremengu
Copy link

This is really causing issues... 😢 Memory usage goes off the roof

Any updates?

@duffiye
Copy link

duffiye commented Sep 29, 2018

Any updates?

@xuri
Copy link
Member

xuri commented Dec 10, 2019

Hi all, @Ryouku, @beezir, @mewben, @pjmuller, @kharism, @cemremengu, @duffiye Sorry for my late reply. I have added a stream writer for generating a new worksheet with huge amounts of data.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

8 participants