New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Write to file as stream row by row? #29
Comments
Thanks! |
Regarding low-hanging fruit for performance - I was having issues outputting a few thousand rows with roughly 30-40 columns each in any sort of reasonable time. I narrowed it down and it appears the performance problems are coming from changes made in this commit (fixing missing hyperlink): 6e1475a#diff-ec4216644fd4919ab5c2a68efb70d453 That change causes the entire spreadsheet data structure to be fully rebuilt every time cell data is set instead of appending new rows when needed. I haven't had time to dive into exactly what was fixed with that issue which is why I'm not submitting a pull request right now, but if I use a version of completeRow similar to the one prior to that commit, my time for xslx generation goes from upwards of a couple minutes to about 3 seconds with identical output (20,000 rows, 30 columns). It still slows down quite a bit at higher numbers, but it's a target for massive improvement in the lower ranges as a starting point. |
why?
发自网易邮箱大师
On 03/30/2017 11:33,Dave Butler<notifications@github.com> wrote:
Regarding low-hanging fruit for performance - I was having issues outputting a few thousand rows with roughly 30-40 columns each in any sort of reasonable time. I narrowed it down and it appears the performance problems are coming from changes made in this commit (fixing missing hyperlink): 6e1475a#diff-ec4216644fd4919ab5c2a68efb70d453
That change causes the entire spreadsheet data structure to be fully rebuilt every time cell data is set instead of appending new rows when needed. I haven't had time to dive into exactly what was fixed with that issue which is why I'm not submitting a pull request right now, but if I use a version of completeRow similar to the one prior to that commit, my time for xslx generation goes from upwards of a couple minutes to about 3 seconds with identical output (20,000 rows, 30 columns). It still slows down quite a bit at higher numbers, but it's a target for massive improvement in the lower ranges as a starting point.
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub, or mute the thread.
|
…. Relate issue #29. The benchmark report of the current version of this library is shown on the wiki page.
+1 to this. Is there a way to send the result by row? I experienced around 5-10 minutes creating and downloading a 2MB xlsx file having 1000+ rows. |
Hey @xuri, any update? |
@Ryouku Yes, I have optimized the memory usage when parsing large file. Please upgrade the library to the last version. |
Thanks! Will give it a shot. |
Hi xuri, I've gone through the multiple github issues related to performance / memory management. However I did not figure out if the following is already possible. Situation: Having to write > 100.000 lines. Does this exist and give you a snippet demonstrating this? Would also be nice for the standard documentation. |
@xuri Can you check my comment above? |
@pjmuller maybe https://github.com/eaciit/hoboexcel will works for your need? the lib uses buffered input/output and custom tailored to handle large simple xlsx file. |
This is really causing issues... 😢 Memory usage goes off the roof Any updates? |
Any updates? |
Hi all, @Ryouku, @beezir, @mewben, @pjmuller, @kharism, @cemremengu, @duffiye Sorry for my late reply. I have added a stream writer for generating a new worksheet with huge amounts of data. |
Hi,
would it be possible to have something like stream-writing to a file in case of lots of rows? Like hitting the excel sheet row limit?
I'm not so familiar with GoLang's IO, but maybe there is no problem to achieve this?
For example I have millions of rows to put in excel sheets, and it is obviously memory heavy task. The idea is to read slice of rows from data source, like 100,000 at the time, then write them to the excel file in a loop until all rows was written. In such case there would be produced either excel file with few sheets or few excel files.
The primary idea is to handle memory consumption to reasonable amounts of RAM, like few gigabytes not tens..
What's Your opinion?
The text was updated successfully, but these errors were encountered: