Write to file as stream row by row? #29

Ryouku · 2017-03-10T07:37:00Z

Hi,
would it be possible to have something like stream-writing to a file in case of lots of rows? Like hitting the excel sheet row limit?

I'm not so familiar with GoLang's IO, but maybe there is no problem to achieve this?

For example I have millions of rows to put in excel sheets, and it is obviously memory heavy task. The idea is to read slice of rows from data source, like 100,000 at the time, then write them to the excel file in a loop until all rows was written. In such case there would be produced either excel file with few sheets or few excel files.

The primary idea is to handle memory consumption to reasonable amounts of RAM, like few gigabytes not tens..

What's Your opinion?

xuri · 2017-03-10T09:17:03Z

@Ryouku Thanks for your issue. As a performance problem, I think this problem can be linked with issue #20 and #26. Next I would like to design a streaming interface to solve this problem.

Ryouku · 2017-03-10T12:14:38Z

Thanks!

beezir · 2017-03-30T03:33:21Z

Regarding low-hanging fruit for performance - I was having issues outputting a few thousand rows with roughly 30-40 columns each in any sort of reasonable time. I narrowed it down and it appears the performance problems are coming from changes made in this commit (fixing missing hyperlink): 6e1475a#diff-ec4216644fd4919ab5c2a68efb70d453

That change causes the entire spreadsheet data structure to be fully rebuilt every time cell data is set instead of appending new rows when needed. I haven't had time to dive into exactly what was fixed with that issue which is why I'm not submitting a pull request right now, but if I use a version of completeRow similar to the one prior to that commit, my time for xslx generation goes from upwards of a couple minutes to about 3 seconds with identical output (20,000 rows, 30 columns). It still slows down quite a bit at higher numbers, but it's a target for massive improvement in the lower ranges as a starting point.

ghost · 2017-03-30T03:33:54Z

why？发自网易邮箱大师 On 03/30/2017 11:33，Dave Butler<notifications@github.com> wrote： Regarding low-hanging fruit for performance - I was having issues outputting a few thousand rows with roughly 30-40 columns each in any sort of reasonable time. I narrowed it down and it appears the performance problems are coming from changes made in this commit (fixing missing hyperlink): 6e1475a#diff-ec4216644fd4919ab5c2a68efb70d453 That change causes the entire spreadsheet data structure to be fully rebuilt every time cell data is set instead of appending new rows when needed. I haven't had time to dive into exactly what was fixed with that issue which is why I'm not submitting a pull request right now, but if I use a version of completeRow similar to the one prior to that commit, my time for xslx generation goes from upwards of a couple minutes to about 3 seconds with identical output (20,000 rows, 30 columns). It still slows down quite a bit at higher numbers, but it's a target for massive improvement in the lower ranges as a starting point. — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or mute the thread.

…. Relate issue #29. The benchmark report of the current version of this library is shown on the wiki page.

xuri · 2017-03-31T09:37:45Z

@beezir Thanks for your comments. I have made some optimize, remove redundant XML element checking logic. The benchmark report of the current version of this library is shown on the wiki page.

mewben · 2017-06-05T10:17:08Z

+1 to this. Is there a way to send the result by row? I experienced around 5-10 minutes creating and downloading a 2MB xlsx file having 1000+ rows.

Ryouku · 2017-07-04T12:57:29Z

Hey @xuri, any update?

xuri · 2017-07-04T13:03:29Z

@Ryouku Yes, I have optimized the memory usage when parsing large file. Please upgrade the library to the last version.

Ryouku · 2017-07-04T13:26:26Z

Thanks! Will give it a shot.

pjmuller · 2017-10-13T06:26:03Z

Hi xuri,

I've gone through the multiple github issues related to performance / memory management. However I did not figure out if the following is already possible.

Situation: Having to write > 100.000 lines.
Solution A: have an io interface which periodically will write to disk so that memory consumption does go through the roof.
Solution B: Write to different xlsx files, and then have a way to merge them together (but again, without having to use big amounts of memory)

Does this exist and give you a snippet demonstrating this? Would also be nice for the standard documentation.

pjmuller · 2017-10-23T12:29:21Z

@xuri Can you check my comment above?

kharism · 2018-03-05T05:41:51Z

@pjmuller maybe https://github.com/eaciit/hoboexcel will works for your need? the lib uses buffered input/output and custom tailored to handle large simple xlsx file.

cemremengu · 2018-03-15T21:52:29Z

This is really causing issues... 😢 Memory usage goes off the roof

Any updates?

duffiye · 2018-09-29T09:14:22Z

Any updates?

xuri · 2019-12-10T15:20:59Z

Hi all, @Ryouku, @beezir, @mewben, @pjmuller, @kharism, @cemremengu, @duffiye Sorry for my late reply. I have added a stream writer for generating a new worksheet with huge amounts of data.

xuri added the enhancement New feature or request label Mar 10, 2017

xuri mentioned this issue Mar 13, 2017

Setting Value Efficiency #26

Closed

xuri added a commit that referenced this issue Mar 31, 2017

Performance enhancements, remove redundant XML element checking logic…

330c7a0

…. Relate issue #29. The benchmark report of the current version of this library is shown on the wiki page.

xuri mentioned this issue May 20, 2019

Large amount of data export memory occupies a huge amount, which is far larger than the actual Excel size and can not recover memory. #406

Closed

xuri mentioned this issue Jul 5, 2019

Is it possible to import specific columns only ? #251

Closed

xuri closed this as completed Dec 10, 2019

xuri mentioned this issue Jan 9, 2020

When create excel, memory using is very large; how to fix this? #555

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Write to file as stream row by row? #29

Write to file as stream row by row? #29

Ryouku commented Mar 10, 2017

xuri commented Mar 10, 2017

Ryouku commented Mar 10, 2017

beezir commented Mar 30, 2017

ghost commented Mar 30, 2017 via email

xuri commented Mar 31, 2017 •

edited

mewben commented Jun 5, 2017

Ryouku commented Jul 4, 2017 •

edited by xuri

xuri commented Jul 4, 2017

Ryouku commented Jul 4, 2017

pjmuller commented Oct 13, 2017 •

edited

pjmuller commented Oct 23, 2017

kharism commented Mar 5, 2018

cemremengu commented Mar 15, 2018

duffiye commented Sep 29, 2018

xuri commented Dec 10, 2019 •

edited

Write to file as stream row by row? #29

Write to file as stream row by row? #29

Comments

Ryouku commented Mar 10, 2017

xuri commented Mar 10, 2017

Ryouku commented Mar 10, 2017

beezir commented Mar 30, 2017

ghost commented Mar 30, 2017 via email

xuri commented Mar 31, 2017 • edited

mewben commented Jun 5, 2017

Ryouku commented Jul 4, 2017 • edited by xuri

xuri commented Jul 4, 2017

Ryouku commented Jul 4, 2017

pjmuller commented Oct 13, 2017 • edited

pjmuller commented Oct 23, 2017

kharism commented Mar 5, 2018

cemremengu commented Mar 15, 2018

duffiye commented Sep 29, 2018

xuri commented Dec 10, 2019 • edited

xuri commented Mar 31, 2017 •

edited

Ryouku commented Jul 4, 2017 •

edited by xuri

pjmuller commented Oct 13, 2017 •

edited

xuri commented Dec 10, 2019 •

edited