Improve performance #19

Open
peetersn opened this Issue Feb 25, 2014 · 2 comments

Comments

Projects
None yet
2 participants

Hi,

Quick question regarding this awesome plugin.
I'm using the following approach:

new WebXlsxExporter().with {
    setResponseHeaders(response)
    fillHeader(headers)
    fillRow(["aaa", "bbb", 13, new Date()], 1)
    save(response.outputStream)
}

... but the problem is that the amount of rows is big (20000). So the file generation takes a while (20 seconds or more), the generated file is 2MB+.
Since we're running on Heroku, the connection is reset by the Heroku Router mesh after 30 seconds. I was wondering what I could do to improve performance:

  1. Would using a template help at all?
  2. Is there anyway I can parallelize the implementation?
  3. Can I send a chuncked or streaming response?
  4. Can I zip the file?
  5. Would reverting to Apache POI bring any performance gain?

The alternative is to generate the file in the background using some Groovy async support, but I'm not quite sure what to use on the front end to "poll the status".
Any recommendation what I can do with this is approach is most welcome.

Collaborator

jakubnabrdalik commented Mar 2, 2014

As for idea from question 5: For such a big excels, you could go with SXSSF (pure Apache POI has 3 modes of operation, I'm not using the most efficient in terms of speed or memory in this plugin): http://poi.apache.org/spreadsheet/index.html

"SXSSF is an API-compatible streaming extension of XSSF to be used when very large spreadsheets have to be produced, and heap space is limited. SXSSF achieves its low memory footprint by limiting access to the rows that are within a sliding window, while XSSF gives access to all rows in the document. Older rows that are no longer in the window become inaccessible, as they are written to the disk. "

You could also add support for SXSSF to this plugin and I'd be happy to merge. It doesn't look difficult.

As for the rest of questions:

  1. Not much
  2. Sure, you can fire it as a background job (quartz, Spring @async, or GPars, to keep it simple and not deal with threads directly), save the file to disk, and return it after it's created. Just ask from the html page for the file once in a while (AJAX) to bypass Heroku timeout. I don't know what js libraries you use, but it's simple.
  3. Theoretically yes, practically not really. I'd go with other options.
  4. Yes, it's actually already a zip, but it's not being send as zip AFAIK. Take a look at FileManipulationAbility.save(OutputStream outputStream). Last thing I do is close the zip package (XlsxExporter.closeZipPackageIfPossible). You can experiment with moving workbook.write(outputStream) after closeZipPackageIfPossible. Not sure what happens then, though - let me know what you find.

peetersn commented Mar 2, 2014

Thanks for the pointers and the detailed answers. I'll definitely look into SXSSF. If I am reaching any significant improvement, I'll see if I can port some of the code to your plugin, that could be handy.

I have done some work with 2. I'll push it to a Github repo soon and will let you know.
I'll check what happens with 4.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment