Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

generating 100k rows in excel crashes the browser #18

Closed
rivlinehud opened this issue Feb 28, 2017 · 15 comments
Closed

generating 100k rows in excel crashes the browser #18

rivlinehud opened this issue Feb 28, 2017 · 15 comments

Comments

@rivlinehud
Copy link

Hi :-) I love this project !

Is it possible to stream the file, instead of inflating the browsers cash ?

I attached some code bellow

thanks a lot :-)

`function doExport(page, options) {
return generate(page, options)
.then(function (blob) {
if (window.navigator && window.navigator.msSaveOrOpenBlob) {
//ie
window.navigator.msSaveOrOpenBlob(blob, page + ".xlsx");
} else {
//not ie
var url = window.URL.createObjectURL(blob);
var a = document.createElement("a");
document.body.appendChild(a);
a.href = url;
a.download = page + ".xlsx";
a.click();
window.URL.revokeObjectURL(url);
document.body.removeChild(a);
}
})
.catch(function (err) {
alert(err.message || err);
throw err;
});
}

function generate(page,options) {
options = options || {};

options.fileName = options.fileName || page;
options.headers = BTB.pages[page].exportData.headers;
options.fields = BTB.pages[page].exportData.fields;
options.data = BTB.pages[page].exportData.rows;
options.lang = options.lang || BTB.language.get();

//prepare headers
var translatedHeaders = []
options.headers.forEach(function (item) {
	var header = BTB.language.dictionary[item] ? BTB.language.dictionary[item][options.lang] : "";
	translatedHeaders.push(header);
});
options.headers = [translatedHeaders];

//prepare data structure for excel
var newData = [];
var newDataLine = [];

options.data.forEach(function (item) {
	newDataLine = [];
	options.fields.forEach(function (field) {
		var newItem = item[field] || "";
		newDataLine.push(newItem);
	})
	newData.push(newDataLine);
});

for (var i = 0; i <= 100000; i++) {
	newData.push(newData[0]);
}

options.newData = newData;

return getWorkbook(options.lang)
	.then(function (workbook) {

		workbook.sheet(0).range("A1:Z1").value(options.headers);
		workbook.sheet(0).range("A2:Z100000").value(options.newData);
		return workbook.outputAsync();//type
	})

}

function getWorkbook(lang) {
lang = lang || "En";
return new Promise(function (resolve, reject) {

	var req = new XMLHttpRequest();
	var url = "/templates/template" + lang + ".xlsx"; //urlInput.value;
	req.open("GET", url, true);
	req.responseType = "arraybuffer";
	req.onreadystatechange = function () {
		if (req.readyState === 4) {
			if (req.status === 200) {
				resolve(XlsxPopulate.fromDataAsync(req.response));
			} else {
				reject("Received a " + req.status + " HTTP code.");
			}
		}
	};

	req.send();
});

}`

@dtjohnson
Copy link
Owner

At which point is it crashing? Which browser(s)?

Could you provide a full working code example as an attachment that I could run?

@rivlinehud
Copy link
Author

Hi ... here is a repository with the project.
hope its helpfull..

Im developing on chrome and the browser memory is inflated on line 55 --> export.js

thanks :-)

@dtjohnson
Copy link
Owner

Thanks. The problem is that in line 55 you are setting the value for 2 million cells. Whenever you set the value for a cell, an object gets created for it that looks something like this:

{
    attributes: { r: "A1", s: "1", t: "s" },
    children: [{
        name: "v",
        children: [5]
    }]
}

Dumping a heap snapshot, it looks like that object takes around 600 bytes. Multiply that by 2 million and you run into the V8 heap limit. While I'm sure it's possible to find ways to minimize the memory, it's not easy and is unlikely to happen. The good news is that the limit in Chrome/Node.js does seem to be right about 2 million cells. That seems to be a healthy amount for most use cases.

@rivlinehud
Copy link
Author

I see ...
Is there a way to stream like the file in order to free the browser memory ?

@dtjohnson
Copy link
Owner

Nope. The file has to be held in memory in the browser. Do you need more than 2 million cells?

@reviewher
Copy link

@dtjohnson @rivlinehud there's actually a lower limit to be aware of: V8 strings cannot be larger than 256 MB nodejs/node#3175 (comment)

@rivlinehud
Copy link
Author

@dtjohnson, yepp ... i need the capability to export large stock files ...
if i cant do it in the browser i will have to do it on the server and that is less convenient,
because of separation of code...

@dtjohnson
Copy link
Owner

@reviewher, thanks for that. I wasn't aware of the string limit. That's not the issue in this particular problem, but I'm sure I'll run into it at some point.

@rivlinehud, how many cells do you need? We should be able to get to 4 million cells without too much difficulty...

@rivlinehud
Copy link
Author

@dtjohnson, 4 million cells would be amaizing !!! :-)

@dtjohnson
Copy link
Owner

OK. So this is going to be harder than I thought. With the current setup you are actually limited to about 650k cells if you want to output a file. While you can certainly get 2M cells in memory, the downstream steps to turn it into a file consume additional memory. I do think 3-4 million cells should be possible, but it will require a lot more effort.

Here is the basic process the xlsx-populate uses:

  1. Opens a file using JSZip. A file with 4 million cells takes only 120kB uncompressed so I don't think there is any memory concerns at this stage.
  2. Uses sax js to stream the XML text in the files into a corresponding object model. Since sax js uses streaming and the final object model must be held entirely in memory I don't think there is any memory savings here.
  3. As you manipulate the workbook, xlsx-populate manipulates the object model. Since the point of xlsx-populate is to be minimally invasive and fairly ignorant about the full Open XML spec, there is a limit to what we can do here to cut memory. One obvious big savings is to eliminate child arrays if there is only a single child. That's pretty easy to do and seems to cut memory usage by 40% or so.
  4. To write to file it must first convert the object model back to XML. We currently use xmlbuilder-js to create an XML document and then convert it to a string. The issue is that the intermediate XML document is itself pretty large and needs to be held in memory at the same time as the object model. To save memory, we'd need to eliminate this and go straight from object model to an XML string. It's definitely doable, but non-trivial.
  5. Use JSZip to zip up all of the strings into an XLSX file. Again, the size of the strings/zips are not significant so I don't think there is a concern here.

So to summarize, I do think it's possible to get to 3-4 million cells, but there's a decent amount of work to get there. You might be better off looking at another library like Sheet JS for the time being.

@dtjohnson
Copy link
Owner

OK. So I just published v1.2.0, which is as far as I'm going to take this at this point. I shaved the memory of the data structure a bit and a rewrote the XML builder from scratch, which saved a huuuuge amount of memory. Right now you can get just over 2 million cells in Chrome. Now CPU is the big bottleneck. Also, performance in IE is terrible in general, and this doesn't fair well either. I'm hesitant to push performance too much more at this stage. I'd rather wait for more completeness of the API before. So hopefully this is good enough for you.

@rivlinehud
Copy link
Author

thanks so much !! :-)

i will do some tests soon and let you know my experience with the uppgrade

@dtjohnson
Copy link
Owner

I just published v1.6.0 that has much better performance. In tests in V8 I was able to generate 9.5 million cells.

@unchewyglees
Copy link

Amazing! I've tried a few libraries that would all run out of memory and crash the browser before finding this one. This was able to do approximately 3.75 million cells in a file I just generated. Great work!

@arthanariramesh
Copy link

Is there a way to over come this memory crash issue. I am trying to load a 60 mb xlsx.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants