Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Only first sheet is returning when working with Huge data sets #563

Open
gtd57 opened this issue Feb 18, 2017 · 12 comments

Comments

Projects
None yet
4 participants
@gtd57
Copy link

commented Feb 18, 2017

While working or bigger data sets( tested with 2000 cols X 1000 rows) with multiple sheets. In that case, after certain MB, (in my case 40MB), sheet_to_row_object_array is only returning first sheet.

Is it a limitation? Are there any alternatives to get all the sheets regardless of files size?

@reviewher

This comment has been minimized.

Copy link
Contributor

commented Feb 18, 2017

sheet_to_row_object_array, as the name suggests, takes a single worksheet as input. If you want to convert the second or third worksheet in a workbook you need to read the workbook first, then get the sheet name you want from the SheetNames array, then pull from the Sheets object. For example in node, if you want to convert the third worksheet:

var XLSX = require('xlsx')
var wb = XLSX.readFile('test.xlsx');
var name_of_third_worksheet = wb.SheetNames[3-1]; // SheetNames is an array of strings
var ws = wb.Sheets[name_of_third_worksheet];
var output = XLSX.utils.sheet_to_row_object_array(ws);

The browser demo iterates across the sheet names in a similar manner:

	workbook.SheetNames.forEach(function(sheetName) {
		var roa = X.utils.sheet_to_row_object_array(workbook.Sheets[sheetName]);
		if(roa.length > 0){
			result[sheetName] = roa;
		}
	});

If you have an example file where you can read one worksheet but not another, then there's probably a parsing issue with that specific sheet. If you could send us a sample, we'd be glad to take a look. Alternatively, if you can enable dev mode (set the option WTF:1 in the read or readFile options object) you may see a concrete error message.

@gtd57

This comment has been minimized.

Copy link
Author

commented Feb 18, 2017

@reviewher Thanks for the quick reply.

continue with my above reply... Actually, the problem is with. let excelData = XLSX.read("<some file>", {type: 'binary'});, then calling excelData.Sheets not returning the object of all sheets, its only returning object with one key which is first sheet name and its data.

Just to mention there are no errors in the sheet. And there is no special data to share with. Just dummy set of columns and rows are fine to test.

One last thing. While doing the testing js-xlsx is not returning any errors.(I will try with WTF:1 option set, and share the details)

@gtd57

This comment has been minimized.

Copy link
Author

commented Feb 22, 2017

Hi @reviewher, any alternative for this? any help is appreciated.

@reviewher

This comment has been minimized.

Copy link
Contributor

commented Feb 22, 2017

@gtd57 what does excelData.SheetNames show?

If you can share a file then we can take a look. If you don't want to upload to github you can email a file to dev at sheetjs dot com and someone can take a look

@gtd57

This comment has been minimized.

Copy link
Author

commented Feb 24, 2017

As per SheetJS/js-xlsx API, excelData.SheetNames must contain sheet name as key for each sheet and it data as value, However in my case it is only showing one key and value of first sheet data only.

I can send you the data, but its is huge file above 50MB. However, that sheet does not contain any special data, You can create data on your side, I just popped some, one word strings in the first column and first row and rest are just numbers and in total I have used 2000 cols in 1200 rows and total of 3 sheets. My requirement is upto 8 sheets

@SheetJSDev

This comment has been minimized.

Copy link
Contributor

commented Feb 24, 2017

Hello @gtd57 ! Just for you I ran a test: https://gist.github.com/SheetJSDev/205d9d9c14c5203e602da70ed57e0a43

It generates an XLSX workbook with 8 sheets. Each sheet happens to be the same but it doesn't affect the nature of the issue. The worksheet is 2000 columns and 1000 rows. You can run it to generate and to verify.

On a side note: the files are massive! without compression the generated file was 475MB. With compression it was a more manageable 98M, both well beyond the 50M limit.

I tried using the node command line tool to read the data. Turns out you hit a memory snag with the default settings, so you need to override:

$ node --max-old-space-size=4096 bin/xlsx.njs sheetjs.xlsx SheetJSA

After waiting a minute and 15 seconds, finally saw the CSV output!

So if it's only showing you one sheet, I'm guessing there's an issue with the other sheets. I'd like to take a look. If you can post the file to a hosting service like Dropbox, you can email me a link and I can download and take a look.

@mtharrison

This comment has been minimized.

Copy link

commented Feb 26, 2017

I'm getting the same or similar issue. I have a workbook with 7 sheets. SheetNames contains all 7 sheetnames but one of the sheets is missing altogether from Sheets. The missing sheet has 382,000 rows. Do you think I'm hitting some kind of limit in xlsx? Do you have any suggested workaround or if it sounds like I bug can I help to debug?

@SheetJSDev

This comment has been minimized.

Copy link
Contributor

commented Feb 26, 2017

@mtharrison The limit is dictated by memory, and honestly you would be able to tell in either case when you hit the memory limit (chrome shows "oh snap" and node gives you a long message about GC). Assuming you don't see that, there's something in the file causing trouble.

The simplest thing you could do is pass the option WTF:1 to the reader:

  • If you are using node, and if you installed xlsx globally, you can just run xlsx --dev --perf file_name the --dev enables all errors and the --perf tells the xlsx command not to output anything

  • If you are using XLSX.read, the second argument is the options argument, so just add WTF:1 there. For example, in the browser using the index.html demo in the repo, add that key to the object {type: 'binary'} and {type: 'base64'}

  • If you are using XLSX.readFile, just pass a second argument with {WTF:1} or add that key to the file. For example, the example in the README https://github.com/SheetJS/js-xlsx/#parsing-workbooks would be var workbook = XLSX.readFile('test.xlsx', {WTF:1});

That option will expose any sheet-specific errors. You should get an error message that you can share.

I'd like to see a file if you can share it. Since it probably would exceed the github or email limit, please post somewhere (like dropbox) and email or share a link.

@mtharrison

This comment has been minimized.

Copy link

commented Feb 26, 2017

Thanks @SheetJSDev for the super-quick response. I tried with the debug option and the error I get is:

Error: Uncaught error: "toString()" failed
    at Buffer.toString (buffer.js:495:11)
    at getdatastr (/REDACTED/node_modules/xlsx/xlsx.js:1318:72)
    at getdata (/REDACTED/node_modules/xlsx/xlsx.js:1336:95)
    at getzipdata (/REDACTED/node_modules/xlsx/xlsx.js:1352:19)
    at safe_parse_ws (/REDACTED/node_modules/xlsx/xlsx.js:12165:26)
    at parse_zip (/REDACTED/node_modules/xlsx/xlsx.js:12268:3)
    at read_zip (/REDACTED/node_modules/xlsx/xlsx.js:12408:9)
    at Object.readSync [as read] (/REDACTED/node_modules/xlsx/xlsx.js:12420:21)

Looks like I'm probably hitting V8's max string size. I logged the buffer size out before toString() is called on it and it's 356mb.

@SheetJSDev

This comment has been minimized.

Copy link
Contributor

commented Feb 26, 2017

@mtharrison you caught me in a late night session!

If you are running in a recent version of node, can you try calling node with --max-old-space-size=4096? To do this against the global install, run

$ node --max-old-space-size=4096 `which xlsx` filename

If you are running something like node main.js normally, just add the flag:

$ node --max-old-space-size=4096 your_script
@SheetJSDev

This comment has been minimized.

Copy link
Contributor

commented Feb 26, 2017

@mtharrison scratch that.

The nodejs string limit is 16 bytes shy of 256MB: nodejs/node#3175 . I thought the previous write test case would cover it, but the underlying xml files are 60M.

I'm going to resolve this in the next version by trying to catch that particular error and give more informative output.

Can you do one last thing: try saving that file as XLSB in Excel and seeing if you hit the same problem? The XLSB parser keeps data as buffers so it theoretically should not trigger a 256MB string conversion

@mtharrison

This comment has been minimized.

Copy link

commented Feb 26, 2017

@SheetJSDev Using an XLSB does indeed sidestep that issue! Thanks a lot for the help.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.