Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Not able to get right Sheets from huge xlsx files #1010

Closed
codingleo opened this issue Feb 28, 2018 · 4 comments
Closed

Not able to get right Sheets from huge xlsx files #1010

codingleo opened this issue Feb 28, 2018 · 4 comments

Comments

@codingleo
Copy link

codingleo commented Feb 28, 2018

I'm trying to get the data from a huge file (800k rows) and put it into database via lambda (AWS).
To do that I'm getting the xlsx file from S3 as a buffer and reading it.

module.exports.getSalesData = new Promise((resolve, reject) => {
  getFileFromS3(filename)
    .then(function (workbook) {
      console.log(workbook.SheetNames[1]); // 'sales'
      console.log(workbook.SheetNames); // showing sales as [ 'main', 'sales', 'Sheet1' ]
      console.log(Array.isArray(workbook.SheetNames)); // true
      console.log(typeof workbook.SheetNames); // Object
      console.log(Object.keys(workbook.Sheets)); // [ 'main', 'Sheet1' ] == why 'sales' is not here?

      var sheet_name = workbook.SheetNames[1]; // sales tab
      var json_sheet = XLSX.utils.sheet_to_json(workbook.Sheets[sheet_name], { raw: true })
      resolve(json_sheet)
    })
    .catch(err => {
      console.log('File: ' + filename + ' doesn\'t exists on S3 or you\'re not connected to the internet.');
    })
})

The issue is that for workbook.Sheets I should see [ 'main', 'sales', 'Sheet1' ], right?

Then I try to get the number of rows (already converted to JSON) like this:

getSalesData.then(function (data) {
    console.log(data.length + ' rows');
    console.log(data[0]);
  }).catch(err => console.error(err));

Where the parameter data is the json_sheet defined in the function above.
So for data.length (number of rows) I get 0 instead of 800k+.
And, of course, I'm unable to get data[0] which is undefined.

PS.: the file has 57.3mb -- Not sure if it's the cause.

Thanks in advance for help.

@reviewher
Copy link
Contributor

You're likely hitting the node / V8 256MB string limit. You can check by renaming to .ZIP, extracting, and checking the size of the constituent sheet#.xml files.

Can you open the file, save as XLSB or XLS, and see if the file is processed?

@codingleo
Copy link
Author

I've tried with XLSB and got:

--- Last few GCs --->

[43577:0x104001000]   272189 ms: Mark-sweep 1357.1 (1437.5) -> 1357.1 (1438.0) MB, 6563.3 / 0.0 ms  allocation failure GC in old space requested
[43577:0x104001000]   280254 ms: Mark-sweep 1357.1 (1438.0) -> 1357.1 (1407.0) MB, 8064.3 / 0.0 ms  last resort
[43577:0x104001000]   290596 ms: Mark-sweep 1357.1 (1407.0) -> 1357.1 (1407.0) MB, 10342.7 / 0.0 ms  last resort


<--- JS stacktrace --->

==== JS stack trace =========================================

Security context: 0x2f9f4991cef1 <JSObject>
    1: ws_parse [/Users/leoribeiro/Workspace/GregMorrisLambda/node_modules/xlsx/xlsx.js:~12739] [pc=0x133a0174e81a](this=0x1f80ab809dc1 <JSGlobal Object>,val=0x29ae72e24439 <JSArray[3]>,R_n=0xf58521aeb51 <String[11]: BrtCellIsst>,RT=7)
    2: recordhopper [/Users/leoribeiro/Workspace/GregMorrisLambda/node_modules/xlsx/xlsx.js:~2573] [pc=0x133a01736b28](this=0x1f80ab809dc1 <JSGlobal Object>,data=...

FATAL ERROR: CALL_AND_RETRY_LAST Allocation failed - JavaScript heap out of memory
 1: node::Abort() [/usr/local/bin/node]
 2: node::FatalException(v8::Isolate*, v8::Local<v8::Value>, v8::Local<v8::Message>) [/usr/local/bin/node]
 3: v8::Utils::ReportOOMFailure(char const*, bool) [/usr/local/bin/node]
 4: v8::internal::V8::FatalProcessOutOfMemory(char const*, bool) [/usr/local/bin/node]
 5: v8::internal::Factory::NewFixedArray(int, v8::internal::PretenureFlag) [/usr/local/bin/node]
 6: v8::internal::HashTable<v8::internal::StringTable, v8::internal::StringTableShape, v8::internal::HashTableKey*>::New(v8::internal::Isolate*, int, v8::internal::MinimumCapacity, v8::internal::PretenureFlag) [/usr/local/bin/node]
 7: v8::internal::HashTable<v8::internal::StringTable, v8::internal::StringTableShape, v8::internal::HashTableKey*>::EnsureCapacity(v8::internal::Handle<v8::internal::StringTable>, int, v8::internal::HashTableKey*, v8::internal::PretenureFlag) [/usr/local/bin/node]
 8: v8::internal::StringTable::LookupKey(v8::internal::Isolate*, v8::internal::HashTableKey*) [/usr/local/bin/node]
 9: v8::internal::StringTable::LookupString(v8::internal::Isolate*, v8::internal::Handle<v8::internal::String>) [/usr/local/bin/node]
10: v8::internal::LookupIterator::LookupIterator(v8::internal::Isolate*, v8::internal::Handle<v8::internal::Object>, v8::internal::Handle<v8::internal::Name>, v8::internal::Handle<v8::internal::JSReceiver>, v8::internal::LookupIterator::Configuration) [/usr/local/bin/node]
11: v8::internal::LookupIterator::PropertyOrElement(v8::internal::Isolate*, v8::internal::Handle<v8::internal::Object>, v8::internal::Handle<v8::internal::Object>, bool*, v8::internal::LookupIterator::Configuration) [/usr/local/bin/node]
12: v8::internal::Runtime::SetObjectProperty(v8::internal::Isolate*, v8::internal::Handle<v8::internal::Object>, v8::internal::Handle<v8::internal::Object>, v8::internal::Handle<v8::internal::Object>, v8::internal::LanguageMode) [/usr/local/bin/node]
13: v8::internal::Runtime_SetProperty(int, v8::internal::Object**, v8::internal::Isolate*) [/usr/local/bin/node]
14: 0x133a015840dd
15: 0x133a0174e81a

...Then I've tried with XLS and have again 0 rows counting.

...
console.log(workbook.SheetNames[0]);
console.log(workbook.SheetNames);
console.log(workbook.Sheets[sheet_name])
...

Console:

sales
[ 'sales' ]
undefined
0 rows
undefined

I'll try to convert to CSV file and split it out in other files, maybe it should work.

@HonzaMac
Copy link

I would suggest you to increase max used memory for node.js via this env variable:
NODE_OPTIONS=--max-old-space-size\=16384 (16GB - my RAM size)
This helps me, but what I discover latter, that this library reach NODEJS limit 256MB - max string size.

@codingleo
Copy link
Author

Just fyi... What I end up doing was converting the excel files into csv and importing directly with MySQL.

Thanks for the help and sorry for keeping this open for so long.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants