Not able to get right Sheets from huge xlsx files #1010

codingleo · 2018-02-28T19:47:44Z

I'm trying to get the data from a huge file (800k rows) and put it into database via lambda (AWS).
To do that I'm getting the xlsx file from S3 as a buffer and reading it.

module.exports.getSalesData = new Promise((resolve, reject) => {
  getFileFromS3(filename)
    .then(function (workbook) {
      console.log(workbook.SheetNames[1]); // 'sales'
      console.log(workbook.SheetNames); // showing sales as [ 'main', 'sales', 'Sheet1' ]
      console.log(Array.isArray(workbook.SheetNames)); // true
      console.log(typeof workbook.SheetNames); // Object
      console.log(Object.keys(workbook.Sheets)); // [ 'main', 'Sheet1' ] == why 'sales' is not here?

      var sheet_name = workbook.SheetNames[1]; // sales tab
      var json_sheet = XLSX.utils.sheet_to_json(workbook.Sheets[sheet_name], { raw: true })
      resolve(json_sheet)
    })
    .catch(err => {
      console.log('File: ' + filename + ' doesn\'t exists on S3 or you\'re not connected to the internet.');
    })
})

The issue is that for workbook.Sheets I should see [ 'main', 'sales', 'Sheet1' ], right?

Then I try to get the number of rows (already converted to JSON) like this:

getSalesData.then(function (data) {
    console.log(data.length + ' rows');
    console.log(data[0]);
  }).catch(err => console.error(err));

Where the parameter data is the json_sheet defined in the function above.
So for data.length (number of rows) I get 0 instead of 800k+.
And, of course, I'm unable to get data[0] which is undefined.

PS.: the file has 57.3mb -- Not sure if it's the cause.

Thanks in advance for help.

The text was updated successfully, but these errors were encountered:

reviewher · 2018-03-01T01:40:57Z

You're likely hitting the node / V8 256MB string limit. You can check by renaming to .ZIP, extracting, and checking the size of the constituent sheet#.xml files.

Can you open the file, save as XLSB or XLS, and see if the file is processed?

codingleo · 2018-03-01T20:47:37Z

I've tried with XLSB and got:

--- Last few GCs --->

[43577:0x104001000]   272189 ms: Mark-sweep 1357.1 (1437.5) -> 1357.1 (1438.0) MB, 6563.3 / 0.0 ms  allocation failure GC in old space requested
[43577:0x104001000]   280254 ms: Mark-sweep 1357.1 (1438.0) -> 1357.1 (1407.0) MB, 8064.3 / 0.0 ms  last resort
[43577:0x104001000]   290596 ms: Mark-sweep 1357.1 (1407.0) -> 1357.1 (1407.0) MB, 10342.7 / 0.0 ms  last resort


<--- JS stacktrace --->

==== JS stack trace =========================================

Security context: 0x2f9f4991cef1 <JSObject>
    1: ws_parse [/Users/leoribeiro/Workspace/GregMorrisLambda/node_modules/xlsx/xlsx.js:~12739] [pc=0x133a0174e81a](this=0x1f80ab809dc1 <JSGlobal Object>,val=0x29ae72e24439 <JSArray[3]>,R_n=0xf58521aeb51 <String[11]: BrtCellIsst>,RT=7)
    2: recordhopper [/Users/leoribeiro/Workspace/GregMorrisLambda/node_modules/xlsx/xlsx.js:~2573] [pc=0x133a01736b28](this=0x1f80ab809dc1 <JSGlobal Object>,data=...

FATAL ERROR: CALL_AND_RETRY_LAST Allocation failed - JavaScript heap out of memory
 1: node::Abort() [/usr/local/bin/node]
 2: node::FatalException(v8::Isolate*, v8::Local<v8::Value>, v8::Local<v8::Message>) [/usr/local/bin/node]
 3: v8::Utils::ReportOOMFailure(char const*, bool) [/usr/local/bin/node]
 4: v8::internal::V8::FatalProcessOutOfMemory(char const*, bool) [/usr/local/bin/node]
 5: v8::internal::Factory::NewFixedArray(int, v8::internal::PretenureFlag) [/usr/local/bin/node]
 6: v8::internal::HashTable<v8::internal::StringTable, v8::internal::StringTableShape, v8::internal::HashTableKey*>::New(v8::internal::Isolate*, int, v8::internal::MinimumCapacity, v8::internal::PretenureFlag) [/usr/local/bin/node]
 7: v8::internal::HashTable<v8::internal::StringTable, v8::internal::StringTableShape, v8::internal::HashTableKey*>::EnsureCapacity(v8::internal::Handle<v8::internal::StringTable>, int, v8::internal::HashTableKey*, v8::internal::PretenureFlag) [/usr/local/bin/node]
 8: v8::internal::StringTable::LookupKey(v8::internal::Isolate*, v8::internal::HashTableKey*) [/usr/local/bin/node]
 9: v8::internal::StringTable::LookupString(v8::internal::Isolate*, v8::internal::Handle<v8::internal::String>) [/usr/local/bin/node]
10: v8::internal::LookupIterator::LookupIterator(v8::internal::Isolate*, v8::internal::Handle<v8::internal::Object>, v8::internal::Handle<v8::internal::Name>, v8::internal::Handle<v8::internal::JSReceiver>, v8::internal::LookupIterator::Configuration) [/usr/local/bin/node]
11: v8::internal::LookupIterator::PropertyOrElement(v8::internal::Isolate*, v8::internal::Handle<v8::internal::Object>, v8::internal::Handle<v8::internal::Object>, bool*, v8::internal::LookupIterator::Configuration) [/usr/local/bin/node]
12: v8::internal::Runtime::SetObjectProperty(v8::internal::Isolate*, v8::internal::Handle<v8::internal::Object>, v8::internal::Handle<v8::internal::Object>, v8::internal::Handle<v8::internal::Object>, v8::internal::LanguageMode) [/usr/local/bin/node]
13: v8::internal::Runtime_SetProperty(int, v8::internal::Object**, v8::internal::Isolate*) [/usr/local/bin/node]
14: 0x133a015840dd
15: 0x133a0174e81a

...Then I've tried with XLS and have again 0 rows counting.

...
console.log(workbook.SheetNames[0]);
console.log(workbook.SheetNames);
console.log(workbook.Sheets[sheet_name])
...

Console:

sales
[ 'sales' ]
undefined
0 rows
undefined

I'll try to convert to CSV file and split it out in other files, maybe it should work.

HonzaMac · 2018-03-19T09:56:17Z

I would suggest you to increase max used memory for node.js via this env variable:
NODE_OPTIONS=--max-old-space-size\=16384 (16GB - my RAM size)
This helps me, but what I discover latter, that this library reach NODEJS limit 256MB - max string size.

codingleo · 2018-08-24T20:33:31Z

Just fyi... What I end up doing was converting the excel files into csv and importing directly with MySQL.

Thanks for the help and sorry for keeping this open for so long.

codingleo closed this as completed Aug 24, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Not able to get right Sheets from huge xlsx files #1010

Not able to get right Sheets from huge xlsx files #1010

codingleo commented Feb 28, 2018 •

edited

reviewher commented Mar 1, 2018

codingleo commented Mar 1, 2018

HonzaMac commented Mar 19, 2018

codingleo commented Aug 24, 2018

Not able to get right Sheets from huge xlsx files #1010

Not able to get right Sheets from huge xlsx files #1010

Comments

codingleo commented Feb 28, 2018 • edited

reviewher commented Mar 1, 2018

codingleo commented Mar 1, 2018

HonzaMac commented Mar 19, 2018

codingleo commented Aug 24, 2018

codingleo commented Feb 28, 2018 •

edited