Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Invalid HTML: could not find <table> #1110

Closed
kaanbursa opened this issue May 22, 2018 · 10 comments
Closed

Invalid HTML: could not find <table> #1110

kaanbursa opened this issue May 22, 2018 · 10 comments

Comments

@kaanbursa
Copy link

kaanbursa commented May 22, 2018

EDIT: If you are seeing this error when trying to read a data from an external resource (e.g. XHR, fetch, axios), check the response code of the request. The data might be a 404 page. If it is a 404, make sure the file exists and is available.

When building a project with CRA or other templates, spreadsheets must be placed in the public folder.

Your code should defend against network issues by checking the status of the request. With fetch the status code is available as res.status in the first callback:

try {
  const response = await fetch(url);
  if(res.status == 404) throw new Error("404 File Not Found");
  const ab = await response.arrayBuffer();
  const workbook = XLSX.read(ab);
  /* DO SOMETHING WITH workbook HERE */
} catch(e) { /* error handling */ }

Usually it does succedd but sometimes I get this error and it crashes any idea?

if(!mtch) throw new Error("Invalid HTML: could not find

");
^

Error: Invalid HTML: could not find


at html_to_sheet (/.../node_modules/xlsx/xlsx.js:17000:19)
at Object.html_to_book [as to_workbook] (/.../node_modules/xlsx/xlsx.js:17056:28)
at parse_xlml_xml (/.../node_modules/xlsx/xlsx.js:13714:26)
at parse_xlml (/.../node_modules/xlsx/xlsx.js:14362:53)
at readSync (/../node_modules/xlsx/xlsx.js:18394:21)
at Object.readFileSync (/.../node_modules/xlsx/xlsx.js:18412:9)
at Request._callback (/.../server/routes/api.js:137:24)
at Request.self.callback (/.../node_modules/request/request.js:186:22)
at emitTwo (events.js:106:13)
at Request.emit (events.js:191:7)
at Request. (/.../node_modules/request/request.js:1163:10)
at emitOne (events.js:101:20)
at Request.emit (events.js:188:7)
at IncomingMessage. (/.../node_modules/request/request.js:1085:12)
at IncomingMessage.g (events.js:291:16)
at emitNone (events.js:91:20)
at IncomingMessage.emit (events.js:185:7)
at endReadableNT (_stream_readable.js:974:12)
at _combinedTickCallback (internal/process/next_tick.js:74:11)
at process._tickCallback (internal/process/next_tick.js:98:9)
[nodemon] app crashed - waiting for file changes before starting...

@SheetJSDev
Copy link
Contributor

That error is triggered when the file is suspected to be HTML but the actual TABLE tag cannot be found. Can you capture and share an example file?

@kaanbursa
Copy link
Author

Hey,
It is an actual xlsm file which it converts sometime but it crashes sometime. Sorry the file is confidential cannot share but it is a basic .xlsm file

@SheetJSDev
Copy link
Contributor

If it's an actual XLSM file, it would start with 0x50 0x4B ("PK") and would never trigger the error. Given that you are using request, It's likely you are seeing bad data (e.g. doing a HTTP request and receiving an HTML payload with the 404 error). Try wrapping XLSX.read in a callback and saving he data to a file -- likely its an HTML error page.

@MedinaGitHub
Copy link

The same thing happens to me and in the registry of the console everything looks very good, but throws the same error

@MedinaGitHub
Copy link

I found the error! in rectJS the file xlsx must be in the 'public' folder

@wodeleeway
Copy link

I had the same issue when I use node to read the excel file in the docker container, it shows Error: Invalid HTML: could not find

.

I debugged into the xlsx.js file and found that the function firstbyte(d, o))[0] return 0x3C which indicated as xlml file, it suppose to return 0x50, the module source code piece below:

switch((n = firstbyte(d, o))[0]) {
case 0xD0: return read_cfb(CFB.read(d, o), o);
case 0x09: return parse_xlscfb(d, o);
case 0x3C: return parse_xlml(d, o);
case 0x49: if(n[1] === 0x44) return read_wb_ID(d, o); break;
case 0x54: if(n[1] === 0x41 && n[2] === 0x42 && n[3] === 0x4C) return DIF.to_workbook(d, o); break;
case 0x50: return (n[1] === 0x4B && n[2] < 0x09 && n[3] < 0x09) ? read_zip(d, o) : read_prn(data, d, o, str);
case 0xEF: return n[3] === 0x3C ? parse_xlml(d, o) : read_prn(data, d, o, str);
case 0xFF: if(n[1] === 0xFE) { return read_utf16(d, o); } break;
case 0x00: if(n[1] === 0x00 && n[2] >= 0x02 && n[3] === 0x00) return WK_.to_workbook(d, o); break;
case 0x03: case 0x83: case 0x8B: case 0x8C: return DBF.to_workbook(d, o);
case 0x7B: if(n[1] === 0x5C && n[2] === 0x72 && n[3] === 0x74) return RTF.to_workbook(d, o); break;
case 0x0A: case 0x0D: case 0x20: return read_plaintext_raw(d, o);
}

@wodeleeway
Copy link

I found the error! in rectJS the file xlsx must be in the 'public' folder

What do you mean about the 'public' folder, should we name the folder as public or have this folder 's permission go public?

@SheetJSDev
Copy link
Contributor

@wodeleeway @MedinaGitHub file type is deduced by looking at the magic (first few bytes), that's what you're seeing in that code block. If the file starts with "<" the immediate guess is that it's an HTML or XML-based format.

If you are using XHR or fetch and requesting a file that isn't available, you'll get back the 404 response. Usually the body is a small HTML page, which is why you see the error in question.

Your code should defend against it by checking the status of the request. With fetch it is available as res.status in the first callback:

fetch(myRequest).then(function(res) {
  if(res.status == 404) { /* file not found */ }

How to resolve the 404 errors is beyond the scope of this project.

@samy-n
Copy link

samy-n commented Mar 29, 2020

I found the error! in rectJS the file xlsx must be in the 'public' folder

This worked for me too

@Alexandria
Copy link

I found the error! in rectJS the file xlsx must be in the 'public' folder

Working on a React App and putting my xlsx file in the public folder worked for me too.

@SheetJS SheetJS locked and limited conversation to collaborators Feb 8, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants