Discovery items type error failure #253

jsetton · 2019-08-19T16:10:37Z

Per the skill logs, several users discovery requests are failing due to the error below

error: discover failed with error:items.forEach is not a function
{
    "stack": "TypeError: items.forEach is not a function\n    at Promise.all.then (/var/task/alexa/v3/discovery.js:45:13)\n    at <anonymous>\n    at process._tickDomainCallback (internal/process/next_tick.js:228:7)"
}

After enabling debug mode, the issue seems to be related to the items data value being a string because the request call seems to only have read/received a partial response preventing the expected JSON object from being parsed properly.

The example based on the reference below isn't that considerably big in size, although it's just partial data, and the Lambda execution time for that request was about 5 seconds.

Lambda request log reference

The text was updated successfully, but these errors were encountered:

jsetton · 2019-08-19T17:25:39Z

@digitaldan I moved this one into its own issue. I am still seeing this error being generated a couple times every hour.

Anyway, I did some research and I found a possible solution including Connection: keep-alive to the request headers.

I did some testing and the request module seems to defaults to Connection: close, if not specified, which would explain why the network connection could get prematurely closed.

Do you see any reason for not making this change?

Direct Connection

REQUEST onRequestResponse http://localhost:8080/rest/items?fields=editable%2CgroupNames%2CgroupType%2Cname%2Clabel%2Cmetadata%2Cstate%2CstateDescription%2Ctags%2Ctype&metadata=alexa%2Cchannel%2Csynonyms 200 { 
  connection: 'close',
  'content-type': 'application/json',
  server: 'Jetty(9.4.18.v20190429)'
}

Behind Reverse Proxy Server (nginx)

REQUEST onRequestResponse https://myopenhab.org/rest/items?fields=editable%2CgroupNames%2CgroupType%2Cname%2Clabel%2Cmetadata%2Cstate%2CstateDescription%2Ctags%2Ctype&metadata=alexa%2Cchannel%2Csynonyms 200 {
  server: 'nginx',
  date: 'Mon, 19 Aug 2019 16:41:07 GMT',
  'content-type': 'application/json',
  'transfer-encoding': 'chunked',
  connection: 'close'
}

digitaldan · 2019-08-19T22:59:58Z

If that works i don't see a problem with it. I'm not sure why it works :-) afaik 'keep-alive' is used so the underlying TCP connection can be used multiple times by different requests and save having to create and tear down a TCP connection over and over for each request, browsers would be painfully slow without it.

If the request is still in flight, and there is a pause in the data stream, i guess it's possible something like a nginx proxy could terminate it if the keep-alive is not set, so maybe thats what it is fixing?

jsetton · 2019-08-20T00:03:01Z

If the request is still in flight, and there is a pause in the data stream, i guess it's possible something like a nginx proxy could terminate it if the keep-alive is not set, so maybe thats what it is fixing?

My understanding is that when sending keep-alive to the server, it will not close the connection after responding to the initial request, unless the client requests it. While the connection remains open, it can be reused and will timeout after a specific time when idling. On top of this, I believe keep-alive is enabled by default in HTTP/1.1.

I think we should enable it/deploy and see if we still get that error. Do you see any reason why we shouldn't?

digitaldan · 2019-08-20T02:19:38Z

I am totally good with it.

jsetton · 2019-08-21T20:24:08Z

So the connection keep-alive change I made didn't resolve this issue.

After doing more research, I believe the issue is related to the fact that the myopenhab cloud connector nginx server is buffering every request/response before passing the data on.

That buffer has a limited size and can become full at time when there are a lot simultaneous discovery requests in a short amount of time. My understanding is that when the buffer gets full, some of the data chunks may be dropped in flight causing partial response to be sent.

Going over the nginx configuration committed in the cloud service github repository, I can see that the proxy_buffering setting is enabled by default, since omitted. Is there a reason why we want to have it enabled? It seems that disabling would potentially resolve our issue.

proxy_buffering off;

Also, we could enable gzip for application/json request type to decrease the response size being sent back to the Lambda function, which would also decrease the response time I believe.

gzip on;
gzip_comp_level 6;
gzip_types application/json;

@digitaldan What do you think?

digitaldan · 2019-08-28T23:17:06Z

I'll take a look at our nginx config, i'm not 100% sure if we set that for a reason or it was copied from some default nginx proxy config example.

jsetton · 2019-08-29T03:30:06Z

So since we now have the request timeout logic deployed, we get the bearer token for the request that timed out in the logs.

Based on my testing on a few of them, it seems that the request times out after 60 seconds when the myopenhab cloud connector nginx server times out with a "504 Gateway Time-out" error. Anyway, I was under the impression the cloud connector can track remote clients that are offline and directly respond to external requests with an offline message response.

The reason I am bringing this up is because each of the failed requests is wasting 8 seconds of compute time. There are about 35 timed out requests across the regions per hour so far. Maybe there is something that can be easily done on the cloud connector side.

jsetton · 2019-09-01T20:36:29Z

According to this documentation, if Cache-Control: no-cache is specified in the request headers, nginx will not cache responses.

Additionally, after doing some test on one of the affected users, I noticed that the proper JSON object was returned on subsequent calls. So we could introduce retry logic at that level if the items data isn't valid.

jsetton · 2019-09-03T19:03:51Z

So the change I included in #270 adding Cache-Control: no-cache to request headers isn't resolving this issue. I will submit a PR introducing retry logic as mentioned in my previous comment. Hopefully, this will finally resolve this matter.

jsetton · 2019-09-04T02:15:44Z

The retry logic is helping a little bit but there are still some requests that aren't able to get the complete items data after 4 tries. I don't think we can go around not tweaking the nginx configuration. Anyway, we need to add gzip support.

One thing I noticed testing with two separate errors, the data gets always truncated at 48k (49152) no matter what the source is. This is a clear indication of some sort of buffering/paging issue on the server side.

jsetton · 2019-09-22T12:42:02Z

myopenhab cloud connector nginx server times out with a "504 Gateway Time-out" error

Based on this post, it seems that the timeout error may be caused by the fact the user cloud connector binding may be only setup for notification access mode while having the remote access disabled.

Is this something that can be determined at the cloud connector level so it can directly return an error instead of waiting until reaching timeout?

jsetton added the bug label Aug 19, 2019

jsetton mentioned this issue Aug 20, 2019

Add rest request header connection keep-alive #256

Merged

jsetton closed this as completed in #256 Aug 20, 2019

jsetton reopened this Aug 21, 2019

This was referenced Aug 25, 2019

Add v2 backward compatibility #261

Merged

Improve discovery request performance #265

Merged

jsetton mentioned this issue Sep 1, 2019

Improve keep-alive socket handling #270

Merged

jsetton mentioned this issue Sep 3, 2019

Add discover get all items retry logic #273

Merged

cee-dee mentioned this issue Nov 2, 2019

JSON delivery by REST api for items ends prematurely openhab/openhab-cloud#284

Open

jsetton mentioned this issue Nov 10, 2021

Readd openhab get all items retry logic #408

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Discovery items type error failure #253

Discovery items type error failure #253

jsetton commented Aug 19, 2019

jsetton commented Aug 19, 2019 •

edited

digitaldan commented Aug 19, 2019

jsetton commented Aug 20, 2019

digitaldan commented Aug 20, 2019

jsetton commented Aug 21, 2019 •

edited

digitaldan commented Aug 28, 2019

jsetton commented Aug 29, 2019

jsetton commented Sep 1, 2019

jsetton commented Sep 3, 2019

jsetton commented Sep 4, 2019 •

edited

jsetton commented Sep 22, 2019

Discovery items type error failure #253

Discovery items type error failure #253

Comments

jsetton commented Aug 19, 2019

jsetton commented Aug 19, 2019 • edited

digitaldan commented Aug 19, 2019

jsetton commented Aug 20, 2019

digitaldan commented Aug 20, 2019

jsetton commented Aug 21, 2019 • edited

digitaldan commented Aug 28, 2019

jsetton commented Aug 29, 2019

jsetton commented Sep 1, 2019

jsetton commented Sep 3, 2019

jsetton commented Sep 4, 2019 • edited

jsetton commented Sep 22, 2019

jsetton commented Aug 19, 2019 •

edited

jsetton commented Aug 21, 2019 •

edited

jsetton commented Sep 4, 2019 •

edited