Questions related to thrift errors #73

mariecl · 2017-04-19T21:44:06Z

Since we moved to v2, I find a lot of these in our logs:

Error making Thrift HTTP request: Error: connect ETIMEDOUT xx.xxx.xxx.xxx:80
Error in Thrift HTTP response: 302
Error in Thrift HTTP response: 502

They coincide with the following pattern on our servers:

spikes of errors coming from Evernote
spikes in memory usage
crashes

Is there any reason why those errors would come in waves every few days? Given the error codes, I don't think they would be related to specific users updating their account, more due to server issues on your side, maybe?
Is there any way to shield from them?
Any clues on why they would tend to fill up memory?

The text was updated successfully, but these errors were encountered:

akhaku · 2017-04-20T15:20:47Z

An HTTP 502 in particular is probably a network blip on our side. We'd had an increased number of those since our move to Google's cloud - turns out no system has a 100% uptime SLA :), so their front-end load balancer downtime now stacks on top of whatever blips we have on our end.

The 302 is a little more confusing - do you know what URL it's 302ing to? Certain requests redirect to a maintenance page when a shard is unavailable (eg during a restart, or during our weekly service release), but I didn't think thrift requests would return 302s for that reason. Are you seeing them mostly on Wednesdays?

There isn't really any way to hide them from the client directly - thrift clients we use internally use connection pools and have retry logic baked in to deal with transient errors like that. In terms of filling up memory, make sure you close your connection even on an error, not just on success, and that your objects get garbage collected.

mariecl · 2017-04-20T15:55:19Z

It looks like those thrift errors are logged by the SDK, so it's hard for me to tell exactly which request they match up with. I am not sure if / how they are surfaced to us. Internally, we map any Evernote error that's not related to tokens (error codes 8 & 9), or rate limits (error code 19) to a 500. Our approach dates back to the old SDK, so it might need to be overhauled.

Anyway, when I look at the graph for 500 errors from Evernote, the spikes seem more or less random, though Tuesdays and Thursdays seem to be more affected:

Monday March 13th
Thursday March 16th
Tuesday March 28th
Thursday March 30th
Tuesday April 4th
Thursday April 6th
Tuesday April 9th
Sunday April 14th.

Our main load is definitely on Mondays, so I don't think it's related to the number of requests we are making.

Regarding memory, I think we actually face the issue described in #71. I find the same stack trace scattered in our logs, surfacing as an uncaught exception.

mariecl · 2017-05-24T02:40:14Z

Closing this issue as thrift errors are not actually causing crashes. MemBuffer overrun issues are (-> follow-up in #71).

mariecl closed this as completed May 24, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Questions related to thrift errors #73

Questions related to thrift errors #73

mariecl commented Apr 19, 2017

akhaku commented Apr 20, 2017

mariecl commented Apr 20, 2017 •

edited

Loading

mariecl commented May 24, 2017

Questions related to thrift errors #73

Questions related to thrift errors #73

Comments

mariecl commented Apr 19, 2017

akhaku commented Apr 20, 2017

mariecl commented Apr 20, 2017 • edited Loading

mariecl commented May 24, 2017

mariecl commented Apr 20, 2017 •

edited

Loading