Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Questions related to thrift errors #73

Closed
mariecl opened this issue Apr 19, 2017 · 3 comments
Closed

Questions related to thrift errors #73

mariecl opened this issue Apr 19, 2017 · 3 comments

Comments

@mariecl
Copy link
Contributor

mariecl commented Apr 19, 2017

Since we moved to v2, I find a lot of these in our logs:

Error making Thrift HTTP request: Error: connect ETIMEDOUT xx.xxx.xxx.xxx:80
Error in Thrift HTTP response: 302
Error in Thrift HTTP response: 502

They coincide with the following pattern on our servers:

  • spikes of errors coming from Evernote
  • spikes in memory usage
  • crashes

Is there any reason why those errors would come in waves every few days? Given the error codes, I don't think they would be related to specific users updating their account, more due to server issues on your side, maybe?
Is there any way to shield from them?
Any clues on why they would tend to fill up memory?

@akhaku
Copy link
Contributor

akhaku commented Apr 20, 2017

An HTTP 502 in particular is probably a network blip on our side. We'd had an increased number of those since our move to Google's cloud - turns out no system has a 100% uptime SLA :), so their front-end load balancer downtime now stacks on top of whatever blips we have on our end.

The 302 is a little more confusing - do you know what URL it's 302ing to? Certain requests redirect to a maintenance page when a shard is unavailable (eg during a restart, or during our weekly service release), but I didn't think thrift requests would return 302s for that reason. Are you seeing them mostly on Wednesdays?

There isn't really any way to hide them from the client directly - thrift clients we use internally use connection pools and have retry logic baked in to deal with transient errors like that. In terms of filling up memory, make sure you close your connection even on an error, not just on success, and that your objects get garbage collected.

@mariecl
Copy link
Contributor Author

mariecl commented Apr 20, 2017

It looks like those thrift errors are logged by the SDK, so it's hard for me to tell exactly which request they match up with. I am not sure if / how they are surfaced to us. Internally, we map any Evernote error that's not related to tokens (error codes 8 & 9), or rate limits (error code 19) to a 500. Our approach dates back to the old SDK, so it might need to be overhauled.

Anyway, when I look at the graph for 500 errors from Evernote, the spikes seem more or less random, though Tuesdays and Thursdays seem to be more affected:

  • Monday March 13th
  • Thursday March 16th
  • Tuesday March 28th
  • Thursday March 30th
  • Tuesday April 4th
  • Thursday April 6th
  • Tuesday April 9th
  • Sunday April 14th.

Our main load is definitely on Mondays, so I don't think it's related to the number of requests we are making.

Regarding memory, I think we actually face the issue described in #71. I find the same stack trace scattered in our logs, surfacing as an uncaught exception.

@mariecl
Copy link
Contributor Author

mariecl commented May 24, 2017

Closing this issue as thrift errors are not actually causing crashes. MemBuffer overrun issues are (-> follow-up in #71).

@mariecl mariecl closed this as completed May 24, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants