New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Randomly getting AEROSPIKE_ERR_TIMEOUT #144
Comments
What's the load level of the application, in terms of requests/second to the Aerospike server? |
I don't know which part of my application causes it because the stack traces only show the Aerospike related stack. All I know is that it's a background process (not the web server itself) and it's doing lots of different DB updates. Enough to bring 1 of 2 CPU cores on DigitalOcean to >90% usage. It might be possible that I'm issuing 560 requests of one type at once and they all fail. I apologize for being a bit in the dark here. |
The stack traces for errors raised in async operation callbacks are unfortunately not very useful. I'd like to add support for better stack traces at least in debug/development mode similar to how e.g. the redis client does it. But you can't do that in production since it comes with a pretty significant performance penalty. What is clear from the stack traces is that the timeouts happen in both put and get operations. |
I would love to have improved stack traces, even if they come with a heavy performance penalty. |
Was this problem solved ? I'm getting some weird behaviour where my callback for put function is executed twice. Once with a AEROSPIKE_ERR_TIMEOUT error and once with a success. Not sure what to do here. |
Upgraded to 2.4.3 from 1.0.56 and started receiving AEROSPIKE_ERR_TIMEOUT when trying to use the put method. Not sure why would we receive the timeout error randomly. |
@rahulbhanushali I'm having the same issue, recently upgraded from node 0.10 to node 6.9.1, and tried to upgrade aerospike client from version 1.0.58 to version 2.4.2, had to return back to 1.0.58 due to timeouts. |
@rahulbhanushali, @bobiblazeski, @GavinDmello, are you just doing single key read/write operations when you get the timeouts or other operations like scan/query? |
@jhecking It's mostly during single key writes in my case. |
I'm in the process of rewriting everything in Go so I haven't checked it lately, however I suspect it's still existent. In my case it's maybe just high load that's causing it (multiple scans + gets + sets at the same time). |
@blitzprog I had many set operations happening one after the other, where I have no other operations happening. The double callback(success and response callbacks getting called for the same call) issue is where I have many operations(gets, sets and queries) |
@jhecking Single key get & put only, nothing else. |
@jhecking, facing this issue on single key write operations. |
Any update over this? |
I know this comes too late for you, @blitzprog, but I have implemented better stack traces for the just released v2.5.2. More details in PR #190. |
Is there any update on this issue? I am also getting AEROSPIKE_ERR_TIMEOUT errors after upgrading from 1.0.58 to 2.5.2. |
@dohse Try increasing the operation timeout property which is part of the policy object and check if it helps |
I activated long stack traces to find out where these error occurs, but it seems to occur for all aerospike calls in our application. @GavinDmello I am not setting any timeouts. Did the default timeout change between 1.0.58 and 2.5.2? |
I need to correct myself. I am setting a timeout of 1000 ms in the client policies. I increased this to 10000 ms now. |
The default timeout has not changed between v1 and v2. It is 1,000 ms. What has changed is the I/O model and that might impact the frequency of timeouts. In v1, Aerospike DB commands are executed using synchronous I/O. The commands are queued up and are getting executed on libuv's worker threads. By default, there are only 4 worker threads, but this can be configured through the In contrast, v2 uses libuv's async TCP API to execute DB commands. The timeout starts as soon as the DB command is issued. If the Node.js server is under heavy I/O workload (not just from Aerospike DB commands but other I/O as well), then it's possible that you are seeing timeouts even if the server responds within the timeout window. I'm not saying that's necessarily what's happening, but it is a significant difference between v1 and v2 that might be at least part of the cause for increased number of timeouts some people are observing. |
@jhecking Thank you for the detailed reasoning. It could be indeed be the case in our application that operations were waiting very long for a thread from the pool. While they did not timeout in that case they might now timeout when waiting to send or receive the data from aerospike. I resolved the issue by retrying timed out operations up to five times. This should be safe as the operations are idempotent. |
Closing this issue. While not a direct fix for this issue, client release v3 has improved timeout & retry handling. The new BasePolicy class has two separate |
I get about 560 of the first error message at once, sometimes. Here's my config:
DB and client are on the same machine (server 3.9.0.3, node client 2.3.1).
I can provide more information if needed.
Is there anything I can do to fix these timeouts?
The text was updated successfully, but these errors were encountered: