-
Notifications
You must be signed in to change notification settings - Fork 27
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Non-blocking redis calls using redis-async lib #96
Conversation
I've deleted the commits related with Ruby updates that were merged in a different PR. I've also rebased on top of master adapting the code to the changes that were introduced in recent PRs (sentinels passwords, update of redis-rb, etc.). |
@unleashed This is ready for review now that we no longer need to support Ruby 2.2. You already reviewed all the PRs of this integration branch except for this one #92 I spent some time with @slopezz trying this in our Openshift cluster and it's working well. Let me know what you think we need to be able to merge this. I think it's a very good starting point and we can keep improving later, for example, investigating alternative ways to fetch the jobs from the queue. |
116: Refactor: Extract JobFetcher from Worker r=unleashed a=davidor This PR extracts the logic to fetch jobs from the queue from the `Worker` class. The commits in this PR have been extracted from #96 In that PR we have a sync and a sync worker. The logic to fetch jobs from the queue is the same in both cases so it makes sense to extract it into a separate class. I have created this PR to simplify the other one, but still, I think this has value in itself because it makes the code easier to understand and the logic to fetch jobs from the queue more testable as we no longer need to test a private method. Co-authored-by: David Ortiz <z.david.ortiz@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some things to check (read comments)
- Check status and document limitations (best if there's some associated issue):
- Threads
- Fibers
- Lazy enumerators
- Spec/Test set-up
- Ensure the configuration early on defines everything to be either Sync or Async (storage) statically and only loads the needed deps.
|
||
config.around :each do |example| | ||
Async.run do | ||
# TODO: This is needed for the acceptance specs. Not sure why. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
any open issue in the async rspec gem we can refer to?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not even sure this is a bug, and it only happens in the acceptance specs, so it makes me think that the problem might be in rspec_api_documentation
. I investigated a bit and didn't understand why.
# All of this might be simplified a bit in the future using the | ||
# "methods" in async-redis | ||
# https://github.com/socketry/async-redis/tree/master/lib/async/redis/methods | ||
# but there are some commands missing, so for now, that's not an option. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updates on this? I expressed concern about this back when this was merged to the integration branch, because the maintenance burden we are taking is not negligible, so we should check again before landing this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
module Async | ||
module Redis | ||
class Client | ||
def call_pipeline(commands) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this might be not needed now? or adapted?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same case. See comment above.
if CHECK_EQUALS_ONE.include?(command_name) | ||
resp.to_i == 1 | ||
elsif CHECK_GREATER_THAN_0.include?(command_name) | ||
resp.to_i > 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hopefully this is also pushed down to the async redis client... at some point really soon now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same case. See comment above.
if configuration.redis.async | ||
Backend::StorageAsync::Client | ||
else | ||
Backend::StorageSync |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is minor, but if we mean to have this method always choose the same implementation at run time (which is what I understand), then better extract the check outside the method definition and define the method differently depending on the check.
The rest seems to be equal, so the most effective might be to just check for the implementation to use and then require the class (with the same name) or do Storage = StorageAsync::Client
or Storage = StorageSync
. The remaining constant can either be defined somewhere else or be injected/defined into the chosen class by this "class loader" code.
@@ -39,11 +39,11 @@ def committed_at | |||
require_relative '../test/test_helpers/configuration' | |||
require_relative '../test/test_helpers/storage' | |||
|
|||
TestHelpers::Storage::Mock.mock_storage_client! | |||
TestHelpers::Storage::Mock.mock_storage_clients |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not a fan of letting the detail that there are now two storage clients leak into the naming. Also, this is meant to use just one storage per invocation, isn't it?
super | ||
if options[:async] | ||
# Conditional require is done to require async-* libs only when | ||
# needed and avoid possible side-effects. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good - must ensure the same happens when loading the async storage in tests, or in general in the listeners when choosing just the sync one (I say this because I haven't checked it, but you probably know).
a8ac288
to
b0ae3ea
Compare
@unleashed We have different views on what this PR should be. This is a big change in how apisonator works. It's true that in our tests this increases performance substantially. However, this relies on the async-redis lib. The maintainers have done an amazing job in the lib and the async ecosystem in general. However, It's also true that the lib is not as widely used as redis-rb. Given that it's such an important piece in this project, there's some risk. I think we need to try this with real workloads to see how it behaves. I think it's important to discard problems such as unexpected crashes, memory leaks, etc. The kind of problems that you only realize when running for a long time on a real production scenario. This feature is opt-in. For users absolutely nothing changes unless they explicitly decide that they want to give this as try. That's why I think we should try to merge this and let @slopezz and others experiment with it. My opinion is that, rather than try to solve all the minor problems in the first PR, we should go step by step. Try a first version, see how it behaves, try different configuration params to evaluate performance in different environments, etc. If all that goes well, we can put the effort to, for example, contribute some of the code that we have here to other projects in the async ecosystem. As I said before, I don't think we need to solve all the issues now. For example, contributing a redis-rb-compatible interface for pipelines in async-redis is not a trivial effort. This PR brings a substantial performance improvement to the table. Also, the rules to scale in environments like Openshift become way easier to understand. That's why I'm willing to go more in tech debt that in other cases, and that includes:
Interestingly, you didn't mention anything about running all the tests twice, which I think is the biggest problem that this PR introduces. Again, the ideal scenario would be to have redis-async - redis-rb compatibility tests in any of those projects. But that does no exist right now, and because of how our test suite works and its coupling with Redis, I don't feel confident running the tests just with async or sync and assuming that the other will work just fine. I think that these trade-offs are reasonable, and we can document them so there are no surprises in the future. I just don't think that waiting to merge this is worth it. There are other small issues that you mentioned that I'll address before merging, but I wanted to talk about the important things first. |
@davidor if you see any issues feel free to keep me in the loop. |
Rebased on top of master. I'll try to add a document with the known limitations tomorrow. |
@unleashed I added a document with the limitations, goals, and other info in my latest commit. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Check out comments
docs/async.md
Outdated
enumerator. That class is only required for the Oauth feature which is | ||
deprecated. Also, the lazy enumerator was not giving us any advantage in this | ||
particular scenario. The important thing is that there might be some problems | ||
with some lazy enumerators (not all) when using `async-redis`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it's the important thing then it should be made prominent in the listing, rather than focusing on the specific instance that was changed. We should state we don't know / haven't investigated yet why this happens.
docs/async.md
Outdated
- We need this line `RSpec.current_example = example` in `config.around :each` | ||
of `spec_helper.rb` in order to make the acceptance tests pass. | ||
|
||
- There's one test of the suite that fails with the async client. It's in the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This a symptom of some issue we think might be related to threading. So the limitation would be: potential issues with threading we haven't figured out yet.
clients. | ||
|
||
|
||
## Limitations |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is no mention of what the async gem does, and it is a potential source of problems. In particular, it monkey-patches core IO modules to use a reactor, which means that interaction with any IO code is to be suspected in case of issues, and C extensions (usually) won't use the reactor for their IO (so we need to watch out for blocking there).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It doesn’t do that (monkey patching).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that was my understanding from when the first PR was introduced - are you not modifying the behaviour of IO
methods?
I'd then assume async-redis is using async{,-io} on its own and then nothing else would be touching the reactor, right? That is preferable for us as it lowers the exposure to issues in unrelated code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that was my understanding from when the first PR was introduced - are you not modifying the behaviour of IO methods?
@unleashed we are not modifying IO. We add Async::IO
which provides (almost) identical wrappers which you can inject into other code if needed.
I'd then assume async-redis is using async{,-io} on its own and then nothing else would be touching the reactor, right? That is preferable for us as it lowers the exposure to issues in unrelated code.
At the moment, that's correct. However in the future, we may make it possible for the user to allow other forms of IO to use the reactor: ruby/ruby#1870 - just FYI.
want to be compatible with the two libraries at least for a while. That means | ||
that we need an adapter `lib/3scale/backend/storage_async/client.rb`. We could | ||
simplify that and other things in our codebase by contributing to the | ||
`async-redis` project. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are also missing the avoid IO in the middle of a pipeline
limitation.
Edit: maybe fixed with fiber id check?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't feel comfortable merging stuff that breaks core language primitives (threading, fibers/lazy enums) in a way that we don't understand. Not only because of the potential issues in the current state, which we might eventually assess to be ok, but the potential for things further down the road breaking in our code and also in dependencies. There are a few other smaller nits I think could have been addressed since I first mentioned them.
The idea is good and I hope we can fix these issues or have a good compromise. But I don't think this is ok as it is, and further work is needed. My approval here only means "I delegate to you".
bors delegate+ |
✌️ davidor can now approve this pull request. To approve and merge a pull request, simply reply with |
What primitives are broken by async? |
Lazy enumerators are buggy when combined with user fibers. It's a known issue: ruby/ruby#2002 The solution is not to use async in code which can be used from an enumerator, until the bug/issue is resolved within Ruby itself. Regarding threads, if you use thread level locks/mutex within async code, you might have strange behaviour, especially if you mix the reactor between threads. Reactors should be per-thread. If you need thread synchronisation, you need to use a higher level construct e.g. a pipe. Using thread synchronisation with an async reactor is blocking by definition. Blocking operations are not strictly bad in async, but they will increase latency and there is a chance you can deadlock (but that applies even if you don't use async). |
Thanks for the explanation @ioquatix The problem that you mentioned is something that we were not aware of. It's something different from what has been discussed between me and @unleashed in this PR up until now. That issue is the only real problem I see with this PR. We were not aware that enumerators had problems when combined with fibers because of a bug at the Ruby language level. I'll need to think about this. I think that code like: array_with_keys.map { |k| redis.get(k) } is pretty common. Unfortunately, I'll need to put this on hold if things like that can unexpectedly fail. At least until I understand exactly what enumerators are problematic, because the example above works without problems as well as all the others that we use in our test suite. |
@davidor that is not a lazy enumerator. There could be problems with our code or dependencies using fibers (some gems use fibers under the hood), but AIUI non-lazy enumerators should not be affected. The threading issue might be resolved by adapting the offending test, and the remaining problem is whether we want to live with these issues. A closer look to the problems or a link to existing or new issues is what I was requesting in my review. Thanks for providing those, @ioquatix. Now we have more information, and it would be good to add it to the document on limitations before this would be merged. |
I see. I misunderstood some things. My last comment is not correct. |
docs/async.md
Outdated
one. The reason is that our tests are highly coupled with Redis, even the unit | ||
ones. Running the test suite with only one of the clients is risky. | ||
|
||
- Cannot use `Fiber.yield` in an enumerator. See [ruby PR |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would specifically call out lazy enumerators, because that is where you are most likely to inadvertently hit this problem.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's only lazy enumerators that allocate a fiber. Some lazy enumerators can avoid it. The typical situation is when you invoke zip, the argument will use a fiber internally.
@unleashed and I have decided to merge this. |
bors r=@unleashed |
Build succeeded |
This is an integration branch. It contains: #77 , #86 , #92 , #93 .
It cannot be merged yet. We need to drop support for Ruby < 2.2.7 first.