Non-blocking redis calls using redis-async lib #96

davidor · 2019-04-24T10:09:03Z

This is an integration branch. It contains: #77 , #86 , #92 , #93 .
~~It cannot be merged yet. We need to drop support for Ruby < 2.2.7 first.~~

davidor · 2019-08-01T12:15:14Z

I've deleted the commits related with Ruby updates that were merged in a different PR. I've also rebased on top of master adapting the code to the changes that were introduced in recent PRs (sentinels passwords, update of redis-rb, etc.).

davidor · 2019-08-07T16:05:40Z

@unleashed This is ready for review now that we no longer need to support Ruby 2.2.

You already reviewed all the PRs of this integration branch except for this one #92

I spent some time with @slopezz trying this in our Openshift cluster and it's working well.

Let me know what you think we need to be able to merge this. I think it's a very good starting point and we can keep improving later, for example, investigating alternative ways to fetch the jobs from the queue.

117: clean-up: delete unnecessary 'before' in Utilization specs r=unleashed a=davidor Minor clean-up extracted from #96. The other before blocks already set up everything needed without regards of previous contents. Co-authored-by: David Ortiz <z.david.ortiz@gmail.com>

116: Refactor: Extract JobFetcher from Worker r=unleashed a=davidor This PR extracts the logic to fetch jobs from the queue from the `Worker` class. The commits in this PR have been extracted from #96 In that PR we have a sync and a sync worker. The logic to fetch jobs from the queue is the same in both cases so it makes sense to extract it into a separate class. I have created this PR to simplify the other one, but still, I think this has value in itself because it makes the code easier to understand and the logic to fetch jobs from the queue more testable as we no longer need to test a private method. Co-authored-by: David Ortiz <z.david.ortiz@gmail.com>

unleashed

Some things to check (read comments)

Check status and document limitations (best if there's some associated issue):
- Threads
- Fibers
- Lazy enumerators
- Spec/Test set-up
Ensure the configuration early on defines everything to be either Sync or Async (storage) statically and only loads the needed deps.

unleashed · 2019-08-26T14:16:59Z

spec/spec_helper.rb

+
+  config.around :each do |example|
+    Async.run do
+      # TODO: This is needed for the acceptance specs. Not sure why.


any open issue in the async rspec gem we can refer to?

I'm not even sure this is a bug, and it only happens in the acceptance specs, so it makes me think that the problem might be in rspec_api_documentation. I investigated a bit and didn't understand why.

unleashed · 2019-08-26T14:19:58Z

lib/3scale/backend/storage_async/client.rb

+        # All of this might be simplified a bit in the future using the
+        # "methods" in async-redis
+        # https://github.com/socketry/async-redis/tree/master/lib/async/redis/methods
+        # but there are some commands missing, so for now, that's not an option.


Updates on this? I expressed concern about this back when this was merged to the integration branch, because the maintenance burden we are taking is not negligible, so we should check again before landing this.

socketry/protocol-redis#1

unleashed · 2019-08-26T14:27:03Z

lib/3scale/backend/storage_async/async_redis.rb

+module Async
+  module Redis
+    class Client
+      def call_pipeline(commands)


this might be not needed now? or adapted?

Same case. See comment above.

unleashed · 2019-08-26T14:37:32Z

lib/3scale/backend/storage_async/pipeline.rb

+            if CHECK_EQUALS_ONE.include?(command_name)
+              resp.to_i == 1
+            elsif CHECK_GREATER_THAN_0.include?(command_name)
+              resp.to_i > 0


Hopefully this is also pushed down to the async redis client... at some point really soon now?

Same case. See comment above.

unleashed · 2019-08-26T15:03:16Z

lib/3scale/backend/storage.rb

+          if configuration.redis.async
+            Backend::StorageAsync::Client
+          else
+            Backend::StorageSync


This is minor, but if we mean to have this method always choose the same implementation at run time (which is what I understand), then better extract the check outside the method definition and define the method differently depending on the check.

The rest seems to be equal, so the most effective might be to just check for the implementation to use and then require the class (with the same name) or do Storage = StorageAsync::Client or Storage = StorageSync. The remaining constant can either be defined somewhere else or be injected/defined into the chosen class by this "class loader" code.

script/test

spec/unit/storage_async/pipeline_spec.rb

unleashed · 2019-08-26T16:07:22Z

spec/spec_helper.rb

@@ -39,11 +39,11 @@ def committed_at
    require_relative '../test/test_helpers/configuration'
    require_relative '../test/test_helpers/storage'

-    TestHelpers::Storage::Mock.mock_storage_client!
+    TestHelpers::Storage::Mock.mock_storage_clients


Not a fan of letting the detail that there are now two storage clients leak into the naming. Also, this is meant to use just one storage per invocation, isn't it?

unleashed · 2019-08-26T16:11:10Z

lib/3scale/backend/worker.rb

-        super
+        if options[:async]
+          # Conditional require is done to require async-* libs only when
+          # needed and avoid possible side-effects.


Looks good - must ensure the same happens when loading the async storage in tests, or in general in the listeners when choosing just the sync one (I say this because I haven't checked it, but you probably know).

lib/3scale/backend/storage_async/client.rb

davidor · 2019-09-04T09:30:40Z

@unleashed We have different views on what this PR should be.

This is a big change in how apisonator works. It's true that in our tests this increases performance substantially. However, this relies on the async-redis lib. The maintainers have done an amazing job in the lib and the async ecosystem in general. However, It's also true that the lib is not as widely used as redis-rb. Given that it's such an important piece in this project, there's some risk. I think we need to try this with real workloads to see how it behaves. I think it's important to discard problems such as unexpected crashes, memory leaks, etc. The kind of problems that you only realize when running for a long time on a real production scenario.

This feature is opt-in. For users absolutely nothing changes unless they explicitly decide that they want to give this as try. That's why I think we should try to merge this and let @slopezz and others experiment with it.

My opinion is that, rather than try to solve all the minor problems in the first PR, we should go step by step. Try a first version, see how it behaves, try different configuration params to evaluate performance in different environments, etc. If all that goes well, we can put the effort to, for example, contribute some of the code that we have here to other projects in the async ecosystem.

As I said before, I don't think we need to solve all the issues now. For example, contributing a redis-rb-compatible interface for pipelines in async-redis is not a trivial effort. This PR brings a substantial performance improvement to the table. Also, the rules to scale in environments like Openshift become way easier to understand. That's why I'm willing to go more in tech debt that in other cases, and that includes:

I'm willing to add a line to make the acceptance tests pass: Non-blocking redis calls using redis-async lib #96 (comment)
I think that having an adapter for the redis-async so we can use both redis-rb and redis-async at the same time is not a big deal. Sure, ideally that part of the code would be in redis-rb as a driver for redis-async or in redis-async as a dsl with a compatibility layer, but we don't have that right now. Also, having adapters like the one introduced in this PR is a common and widely accepted pattern to solve this kind of problem: Non-blocking redis calls using redis-async lib #96 (comment)
I think that changing a lazy enumerator in a part of the code that's deprecated and that does not really benefit from that lazy enumerator is not a big issue: Non-blocking redis calls using redis-async lib #96 (comment)
I'm also OK with running this test only with the sync driver Non-blocking redis calls using redis-async lib #96 (comment)

Interestingly, you didn't mention anything about running all the tests twice, which I think is the biggest problem that this PR introduces. Again, the ideal scenario would be to have redis-async - redis-rb compatibility tests in any of those projects. But that does no exist right now, and because of how our test suite works and its coupling with Redis, I don't feel confident running the tests just with async or sync and assuming that the other will work just fine.

I think that these trade-offs are reasonable, and we can document them so there are no surprises in the future. I just don't think that waiting to merge this is worth it.

There are other small issues that you mentioned that I'll address before merging, but I wanted to talk about the important things first.

ioquatix · 2019-09-04T21:23:39Z

@davidor if you see any issues feel free to keep me in the loop.

davidor · 2019-09-25T15:00:28Z

Rebased on top of master. I'll try to add a document with the known limitations tomorrow.

davidor · 2019-09-27T13:37:11Z

@unleashed I added a document with the limitations, goals, and other info in my latest commit.

unleashed

Check out comments

unleashed · 2019-09-27T13:53:11Z

docs/async.md

+enumerator. That class is only required for the Oauth feature which is
+deprecated. Also, the lazy enumerator was not giving us any advantage in this
+particular scenario. The important thing is that there might be some problems
+with some lazy enumerators (not all) when using `async-redis`.


If it's the important thing then it should be made prominent in the listing, rather than focusing on the specific instance that was changed. We should state we don't know / haven't investigated yet why this happens.

unleashed · 2019-09-27T14:14:35Z

docs/async.md

+- We need this line `RSpec.current_example = example` in `config.around :each`
+of `spec_helper.rb` in order to make the acceptance tests pass.
+
+- There's one test of the suite that fails with the async client. It's in the


This a symptom of some issue we think might be related to threading. So the limitation would be: potential issues with threading we haven't figured out yet.

unleashed · 2019-09-27T14:24:05Z

docs/async.md

+clients.
+
+
+## Limitations


There is no mention of what the async gem does, and it is a potential source of problems. In particular, it monkey-patches core IO modules to use a reactor, which means that interaction with any IO code is to be suspected in case of issues, and C extensions (usually) won't use the reactor for their IO (so we need to watch out for blocking there).

It doesn’t do that (monkey patching).

that was my understanding from when the first PR was introduced - are you not modifying the behaviour of IO methods?

I'd then assume async-redis is using async{,-io} on its own and then nothing else would be touching the reactor, right? That is preferable for us as it lowers the exposure to issues in unrelated code.

that was my understanding from when the first PR was introduced - are you not modifying the behaviour of IO methods?

@unleashed we are not modifying IO. We add Async::IO which provides (almost) identical wrappers which you can inject into other code if needed.

I'd then assume async-redis is using async{,-io} on its own and then nothing else would be touching the reactor, right? That is preferable for us as it lowers the exposure to issues in unrelated code.

At the moment, that's correct. However in the future, we may make it possible for the user to allow other forms of IO to use the reactor: ruby/ruby#1870 - just FYI.

unleashed · 2019-09-27T14:43:16Z

docs/async.md

+want to be compatible with the two libraries at least for a while. That means
+that we need an adapter `lib/3scale/backend/storage_async/client.rb`. We could
+simplify that and other things in our codebase by contributing to the
+`async-redis` project.


You are also missing the avoid IO in the middle of a pipeline limitation.

Edit: maybe fixed with fiber id check?

unleashed

I don't feel comfortable merging stuff that breaks core language primitives (threading, fibers/lazy enums) in a way that we don't understand. Not only because of the potential issues in the current state, which we might eventually assess to be ok, but the potential for things further down the road breaking in our code and also in dependencies. There are a few other smaller nits I think could have been addressed since I first mentioned them.

The idea is good and I hope we can fix these issues or have a good compromise. But I don't think this is ok as it is, and further work is needed. My approval here only means "I delegate to you".

unleashed · 2019-09-27T15:57:03Z

bors delegate+

bors · 2019-09-27T15:57:04Z

✌️ davidor can now approve this pull request. To approve and merge a pull request, simply reply with bors r+. More detailed instructions are available here.

ioquatix · 2019-09-28T00:41:45Z

What primitives are broken by async?

unleashed · 2019-09-29T17:37:08Z

What primitives are broken by async?

@ioquatix according to @davidor our usage of lazy enumerators and a test using two threads break when loading the async gems. He might be able to provide further details.

ioquatix · 2019-09-30T06:13:10Z

@ioquatix according to @davidor our usage of lazy enumerators and a test using two threads break when loading the async gems. He might be able to provide further details.

Lazy enumerators are buggy when combined with user fibers. It's a known issue: ruby/ruby#2002

The solution is not to use async in code which can be used from an enumerator, until the bug/issue is resolved within Ruby itself.

Regarding threads, if you use thread level locks/mutex within async code, you might have strange behaviour, especially if you mix the reactor between threads. Reactors should be per-thread. If you need thread synchronisation, you need to use a higher level construct e.g. a pipe. Using thread synchronisation with an async reactor is blocking by definition.

Blocking operations are not strictly bad in async, but they will increase latency and there is a chance you can deadlock (but that applies even if you don't use async).

davidor · 2019-09-30T09:00:00Z

Thanks for the explanation @ioquatix

The problem that you mentioned is something that we were not aware of. It's something different from what has been discussed between me and @unleashed in this PR up until now.

That issue is the only real problem I see with this PR. We were not aware that enumerators had problems when combined with fibers because of a bug at the Ruby language level.

I'll need to think about this. I think that code like:

array_with_keys.map { |k| redis.get(k) }

is pretty common. Unfortunately, I'll need to put this on hold if things like that can unexpectedly fail. At least until I understand exactly what enumerators are problematic, because the example above works without problems as well as all the others that we use in our test suite.

unleashed · 2019-09-30T09:15:46Z

@davidor that is not a lazy enumerator. There could be problems with our code or dependencies using fibers (some gems use fibers under the hood), but AIUI non-lazy enumerators should not be affected. The threading issue might be resolved by adapting the offending test, and the remaining problem is whether we want to live with these issues.

A closer look to the problems or a link to existing or new issues is what I was requesting in my review. Thanks for providing those, @ioquatix. Now we have more information, and it would be good to add it to the document on limitations before this would be merged.

davidor · 2019-09-30T10:20:25Z

I see. I misunderstood some things. My last comment is not correct.
I've updated the async.md with the new things we've learned.

unleashed · 2019-09-30T10:50:11Z

docs/async.md

+one. The reason is that our tests are highly coupled with Redis, even the unit
+ones. Running the test suite with only one of the clients is risky.
+
+- Cannot use `Fiber.yield` in an enumerator. See [ruby PR


I would specifically call out lazy enumerators, because that is where you are most likely to inadvertently hit this problem.

It's only lazy enumerators that allocate a fiber. Some lazy enumerators can avoid it. The typical situation is when you invoke zip, the argument will use a fiber internally.

davidor · 2019-09-30T13:05:18Z

@unleashed and I have decided to merge this.
The feature is opt-in and all the limitations are documented in docs/async.md.

davidor · 2019-09-30T13:05:26Z

bors r=@unleashed

96: Non-blocking redis calls using redis-async lib r=unleashed a=davidor This is an integration branch. It contains: #77 , #86 , #92 , #93 . ~It cannot be merged yet. We need to drop support for Ruby < 2.2.7 first.~ Co-authored-by: David Ortiz <z.david.ortiz@gmail.com>

bors · 2019-09-30T13:09:52Z

Build succeeded

ci/circleci

davidor force-pushed the async branch from 18b06c7 to eb9a779 Compare August 1, 2019 12:12

davidor force-pushed the async branch from eb9a779 to 2fba350 Compare August 5, 2019 14:54

davidor marked this pull request as ready for review August 7, 2019 14:47

This was referenced Aug 23, 2019

Refactor: Extract JobFetcher from Worker #116

Merged

clean-up: delete unnecessary 'before' in Utilization specs #117

Merged

davidor force-pushed the async branch from 2fba350 to f40b088 Compare August 26, 2019 12:26

unleashed reviewed Aug 26, 2019

View reviewed changes

davidor force-pushed the async branch 8 times, most recently from a8ac288 to b0ae3ea Compare August 28, 2019 14:07

davidor added 9 commits September 25, 2019 16:51

Gemfiles: add async-redis

8ad4912

test/test_helper: run tests with a reactor

1bf1b51

spec/spec_helper: run tests with a reactor

810a155

configuration: add redis.async param

5ad8c74

Add an async client based on async-redis

064b31e

StorageAsync: implement pipelining

4030d4c

StorageAsync: add .call_pipeline method to the async-redis lib

c3d7e3e

test/helpers/storage: use StorageAsync::Client instead of Storage

62fe424

spec/unit: add specs for Pipeline

56cac28

davidor added 4 commits September 25, 2019 16:51

spec/unit: add specs for WorkerAsync

57f9ec6

spec/integration: add specs for the async worker

29b61fb

job_fetcher: handle connection errors in blpop

e57b92b

spec/integration/worker_metrics: run with worker sync

254b1a7

davidor force-pushed the async branch from b0ae3ea to 254b1a7 Compare September 25, 2019 14:59

unleashed reviewed Sep 27, 2019

View reviewed changes

davidor force-pushed the async branch from e1f2166 to 6c32bda Compare September 27, 2019 15:02

davidor requested a review from unleashed September 27, 2019 15:03

unleashed approved these changes Sep 27, 2019

View reviewed changes

davidor force-pushed the async branch from 6c32bda to 52bcd63 Compare September 30, 2019 10:17

unleashed reviewed Sep 30, 2019

View reviewed changes

docs: document the async redis client feature

bc26858

davidor force-pushed the async branch from 52bcd63 to bc26858 Compare September 30, 2019 12:44

bors bot merged commit bc26858 into master Sep 30, 2019

bors bot deleted the async branch September 30, 2019 13:09

		clients.


		## Limitations

Non-blocking redis calls using redis-async lib #96

Non-blocking redis calls using redis-async lib #96

Conversation

davidor commented Apr 24, 2019 • edited Loading

davidor commented Aug 1, 2019

davidor commented Aug 7, 2019

unleashed left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

davidor commented Sep 4, 2019

ioquatix commented Sep 4, 2019

davidor commented Sep 25, 2019

davidor commented Sep 27, 2019

unleashed left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

unleashed Sep 27, 2019 • edited Loading

Choose a reason for hiding this comment

unleashed left a comment

Choose a reason for hiding this comment

unleashed commented Sep 27, 2019

bors bot commented Sep 27, 2019

ioquatix commented Sep 28, 2019

unleashed commented Sep 29, 2019

ioquatix commented Sep 30, 2019 • edited Loading

davidor commented Sep 30, 2019

unleashed commented Sep 30, 2019

davidor commented Sep 30, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

davidor commented Sep 30, 2019

davidor commented Sep 30, 2019

bors bot commented Sep 30, 2019

Build succeeded

davidor commented Apr 24, 2019 •

edited

Loading

unleashed Sep 27, 2019 •

edited

Loading

ioquatix commented Sep 30, 2019 •

edited

Loading