Non-blocking Redis calls: preparation (import libs, write async client, adapt tests, etc.) #77

davidor · 2019-02-26T10:23:18Z

This PR implements the base for what we need to use the async-redis lib.

The idea is to be able to perform non-blocking calls to Redis in a reactor instead of having to spawn N worker processes where each one blocks on redis calls.

Notice that the base branch of this PR is an integration one. This is because this PR is just a first step.

This PR:

Sets the Ruby version to 2.3. Ruby 2.2 has reached EOL and the minimum version required by async-redis is 2.2.7, which is higher than the one we are testing against for 2.2. For that reason, I think it's better to just update to 2.3.
Adapts the test helpers so tests are run in a reactor. All the tests use async-redis.
Implements an async storage client. Pipeline is not implemented in the async-redis library, so it's implmented here.
Adapts some tests to make them work with the async storage client.

Notice that this PR does not change any models or any of the core functionality. Using the async-storage is opt-in. There's a new parameter in the config for that (redis.async).

TODO for coming PRs:

Adapt the workers to use the async storage client implemented in this PR.
Adapt the listener to use the async storage client implemented in this PR.
Perf optimization: ensure that there's only one socket.write for each pipeline.
Run all the tests both with the async storage client and the standard one.
Document limitations (avoid IO in the middle of a pipeline, avoid .lazy + storage requests, add reminder about other non-redis blocking IO, etc.)

CircleCI tests fail because the CI image tries to install ruby 2.2. We'll need to change it before merging this to master. Tests pass locally.

It is redundant because that work is already done in the helpers

andrewdavidmackenzie · 2019-02-26T11:40:05Z

Nice!

unleashed

Ok, nice work and direction, @davidor 👍. That said, there are some (few) comments and issues, please take a look and let's see how can we fix/workaround them.

unleashed · 2019-02-27T10:37:43Z

spec/spec_helper.rb

@@ -48,8 +49,22 @@ def committed_at

  config.before :each do
    ThreeScale::Backend::Storage.instance(true).flushdb
+


spurious blank line?

unleashed · 2019-02-27T10:39:38Z

spec/spec_helper.rb

+
+  config.around :each do |example|
+    Async.run do
+      # This is needed for the acceptance specs. Not sure why.


is there any link we can follow to an open issue? or any other resource for that matter?

if there is, let's add it here, and in any case, add a TODO/FIXME here to mark it as a wart needing some investigation.

I don't know. I found this experimenting. Not sure if it's a bug or just something that we're doing wrong or something I didn't take into account.

spec/acceptance/api/internal/utilization_api_spec.rb

unleashed · 2019-02-27T11:07:00Z

lib/3scale/backend/storage_async/client.rb

+          define_method(method) do |*args|
+            @redis_async.call(method.to_s.upcase, *args)
+          end
+        end


so these methods do not exist in @redis_async itself and always need call? If so, any reason for that? Maybe it's still incomplete/WIP? 👀

async-redis offers 2 different interfaces for making requests:

Perform redis calls with .call(): client.call('GET', 'k'), client.call('SET', 'k', '1'), etc. This works for any redis operation.

Perform redis calls with methods like client.get('k'), client.set('k', '1'), etc. In the library they refer to these methods as "dsl" or "methods". They are defined here: https://github.com/socketry/async-redis/tree/master/lib/async/redis/methods The problem is that not all the commands are supported (for example, they do not provide any methods for sets). Also, those methods don't necessarily need to provide the same interface as redis-rb, so we might end up doing something similar to what we have now anyway. I'll leave this as it is now. In the future, I think it might make sense to contribute to the library to implement the methods that are missing.

We are still experimenting with how to best support all Redis methods, we experimenting with generating method stubs from redis documentation.

unleashed · 2019-02-27T11:08:41Z

lib/3scale/backend/storage_async/client.rb

+          define_method(method) do |*args|
+            @redis_async.call(method.to_s.upcase, *args) > 0
+          end
+        end


Ok, this is adding to the above problem - I don't think we will have a problem in the short term, but this really should be done in the client's code, because otherwise we will be bitten by any new addition or change. Are you planning to create a PR upstream?

Possibly, see comment above :D

unleashed · 2019-02-27T14:17:07Z

spec/unit/storage_async/pipeline_spec.rb

+module ThreeScale
+  module Backend
+    module StorageAsync
+      describe Pipeline do


this should be ideally done in the upstream client - maybe we could keep these (or similar) specs if we implement a different solution needing to prove that yielding from a Fiber signals an error when the client is used from a different context... but well, even that I think should be considered to belong to async-redis with pipelining support.

Agreed. #77 (comment)

unleashed · 2019-02-27T14:20:20Z

lib/3scale/backend/storage.rb

+          if configuration.redis.async
+            Backend::StorageAsync::Client
+          else
+            Backend::StorageSync


let's use this to decide what to mock in tests? or mock both and run tests on both? 😓

Related: #77 (comment)

unleashed · 2019-02-27T14:22:09Z

spec/unit/event_storage_spec.rb

-                sleep 0.01 until t.stop?
+              # This test does not work when using the async storage. The async
+              # libs run async tasks inside Fibers and creating threads like in
+              # this test


this comment (also in the commit message) seems to have some typo so that the sentence feels not finished

in any case, a more thorough explanation of why (what limitation?) it does not work is merited, even if it's just a comment here, because there is no fundamental reason preventing multiple threads from being combined with one or more reactors.

creating threads from a fiber looked weird to me.

unleashed · 2019-02-27T14:27:47Z

lib/3scale/backend/oauth/token_storage.rb

@@ -94,7 +94,7 @@ def all_by_service_and_app(service_id, app_id, user_id = nil)
                    Token.from_value token, service_id, value, ttl
                  end
                end
-                .force.tap do
+                .tap do


No explanation of the actual problem in the commit message. 👎

This is important, can you please explain it? I will comment below what's wrong with removing this (although I don't mean it cannot or should not be removed, depends on our usage).

unleashed · 2019-02-27T14:36:41Z

lib/3scale/backend/oauth/token_storage.rb

-              # laziness is maintained until some enumerator forces execution or
-              # the caller calls 'to_a' or 'force', whichever happens first.
-              storage.smembers(token_set).lazy
+              storage.smembers(token_set)


So, the commit message is just wrong.

lazy does not magically make smembers fetch partial contents of a Redis set, process them, and then fetch more. lazy creates an Enumerator::Lazy, which fundamentally changes what happens after the call to lazy.

If storage.smembers(...) above would have been the mentioned (in TODO comments) storage.sscan(...) wrapper providing an Enumerator to have it provide something like 100 elements in each batch, lazy would still fundamentally change the way it behaves.

See, before the call to lazy:

SMEMBERS -> Fetch a million keys from set -> Store a million keys in an Array.

SSCAN -> Fetch 100 elements from set -> Store 100 elements in an Array.

After the call to lazy, imagine we have tokens_from(a_set).select { |t| t.start_with 'alex' }.take(2) elsewhere:

SMEMBERS w/o lazy: A million elements Array -> select iterates over all that million of elements filling in another array -> ie. creates an array with 15 elements -> returns first two matching elements in a new 2-element array.

SMEMBERS w/ lazy: A million elements Array -> select iterates just to find the first element matching -> take accumulates this element in an array -> select iterates to find the second element matching -> take accumulates this element in the array, now a 2-element array -> returns first two matching elements in the 2-element array.

SSCAN w/o lazy: 100 elements Array -> select iterates over the 100 elements filling another array -> ie. creates an array with 1 element -> returns first element in a 1-element array -> some special code would be needed to call SSCAN again and add to the results array -> another 100 elements Array is provided -> select iterates over these new 100 elements filling yet another array -> ie. creates an array with 3 elements -> take returns first two elements of the 3 -> some special code would be needed to join this 2-element array to the tail of the previous 1-element array, then trimming the array to only contain the first 2 elements.

SSCAN w/ lazy: 100 elements Array -> select checks elements one by one until it finds the first matching -> take accumulates the first element, but because it hasn't accumulated two, the process iterates -> select checks now remaining elements from the 100 element array one by one until it finds the second... assume the second is not found, so select iterates on its source lazy enumerator -> the SSCAN enumerator returns the next batch of 100 elements -> select gets to check now for the second matching element again -> finds it and take accumulates it in its array, a 2-element array, so it is done -> take returns the array result.

Ok, so this is to clarify, there are 2 effects at play for minimising both CPU and memory usage, which essentially correspond to external and internal fragmentation:

External fragmentation: we can avoid storing millions of tokens in memory if we implement a generator/enumerator, which is what SSCAN is for (including avoiding collapsing network bandwidth and burning the Redis CPU) and we have yet to wrap it in an Enumerator.

Internal fragmentation: we can avoid processing extra elements if our processing has a termination condition that does not require looking at the whole collection (ie. searching for a specific subset of the collection), and we already have this implemented with an Enumerator::Lazy.

The 1st is interesting in its own right (but we haven't yet implemented it), certainly the most interesting one. But this is removing the 2nd, and we really need to find a good reason to do so (it might be that we aren't taking advantage of the laziness, though, I just don't know at this point).

The important thing, though, if the justification exists, is that all users need to be checked in how they use the lazy enumerator to see whether they can terminate early (and if so, how big would the estimated impact be, given sets could be huge).

I understand how .lazy works and I made a conscious decision when I decided to remove it. Let me try to justify that. I admit that I didn't do a good job explaining this in the description of the commit.

First of all, this code is deprecated. It's only used in the endpoints that we offer to manage OAuth tokens which are deprecated.

I spent some time trying to debug the issue, but to be honest, I was not able to fully understand why .lazy does not play well with the async library. It might be related to something like this: socketry/async#23, but I do not know for sure.

Regarding your points about external/internal fragmentation:

External fragmentation. As you explained, .lazy() would be absolutely critical if we changed this to SSCAN as the TODO in the code mentions, but I suspect this is not going to happen. It has not happened in years, so it's not going to happen now that the code is deprecated.

Internal fragmentation. As far as I know, the method where the call to .lazy was is only called from OAuth::Token::Storage.all_by_service_and_app which needs to return all the tokens returned by the smembers call except for the expired ones. So there's a .select that discards some elements, but there's nothing like a take(a_few).

@davidor re: SSCAN happening - well, it can still happen and the fact this is "deprecated" does not mean SSCAN would not be used elsewhere. The fact this code implies lazy won't be usable is something that will need to be documented and investigated eventually.

I know. I added a note in the TODO list above to document this in the future. We might find other limitations in future PRs so I'll document all of them later.

davidor · 2019-03-04T12:04:00Z

@unleashed I addressed all your comments.

I also added a few minor things that I noticed were missing. Those changes are in the last 4 commits.

I've updated the TODO in the first comment of this PR with things that will be addressed in future PRs. In particular, most things that have to do with contributing code to the async-redis project.

unleashed · 2019-03-04T16:30:29Z

Gemfile.base

@@ -60,3 +61,4 @@ gem 'sinatra', '~> 2.0.3'
 gem 'sinatra-contrib', '~> 2.0.3'
 # Optional external error logging services
 gem 'bugsnag', '~> 6', require: nil
+gem 'async-redis', '~> 0.3.3'


since this is going to be opt-in, add require: nil and load only when config is set to use async? also, async-rspec above could also be loaded on demand if async testing was requested - IIRC loading the async library itself can have side effects as it monkey-patches stuff, so best to avoid it completely when not wanted

👍
I'll leave the async-rpec part for the PR where I'll change things to run the test suite with both clients.

unleashed · 2019-03-04T17:02:30Z

lib/3scale/backend/oauth/token_storage.rb

-              # laziness is maintained until some enumerator forces execution or
-              # the caller calls 'to_a' or 'force', whichever happens first.
-              storage.smembers(token_set).lazy
+              storage.smembers(token_set)


@davidor re: SSCAN happening - well, it can still happen and the fact this is "deprecated" does not mean SSCAN would not be used elsewhere. The fact this code implies lazy won't be usable is something that will need to be documented and investigated eventually.

unleashed · 2019-03-04T17:05:13Z

lib/3scale/backend/storage.rb

-# used to provide a Redis client based on a configuration object
-require 'uri'
+require '3scale/backend/storage_async'
+require '3scale/backend/storage_sync'


Just a note: can we design this so opting out of async means no async code gets loaded? I worry about the async monkeypatching stuff, which, while assumed to be inactive, should still not happen when using sync.

unleashed · 2019-03-04T17:06:17Z

lib/3scale/backend/storage_async/client.rb

+
+          # When running a nested pipeline, we just need to continue
+          # accumulating commands.
+          if @redis_async.is_a? Pipeline


/shrug we know we can do better, easily.

unleashed · 2019-03-04T17:07:55Z

lib/3scale/backend/storage_async/client.rb

+
+          original = @redis_async
+          pipeline = Pipeline.new
+          @redis_async = pipeline


didn't you add/change this?

unleashed · 2019-03-04T17:56:43Z

Ok to merge to integration branch 👍

Also, move the helpers to StorageHelpers to be able to use them to instantiate AsyncStorage too. The helpers will need to be cleaned-up because we have ThreeScale::Backend::StorageHelpers and ThreeScale::Backend::Storage::Helpers. They should probably be merged.

Most of these tests only apply to StorageSync. StorageAsync does not support sentinels, for example.

…torage This test does not work when using the async storage. The async libs run async tasks inside Fibers and creating threads like in this test.

It causes problems with the async-redis library. Also, I do not think it has any value, because the smembers will need to be ran anyway. It does not really matter when.

…on.redis

Includes socketry/async-redis#7

There are a couple of callers that need to instantiate a client instead or reusing an existing one with #instance.

… fibers

davidor · 2019-03-04T17:58:20Z

Thanks for the review @unleashed . I squashed the "fixup" commits

davidor · 2019-03-04T18:00:14Z

CircleCI fails because the CI image contains a ruby version that is not compatible with this feature. We'll update the image later before merging to master. The tests pass locally.

ioquatix · 2019-04-06T21:54:13Z

We also recently did this: redis/redis-rb#832

96: Non-blocking redis calls using redis-async lib r=unleashed a=davidor This is an integration branch. It contains: #77 , #86 , #92 , #93 . ~It cannot be merged yet. We need to drop support for Ruby < 2.2.7 first.~ Co-authored-by: David Ortiz <z.david.ortiz@gmail.com>

davidor added 6 commits February 21, 2019 10:56

gemspec: set Ruby to >= 2.3.0

b66918d

ruby-version: set to 2.3.6

d0cda35

.circleci/config: do not run tests on ruby 2.2

b988de1

Gemfiles: add async-redis

babe251

test/test_helper: run tests with a reactor

15c9d32

spec/acceptance/utilization: delete unnecessary 'before'

f687302

It is redundant because that work is already done in the helpers

davidor requested a review from unleashed February 27, 2019 10:41

unleashed suggested changes Feb 27, 2019

View reviewed changes

davidor added 2 commits February 28, 2019 12:34

spec/spec_helper: run tests with a reactor

b4dd9a5

configuration: add redis.async param

2e7ee3e

davidor force-pushed the async-redis branch from 34d3b90 to 8578f3b Compare March 1, 2019 09:31

Add an async client based on async-redis

5e09973

davidor force-pushed the async-redis branch 2 times, most recently from 27c5b06 to 1ca21e7 Compare March 4, 2019 12:01

davidor requested a review from unleashed March 4, 2019 15:27

unleashed reviewed Mar 4, 2019

View reviewed changes

unleashed approved these changes Mar 4, 2019

View reviewed changes

davidor added 10 commits March 4, 2019 18:57

StorageAsync: implement pipelining

0dbb2d5

StorageAsync: add .call_pipeline method to the async-redis lib

60850db

test/helpers/storage: use StorageAsync::Client instead of Storage

1e0571c

spec/unit: add specs for Pipeline

66c30da

storage: add new class that chooses async or sync based on the config

9970253

test/test_helpers/configuration: set redis.async = true

215f986

test/unit: adapt storage tests

e065c65

Most of these tests only apply to StorageSync. StorageAsync does not support sentinels, for example.

spec/unit/event_storage: run test with threads only when using sync s…

c65546d

…torage This test does not work when using the async storage. The async libs run async tasks inside Fibers and creating threads like in this test.

oauth/token_storage: do not use .lazy for the smembers call

d9308a6

It causes problems with the async-redis library. Also, I do not think it has any value, because the smembers will need to be ran anyway. It does not really matter when.

davidor added 7 commits March 4, 2019 18:57

Dockerfile.ci: do not install Ruby 2.2

8572f5e

storage_async/client: initialize properly using params in configurati…

1d90d27

…on.redis

Gemfiles: update async-redis to v0.3.3

a59aacd

Includes socketry/async-redis#7

storage: expose #new

39cf772

There are a couple of callers that need to instantiate a client instead or reusing an existing one with #instance.

queue_storage, stats_storage: instantiate Storage with Storage#new

0cef4d6

spec/unit/queue_storage: run tests only when using sync storage

9035dbd

storage_async/pipeline: check that it is not shared between different…

ba5a046

… fibers

davidor force-pushed the async-redis branch from 93a1f3b to ba5a046 Compare March 4, 2019 17:57

davidor merged commit 6adafd8 into async Mar 4, 2019

bors bot deleted the async-redis branch March 4, 2019 18:00

This was referenced Mar 19, 2019

Non-blocking Redis calls: add an async worker #92

Merged

Non-blocking Redis calls: run tests with both blocking/non-blocking drivers #93

Merged

davidor mentioned this pull request Apr 24, 2019

Non-blocking redis calls using redis-async lib #96

Merged

		@@ -48,8 +49,22 @@ def committed_at

		config.before :each do
		ThreeScale::Backend::Storage.instance(true).flushdb

Non-blocking Redis calls: preparation (import libs, write async client, adapt tests, etc.) #77

Non-blocking Redis calls: preparation (import libs, write async client, adapt tests, etc.) #77

Conversation

davidor commented Feb 26, 2019 • edited

andrewdavidmackenzie commented Feb 26, 2019

unleashed left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

unleashed Mar 4, 2019 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

davidor commented Mar 4, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

unleashed Mar 4, 2019 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

unleashed commented Mar 4, 2019

davidor commented Mar 4, 2019

davidor commented Mar 4, 2019

ioquatix commented Apr 6, 2019

davidor commented Feb 26, 2019 •

edited

unleashed Mar 4, 2019 •

edited

unleashed Mar 4, 2019 •

edited