Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Non-blocking Redis calls: preparation (import libs, write async client, adapt tests, etc.) #77

Merged
merged 26 commits into from Mar 4, 2019

Conversation

davidor
Copy link
Contributor

@davidor davidor commented Feb 26, 2019

This PR implements the base for what we need to use the async-redis lib.

The idea is to be able to perform non-blocking calls to Redis in a reactor instead of having to spawn N worker processes where each one blocks on redis calls.

Notice that the base branch of this PR is an integration one. This is because this PR is just a first step.

This PR:

  • Sets the Ruby version to 2.3. Ruby 2.2 has reached EOL and the minimum version required by async-redis is 2.2.7, which is higher than the one we are testing against for 2.2. For that reason, I think it's better to just update to 2.3.
  • Adapts the test helpers so tests are run in a reactor. All the tests use async-redis.
  • Implements an async storage client. Pipeline is not implemented in the async-redis library, so it's implmented here.
  • Adapts some tests to make them work with the async storage client.

Notice that this PR does not change any models or any of the core functionality. Using the async-storage is opt-in. There's a new parameter in the config for that (redis.async).

TODO for coming PRs:

  • Adapt the workers to use the async storage client implemented in this PR.
  • Adapt the listener to use the async storage client implemented in this PR.
  • Perf optimization: ensure that there's only one socket.write for each pipeline.
  • Run all the tests both with the async storage client and the standard one.
  • Document limitations (avoid IO in the middle of a pipeline, avoid .lazy + storage requests, add reminder about other non-redis blocking IO, etc.)

CircleCI tests fail because the CI image tries to install ruby 2.2. We'll need to change it before merging this to master. Tests pass locally.

@andrewdavidmackenzie
Copy link
Member

Nice!

Copy link
Contributor

@unleashed unleashed left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, nice work and direction, @davidor 👍. That said, there are some (few) comments and issues, please take a look and let's see how can we fix/workaround them.

@@ -48,8 +49,22 @@ def committed_at

config.before :each do
ThreeScale::Backend::Storage.instance(true).flushdb

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

spurious blank line?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍


config.around :each do |example|
Async.run do
# This is needed for the acceptance specs. Not sure why.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there any link we can follow to an open issue? or any other resource for that matter?

if there is, let's add it here, and in any case, add a TODO/FIXME here to mark it as a wart needing some investigation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know. I found this experimenting. Not sure if it's a bug or just something that we're doing wrong or something I didn't take into account.

define_method(method) do |*args|
@redis_async.call(method.to_s.upcase, *args)
end
end
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so these methods do not exist in @redis_async itself and always need call? If so, any reason for that? Maybe it's still incomplete/WIP? 👀

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

async-redis offers 2 different interfaces for making requests:

  1. Perform redis calls with .call(): client.call('GET', 'k'), client.call('SET', 'k', '1'), etc. This works for any redis operation.
  2. Perform redis calls with methods like client.get('k'), client.set('k', '1'), etc. In the library they refer to these methods as "dsl" or "methods". They are defined here: https://github.com/socketry/async-redis/tree/master/lib/async/redis/methods The problem is that not all the commands are supported (for example, they do not provide any methods for sets). Also, those methods don't necessarily need to provide the same interface as redis-rb, so we might end up doing something similar to what we have now anyway. I'll leave this as it is now. In the future, I think it might make sense to contribute to the library to implement the methods that are missing.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are still experimenting with how to best support all Redis methods, we experimenting with generating method stubs from redis documentation.

define_method(method) do |*args|
@redis_async.call(method.to_s.upcase, *args) > 0
end
end
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, this is adding to the above problem - I don't think we will have a problem in the short term, but this really should be done in the client's code, because otherwise we will be bitten by any new addition or change. Are you planning to create a PR upstream?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Possibly, see comment above :D

module ThreeScale
module Backend
module StorageAsync
describe Pipeline do
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should be ideally done in the upstream client - maybe we could keep these (or similar) specs if we implement a different solution needing to prove that yielding from a Fiber signals an error when the client is used from a different context... but well, even that I think should be considered to belong to async-redis with pipelining support.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. #77 (comment)

if configuration.redis.async
Backend::StorageAsync::Client
else
Backend::StorageSync
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's use this to decide what to mock in tests? or mock both and run tests on both? 😓

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Related: #77 (comment)

sleep 0.01 until t.stop?
# This test does not work when using the async storage. The async
# libs run async tasks inside Fibers and creating threads like in
# this test
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this comment (also in the commit message) seems to have some typo so that the sentence feels not finished

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in any case, a more thorough explanation of why (what limitation?) it does not work is merited, even if it's just a comment here, because there is no fundamental reason preventing multiple threads from being combined with one or more reactors.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

creating threads from a fiber looked weird to me.

@@ -94,7 +94,7 @@ def all_by_service_and_app(service_id, app_id, user_id = nil)
Token.from_value token, service_id, value, ttl
end
end
.force.tap do
.tap do
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No explanation of the actual problem in the commit message. 👎

This is important, can you please explain it? I will comment below what's wrong with removing this (although I don't mean it cannot or should not be removed, depends on our usage).

# laziness is maintained until some enumerator forces execution or
# the caller calls 'to_a' or 'force', whichever happens first.
storage.smembers(token_set).lazy
storage.smembers(token_set)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, the commit message is just wrong.

lazy does not magically make smembers fetch partial contents of a Redis set, process them, and then fetch more. lazy creates an Enumerator::Lazy, which fundamentally changes what happens after the call to lazy.

If storage.smembers(...) above would have been the mentioned (in TODO comments) storage.sscan(...) wrapper providing an Enumerator to have it provide something like 100 elements in each batch, lazy would still fundamentally change the way it behaves.

See, before the call to lazy:

  • SMEMBERS -> Fetch a million keys from set -> Store a million keys in an Array.
  • SSCAN -> Fetch 100 elements from set -> Store 100 elements in an Array.

After the call to lazy, imagine we have tokens_from(a_set).select { |t| t.start_with 'alex' }.take(2) elsewhere:

  • SMEMBERS w/o lazy: A million elements Array -> select iterates over all that million of elements filling in another array -> ie. creates an array with 15 elements -> returns first two matching elements in a new 2-element array.

  • SMEMBERS w/ lazy: A million elements Array -> select iterates just to find the first element matching -> take accumulates this element in an array -> select iterates to find the second element matching -> take accumulates this element in the array, now a 2-element array -> returns first two matching elements in the 2-element array.

  • SSCAN w/o lazy: 100 elements Array -> select iterates over the 100 elements filling another array -> ie. creates an array with 1 element -> returns first element in a 1-element array -> some special code would be needed to call SSCAN again and add to the results array -> another 100 elements Array is provided -> select iterates over these new 100 elements filling yet another array -> ie. creates an array with 3 elements -> take returns first two elements of the 3 -> some special code would be needed to join this 2-element array to the tail of the previous 1-element array, then trimming the array to only contain the first 2 elements.

  • SSCAN w/ lazy: 100 elements Array -> select checks elements one by one until it finds the first matching -> take accumulates the first element, but because it hasn't accumulated two, the process iterates -> select checks now remaining elements from the 100 element array one by one until it finds the second... assume the second is not found, so select iterates on its source lazy enumerator -> the SSCAN enumerator returns the next batch of 100 elements -> select gets to check now for the second matching element again -> finds it and take accumulates it in its array, a 2-element array, so it is done -> take returns the array result.

Ok, so this is to clarify, there are 2 effects at play for minimising both CPU and memory usage, which essentially correspond to external and internal fragmentation:

  1. External fragmentation: we can avoid storing millions of tokens in memory if we implement a generator/enumerator, which is what SSCAN is for (including avoiding collapsing network bandwidth and burning the Redis CPU) and we have yet to wrap it in an Enumerator.
  2. Internal fragmentation: we can avoid processing extra elements if our processing has a termination condition that does not require looking at the whole collection (ie. searching for a specific subset of the collection), and we already have this implemented with an Enumerator::Lazy.

The 1st is interesting in its own right (but we haven't yet implemented it), certainly the most interesting one. But this is removing the 2nd, and we really need to find a good reason to do so (it might be that we aren't taking advantage of the laziness, though, I just don't know at this point).

The important thing, though, if the justification exists, is that all users need to be checked in how they use the lazy enumerator to see whether they can terminate early (and if so, how big would the estimated impact be, given sets could be huge).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand how .lazy works and I made a conscious decision when I decided to remove it. Let me try to justify that. I admit that I didn't do a good job explaining this in the description of the commit.

First of all, this code is deprecated. It's only used in the endpoints that we offer to manage OAuth tokens which are deprecated.

I spent some time trying to debug the issue, but to be honest, I was not able to fully understand why .lazy does not play well with the async library. It might be related to something like this: socketry/async#23, but I do not know for sure.

Regarding your points about external/internal fragmentation:

  • External fragmentation. As you explained, .lazy() would be absolutely critical if we changed this to SSCAN as the TODO in the code mentions, but I suspect this is not going to happen. It has not happened in years, so it's not going to happen now that the code is deprecated.

  • Internal fragmentation. As far as I know, the method where the call to .lazy was is only called from OAuth::Token::Storage.all_by_service_and_app which needs to return all the tokens returned by the smembers call except for the expired ones. So there's a .select that discards some elements, but there's nothing like a take(a_few).

Copy link
Contributor

@unleashed unleashed Mar 4, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@davidor re: SSCAN happening - well, it can still happen and the fact this is "deprecated" does not mean SSCAN would not be used elsewhere. The fact this code implies lazy won't be usable is something that will need to be documented and investigated eventually.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know. I added a note in the TODO list above to document this in the future. We might find other limitations in future PRs so I'll document all of them later.

@davidor davidor force-pushed the async-redis branch 2 times, most recently from 27c5b06 to 1ca21e7 Compare March 4, 2019 12:01
@davidor
Copy link
Contributor Author

davidor commented Mar 4, 2019

@unleashed I addressed all your comments.

I also added a few minor things that I noticed were missing. Those changes are in the last 4 commits.

I've updated the TODO in the first comment of this PR with things that will be addressed in future PRs. In particular, most things that have to do with contributing code to the async-redis project.

@davidor davidor requested a review from unleashed March 4, 2019 15:27
Gemfile.base Outdated
@@ -60,3 +61,4 @@ gem 'sinatra', '~> 2.0.3'
gem 'sinatra-contrib', '~> 2.0.3'
# Optional external error logging services
gem 'bugsnag', '~> 6', require: nil
gem 'async-redis', '~> 0.3.3'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since this is going to be opt-in, add require: nil and load only when config is set to use async? also, async-rspec above could also be loaded on demand if async testing was requested - IIRC loading the async library itself can have side effects as it monkey-patches stuff, so best to avoid it completely when not wanted

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍
I'll leave the async-rpec part for the PR where I'll change things to run the test suite with both clients.

# laziness is maintained until some enumerator forces execution or
# the caller calls 'to_a' or 'force', whichever happens first.
storage.smembers(token_set).lazy
storage.smembers(token_set)
Copy link
Contributor

@unleashed unleashed Mar 4, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@davidor re: SSCAN happening - well, it can still happen and the fact this is "deprecated" does not mean SSCAN would not be used elsewhere. The fact this code implies lazy won't be usable is something that will need to be documented and investigated eventually.

# used to provide a Redis client based on a configuration object
require 'uri'
require '3scale/backend/storage_async'
require '3scale/backend/storage_sync'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a note: can we design this so opting out of async means no async code gets loaded? I worry about the async monkeypatching stuff, which, while assumed to be inactive, should still not happen when using sync.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍


# When running a nested pipeline, we just need to continue
# accumulating commands.
if @redis_async.is_a? Pipeline
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/shrug we know we can do better, easily.


original = @redis_async
pipeline = Pipeline.new
@redis_async = pipeline
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

didn't you add/change this?

@unleashed
Copy link
Contributor

Ok to merge to integration branch 👍

Also, move the helpers to StorageHelpers to be able to use them to
instantiate AsyncStorage too.

The helpers will need to be cleaned-up because we have
ThreeScale::Backend::StorageHelpers and
ThreeScale::Backend::Storage::Helpers. They should probably be merged.
Most of these tests only apply to StorageSync. StorageAsync does not
support sentinels, for example.
…torage

This test does not work when using the async storage. The async libs run
async tasks inside Fibers and creating threads like in this test.
It causes problems with the async-redis library. Also, I do not think it
has any value, because the smembers will need to be ran anyway. It does
not really matter when.
@davidor
Copy link
Contributor Author

davidor commented Mar 4, 2019

Thanks for the review @unleashed . I squashed the "fixup" commits

@davidor
Copy link
Contributor Author

davidor commented Mar 4, 2019

CircleCI fails because the CI image contains a ruby version that is not compatible with this feature. We'll update the image later before merging to master. The tests pass locally.

@ioquatix
Copy link

ioquatix commented Apr 6, 2019

We also recently did this: redis/redis-rb#832

bors bot added a commit that referenced this pull request Sep 30, 2019
96: Non-blocking redis calls using redis-async lib r=unleashed a=davidor

This is an integration branch. It contains: #77 , #86 , #92 , #93 .
~It cannot be merged yet. We need to drop support for Ruby < 2.2.7 first.~

Co-authored-by: David Ortiz <z.david.ortiz@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants