New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Non-blocking Redis calls: preparation (import libs, write async client, adapt tests, etc.) #77
Conversation
It is redundant because that work is already done in the helpers
Nice! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, nice work and direction, @davidor 👍. That said, there are some (few) comments and issues, please take a look and let's see how can we fix/workaround them.
spec/spec_helper.rb
Outdated
@@ -48,8 +49,22 @@ def committed_at | |||
|
|||
config.before :each do | |||
ThreeScale::Backend::Storage.instance(true).flushdb | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
spurious blank line?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
spec/spec_helper.rb
Outdated
|
||
config.around :each do |example| | ||
Async.run do | ||
# This is needed for the acceptance specs. Not sure why. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is there any link we can follow to an open issue? or any other resource for that matter?
if there is, let's add it here, and in any case, add a TODO/FIXME here to mark it as a wart needing some investigation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know. I found this experimenting. Not sure if it's a bug or just something that we're doing wrong or something I didn't take into account.
define_method(method) do |*args| | ||
@redis_async.call(method.to_s.upcase, *args) | ||
end | ||
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so these methods do not exist in @redis_async
itself and always need call
? If so, any reason for that? Maybe it's still incomplete/WIP? 👀
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
async-redis
offers 2 different interfaces for making requests:
- Perform redis calls with .call():
client.call('GET', 'k')
,client.call('SET', 'k', '1')
, etc. This works for any redis operation. - Perform redis calls with methods like
client.get('k')
,client.set('k', '1')
, etc. In the library they refer to these methods as "dsl" or "methods". They are defined here: https://github.com/socketry/async-redis/tree/master/lib/async/redis/methods The problem is that not all the commands are supported (for example, they do not provide any methods for sets). Also, those methods don't necessarily need to provide the same interface asredis-rb
, so we might end up doing something similar to what we have now anyway. I'll leave this as it is now. In the future, I think it might make sense to contribute to the library to implement the methods that are missing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are still experimenting with how to best support all Redis methods, we experimenting with generating method stubs from redis documentation.
define_method(method) do |*args| | ||
@redis_async.call(method.to_s.upcase, *args) > 0 | ||
end | ||
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, this is adding to the above problem - I don't think we will have a problem in the short term, but this really should be done in the client's code, because otherwise we will be bitten by any new addition or change. Are you planning to create a PR upstream?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Possibly, see comment above :D
module ThreeScale | ||
module Backend | ||
module StorageAsync | ||
describe Pipeline do |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this should be ideally done in the upstream client - maybe we could keep these (or similar) specs if we implement a different solution needing to prove that yielding from a Fiber signals an error when the client is used from a different context... but well, even that I think should be considered to belong to async-redis with pipelining support.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed. #77 (comment)
if configuration.redis.async | ||
Backend::StorageAsync::Client | ||
else | ||
Backend::StorageSync |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's use this to decide what to mock in tests? or mock both and run tests on both? 😓
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Related: #77 (comment)
sleep 0.01 until t.stop? | ||
# This test does not work when using the async storage. The async | ||
# libs run async tasks inside Fibers and creating threads like in | ||
# this test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this comment (also in the commit message) seems to have some typo so that the sentence feels not finished
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in any case, a more thorough explanation of why (what limitation?) it does not work is merited, even if it's just a comment here, because there is no fundamental reason preventing multiple threads from being combined with one or more reactors.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
creating threads from a fiber looked weird to me.
@@ -94,7 +94,7 @@ def all_by_service_and_app(service_id, app_id, user_id = nil) | |||
Token.from_value token, service_id, value, ttl | |||
end | |||
end | |||
.force.tap do | |||
.tap do |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No explanation of the actual problem in the commit message. 👎
This is important, can you please explain it? I will comment below what's wrong with removing this (although I don't mean it cannot or should not be removed, depends on our usage).
# laziness is maintained until some enumerator forces execution or | ||
# the caller calls 'to_a' or 'force', whichever happens first. | ||
storage.smembers(token_set).lazy | ||
storage.smembers(token_set) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, the commit message is just wrong.
lazy
does not magically make smembers
fetch partial contents of a Redis set, process them, and then fetch more. lazy
creates an Enumerator::Lazy
, which fundamentally changes what happens after the call to lazy
.
If storage.smembers(...)
above would have been the mentioned (in TODO comments) storage.sscan(...)
wrapper providing an Enumerator
to have it provide something like 100 elements in each batch, lazy
would still fundamentally change the way it behaves.
See, before the call to lazy
:
- SMEMBERS -> Fetch a million keys from set -> Store a million keys in an Array.
- SSCAN -> Fetch 100 elements from set -> Store 100 elements in an Array.
After the call to lazy
, imagine we have tokens_from(a_set).select { |t| t.start_with 'alex' }.take(2)
elsewhere:
-
SMEMBERS w/o
lazy
: A million elements Array -> select iterates over all that million of elements filling in another array -> ie. creates an array with 15 elements -> returns first two matching elements in a new 2-element array. -
SMEMBERS w/
lazy
: A million elements Array -> select iterates just to find the first element matching -> take accumulates this element in an array -> select iterates to find the second element matching -> take accumulates this element in the array, now a 2-element array -> returns first two matching elements in the 2-element array. -
SSCAN w/o
lazy
: 100 elements Array -> select iterates over the 100 elements filling another array -> ie. creates an array with 1 element -> returns first element in a 1-element array -> some special code would be needed to callSSCAN
again and add to the results array -> another 100 elements Array is provided -> select iterates over these new 100 elements filling yet another array -> ie. creates an array with 3 elements -> take returns first two elements of the 3 -> some special code would be needed to join this 2-element array to the tail of the previous 1-element array, then trimming the array to only contain the first 2 elements. -
SSCAN w/
lazy
: 100 elements Array -> select checks elements one by one until it finds the first matching -> take accumulates the first element, but because it hasn't accumulated two, the process iterates -> select checks now remaining elements from the 100 element array one by one until it finds the second... assume the second is not found, so select iterates on its source lazy enumerator -> the SSCAN enumerator returns the next batch of 100 elements -> select gets to check now for the second matching element again -> finds it and take accumulates it in its array, a 2-element array, so it is done -> take returns the array result.
Ok, so this is to clarify, there are 2 effects at play for minimising both CPU and memory usage, which essentially correspond to external and internal fragmentation:
- External fragmentation: we can avoid storing millions of tokens in memory if we implement a generator/enumerator, which is what SSCAN is for (including avoiding collapsing network bandwidth and burning the Redis CPU) and we have yet to wrap it in an
Enumerator
. - Internal fragmentation: we can avoid processing extra elements if our processing has a termination condition that does not require looking at the whole collection (ie. searching for a specific subset of the collection), and we already have this implemented with an
Enumerator::Lazy
.
The 1st is interesting in its own right (but we haven't yet implemented it), certainly the most interesting one. But this is removing the 2nd, and we really need to find a good reason to do so (it might be that we aren't taking advantage of the laziness, though, I just don't know at this point).
The important thing, though, if the justification exists, is that all users need to be checked in how they use the lazy enumerator to see whether they can terminate early (and if so, how big would the estimated impact be, given sets could be huge).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand how .lazy
works and I made a conscious decision when I decided to remove it. Let me try to justify that. I admit that I didn't do a good job explaining this in the description of the commit.
First of all, this code is deprecated. It's only used in the endpoints that we offer to manage OAuth tokens which are deprecated.
I spent some time trying to debug the issue, but to be honest, I was not able to fully understand why .lazy
does not play well with the async library. It might be related to something like this: socketry/async#23, but I do not know for sure.
Regarding your points about external/internal fragmentation:
-
External fragmentation. As you explained,
.lazy()
would be absolutely critical if we changed this to SSCAN as the TODO in the code mentions, but I suspect this is not going to happen. It has not happened in years, so it's not going to happen now that the code is deprecated. -
Internal fragmentation. As far as I know, the method where the call to
.lazy
was is only called fromOAuth::Token::Storage.all_by_service_and_app
which needs to return all the tokens returned by thesmembers
call except for the expired ones. So there's a.select
that discards some elements, but there's nothing like atake(a_few)
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@davidor re: SSCAN happening - well, it can still happen and the fact this is "deprecated" does not mean SSCAN
would not be used elsewhere. The fact this code implies lazy
won't be usable is something that will need to be documented and investigated eventually.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know. I added a note in the TODO list above to document this in the future. We might find other limitations in future PRs so I'll document all of them later.
27c5b06
to
1ca21e7
Compare
@unleashed I addressed all your comments. I also added a few minor things that I noticed were missing. Those changes are in the last 4 commits. I've updated the TODO in the first comment of this PR with things that will be addressed in future PRs. In particular, most things that have to do with contributing code to the async-redis project. |
Gemfile.base
Outdated
@@ -60,3 +61,4 @@ gem 'sinatra', '~> 2.0.3' | |||
gem 'sinatra-contrib', '~> 2.0.3' | |||
# Optional external error logging services | |||
gem 'bugsnag', '~> 6', require: nil | |||
gem 'async-redis', '~> 0.3.3' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
since this is going to be opt-in, add require: nil
and load only when config is set to use async? also, async-rspec
above could also be loaded on demand if async testing was requested - IIRC loading the async
library itself can have side effects as it monkey-patches stuff, so best to avoid it completely when not wanted
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
I'll leave the async-rpec part for the PR where I'll change things to run the test suite with both clients.
# laziness is maintained until some enumerator forces execution or | ||
# the caller calls 'to_a' or 'force', whichever happens first. | ||
storage.smembers(token_set).lazy | ||
storage.smembers(token_set) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@davidor re: SSCAN happening - well, it can still happen and the fact this is "deprecated" does not mean SSCAN
would not be used elsewhere. The fact this code implies lazy
won't be usable is something that will need to be documented and investigated eventually.
# used to provide a Redis client based on a configuration object | ||
require 'uri' | ||
require '3scale/backend/storage_async' | ||
require '3scale/backend/storage_sync' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a note: can we design this so opting out of async means no async code gets loaded? I worry about the async monkeypatching stuff, which, while assumed to be inactive, should still not happen when using sync.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
|
||
# When running a nested pipeline, we just need to continue | ||
# accumulating commands. | ||
if @redis_async.is_a? Pipeline |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/shrug we know we can do better, easily.
|
||
original = @redis_async | ||
pipeline = Pipeline.new | ||
@redis_async = pipeline |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
didn't you add/change this?
Ok to merge to integration branch 👍 |
Also, move the helpers to StorageHelpers to be able to use them to instantiate AsyncStorage too. The helpers will need to be cleaned-up because we have ThreeScale::Backend::StorageHelpers and ThreeScale::Backend::Storage::Helpers. They should probably be merged.
Most of these tests only apply to StorageSync. StorageAsync does not support sentinels, for example.
…torage This test does not work when using the async storage. The async libs run async tasks inside Fibers and creating threads like in this test.
It causes problems with the async-redis library. Also, I do not think it has any value, because the smembers will need to be ran anyway. It does not really matter when.
There are a couple of callers that need to instantiate a client instead or reusing an existing one with #instance.
Thanks for the review @unleashed . I squashed the "fixup" commits |
CircleCI fails because the CI image contains a ruby version that is not compatible with this feature. We'll update the image later before merging to master. The tests pass locally. |
We also recently did this: redis/redis-rb#832 |
This PR implements the base for what we need to use the async-redis lib.
The idea is to be able to perform non-blocking calls to Redis in a reactor instead of having to spawn N worker processes where each one blocks on redis calls.
Notice that the base branch of this PR is an integration one. This is because this PR is just a first step.
This PR:
Notice that this PR does not change any models or any of the core functionality. Using the async-storage is opt-in. There's a new parameter in the config for that (
redis.async
).TODO for coming PRs:
CircleCI tests fail because the CI image tries to install ruby 2.2. We'll need to change it before merging this to master. Tests pass locally.