Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Non-blocking redis calls using redis-async lib #96

Merged
merged 31 commits into from
Sep 30, 2019
Merged

Non-blocking redis calls using redis-async lib #96

merged 31 commits into from
Sep 30, 2019

Conversation

davidor
Copy link
Contributor

@davidor davidor commented Apr 24, 2019

This is an integration branch. It contains: #77 , #86 , #92 , #93 .
It cannot be merged yet. We need to drop support for Ruby < 2.2.7 first.

@davidor
Copy link
Contributor Author

davidor commented Aug 1, 2019

I've deleted the commits related with Ruby updates that were merged in a different PR. I've also rebased on top of master adapting the code to the changes that were introduced in recent PRs (sentinels passwords, update of redis-rb, etc.).

@davidor davidor marked this pull request as ready for review August 7, 2019 14:47
@davidor
Copy link
Contributor Author

davidor commented Aug 7, 2019

@unleashed This is ready for review now that we no longer need to support Ruby 2.2.

You already reviewed all the PRs of this integration branch except for this one #92

I spent some time with @slopezz trying this in our Openshift cluster and it's working well.

Let me know what you think we need to be able to merge this. I think it's a very good starting point and we can keep improving later, for example, investigating alternative ways to fetch the jobs from the queue.

bors bot added a commit that referenced this pull request Aug 26, 2019
117: clean-up: delete unnecessary 'before' in Utilization specs r=unleashed a=davidor

Minor clean-up extracted from #96.

The other before blocks already set up everything needed without regards of previous contents.

Co-authored-by: David Ortiz <z.david.ortiz@gmail.com>
bors bot added a commit that referenced this pull request Aug 26, 2019
116: Refactor: Extract JobFetcher from Worker r=unleashed a=davidor

This PR extracts the logic to fetch jobs from the queue from the `Worker` class.

The commits in this PR have been extracted from #96

In that PR we have a sync and a sync worker. The logic to fetch jobs from the queue is the same in both cases so it makes sense to extract it into a separate class.

I have created this PR to simplify the other one, but still, I think this has value in itself because it makes the code easier to understand and the logic to fetch jobs from the queue more testable as we no longer need to test a private method.

Co-authored-by: David Ortiz <z.david.ortiz@gmail.com>
Copy link
Contributor

@unleashed unleashed left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some things to check (read comments)

  • Check status and document limitations (best if there's some associated issue):
    • Threads
    • Fibers
    • Lazy enumerators
    • Spec/Test set-up
  • Ensure the configuration early on defines everything to be either Sync or Async (storage) statically and only loads the needed deps.


config.around :each do |example|
Async.run do
# TODO: This is needed for the acceptance specs. Not sure why.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

any open issue in the async rspec gem we can refer to?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not even sure this is a bug, and it only happens in the acceptance specs, so it makes me think that the problem might be in rspec_api_documentation. I investigated a bit and didn't understand why.

# All of this might be simplified a bit in the future using the
# "methods" in async-redis
# https://github.com/socketry/async-redis/tree/master/lib/async/redis/methods
# but there are some commands missing, so for now, that's not an option.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updates on this? I expressed concern about this back when this was merged to the integration branch, because the maintenance burden we are taking is not negligible, so we should check again before landing this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

module Async
module Redis
class Client
def call_pipeline(commands)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this might be not needed now? or adapted?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same case. See comment above.

if CHECK_EQUALS_ONE.include?(command_name)
resp.to_i == 1
elsif CHECK_GREATER_THAN_0.include?(command_name)
resp.to_i > 0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hopefully this is also pushed down to the async redis client... at some point really soon now?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same case. See comment above.

if configuration.redis.async
Backend::StorageAsync::Client
else
Backend::StorageSync
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is minor, but if we mean to have this method always choose the same implementation at run time (which is what I understand), then better extract the check outside the method definition and define the method differently depending on the check.

The rest seems to be equal, so the most effective might be to just check for the implementation to use and then require the class (with the same name) or do Storage = StorageAsync::Client or Storage = StorageSync. The remaining constant can either be defined somewhere else or be injected/defined into the chosen class by this "class loader" code.

script/test Show resolved Hide resolved
spec/unit/storage_async/pipeline_spec.rb Show resolved Hide resolved
@@ -39,11 +39,11 @@ def committed_at
require_relative '../test/test_helpers/configuration'
require_relative '../test/test_helpers/storage'

TestHelpers::Storage::Mock.mock_storage_client!
TestHelpers::Storage::Mock.mock_storage_clients
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a fan of letting the detail that there are now two storage clients leak into the naming. Also, this is meant to use just one storage per invocation, isn't it?

super
if options[:async]
# Conditional require is done to require async-* libs only when
# needed and avoid possible side-effects.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good - must ensure the same happens when loading the async storage in tests, or in general in the listeners when choosing just the sync one (I say this because I haven't checked it, but you probably know).

lib/3scale/backend/storage_async/client.rb Show resolved Hide resolved
@davidor davidor force-pushed the async branch 8 times, most recently from a8ac288 to b0ae3ea Compare August 28, 2019 14:07
@davidor
Copy link
Contributor Author

davidor commented Sep 4, 2019

@unleashed We have different views on what this PR should be.

This is a big change in how apisonator works. It's true that in our tests this increases performance substantially. However, this relies on the async-redis lib. The maintainers have done an amazing job in the lib and the async ecosystem in general. However, It's also true that the lib is not as widely used as redis-rb. Given that it's such an important piece in this project, there's some risk. I think we need to try this with real workloads to see how it behaves. I think it's important to discard problems such as unexpected crashes, memory leaks, etc. The kind of problems that you only realize when running for a long time on a real production scenario.

This feature is opt-in. For users absolutely nothing changes unless they explicitly decide that they want to give this as try. That's why I think we should try to merge this and let @slopezz and others experiment with it.

My opinion is that, rather than try to solve all the minor problems in the first PR, we should go step by step. Try a first version, see how it behaves, try different configuration params to evaluate performance in different environments, etc. If all that goes well, we can put the effort to, for example, contribute some of the code that we have here to other projects in the async ecosystem.

As I said before, I don't think we need to solve all the issues now. For example, contributing a redis-rb-compatible interface for pipelines in async-redis is not a trivial effort. This PR brings a substantial performance improvement to the table. Also, the rules to scale in environments like Openshift become way easier to understand. That's why I'm willing to go more in tech debt that in other cases, and that includes:

Interestingly, you didn't mention anything about running all the tests twice, which I think is the biggest problem that this PR introduces. Again, the ideal scenario would be to have redis-async - redis-rb compatibility tests in any of those projects. But that does no exist right now, and because of how our test suite works and its coupling with Redis, I don't feel confident running the tests just with async or sync and assuming that the other will work just fine.

I think that these trade-offs are reasonable, and we can document them so there are no surprises in the future. I just don't think that waiting to merge this is worth it.

There are other small issues that you mentioned that I'll address before merging, but I wanted to talk about the important things first.

@ioquatix
Copy link

ioquatix commented Sep 4, 2019

@davidor if you see any issues feel free to keep me in the loop.

@davidor
Copy link
Contributor Author

davidor commented Sep 25, 2019

Rebased on top of master. I'll try to add a document with the known limitations tomorrow.

@davidor
Copy link
Contributor Author

davidor commented Sep 27, 2019

@unleashed I added a document with the limitations, goals, and other info in my latest commit.

Copy link
Contributor

@unleashed unleashed left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Check out comments

docs/async.md Outdated
enumerator. That class is only required for the Oauth feature which is
deprecated. Also, the lazy enumerator was not giving us any advantage in this
particular scenario. The important thing is that there might be some problems
with some lazy enumerators (not all) when using `async-redis`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it's the important thing then it should be made prominent in the listing, rather than focusing on the specific instance that was changed. We should state we don't know / haven't investigated yet why this happens.

docs/async.md Outdated
- We need this line `RSpec.current_example = example` in `config.around :each`
of `spec_helper.rb` in order to make the acceptance tests pass.

- There's one test of the suite that fails with the async client. It's in the
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This a symptom of some issue we think might be related to threading. So the limitation would be: potential issues with threading we haven't figured out yet.

clients.


## Limitations
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no mention of what the async gem does, and it is a potential source of problems. In particular, it monkey-patches core IO modules to use a reactor, which means that interaction with any IO code is to be suspected in case of issues, and C extensions (usually) won't use the reactor for their IO (so we need to watch out for blocking there).

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn’t do that (monkey patching).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that was my understanding from when the first PR was introduced - are you not modifying the behaviour of IO methods?

I'd then assume async-redis is using async{,-io} on its own and then nothing else would be touching the reactor, right? That is preferable for us as it lowers the exposure to issues in unrelated code.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that was my understanding from when the first PR was introduced - are you not modifying the behaviour of IO methods?

@unleashed we are not modifying IO. We add Async::IO which provides (almost) identical wrappers which you can inject into other code if needed.

I'd then assume async-redis is using async{,-io} on its own and then nothing else would be touching the reactor, right? That is preferable for us as it lowers the exposure to issues in unrelated code.

At the moment, that's correct. However in the future, we may make it possible for the user to allow other forms of IO to use the reactor: ruby/ruby#1870 - just FYI.

want to be compatible with the two libraries at least for a while. That means
that we need an adapter `lib/3scale/backend/storage_async/client.rb`. We could
simplify that and other things in our codebase by contributing to the
`async-redis` project.
Copy link
Contributor

@unleashed unleashed Sep 27, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are also missing the avoid IO in the middle of a pipeline limitation.

Edit: maybe fixed with fiber id check?

Copy link
Contributor

@unleashed unleashed left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't feel comfortable merging stuff that breaks core language primitives (threading, fibers/lazy enums) in a way that we don't understand. Not only because of the potential issues in the current state, which we might eventually assess to be ok, but the potential for things further down the road breaking in our code and also in dependencies. There are a few other smaller nits I think could have been addressed since I first mentioned them.

The idea is good and I hope we can fix these issues or have a good compromise. But I don't think this is ok as it is, and further work is needed. My approval here only means "I delegate to you".

@unleashed
Copy link
Contributor

bors delegate+

@bors
Copy link
Contributor

bors bot commented Sep 27, 2019

✌️ davidor can now approve this pull request. To approve and merge a pull request, simply reply with bors r+. More detailed instructions are available here.

@ioquatix
Copy link

What primitives are broken by async?

@unleashed
Copy link
Contributor

What primitives are broken by async?

@ioquatix according to @davidor our usage of lazy enumerators and a test using two threads break when loading the async gems. He might be able to provide further details.

@ioquatix
Copy link

ioquatix commented Sep 30, 2019

@ioquatix according to @davidor our usage of lazy enumerators and a test using two threads break when loading the async gems. He might be able to provide further details.

Lazy enumerators are buggy when combined with user fibers. It's a known issue: ruby/ruby#2002

The solution is not to use async in code which can be used from an enumerator, until the bug/issue is resolved within Ruby itself.

Regarding threads, if you use thread level locks/mutex within async code, you might have strange behaviour, especially if you mix the reactor between threads. Reactors should be per-thread. If you need thread synchronisation, you need to use a higher level construct e.g. a pipe. Using thread synchronisation with an async reactor is blocking by definition.

Blocking operations are not strictly bad in async, but they will increase latency and there is a chance you can deadlock (but that applies even if you don't use async).

@davidor
Copy link
Contributor Author

davidor commented Sep 30, 2019

Thanks for the explanation @ioquatix

The problem that you mentioned is something that we were not aware of. It's something different from what has been discussed between me and @unleashed in this PR up until now.

That issue is the only real problem I see with this PR. We were not aware that enumerators had problems when combined with fibers because of a bug at the Ruby language level.

I'll need to think about this. I think that code like:

array_with_keys.map { |k| redis.get(k) }

is pretty common. Unfortunately, I'll need to put this on hold if things like that can unexpectedly fail. At least until I understand exactly what enumerators are problematic, because the example above works without problems as well as all the others that we use in our test suite.

@unleashed
Copy link
Contributor

@davidor that is not a lazy enumerator. There could be problems with our code or dependencies using fibers (some gems use fibers under the hood), but AIUI non-lazy enumerators should not be affected. The threading issue might be resolved by adapting the offending test, and the remaining problem is whether we want to live with these issues.

A closer look to the problems or a link to existing or new issues is what I was requesting in my review. Thanks for providing those, @ioquatix. Now we have more information, and it would be good to add it to the document on limitations before this would be merged.

@davidor
Copy link
Contributor Author

davidor commented Sep 30, 2019

I see. I misunderstood some things. My last comment is not correct.
I've updated the async.md with the new things we've learned.

docs/async.md Outdated
one. The reason is that our tests are highly coupled with Redis, even the unit
ones. Running the test suite with only one of the clients is risky.

- Cannot use `Fiber.yield` in an enumerator. See [ruby PR
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would specifically call out lazy enumerators, because that is where you are most likely to inadvertently hit this problem.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's only lazy enumerators that allocate a fiber. Some lazy enumerators can avoid it. The typical situation is when you invoke zip, the argument will use a fiber internally.

@davidor
Copy link
Contributor Author

davidor commented Sep 30, 2019

@unleashed and I have decided to merge this.
The feature is opt-in and all the limitations are documented in docs/async.md.

@davidor
Copy link
Contributor Author

davidor commented Sep 30, 2019

bors r=@unleashed

bors bot added a commit that referenced this pull request Sep 30, 2019
96: Non-blocking redis calls using redis-async lib r=unleashed a=davidor

This is an integration branch. It contains: #77 , #86 , #92 , #93 .
~It cannot be merged yet. We need to drop support for Ruby < 2.2.7 first.~

Co-authored-by: David Ortiz <z.david.ortiz@gmail.com>
@bors
Copy link
Contributor

bors bot commented Sep 30, 2019

Build succeeded

@bors bors bot merged commit bc26858 into master Sep 30, 2019
@bors bors bot deleted the async branch September 30, 2019 13:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants