Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Retry and Timeout policies for grpc and http #679

Merged
merged 29 commits into from
Jun 27, 2024

Conversation

elena-kolevska
Copy link
Contributor

@elena-kolevska elena-kolevska commented Feb 27, 2024

Description

This PR adds the possibility for users to configure Retry and Timeout policies for both grpc and http based clients.
The retry is configurable through the DAPR_API_MAX_RETRIES environment variable and is picked up automatically for both the grpc and http client, not requiring any code changes.
I left the possibility for a more fine-grained control over the retry mechanism (the client can receive a RetryPolicy object), but I'm not sure if that's something people would need. Let's discuss it.

Issue reference

#676

Checklist

Please make sure you've completed the relevant tasks for this PR, out of the following list:

  • Code compiles correctly
  • Created/updated tests
  • Extended the documentation

@elena-kolevska elena-kolevska changed the title WIP: Retry with decorators WIP: Retry with a custom function Feb 28, 2024
Copy link

codecov bot commented Feb 28, 2024

Codecov Report

Attention: Patch coverage is 89.59538% with 18 lines in your changes missing coverage. Please review.

Project coverage is 86.15%. Comparing base (fc0e9d1) to head (63548df).
Report is 28 commits behind head on main.

Files Patch % Lines
dapr/clients/retry.py 82.50% 14 Missing ⚠️
dapr/conf/__init__.py 50.00% 2 Missing ⚠️
dapr/actor/client/proxy.py 50.00% 1 Missing ⚠️
dapr/aio/clients/grpc/client.py 90.90% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #679      +/-   ##
==========================================
- Coverage   86.37%   86.15%   -0.22%     
==========================================
  Files          79       81       +2     
  Lines        4094     4282     +188     
==========================================
+ Hits         3536     3689     +153     
- Misses        558      593      +35     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@elena-kolevska
Copy link
Contributor Author

The lint tests are currently failing because of a change introduced in Ruff 0.3 which is incompatible with our flake8 set-up. We should solve this issue in a separate PR.

@elena-kolevska elena-kolevska marked this pull request as ready for review March 3, 2024 03:16
@elena-kolevska elena-kolevska requested review from a team as code owners March 3, 2024 03:16
Signed-off-by: Elena Kolevska <elena@kolevska.com>
@elena-kolevska elena-kolevska changed the title WIP: Retry with a custom function Retry and Timeout policies for grpc and http Apr 24, 2024
elena-kolevska and others added 16 commits April 24, 2024 12:14
Signed-off-by: Elena Kolevska <elena@kolevska.com>
Signed-off-by: Elena Kolevska <elena@kolevska.com>
Signed-off-by: Elena Kolevska <elena@kolevska.com>
Signed-off-by: Elena Kolevska <elena@kolevska.com>
Signed-off-by: Elena Kolevska <elena@kolevska.com>
Signed-off-by: Elena Kolevska <elena@kolevska.com>
Signed-off-by: Elena Kolevska <elena@kolevska.com>
Signed-off-by: Elena Kolevska <elena@kolevska.com>
Signed-off-by: Elena Kolevska <elena@kolevska.com>
This reverts commit 8609b7f.

Signed-off-by: Elena Kolevska <elena@kolevska.com>
Signed-off-by: Elena Kolevska <elena@kolevska.com>
Signed-off-by: Elena Kolevska <elena@kolevska.com>
* Adds a deprecation warning for the wait() method

Signed-off-by: Elena Kolevska <elena@kolevska.com>

* Ruff

Signed-off-by: Elena Kolevska <elena@kolevska.com>

---------

Signed-off-by: Elena Kolevska <elena@kolevska.com>
Signed-off-by: Bernd Verst <github@bernd.dev>
Signed-off-by: Elena Kolevska <elena@kolevska.com>
Signed-off-by: Bernd Verst <github@bernd.dev>
Signed-off-by: Elena Kolevska <elena@kolevska.com>
Signed-off-by: Elena Kolevska <elena@kolevska.com>
@elena-kolevska elena-kolevska marked this pull request as draft April 24, 2024 11:41
@berndverst
Copy link
Member

berndverst commented May 1, 2024

Is this PR ready @elena-kolevska ?

I kept the ruff version at 0.2.2 for now. Lint and Ruff are passing with that.

@elena-kolevska
Copy link
Contributor Author

elena-kolevska commented May 1, 2024

It's almost ready, I just wanted to confirm a few things with you:

  • Right now the code will retry if the env variable is set, with no additional code changes. I also made it possible for users to tune other retry-related parameters (like initial_backoff, max_backoff, retryable_http_status_codes for ex.). Do you think the need ffor it justifies the added code complexity, and heavier constructors?
  • For now, I only added the retry wrapper to one method, the save_state one. Once we confirm this is the right approach, I'll update all methods in the class.
  • When I first started work on this PR I added a timeout interceptor that would be applied to all calls. Looking at it now, I think it makes more sense to be able to specify a timeout per method. Some things are expected to take longer than others, and control should be more granular. What do you think?

@berndverst
Copy link
Member

It's almost ready, I just wanted to confirm a few things with you:

  • Right now the code will retry if the env variable is set, with no additional code changes. I also made it possible for users to tune other retry-related parameters (like initial_backoff, max_backoff, retryable_http_status_codes for ex.). Do you think the need ffor it justifies the added code complexity, and heavier constructors?
  • For now, I only added the retry wrapper to one method, the save_state one. Once we confirm this is the right approach, I'll update all methods in the class.

I think that's probably fine, but why wouldn't we instead add this as a decorator on the save_state method instead? Would that be possible? Right now you are not actually using a decorator but instead a method that takes another method as the argument (which is very similar to how decorators work). I just find decorators a lot cleaner. Keep in mind that you can always have the decorator intercept arbitrary **kwargs and then receive whatever properties you need to tweak retry behavior, without having to modify the method signature of the function.

  • When I first started work on this PR I added a timeout interceptor that would be applied to all calls. Looking at it now, I think it makes more sense to be able to specify a timeout per method. Some things are expected to take longer than others, and control should be more granular. What do you think?

How do you plan for users to be able to tweak the individual method timeouts? As optional arguments during each of the methods? That would work if you use decorators, because decorators can intercept arguments.

We should not add any new parameters to methods or the constructor - everything should be optional to not cause a breaking change and we should choose either sane defaults or retain the existing behavior, but then document how this can be set.

Let's also make sure using an env variable will not be the only way to control this behavior by the way.

@elena-kolevska
Copy link
Contributor Author

elena-kolevska commented Jun 3, 2024

Yeah, decorators were my first try, and I need to go back and remember why we decided it wasn't the best option. I'll get back to you on it.
In the meantime, on the topic of timeouts: looks like we already accept a timeout parameter for the invoke_method function, which is where users would usually need to tweak per call.
We could still allow users to control their global timeout for all other calls (get state, save state...) because those are going to be dependant on their network.
So I think we can leave the code as it is.

Signed-off-by: Elena Kolevska <elena@kolevska.com>
@elena-kolevska elena-kolevska marked this pull request as ready for review June 4, 2024 00:08
@elena-kolevska elena-kolevska marked this pull request as draft June 4, 2024 00:31
Signed-off-by: Elena Kolevska <elena@kolevska.com>
@elena-kolevska
Copy link
Contributor Author

I remembered why we're not using decorators: when we retry we would have to call again the whole function that prepares the request, instead of just the call, so it's better to keep the code more performant and use a single function.

I also went ahead and updated all the calls and marked the PR as ready to review 🙏

@elena-kolevska elena-kolevska marked this pull request as ready for review June 4, 2024 00:59
elena-kolevska and others added 2 commits June 17, 2024 11:02
Signed-off-by: Elena Kolevska <elena@kolevska.com>
@berndverst berndverst merged commit f43c0aa into dapr:main Jun 27, 2024
15 of 16 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants