-
-
Notifications
You must be signed in to change notification settings - Fork 741
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Retry delay is set to a maximum of 5 seconds #3630
Comments
We intentionally put this upper limit there because of the way retry is currently implemented. Right now retry is implemented inside the notifier service as "wait and retry" and not as a separate execution status. This means that retry is not notifier service restart safe - if you restarted the service and there were some actions to retry, those retries would get lost. And a chance of this happening is more likely with higher retry delays. This first implementation was mostly meant for simple retries on networking errors (connection time out, etc.) which are usually intermediate and even retrying after couple of seconds usually works just fine. In the future we plan to implement this as a separate |
@dead10ck For now we decided to bump the max limit to 10 minutes by default and also making this upper limit user configurable in st2.conf. As mentioned above, current implementation has limitations you need to be aware off and even when we re-do the implementation it will be designed for retries up to 10 minutes. If you want to do longer retries you probably need to re-design your approach and utilize other primitives we offer (e.g. interval trigger). |
With all due respect, if your retry system can't handle waits longer than 10 minutes, perhaps you need to re-design your approach. It's not an unreasonable workflow to have an action that runs on a period of once a day, or once a week, where it would be preferrable to wait an hour or a day to retry in the event of failure, rather than the full day or week, and it would not make sense to retry in 10 minutes. |
@dead10ck We already have other primitives to handle those long delay which were designed specifically for such use cases - timers (https://docs.stackstorm.com/rules.html#timers) which allow you to run action on a specific date or time intervals / periods. |
@Kami ok, maybe a concrete example will help you understand my problem. Say I worked for a company whose accounting department makes a financial report every 30 days. Say one of my responsibilities was to run some numbers on these financial reports every month, and I wanted to use StackStorm to automate the analysis. I would set up the action to run on an ---
name: monthly-financial-report-timer
pack: mycompany
description: "Run analysis on the monthly report"
enabled: true
trigger:
type: core.st2.IntervalTimer
parameters:
unit: days
delta: 30
action:
ref: mycompany.monthly-financial-report Now suppose this day of the month comes around, and for some reason, accounting gets delayed, and they won't be able to publish this month's report until the next day. So when this timer triggers How do you propose I use StackStorm's timer primitives to help me in this situation? |
Rather than retry policies, maybe that would be better done using a Mistral workflow? |
@LindsayHill Adding Mistral to my architecture, learning it, and maintaining it is a lot effort just to be able to delay retries longer than 10 mins. |
So you have a customised ST2 install that does not include Mistral? You don't need to do a separate install of OpenStack Mistral. Sooner or later you'll run into other limitations of workflows if you're only using action chains. |
@LindsayHill Forgive me, I'm new to StackStorm; I wasn't aware that Mistral came packaged with it. In any case, though, is the answer to my problem just "don't use our retry system if you really need longer than 10 mins"? |
At this stage, yes, Mistral is probably a better answer. See https://docs.stackstorm.com/mistral.html Using retry policies is not going to work for > 10 mins. The current implementation of it is not designed for what you're trying to achieve. |
Got it, thanks for your help. |
There are also a couple of other way you could try to approach this:
The place where the report is generated could be modified to run a script or similar which sends a webhook (an event) to StackStorm when a report is generated which you use to trigger your workflow / action. If that is not possible, you could write a sensor which periodically checks when report is ready and when it is, it dispatches an event. Both of those approaches might sound and look a little bit more complicated than retry one, but the follow "event driven" approach which make it more powerful and useful - e.g. you could also use those events to trigger other actions, etc.
Another way to approach it would be to write an workflow / action which checks when report is ready and when it is, it also generates a report (e.g. `generate_report_if_available). You would then use interval timer to run this action every day or similar. Those are both fairly common patterns in the StackStorm land :) |
There is another limitation when the retry policy does not allow more than I agree that max Ideally if there will be no limits and guarantees of state persistence between service restarts, which will require re-working the implementation. Otherwise we're not helping, but encouraging our users to workaround with hacks while there are established expectations from the |
If you try to make a retry policy with a delay of longer than 5 seconds, it will not register. With this policy:
When you try to register it, it complains:
5 seconds is incredibly and arbitrarily short. Honestly, I don't think there should be a maximum at all. If I want to delay my action's retries for 4 hours, or 4 days, or 4 weeks, I should be able to.
The text was updated successfully, but these errors were encountered: