Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add --interval to launch monitor command #14068

Merged
merged 6 commits into from
Jun 22, 2023

Conversation

gamuniz
Copy link
Contributor

@gamuniz gamuniz commented Jun 1, 2023

SUMMARY

This allows passing in an integer(seconds) to the launch option in awxkit to rate limit the api calls for longer running jobs or to avoid overwhelming the api

example:

awx workflow launch 8 --monitor --interval 5

nginx logs:

10.244.0.1 - - [01/Jun/2023:03:12:13 +0000] "GET /api/v2/unified_jobs/?order_by=finished&unified_job_node__workflow_job=22 HTTP/1.1" 200 5981 "-" "python-requests/2.28.1" "-"

10.244.0.1 - - [01/Jun/2023:03:12:18 +0000] "GET /api/v2/workflow_jobs/22/ HTTP/1.1" 200 1740 "-" "python-requests/2.28.1" "-"

10.244.0.1 - - [01/Jun/2023:03:12:24 +0000] "GET /api/v2/unified_jobs?order_by=finished&unified_job_node__workflow_job=22 HTTP/1.1" 301 5 "-" "python-requests/2.28.1" "-"

ISSUE TYPE
  • New or Enhanced Feature
COMPONENT NAME
  • CLI
AWX VERSION
make VERSION 
awx: 22.3.1.dev19+gb0068caf41

ADDITIONAL INFORMATION

This allows passing in an integer(seconds) to the launch option in awxkit to rate limit the api calls for longer running jobs or to avoid overwhelming the api

example:

awx workflow launch 8 --monitor --interval 5

nginx logs:

10.244.0.1 - - [01/Jun/2023:03:12:13 +0000] "GET /api/v2/unified_jobs/?order_by=finished&unified_job_node__workflow_job=22 HTTP/1.1" 200 5981 "-" "python-requests/2.28.1" "-"

10.244.0.1 - - [01/Jun/2023:03:12:18 +0000] "GET /api/v2/workflow_jobs/22/ HTTP/1.1" 200 1740 "-" "python-requests/2.28.1" "-"

10.244.0.1 - - [01/Jun/2023:03:12:24 +0000] "GET /api/v2/unified_jobs?order_by=finished&unified_job_node__workflow_job=22 HTTP/1.1" 301 5 "-" "python-requests/2.28.1" "-"
@hluk
Copy link

hluk commented Jun 1, 2023

I have a bit different solution in #14069. It additionally defines default and minimum value.

I think having a larger default interval would help us making most clients behave nicely to our server.

@@ -56,6 +56,7 @@ def add_arguments(self, parser, resource_options_parser, with_pk=True):
parser.choices[self.action].add_argument('--monitor', action='store_true', help='If set, prints stdout of the launched job until it finishes.')
parser.choices[self.action].add_argument('--action-timeout', type=int, help='If set with --monitor or --wait, time out waiting on job completion.')
parser.choices[self.action].add_argument('--wait', action='store_true', help='If set, waits until the launched job finishes.')
parser.choices[self.action].add_argument('--interval', type=int, help='If set with --monitor or --wait, amount of time to wait between api calls.')
Copy link

@hluk hluk Jun 1, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The unit is not mentioned here. Assuming it is in seconds but the default value type below is float.

awxkit/awxkit/cli/stdout.py Outdated Show resolved Hide resolved
@AlanCoding
Copy link
Member

In either case, I see that hard-coded 0.25 and strongly support fixing that. Up to @gamuniz how to move forward with it.

@gamuniz gamuniz marked this pull request as ready for review June 2, 2023 00:56
Copy link

@hluk hluk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 This would fix the problems I'm experiencing.

One minor issue is that user can set (for whatever reason) a negative value for the interval which would cause ValueError in time.sleep(). Simple fix would be to limit it to values>0 (though I would still prefer a higher limit if possible): time.sleep(max(0.0, interval))

@@ -56,6 +56,9 @@ def add_arguments(self, parser, resource_options_parser, with_pk=True):
parser.choices[self.action].add_argument('--monitor', action='store_true', help='If set, prints stdout of the launched job until it finishes.')
parser.choices[self.action].add_argument('--action-timeout', type=int, help='If set with --monitor or --wait, time out waiting on job completion.')
parser.choices[self.action].add_argument('--wait', action='store_true', help='If set, waits until the launched job finishes.')
parser.choices[self.action].add_argument(
'--interval', type=float, help='If set with --monitor or --wait, amount of time to wait in seconds between api calls.'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you really want a float instead of an int here? It should work, just feels weird.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the new update and the minimum value set to 2.5, I think it would be a good idea to make the argument int (and minimum an integer larger than 0). Hopefully, nobody needs faster updates - it would make printing output faster/smoother but potentially overwhelming the server again.

… inform users the interval has been increased
@jangel97
Copy link

jangel97 commented Jun 20, 2023

Hello team,

Please, could this be reviewed my some maintainer when possible? This feature would be very interesting for many teams who are using awxkit to span jobs from their CICD pipelines.

Thank you!

@jay-steurer
Copy link
Contributor

Running some tests, I will merge once is done.

@jay-steurer jay-steurer merged commit 721a200 into ansible:devel Jun 22, 2023
14 checks passed
@gamuniz gamuniz deleted the add_interval_to_monitor branch June 22, 2023 16:39
@rmahroua
Copy link

Hi chiming in -- we've had some performance issues with large systems and the best way I've found to alleviate the load is to implement retries with exponential backoff.
Pseudo-implementation:

def retry_backoff(retries: int = 15, backoff: float = 1.0, debug: bool = False) -> object:
    def rb(f):
        def wrapper(**kwargs):
            count = 0
            while True:
                try:
                    return f(**kwargs), None
                except <API EXCEPTION HERE> as e:
                    if count == retries - 1:
                        return False, None
                    else:
                        sleep = (count ** 1.5 * backoff + random.uniform(0, 1))
                        time.sleep(sleep)
                        count += 1
        return wrapper
    return rb

With that, the AWX API should be able to handle a higher volumes of transaction. Let me know if you'd be open to go this route and I'd be happy to send a MR that builds on @gamuniz's work.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

9 participants