ENH CLI live log tail #359

aripollak · 2020-02-10T20:42:43Z

Adds civis jobs follow-log and civis jobs follow-run-log
Like tail -f for job run logs, closely mirroring the output of run logs on the web UI

Demo GIFs

* Like `tail -f` for job run logs, closely mirroring the output of run logs on the web UI

* sort log lines by id as a fallback if createdAts are equal

…rm API is different without one

elsander

I have some concerns about the control flow in the workhorse function here-- I see a few points where it looks possible to get a hard error that doesn't make sense, or to get stuck in an infinite loop. Some additional docstring/inline documentation would also help, since it's not obvious to me what all of the variables are and what different elements of the response mean.

civis/cli/_cli_commands.py

elsander · 2020-02-13T19:30:11Z

civis/cli/_cli_commands.py

+    while continue_polling:
+        response = client.jobs.list_runs_logs(id, run_id,
+                                              last_id=seen_max_log_id)
+        if 'civis-max-id' in response.headers:


What does this header mean?

It's the maximum log ID that has been seen on Platform. It's used to decide if we should keep retrieving logs immediately instead of waiting for the next 3 second poll interval, and as a double-check to see if the logs are fully done and we can exit.
Maybe I could rename the variable here to max_log_id_seen_on_platform?

Part of the issue is that I don't know how platform stores log information. Are you saying that each log line has its own id?

Currently the people who maintain the API clients are not people who work on the API itself, so an inline comment mentioning that we can check for new platform logs by looking for new ids would help clarify.

Ah, okay. Yeah, each log message (I'm not calling it a line because I think a single message could technically have multiple newlines) has an ID that should be unique within each run and is in chronological order based on when the message was originally sent.
Here's what one of the log entries from civis jobs list-runs-logs looks like in YAML format, since that's the same endpoint gets used here:

- createdAt: '2019-11-04T15:56:44.492Z' id: 492602116 level: info message: Queued

elsander · 2020-02-13T22:08:54Z

civis/cli/_cli_commands.py

+        response = client.jobs.list_runs_logs(id, run_id,
+                                              last_id=seen_max_log_id)
+        if 'civis-max-id' in response.headers:
+            remote_max_log_id = int(response.headers['civis-max-id'])


This variable is only set if the conditional is met. If the conditional is not met, the user will get a confusing error on line 210. What should happen if the header is not present? Maybe raise an informative error rather than continuing on in the function?

Oops, good point. There won't be a civis-max-id if there are no logs at all yet, so I can just set this to None if it doesn't exist.
In that case, I'll have to change the logic on line 210 to account for this edge case: the first iteration through, if Platform hasn't seen any logs for this run yet, seen_max_log_id = 0 and remote_max_log_id = None. Then line 210 would be false and we'd miss that sleep and keep trying to hit Platform as fast as possible until we get some logs. That case is unlikely but possible.

elsander · 2020-02-13T22:11:08Z

civis/cli/_cli_commands.py

+                                              last_id=seen_max_log_id)
+        if 'civis-max-id' in response.headers:
+            remote_max_log_id = int(response.headers['civis-max-id'])
+        log_finished = response.headers['civis-cache-control'] != 'no-store'


A lot of these items being pulled out of headers are mysterious to me. It would be helpful to have a comment saying what is being pulled out of the header and what the value you're expecting means.

I guess the actual definition of the value here doesn't really matter as much as the fact that the 'no-store' means Platform is no longer expecting to get any more logs for this run.
Is there a variable name that would make it clearer?

The variable name makes sense to me, but I find the names of things in the Platform response confusing (civis-cache-control, no-store). I can take your word for it that this variable holds a bool of whether the log is done or not, but I wondered if there was any context you could add about what Platform is passing. This could provide extra context if, for example, I needed to modify the code in the future for some reason.

Currently it's either max-age=120 or no-store, where the latter means that Platform has decided it might still be getting more logs. The max-age=120 value is currently not used directly by the web UI either. It's in the same format as a regular Cache-Control header to tell the client how long it can cache the results, but right now it's just used to tell the client when to stop polling for results.

elsander · 2020-02-13T22:13:52Z

civis/cli/_cli_commands.py

+            print(' '.join((log['createdAt'], log['message'].rstrip())))
+        # if output is a pipe, write the buffered output immediately:
+        sys.stdout.flush()
+        if seen_max_log_id == remote_max_log_id:


What does it mean if this condition is met vs not? If it's not met, won't you be stuck in an infinite loop?

If it's not met, we'll continue polling.

elsander · 2020-02-13T22:14:22Z

civis/cli/_cli_commands.py

+            remote_max_log_id = int(response.headers['civis-max-id'])
+        log_finished = response.headers['civis-cache-control'] != 'no-store'
+        logs = response.json()
+        if logs:


What if if logs is not met? Will you be stuck in an infinite loop?

That's okay and normal if there are no more log results; we should normally just keep polling.

aripollak · 2020-02-14T21:03:52Z

civis/cli/_cli_commands.py

+            if log_finished:
+                continue_polling = False
+            else:
+                time.sleep(_FOLLOW_POLL_INTERVAL_SEC)


I feel like it should be easier to DRY the sleep logic so we only have to do it in one place, but I wasn't able to come up with anything. Suggestions welcome.

Maybe something like this to both document and DRY? It's more code, but might be clearer?

if remote_max_log_id is None: remote_has_more_logs_to_retrieve_now = False elif local_max_log_id == remote_max_log_id: remote_has_more_logs_to_retrieve_now = False if log_finished: continue_polling = False else: remote_has_more_logs_to_retrieve_now = True if continue_polling and not remote_has_more_logs_to_retrieve_now: time.sleep(_FOLLOW_POLL_INTERVAL_SEC)

Yeah, I think this is clearer.

elsander · 2020-02-14T22:39:53Z

@aripollak I left a couple notes about places where inline comments would be helpful. Otherwise this looks good.

elsander

LGTM! Thanks for submitting this feature!

Ari Pollak added 5 commits February 10, 2020 15:35

ENH Add follow-run-log CLI command

98dddcb

* Like `tail -f` for job run logs, closely mirroring the output of run logs on the web UI

follow-log and follow-run-log

37c9bd9

* sort log lines by id as a fallback if createdAts are equal

Changelog

a8a8590

Add brief CLI shoutout to README; add links to other docs

880d907

Strip trailing whitespace/newlines since print() adds a newline

565bb85

aripollak requested a review from elsander February 11, 2020 16:31

Always pass a last_log_id since the ordering behavior from the Platfo…

120a3b7

…rm API is different without one

elsander suggested changes Feb 13, 2020

View reviewed changes

elsander assigned aripollak Feb 13, 2020

Ari Pollak added 2 commits February 14, 2020 15:49

Fix case where civis-max-id header isn't sent

da8c3bd

Include a little more documentation

d0b166a

aripollak commented Feb 14, 2020

View reviewed changes

seen -> local

ef565c7

aripollak requested a review from elsander February 14, 2020 21:18

Refactor to add some documentation

f077b05

aripollak assigned elsander and unassigned aripollak Feb 15, 2020

elsander approved these changes Feb 17, 2020

View reviewed changes

elsander assigned aripollak and unassigned elsander Feb 17, 2020

aripollak merged commit adc87f1 into master Feb 17, 2020

aripollak deleted the HACK-377-live-log-tail branch February 17, 2020 15:51

jacksonllee added the enhancement label Mar 3, 2020

jacksonllee added this to the Next Version milestone Mar 3, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ENH CLI live log tail #359

ENH CLI live log tail #359

aripollak commented Feb 10, 2020 •

edited

elsander left a comment

elsander Feb 13, 2020

aripollak Feb 14, 2020 •

edited

elsander Feb 14, 2020

aripollak Feb 15, 2020

elsander Feb 13, 2020

aripollak Feb 13, 2020

elsander Feb 13, 2020

aripollak Feb 14, 2020 •

edited

elsander Feb 14, 2020

aripollak Feb 15, 2020

elsander Feb 13, 2020

aripollak Feb 14, 2020

elsander Feb 13, 2020

aripollak Feb 14, 2020

aripollak Feb 14, 2020

aripollak Feb 14, 2020 •

edited

elsander Feb 14, 2020

elsander commented Feb 14, 2020

elsander left a comment

ENH CLI live log tail #359

ENH CLI live log tail #359

Conversation

aripollak commented Feb 10, 2020 • edited

elsander left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

aripollak Feb 14, 2020 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

aripollak Feb 14, 2020 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

aripollak Feb 14, 2020 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

elsander commented Feb 14, 2020

elsander left a comment

Choose a reason for hiding this comment

aripollak commented Feb 10, 2020 •

edited

aripollak Feb 14, 2020 •

edited

aripollak Feb 14, 2020 •

edited

aripollak Feb 14, 2020 •

edited