Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH CLI live log tail #359

Merged
merged 10 commits into from
Feb 17, 2020
Merged

ENH CLI live log tail #359

merged 10 commits into from
Feb 17, 2020

Conversation

aripollak
Copy link
Contributor

@aripollak aripollak commented Feb 10, 2020

  • Adds civis jobs follow-log and civis jobs follow-run-log
  • Like tail -f for job run logs, closely mirroring the output of run logs on the web UI
Demo GIFs

demo1

demo2

Ari Pollak added 5 commits February 10, 2020 15:35
* Like `tail -f` for job run logs, closely mirroring the output of
run logs on the web UI
* sort log lines by id as a fallback if createdAts are equal
Copy link
Contributor

@elsander elsander left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have some concerns about the control flow in the workhorse function here-- I see a few points where it looks possible to get a hard error that doesn't make sense, or to get stuck in an infinite loop. Some additional docstring/inline documentation would also help, since it's not obvious to me what all of the variables are and what different elements of the response mean.

civis/cli/_cli_commands.py Show resolved Hide resolved
civis/cli/_cli_commands.py Show resolved Hide resolved
while continue_polling:
response = client.jobs.list_runs_logs(id, run_id,
last_id=seen_max_log_id)
if 'civis-max-id' in response.headers:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does this header mean?

Copy link
Contributor Author

@aripollak aripollak Feb 14, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's the maximum log ID that has been seen on Platform. It's used to decide if we should keep retrieving logs immediately instead of waiting for the next 3 second poll interval, and as a double-check to see if the logs are fully done and we can exit.
Maybe I could rename the variable here to max_log_id_seen_on_platform?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Part of the issue is that I don't know how platform stores log information. Are you saying that each log line has its own id?

Currently the people who maintain the API clients are not people who work on the API itself, so an inline comment mentioning that we can check for new platform logs by looking for new ids would help clarify.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, okay. Yeah, each log message (I'm not calling it a line because I think a single message could technically have multiple newlines) has an ID that should be unique within each run and is in chronological order based on when the message was originally sent.
Here's what one of the log entries from civis jobs list-runs-logs looks like in YAML format, since that's the same endpoint gets used here:

- createdAt: '2019-11-04T15:56:44.492Z'
  id: 492602116
  level: info
  message: Queued

response = client.jobs.list_runs_logs(id, run_id,
last_id=seen_max_log_id)
if 'civis-max-id' in response.headers:
remote_max_log_id = int(response.headers['civis-max-id'])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This variable is only set if the conditional is met. If the conditional is not met, the user will get a confusing error on line 210. What should happen if the header is not present? Maybe raise an informative error rather than continuing on in the function?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops, good point. There won't be a civis-max-id if there are no logs at all yet, so I can just set this to None if it doesn't exist.
In that case, I'll have to change the logic on line 210 to account for this edge case: the first iteration through, if Platform hasn't seen any logs for this run yet, seen_max_log_id = 0 and remote_max_log_id = None. Then line 210 would be false and we'd miss that sleep and keep trying to hit Platform as fast as possible until we get some logs. That case is unlikely but possible.

last_id=seen_max_log_id)
if 'civis-max-id' in response.headers:
remote_max_log_id = int(response.headers['civis-max-id'])
log_finished = response.headers['civis-cache-control'] != 'no-store'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A lot of these items being pulled out of headers are mysterious to me. It would be helpful to have a comment saying what is being pulled out of the header and what the value you're expecting means.

Copy link
Contributor Author

@aripollak aripollak Feb 14, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess the actual definition of the value here doesn't really matter as much as the fact that the 'no-store' means Platform is no longer expecting to get any more logs for this run.
Is there a variable name that would make it clearer?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The variable name makes sense to me, but I find the names of things in the Platform response confusing (civis-cache-control, no-store). I can take your word for it that this variable holds a bool of whether the log is done or not, but I wondered if there was any context you could add about what Platform is passing. This could provide extra context if, for example, I needed to modify the code in the future for some reason.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently it's either max-age=120 or no-store, where the latter means that Platform has decided it might still be getting more logs. The max-age=120 value is currently not used directly by the web UI either. It's in the same format as a regular Cache-Control header to tell the client how long it can cache the results, but right now it's just used to tell the client when to stop polling for results.

print(' '.join((log['createdAt'], log['message'].rstrip())))
# if output is a pipe, write the buffered output immediately:
sys.stdout.flush()
if seen_max_log_id == remote_max_log_id:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does it mean if this condition is met vs not? If it's not met, won't you be stuck in an infinite loop?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it's not met, we'll continue polling.

remote_max_log_id = int(response.headers['civis-max-id'])
log_finished = response.headers['civis-cache-control'] != 'no-store'
logs = response.json()
if logs:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if if logs is not met? Will you be stuck in an infinite loop?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's okay and normal if there are no more log results; we should normally just keep polling.

if log_finished:
continue_polling = False
else:
time.sleep(_FOLLOW_POLL_INTERVAL_SEC)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like it should be easier to DRY the sleep logic so we only have to do it in one place, but I wasn't able to come up with anything. Suggestions welcome.

Copy link
Contributor Author

@aripollak aripollak Feb 14, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe something like this to both document and DRY? It's more code, but might be clearer?

        if remote_max_log_id is None:
            remote_has_more_logs_to_retrieve_now = False
        elif local_max_log_id == remote_max_log_id:
            remote_has_more_logs_to_retrieve_now = False
            if log_finished:
                continue_polling = False
        else:
            remote_has_more_logs_to_retrieve_now = True

        if continue_polling and not remote_has_more_logs_to_retrieve_now:
            time.sleep(_FOLLOW_POLL_INTERVAL_SEC)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I think this is clearer.

@elsander
Copy link
Contributor

@aripollak I left a couple notes about places where inline comments would be helpful. Otherwise this looks good.

@aripollak aripollak assigned elsander and unassigned aripollak Feb 15, 2020
Copy link
Contributor

@elsander elsander left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thanks for submitting this feature!

@elsander elsander assigned aripollak and unassigned elsander Feb 17, 2020
@aripollak aripollak merged commit adc87f1 into master Feb 17, 2020
@aripollak aripollak deleted the HACK-377-live-log-tail branch February 17, 2020 15:51
@jacksonllee jacksonllee added this to the Next Version milestone Mar 3, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants