Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Console debugging experience when nothing works is not ideal #385

Closed
2 of 3 tasks
iliana opened this issue Oct 9, 2019 · 19 comments
Closed
2 of 3 tasks

Console debugging experience when nothing works is not ideal #385

iliana opened this issue Oct 9, 2019 · 19 comments

Comments

@iliana
Copy link
Contributor

iliana commented Oct 9, 2019

When trying to view the instance system/console log in the EC2 console:

  • ANSI color sequences don't render, so it's very hard to read
  • systemd truncates long lines, but we'd rather have the full output
  • Services in the boot path should probably have their stdout/stderr make it to the console as well
@tjkirch
Copy link
Contributor

tjkirch commented Oct 9, 2019

Services in the boot path should probably have their stdout/stderr make it to the console as well

For reference, this means StandardOutput=journal+console and StandardError=journal+console in the [Service] section of the unit file. I've seen it work and show up in the EC2 console output. (I've also seen a case where I didn't see output I expected, but it may have been user error.)

We have to decide which services are important enough to get their full output in the console. It shouldn't be many, or the output will become generally incomprehensible due to volume. Each message also slows down boot a bit, because of kernel-level locking on the serial output.

@tjkirch
Copy link
Contributor

tjkirch commented Oct 9, 2019

systemd truncates long lines, but we'd rather have the full output

There's a tip at the end of this Arch wiki section indicating that the answer here is overriding SYSTEMD_LESS because its default value includes the -S option to less, also known as --chop-long-lines, which truncates at screen width.

@iliana
Copy link
Contributor Author

iliana commented Oct 9, 2019

the answer here is overriding SYSTEMD_LESS

That is certainly the answer when using systemctl but I hope there's a config option for PID 1's output, since it's much harder to set its environment :)

@iliana
Copy link
Contributor Author

iliana commented Oct 9, 2019

We have to decide which services are important enough to get their full output in the console. It shouldn't be many, or the output will become generally incomprehensible due to volume. Each message also slows down boot a tiny bit, because of kernel-level locking on the serial output.

We also talked in the aisle about perhaps keying off the success/failure state reported as part of #86's work (cc @jamieand) and dumping a bunch of diagnostic output to the console on a detected failure.

There's a balance we have to strike here — we need the critical messages as part of boot in case that also fails.

@tjkirch
Copy link
Contributor

tjkirch commented Oct 9, 2019

the answer here is overriding SYSTEMD_LESS

That is certainly the answer when using systemctl but I hope there's a config option for PID 1's output, since it's much harder to set its environment :)

Not finding one yet.

Similar for disabling color - I found that SYSTEMD_COLORS=0 was added, but it was intended for systemctl. In general, if it thinks it's talking to a terminal, it makes its own choices.

@iliana
Copy link
Contributor Author

iliana commented Oct 9, 2019

Maybe we just have to tell systemd it's outputting to a vt100.

@bcressey
Copy link
Contributor

We have to decide which services are important enough to get their full output in the console. It shouldn't be many, or the output will become generally incomprehensible due to volume. Each message also slows down boot a tiny bit, because of kernel-level locking on the serial output.

No service is important enough to get its full output in the console. The slow down is not a "tiny bit", it is quite significant.

@davidbarsky
Copy link
Contributor

I can't speak to the rest, but I believe the ansi output is my mistake. You should be able to disable it using this method: https://docs.rs/tracing-subscriber/0.1.5/tracing_subscriber/fmt/struct.Builder.html#method.with_ansi.

@iliana
Copy link
Contributor Author

iliana commented Oct 10, 2019

but I believe the ansi output is my mistake

Most of my pain was systemd coloring [ OK ]. :)

@tjkirch tjkirch added the urgent label Oct 24, 2019
@tjkirch tjkirch added this to To do in Public Preview via automation Oct 24, 2019
@tjkirch tjkirch added this to the Public Preview milestone Oct 24, 2019
@tjkirch
Copy link
Contributor

tjkirch commented Oct 24, 2019

An idea I'm playing with is having first-party (and adjacent) services log stderr to journal+console, and having only error-level messages log to stderr. I reached out to @davidbarsky for some help with tracing, seeing if that's easy to do.

@bottlerocket-os bottlerocket-os deleted a comment from davidbarsky Oct 25, 2019
@tjkirch
Copy link
Contributor

tjkirch commented Oct 25, 2019

@davidbarsky was very helpful in understanding tracing, along with its Layer support for handling branching behavior like we would need to output to different destinations. It's not something that's available easily today, though. At a high level, I think tracing has great goals, but it doesn't have the features we need today, and this isn't a particular area we can invest in right now.

So, I think the best answer for now is to use a logger with this split-destination feature. I prototyped with simplelog and it worked well. I'm going to continue along that path toward a PR that will expose errors in the console, and then revisit the other pain points in this issue.

@davidbarsky
Copy link
Contributor

davidbarsky commented Oct 28, 2019

@tjkirch Depending on the urgency of this fix, I might have a solution for you—I've re-implemented a decent portion of tracing_subscriber::fmt::Subscriber as Layers atop of a shared, centralized span registry. Most notably, it allows the split destination pattern that you wanted:

let stderr = ConsoleLayer::builder()
    .with_interest(|event| event.metadata().level() >= &Level::WARN)
    .with_writer(io::stderr)
    .build();

let stdout = ConsoleLayer::builder()
    .with_interest(|event| event.metadata().level() == &Level::INFO)
    .with_writer(io::stdout)
    .build();

let subscriber = stderr.and_then(stdout).with_subscriber(Registry::default());
tracing::subscriber::set_global_default(subscriber).expect("Could not set global default");

Note that this interface still might change, most notably in ConsoleLayer being renamed to FmtLayer because it can write to anything provided as an Fn() -> std::io::Write.


Of course, given the instability and incompleteness of this solution, this might not be a good fit for Bottlerocket this moment. However, I wanted to alert you to this option, especially since it might begin to address the telemetry issues in an appealing way.

@tjkirch
Copy link
Contributor

tjkirch commented Oct 28, 2019

#476 addresses missing (first-party) errors on the console.

@tjkirch tjkirch removed the urgent label Oct 29, 2019
@tjkirch
Copy link
Contributor

tjkirch commented Oct 29, 2019

With #476 merged I'm removing the urgent tag; I think we should see most of the common errors we had now. I think we should open separate issues to track other software that should expose errors, because we need to handle them carefully so as not to write too much to the console.

I'm leaving this open because I think we should address the coloring/truncating issue, but it seems less urgent than not seeing errors at all.

@davidbarsky
Copy link
Contributor

I've spend some time getting support for arbitrary writers in tracing, you can see the output here:

Screen Shot 2019-10-31 at 10 21 19 AM

There's still some work to be done, but I'd find feedback in terms of what directives and features to prioritize to be helpful. Note that the "Writing to: [type_name]" output is not part of the regular output; it's to demonstrate, in a screenshot, that @tjkirch's desired feature will be soon supported in tracing.

(This week is mostly operational work with decent amounts of downtime. You can see the PR here: tokio-rs/tracing#412)

@tjkirch
Copy link
Contributor

tjkirch commented Dec 2, 2019

Made #570 to output more errors to console on failed boot.

@tjkirch tjkirch removed this from the Public Preview milestone Jan 22, 2020
@tjkirch
Copy link
Contributor

tjkirch commented Mar 9, 2020

Thanks to @bcressey for finding systemd.log_color=0, which might help us remove the ANSI color sequences from the console. Trying that now.

@tjkirch
Copy link
Contributor

tjkirch commented Apr 14, 2020

Does anyone have an example of truncated log output in the console? I'm seeing some pretty long lines on a test instance (650+ characters) with no truncation.

@tjkirch
Copy link
Contributor

tjkirch commented Apr 15, 2020

Going to close this because the only remaining checkbox is line truncation, which I'm not seeing and someone else confirmed they're not seeing. We can reopen if we can reproduce.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
No open projects
Public Preview
  
To do
Development

No branches or pull requests

4 participants