Too much log output seems to cause processes to freeze #57

john-rodewald · 2023-06-14T09:35:25Z

Hi! We use this tool in our team to orchestrate a web application on our development machines. It's been very pleasant so far.

One of the tools in our stack managed by process-compose is Hasura. Specifically, the process is a shell script that sets environment variables and then executes graphql-engine.

For the longest time, we've observed graphql-engine occasionally freezing up for no apparent reason. A process-compose restart is enough to get it to behave again. Today I discovered something new - the problem is:

Reliably reproducible by having graphql-engine output more logs (when running it through process-compose)
Impossible to reproduce with a lower graphql-engine verbosity (when running it through process-compose)
Impossible to reproduce when running graphql-engine outside of process-compose

One way that I can run graphql-engine through process-compose without any freezing is by redirecting stdout to a log file, i.e. adding 1>graphql-engine.log to the process shell script. Setting a log_location and modifying log_level in process-compose.yaml did not seem to fix freezing.

I can't say if the sheer amount of logs is what causes process-compose to choke or if something else is going on. If there exists an error log of process-compose itself or any other information that may be useful, let me know and I'll try to provide.

I'll expand this issue if I discover anything more.

The text was updated successfully, but these errors were encountered:

F1bonacc1 · 2023-06-14T19:23:49Z

Hi @john-rodewald,

Thank you for trying Process Compose and for letting me know about its issues.

A few questions:

Are you running in TUI or TUI disabled mode -t=false?
What version of process-compose are you using?
What is the log_length configured in your process-compose.yaml?
Not sure if your logs have ansi colors if they have, did you try processes.process_name.disable_ansi_colors: true?

Also if you'll give me some numbers, I will try to reconstruct it on my end.

Log lines/minute.
How long does it take to start freezing.

P.S.
The log_level is for process-compose internal log.
You can run process-compose info to see the process-compose log location.

john-rodewald · 2023-06-15T10:59:09Z

Are you running in TUI or TUI disabled mode -t=false?

This is in TUI mode. I've tried with -t=false with no difference in behaviour.

What version of process-compose are you using?

v0.45.0 at this time. The next thing I'll try is to update process-compose to v0.51.0.

What is the log_length configured in your process-compose.yaml?

None is configured at the moment. I now tried setting it to 10 and to 10000 with no difference in behaviour.

Not sure if your logs have ansi colors if they have, did you try processes.process_name.disable_ansi_colors: true?

Tried this with no difference in behaviour.

Log lines/minute.
How long does it take to start freezing.

There's a set of reproduction steps I can follow that reliably causes freezing. It seems that we're in the range of ~100 log lines per second before I see freezing. By the time I observe this peak, the process has already stopped responding.

Reducing verbosity brings me down to a peak of ~20 log lines per second. In this case, no freezing occurs.

I find these numbers to be fairly low, both in absolute and relative terms. 🤔

F1bonacc1 · 2023-06-15T11:13:08Z

This is very helpful.
This narrows it down to a completely different place than I suspected at first. And it should be easier to address.
I will try to reconstruct it on my end and optimize the log rotation mechanism.

Another question, this one is probably more specific to your business logic. How do you measure (or experience) the freezes?

john-rodewald · 2023-06-15T11:37:18Z

I'm glad to hear! graphql-engine is a GraphQL server that talks to our database. This is a web application with an SPA frontend. My user story goes something like this:

I use the application in the browser and requests succeed as expected
Next, I follow a set of reproduction steps in a certain module
- In this example, several entities are created and updated in quick succession, likely causing a lot of log output
The application now becomes unresponsive (requests don't succeed anymore, content is no longer loaded)
There is no additional log output in the TUI and all requests time out (504) until the process is restarted

We only use process-compose to help orchestrate local development environments and only ever see this problem in those local environments, never in live systems running the same GraphQL server setup.

dzmitry-lahoda · 2023-06-15T16:44:28Z

Seems for me this is case too.

Also reproduced - dump 5MB text file - freezes.

F1bonacc1 · 2023-06-15T22:50:00Z

I created a small test environment that prints about 100K log lines/sec.

When it runs outside process-compose, it takes ~1.1s to print 100K lines:
I am long line number 2676253 ====================================== Duration per 100k: 0:00:01.108581

When inside process-compose, it takes ~1.3s to print 100K lines:
[bad_script_1 ] I am long line number 14839177 ====================================== Duration per 100k: 0:00:01.320320

It was running for a few minutes and was relatively stable [1.2s-1.4s] - no freezes.
With or without saving to file (process_name.log_location undefined) didn't make any difference.

Will you give this test a try on your system?

P.S.
I tested with TUI disabled: process-compose -t=false

dzmitry-lahoda · 2023-06-15T23:07:32Z

Not sure if that response to me) But did you tried one big 5mb or like line? not many lines, but one big?

dzmitry-lahoda · 2023-06-15T23:08:01Z

I guess tui may fail on such.

F1bonacc1 · 2023-06-15T23:39:35Z

Not sure if that response to me) But did you tried one big 5mb or like line? not many lines, but one big?

I tried to cat 1GB file. OK
But when I replaced \n with in a 20MB file it didn't go so well.
Do you think it's a realistic scenario where a single line is 23040578 chars long?

Not sure that it's the same issue @john-rodewald is facing.

dzmitry-lahoda · 2023-06-15T23:43:04Z

20 no, but i have 5 yes. i run wasm hosts, and they sometime respond into logs with wasm blobs base64 encoded one liners.

F1bonacc1 · 2023-06-15T23:52:43Z

@dzmitry-lahoda Do you mind opening a separate issue for this scenario?

john-rodewald · 2023-06-16T07:10:58Z

I created a small test environment that prints about 100K log lines/sec.

Thanks for checking! I tried running this both with v0.45.0 and with v0.51.0 and everything is stable, as you say.

To rule out issues with the actual log content, I captured graphql-engine logs and printed a set of those in your Python script (essentially replacing the I am long line number... with actual graphql-engine output). Again, no issues at the same speed.

This is bizarre. 🤔

F1bonacc1 · 2023-06-16T11:23:42Z

This is bizarre indeed...

A few external factors to consider:

Check your terminal scroll history size, make sure it's less than 10K.
Comment out the processes.hasura.log_location and root level log_location to make sure it's not a persistence delay issue.
Also related to persistence (if enabled) make sure the log_location directory is in exclude list for all kinds of active protection software (antiviruses and such).
As @dzmitry-lahoda found, process compose doesn't handle well too long lines. I still need to check what is considered too long and when things start to break, but in the meanwhile please check the hasura logs to find if there are cases of very long lines. We can compare notes after I write a proper repro 😃

F1bonacc1 · 2023-06-19T19:42:41Z

Hi,

@john-rodewald, can you please update if you discovered any additional information that can help to reconstruct the issue?
The scenario described by @dzmitry-lahoda was straightforward to reconstruct. The limit is 2^16 bytes per line, it's the default value defined as MaxScanTokenSize in bufio/scan. It's possible to reconfigure it externally to any other, reasonable value.
I don't think it's related to your case, but I would prefer you to confirm it.

Another question. When you experience those freezes, does the logger stop entirely (until restarted) or slow down for some time? For how long?

appaquet · 2023-07-06T18:37:29Z

I ran into this issue as well. One of my process is logging an excessively long line and it hangs writing to stdout. I had a lot of goroutines in my process stuck on zap logger trying to log to stdout (and doing it under a lock to probably prevent mangling lines).

I recompiled process-compose with pprof enabled and dumped the goroutines. At a glance, nothing seems to be deadlocked, but I do see a bunch of goroutines scanning lines (see https://gist.github.com/appaquet/d960f5f4bf7aae018971735c5f73b0a4), as it should be.

I then tried increasing the max token size to a bigger value, and it does fix my issue:

diff --git a/src/app/process.go b/src/app/process.go
index f595e9c..cf070aa 100644
--- a/src/app/process.go
+++ b/src/app/process.go
@@ -4,7 +4,6 @@ import (
 	"bufio"
 	"context"
 	"fmt"
-	"github.com/f1bonacc1/process-compose/src/types"
 	"io"
 	"math/rand"
 	"os"
@@ -14,6 +13,8 @@ import (
 	"syscall"
 	"time"
 
+	"github.com/f1bonacc1/process-compose/src/types"
+
 	"github.com/f1bonacc1/process-compose/src/command"
 	"github.com/f1bonacc1/process-compose/src/health"
 	"github.com/f1bonacc1/process-compose/src/pclog"
@@ -335,6 +336,7 @@ func (p *Process) handleInput(pipe io.WriteCloser) {
 
 func (p *Process) handleOutput(pipe io.ReadCloser, handler func(message string)) {
 	outscanner := bufio.NewScanner(pipe)
+	outscanner.Buffer(make([]byte, 1024), 1024*1024)
 	outscanner.Split(bufio.ScanLines)
 	for outscanner.Scan() {
 		handler(outscanner.Text())

Perhaps making this max line value configurable per process could be a fix

F1bonacc1 · 2023-07-06T19:54:34Z

Hi @appaquet,

You are absolutely right, this is precisely the place that caused it to hang for @dzmitry-lahoda. In one of the latest versions, I added an explicit error printed to the process compose log.
I considered adding it as a configuration parameter, but it felt too low-level for configuration.
I can also increase the default value (2^16), but no matter what I do, at some point, there is going to be someone for whom the default won't be high enough or the new configuration parameter is too hidden.

Maybe it's time for the troubleshooting section in Readme.

I am not sure, though that this is the error @john-rodewald is facing.

nitrotm · 2023-12-08T21:56:55Z

I can confirm that I have the same problem (some log line are 10MB).

Why not raise it to a large but capped amount of ram? I personally don't care if the buffer can grow up to 128MB in dev mode.

Alternatively, maybe rely on a fail-safe scanner which would yield a partial line if the buffer is exhausted?

manveru · 2024-01-20T14:45:02Z

I just hit the same issue with Postgresql when logging all SQL statements and inserting a 2.3MB blob. Thought I was going insane until I noticed that it wasn't my application or postgresql, but process-compose itself that completely froze and had to be killed by force.
I really think that excessively long lines should be truncated, otherwise I might have to start a tmux session around each process it handles, at which point i might as well use tmux standalone anyway...

dzmitry-lahoda · 2024-01-21T21:01:30Z

hm, may bw there is some non tmux thing to handle logs?
may be pc should just fail in case of big log gracefully and suggest using that tool. unix philosofy.

btw, can tmux do background sessions?

F1bonacc1 · 2024-01-21T21:54:31Z

Fixed. Will be part of the next release.

F1bonacc1 · 2024-01-28T21:23:36Z

Fixed in v0.81.4

john-rodewald changed the title ~~Too much log output seems to cause processes to freeze~~ Too much console log output seems to cause processes to freeze Jun 14, 2023

john-rodewald changed the title ~~Too much console log output seems to cause processes to freeze~~ Too much log output seems to cause processes to freeze Jun 14, 2023

F1bonacc1 added bug Something isn't working good first issue Good for newcomers labels Jun 14, 2023

F1bonacc1 added a commit that referenced this issue Jun 15, 2023

issue #57 reconstruct

81ce325

F1bonacc1 closed this as completed in cc5a927 Jan 28, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Too much log output seems to cause processes to freeze #57

Too much log output seems to cause processes to freeze #57

john-rodewald commented Jun 14, 2023 •

edited

F1bonacc1 commented Jun 14, 2023 •

edited

john-rodewald commented Jun 15, 2023 •

edited

F1bonacc1 commented Jun 15, 2023

john-rodewald commented Jun 15, 2023

dzmitry-lahoda commented Jun 15, 2023 •

edited

F1bonacc1 commented Jun 15, 2023 •

edited

dzmitry-lahoda commented Jun 15, 2023 •

edited

dzmitry-lahoda commented Jun 15, 2023

F1bonacc1 commented Jun 15, 2023

dzmitry-lahoda commented Jun 15, 2023

F1bonacc1 commented Jun 15, 2023

john-rodewald commented Jun 16, 2023 •

edited

F1bonacc1 commented Jun 16, 2023

F1bonacc1 commented Jun 19, 2023

appaquet commented Jul 6, 2023

F1bonacc1 commented Jul 6, 2023

nitrotm commented Dec 8, 2023

manveru commented Jan 20, 2024

dzmitry-lahoda commented Jan 21, 2024

F1bonacc1 commented Jan 21, 2024

F1bonacc1 commented Jan 28, 2024

Too much log output seems to cause processes to freeze #57

Too much log output seems to cause processes to freeze #57

Comments

john-rodewald commented Jun 14, 2023 • edited

F1bonacc1 commented Jun 14, 2023 • edited

john-rodewald commented Jun 15, 2023 • edited

F1bonacc1 commented Jun 15, 2023

john-rodewald commented Jun 15, 2023

dzmitry-lahoda commented Jun 15, 2023 • edited

F1bonacc1 commented Jun 15, 2023 • edited

dzmitry-lahoda commented Jun 15, 2023 • edited

dzmitry-lahoda commented Jun 15, 2023

F1bonacc1 commented Jun 15, 2023

dzmitry-lahoda commented Jun 15, 2023

F1bonacc1 commented Jun 15, 2023

john-rodewald commented Jun 16, 2023 • edited

F1bonacc1 commented Jun 16, 2023

F1bonacc1 commented Jun 19, 2023

appaquet commented Jul 6, 2023

F1bonacc1 commented Jul 6, 2023

nitrotm commented Dec 8, 2023

manveru commented Jan 20, 2024

dzmitry-lahoda commented Jan 21, 2024

F1bonacc1 commented Jan 21, 2024

F1bonacc1 commented Jan 28, 2024

john-rodewald commented Jun 14, 2023 •

edited

F1bonacc1 commented Jun 14, 2023 •

edited

john-rodewald commented Jun 15, 2023 •

edited

dzmitry-lahoda commented Jun 15, 2023 •

edited

F1bonacc1 commented Jun 15, 2023 •

edited

dzmitry-lahoda commented Jun 15, 2023 •

edited

john-rodewald commented Jun 16, 2023 •

edited