Skip to content

fix: properly close millCh if non-nil.#7

Merged
kkourt merged 1 commit intocilium:masterfrom
FedeDP:master
Feb 19, 2026
Merged

fix: properly close millCh if non-nil.#7
kkourt merged 1 commit intocilium:masterfrom
FedeDP:master

Conversation

@FedeDP
Copy link
Copy Markdown

@FedeDP FedeDP commented Feb 10, 2026

Playing with cilium/tetragon#4508 (and specifically writing the test for the exporter), i stumbled upon natefinch#219 (there are many reports: natefinch#205); that's because i am specifically testing that updating the rotationInterval does the trick and to do so, i use synctest.Run() with some sleeps.

The test failure:

=== RUN   TestExporterSetLoggingParams
level=info msg="Rotating JSON logs export" file=test.txt directory=/tmp/TestExporterSetLoggingParams2522563654/001
level=info msg="Updating exporter params" params="{MaxSize:<nil> RotationInterval:2s MaxBackups:<nil>}"
level=info msg="Rotating JSON logs export" file=test.txt directory=/tmp/TestExporterSetLoggingParams2522563654/001
--- FAIL: TestExporterSetLoggingParams (0.00s)
panic: deadlock: main bubble goroutine has exited but blocked goroutines remain [recovered, repanicked]

goroutine 67 [running]:
testing.tRunner.func1.2({0x1c86040, 0xc0002e3f50})
	/home/fdipierr/go/pkg/mod/golang.org/toolchain@v0.0.1-go1.25.0.linux-amd64/src/testing/testing.go:1872 +0x237
testing.tRunner.func1()
	/home/fdipierr/go/pkg/mod/golang.org/toolchain@v0.0.1-go1.25.0.linux-amd64/src/testing/testing.go:1875 +0x35b
panic({0x1c86040?, 0xc0002e3f50?})
	/home/fdipierr/go/pkg/mod/golang.org/toolchain@v0.0.1-go1.25.0.linux-amd64/src/runtime/panic.go:783 +0x132
internal/synctest.Run(0xc000bc4620)
	/home/fdipierr/go/pkg/mod/golang.org/toolchain@v0.0.1-go1.25.0.linux-amd64/src/runtime/synctest.go:251 +0x2de
testing/synctest.Test(0xc0004456c0, 0x1f8da80)
	/home/fdipierr/go/pkg/mod/golang.org/toolchain@v0.0.1-go1.25.0.linux-amd64/src/testing/synctest/synctest.go:282 +0x90
github.com/cilium/tetragon/pkg/exporter.TestExporterSetLoggingParams(0xc0004456c0?)
	/home/fdipierr/Work/tetragon/pkg/exporter/exporter_test.go:237 +0x1a
testing.tRunner(0xc0004456c0, 0x1f8d970)
	/home/fdipierr/go/pkg/mod/golang.org/toolchain@v0.0.1-go1.25.0.linux-amd64/src/testing/testing.go:1934 +0xea
created by testing.(*T).Run in goroutine 1
	/home/fdipierr/go/pkg/mod/golang.org/toolchain@v0.0.1-go1.25.0.linux-amd64/src/testing/testing.go:1997 +0x465

goroutine 71 [chan receive (durable), synctest bubble 1]:
github.com/cilium/lumberjack/v2.(*Logger).millRun(...)
	/home/fdipierr/Work/tetragon/vendor/github.com/cilium/lumberjack/v2/lumberjack.go:403
created by github.com/cilium/lumberjack/v2.(*Logger).mill.func1 in goroutine 70
	/home/fdipierr/Work/tetragon/vendor/github.com/cilium/lumberjack/v2/lumberjack.go:414 +0x8c

If i manually update lumberjack code to close millCh, the test passes without panic'ing.
Here is the fix.

Copy link
Copy Markdown
Collaborator

@chancez chancez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is quite right either. close is called by rotate() which closes the file and opens a new one, then calls mill() which sends on millCh, which will now be empty. It's not re-created because millCh is only allocated once, using a sync.Once in mill: https://github.com/cilium/lumberjack/blob/master/lumberjack.go#L412-L415

So with this change, I'd expect that a rotation would result in the channel being closed, then a send on a nil channel would occur.

Besides this, this causes millRun to return, and never gets restarted.

@FedeDP
Copy link
Copy Markdown
Author

FedeDP commented Feb 10, 2026

Uh, you are right! Let me move it to Close() instead.
Great catch, thanks for the review.

@FedeDP FedeDP force-pushed the master branch 3 times, most recently from 552d0ee to 17aaef4 Compare February 10, 2026 16:07
@chancez
Copy link
Copy Markdown
Collaborator

chancez commented Feb 10, 2026

I think this looks correct, but would be able to add some tests that would cause the failure you originally saw in Tetragon and after this change, the test should pass?

@FedeDP
Copy link
Copy Markdown
Author

FedeDP commented Feb 11, 2026

@chancez can i bump the go version to make use of synctest?

Otherwise i could manually check number of goroutines in the test, eg: runtime.NumGoroutine() :)

@FedeDP
Copy link
Copy Markdown
Author

FedeDP commented Feb 11, 2026

runtime.NumGoroutine() is more than enough to test the behavior!
PTAL 🙏

=== RUN   TestForcedRotateCleanup
--- PASS: TestForcedRotateCleanup (0.00s)
PASS

Process finished with the exit code 0

@FedeDP FedeDP requested a review from chancez February 11, 2026 10:52
Also, add a millWg to properly wait for `millRun` goroutine
to cleanly exit in `Close()`.

Signed-off-by: Federico Di Pierro <nierro92@gmail.com>
@kkourt kkourt merged commit 56060e5 into cilium:master Feb 19, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants