Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SIGSEGV in 0.12.0 with zstd compression enabled, when producer is shared between multiple goroutines #1163

Closed
0x4500 opened this issue Jan 29, 2024 · 3 comments · Fixed by #1164
Assignees
Labels

Comments

@0x4500
Copy link

0x4500 commented Jan 29, 2024

Expected behavior

In release v0.11.0, sending unbatched messages with zstd compression enabled works fine.

In v0.12.0, it appears to cause a segfault.

Actual behavior

Segfaults of the following form are observed:

SIGSEGV: segmentation violation
PC=0xd41627 m=14 sigcode=1
signal arrived during cgo execution

goroutine 62 [syscall]:
runtime.cgocall(0xcf0dc0, 0xc0001ed318)
	/usr/local/go/src/runtime/cgocall.go:157 +0x4b fp=0xc0001ed2f0 sp=0xc0001ed2b8 pc=0x409d6b
github.com/DataDog/zstd._Cfunc_ZSTD_compressCCtx(0x7f8b73d5e010, 0xc000fb3140, 0xbc, 0xc00103ad80, 0x7d, 0x9)
	_cgo_gotypes.go:223 +0x4c fp=0xc0001ed318 sp=0xc0001ed2f0 pc=0x82e48c
github.com/DataDog/zstd.(*ctx).CompressLevel.func2(0x18892e8?, 0xc0001ed3e0, 0xc0001ed3f8, 0x9)
	/builds/driftnet.io/driftnet/.go/pkg/mod/github.com/!data!dog/zstd@v1.5.5/zstd_ctx.go:84 +0x127 fp=0xc0001ed388 sp=0xc0001ed318 pc=0x82f447
github.com/DataDog/zstd.(*ctx).CompressLevel(0x40fc5a?, {0xc000fb3140, 0xbc, 0xbc}, {0xc00103ad80, 0x7d, 0x7d}, 0x0?)
	/builds/driftnet.io/driftnet/.go/pkg/mod/github.com/!data!dog/zstd@v1.5.5/zstd_ctx.go:84 +0xd9 fp=0xc0001ed3d8 sp=0xc0001ed388 pc=0x82f219
github.com/apache/pulsar-client-go/pulsar/internal/compression.(*zstdCGoProvider).Compress(0x0?, {0x0?, 0x413005?, 0x8?}, {0xc00103ad80?, 0x101?, 0xc0001ed4e8?})
	/builds/driftnet.io/driftnet/.go/pkg/mod/github.com/apache/pulsar-client-go@v0.12.0/pulsar/internal/compression/zstd_cgo.go:64 +0x33 fp=0xc0001ed450 sp=0xc0001ed3d8 pc=0x83d1b3
github.com/apache/pulsar-client-go/pulsar.(*partitionProducer).updateChunkInfo(0xc0001a1680, 0xc000161b80)
	/builds/driftnet.io/driftnet/.go/pkg/mod/github.com/apache/pulsar-client-go@v0.12.0/pulsar/producer_partition.go:1155 +0x71 fp=0xc0001ed4f8 sp=0xc0001ed450 pc=0xb64ff1
github.com/apache/pulsar-client-go/pulsar.(*partitionProducer).internalSendAsync(0xc0001a1680, {0x117c200, 0x188dc80}, 0xc0002285b0, 0xc001015930, 0x0)
	/builds/driftnet.io/driftnet/.go/pkg/mod/github.com/apache/pulsar-client-go@v0.12.0/pulsar/producer_partition.go:1251 +0x517 fp=0xc0001ed718 sp=0xc0001ed4f8 pc=0xb65ad7
github.com/apache/pulsar-client-go/pulsar.(*partitionProducer).SendAsync(0xc00026cc80?, {0x117c200?, 0x188dc80?}, 0x413005?, 0xd0?)
	/builds/driftnet.io/driftnet/.go/pkg/mod/github.com/apache/pulsar-client-go@v0.12.0/pulsar/producer_partition.go:1024 +0x25 fp=0xc0001ed758 sp=0xc0001ed718 pc=0xb640a5
github.com/apache/pulsar-client-go/pulsar.(*producer).SendAsync(0xe71300?, {0x117c200, 0x188dc80}, 0x20?, 0x0?)
[...]

Steps to reproduce

I don't have test case code, but the configuration for the crashing producer is

pulsar.ProducerOptions{
	Topic:              <topic>,
	Name:               <instanceID>,
	CompressionType:    pulsar.ZSTD,
	CompressionLevel:   pulsar.Better,
	DisableBatching:    true,
	SendTimeout:        600*time.Second,
	MaxPendingMessages: 5000,
}

I have other producers in the system which have batching enabled and zstd compression also enabled. These are not crashing in 0.12.0.

System configuration

Pulsar version: 3.1.2
Golang version: 1.21.6

@0x4500 0x4500 changed the title SIGSEGV in 0.12.0 with zstd compression enabled and batching disabled SIGSEGV in 0.12.0 with batching disabled and zstd compression enabled Jan 29, 2024
@RobertIndie
Copy link
Member

Could you provide the reproducible code? I couldn't reproduce it. And could you provide the OS env where you are running the go client?

I have other producers in the system which have batching enabled and zstd compression also enabled. These are not crashing in 0.12.0.

Do you mean that only this producer would crash?

@0x4500
Copy link
Author

0x4500 commented Jan 30, 2024

Unfortunately I am not able to provide a minimal test case. This is a production system, so we rolled back to 0.11.0.

This code is running in a container based on Alpine 3.19.1. However, I can reproduce this issue when running our code from the command line, using Ubuntu 23.10.

Sometimes, this message will be generated instead of a segfault:

FATA[0001] Failed to compress                            error="Src size is incorrect"

@0x4500 0x4500 changed the title SIGSEGV in 0.12.0 with batching disabled and zstd compression enabled SIGSEGV in 0.12.0 with zstd compression enabled Jan 30, 2024
@0x4500
Copy link
Author

0x4500 commented Jan 30, 2024

In the zstd code, I see this comment (in zstd_context.go):

//  Note 2 : In multi-threaded environments,
//         use one different context per thread for parallel execution.

Our calling code uses multiple goroutines and will be multi-threaded at the OS level. In the particular code that crashes, we share a pulsar.Producer between multiple goroutines.

In v0.11 of the pulsar-client-go library, compression always occurred in the function internalSend(), which runs in a single goroutine (= single thread) since it is only ever called from runEventsLoop().

In v0.12, compression occurs in the internalSendAsync() function, which runs in whatever goroutine the calling code is executing in and so can be multiplexed onto one of the several threads the runtime is currently using.

I think the issue is that v0.12 of the pulsar-client-go library is no longer meeting the requirement to use a unique zstd context per thread, if the calling client shares a pulsar.Producer between goroutines.

If this is by design then that's fine, but in that case it would be helpful to update the documentation and explicitly state that a pulsar.Producer must not be shared between goroutines.

@0x4500 0x4500 changed the title SIGSEGV in 0.12.0 with zstd compression enabled SIGSEGV in 0.12.0 with zstd compression enabled, when producer is shared between multiple goroutines Jan 30, 2024
@RobertIndie RobertIndie self-assigned this Jan 31, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants