Skip to content

[Task][Go SDK][Perf]: Make and use a general []byte buffer allocation pool for the SDK. #25522

@lostluck

Description

@lostluck

What needs to happen?

Replace all small []byte, or bytes.Buffer uses with a central sync.Pool allocated "buffer" package, like

https://cs.opensource.google/go/x/exp/+/f062dba9:slog/internal/buffer/buffer.go;bpv=1;bpt=1

or

https://github.com/golang/go/blob/master/src/log/log.go#L165

https://pkg.go.dev/sync#Pool are weak reference pools meaning that buffers in the pool may be freed at during the next GC if appropriate.

While some unsafe strategies were used as well, through the ioutilx package read and write calls to avoid allocations, this remains slightly risky, so uses there are a reasonable priority.

But generally, uses of bytes.Buffer or arbitrary lengths but known lifetime uses of make([]byte)

Some bytes.Buffer replacements would require this new Buffer to support the io.Reader interface as well.

In particular, targeting the uses in the exec and harness packages would likely be the most fruitful.

It would be worth having a moderate scale benchmark to validate the before and after performance of this change, with appropriate heap and CPU profiles.


Notionally the benefits would be a reduction in byte buffer allocations to the heap through re-use, while also enabling certain levels of inlining, though this would require more explicit use of the new buffer type instead of using the interfaces.

Note that Go also remains a garbage collected language, so not all uses from the pool need to be freed. They can leak and be GC'd normally.

Issue Priority

Priority: 3 (nice-to-have improvement)

Issue Components

  • Component: Python SDK
  • Component: Java SDK
  • Component: Go SDK
  • Component: Typescript SDK
  • Component: IO connector
  • Component: Beam examples
  • Component: Beam playground
  • Component: Beam katas
  • Component: Website
  • Component: Spark Runner
  • Component: Flink Runner
  • Component: Samza Runner
  • Component: Twister2 Runner
  • Component: Hazelcast Jet Runner
  • Component: Google Cloud Dataflow Runner

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions