[improve](sdk) add gzip compression support for stream load#61373
[improve](sdk) add gzip compression support for stream load#61373JNSimba wants to merge 3 commits intoapache:masterfrom
Conversation
|
Thank you for your contribution to Apache Doris. Please clearly describe your PR:
|
|
/review |
There was a problem hiding this comment.
Pull request overview
Adds transparent gzip compression support to the Go Doris stream load SDK for CSV payloads, allowing callers to enable compression via config without pre-compressing data.
Changes:
- Introduces
EnableGzipin loadConfigand validates it is only used with CSV format. - Compresses the HTTP request body with gzip when enabled and adds the
compress_type: gzheader. - Adds a runnable gzip example and documents the new option in the SDK README.
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| sdk/go-doris-sdk/pkg/load/loader/request_builder.go | Adds in-memory gzip compression wrapper and sets compress_type header when enabled. |
| sdk/go-doris-sdk/pkg/load/config/load_config.go | Adds EnableGzip config flag and validation for CSV-only support. |
| sdk/go-doris-sdk/examples/gzip_example.go | New example demonstrating gzip-compressed CSV stream load. |
| sdk/go-doris-sdk/cmd/examples/main.go | Adds a gzip runnable example option to the examples runner. |
| sdk/go-doris-sdk/README.md | Documents EnableGzip usage and adds gzip to the examples list. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
You can also share your feedback on Copilot code review. Take the survey.
There was a problem hiding this comment.
Code Review Summary
This PR adds gzip compression support for CSV stream load in the Go SDK. The implementation is clean and well-structured, but there are several issues that should be addressed.
Critical Checkpoints
1. Does the code accomplish its goal? Is there a test that proves it?
The code accomplishes gzip compression for stream load. However, there are no unit tests for any of the new code — not for the config validation (EnableGzip with JSON rejection), not for wrapWithGzip(), and not for the header-setting logic in buildStreamLoadOptions(). The PR's own checklist has no test boxes checked. Unit tests should be added for the new functionality.
2. Is this modification as small, clear, and focused as possible?
Yes, the change is focused on a single feature. The struct field reformat is an unnecessary whitespace change but is minor.
3. Concurrency?
Not applicable — no new concurrency introduced.
4. Special lifecycle management?
Not applicable.
5. Configuration items added?
EnableGzip is a new config field. It does not need to be dynamically changeable since configs are set at client creation time.
6. Incompatible changes?
No — adding a new optional bool field with zero-value false is backward compatible.
7. Functionally parallel code paths?
No parallel code paths affected.
8. Special conditional checks?
The JSON format rejection check has a factual issue — see inline comment. Doris BE actually supports compress_type with JSON format (via NewJsonReader + Decompressor). The validation is unnecessarily restrictive.
9. Test coverage?
No tests at all. This is the primary gap.
10. Observability?
No logging is added when gzip compression is applied. A debug log in CreateStreamLoadRequest when gzip wrapping occurs would help troubleshooting.
11. Transaction/persistence?
Not applicable.
12. FE-BE variable passing?
Not applicable (SDK-only change).
13. Performance issues?
Gzip compression is re-performed on every retry attempt. Since CreateStreamLoadRequest is called per-retry with a fresh uncompressed reader (from getBodyFunc()), the data is re-compressed each time. For large payloads this is wasteful. Consider compressing once and caching the compressed bytes, or doing the compression in the retry layer (doris_load_client.go) instead of per-request.
14. Other issues?
- Example file uses a hardcoded internal IP address (
10.16.10.6:48939). - The
"all"case incmd/examples/main.godoes not include the gzip example. - The
usageconst string does not list thegzipexample. - No conflict detection if user manually sets
compress_typeinOptionsmap while also settingEnableGzip: true.
Issues Found
| # | Severity | File | Issue |
|---|---|---|---|
| 1 | High | load_config.go |
JSON format rejection is factually incorrect — Doris supports compress_type with JSON |
| 2 | High | (missing) | No unit tests for any new functionality |
| 3 | Medium | request_builder.go |
Gzip re-compression on every retry attempt is wasteful for large payloads |
| 4 | Low | gzip_example.go |
Hardcoded internal IP address should use localhost placeholder |
| 5 | Low | cmd/examples/main.go |
"all" case and usage string don't include gzip example |
|
run buildall |
What problem does this PR solve?
Summary
This PR adds gzip compression support to the Doris Go SDK for stream load.
When
EnableGzip: trueis set in the config, the SDK automatically compressesthe request body using gzip and adds the
compress_type: gzHTTP header,without requiring the caller to pre-compress the data.
Both CSV and JSON formats are supported. Note that JSON compression support
depends on the Doris version (Doris3.0.5+)
Usage
Notes
extra compression overhead.
original data internally for retry support, this does not introduce additional
memory copies in practice.
Gzip Compression Benchmark
[CSV format]
[JSON format]
Check List (For Author)
Test
Behavior changed:
Does this need documentation?
Check List (For Reviewer who merge this PR)