Skip to content

[improve](sdk) add gzip compression support for stream load#61373

Open
JNSimba wants to merge 3 commits intoapache:masterfrom
JNSimba:improve_sdk
Open

[improve](sdk) add gzip compression support for stream load#61373
JNSimba wants to merge 3 commits intoapache:masterfrom
JNSimba:improve_sdk

Conversation

@JNSimba
Copy link
Member

@JNSimba JNSimba commented Mar 16, 2026

What problem does this PR solve?

Summary

This PR adds gzip compression support to the Doris Go SDK for stream load.

When EnableGzip: true is set in the config, the SDK automatically compresses
the request body using gzip and adds the compress_type: gz HTTP header,
without requiring the caller to pre-compress the data.

Both CSV and JSON formats are supported. Note that JSON compression support
depends on the Doris version (Doris3.0.5+)

Usage

config := &doris.Config{
    Endpoints:  []string{"http://127.0.0.1:8030"},
    User:       "root",
    Password:   "password",
    Database:   "test_db",
    Table:      "users",
    Format:     doris.DefaultCSVFormat(),
    EnableGzip: true,
}

Notes

  • Compression is applied once before the retry loop, so retries do not incur
    extra compression overhead.
  • The compressed data is buffered in memory. Since the SDK already buffers the
    original data internally for retry support, this does not introduce additional
    memory copies in practice.

Gzip Compression Benchmark

[CSV format]

Rows Approx Size Original After gzip Compressed By
100 ~2 KB 1590 B 505 B 68.2%
1,000 ~17 KB 16.49 KB 4.38 KB 73.4%
10,000 ~176 KB 175.67 KB 47.74 KB 72.8%
100,000 ~1935 KB 1.89 MB 473.59 KB 75.5%
1,000,000 ~21272 KB 20.78 MB 4.74 MB 77.2%
10,000,000 ~232210 KB 226.77 MB 47.51 MB 79.1%

[JSON format]

Rows Approx Size Original After gzip Compressed By
100 ~4 KB 3.70 KB 629 B 83.4%
1,000 ~38 KB 37.98 KB 4.92 KB 87.0%
10,000 ~391 KB 390.52 KB 48.61 KB 87.6%
100,000 ~4083 KB 3.99 MB 497.33 KB 87.8%
1,000,000 ~42756 KB 41.77 MB 5.00 MB 88.0%
10,000,000 ~447054 KB 436.58 MB 49.87 MB 88.6%

Check List (For Author)

  • Test

    • Regression test
    • Unit Test
    • Manual test (add detailed scripts or steps below)
    • No need to test or manual test. Explain why:
      • This is a refactor/code format and no logic has been changed.
      • Previous test can cover this change.
      • No code files have been changed.
      • Other reason
  • Behavior changed:

    • No.
    • Yes.
  • Does this need documentation?

    • No.
    • Yes.

Check List (For Reviewer who merge this PR)

  • Confirm the release note
  • Confirm test cases
  • Confirm document
  • Add branch pick label

@hello-stephen
Copy link
Contributor

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@JNSimba JNSimba requested a review from Copilot March 16, 2026 07:12
@JNSimba
Copy link
Member Author

JNSimba commented Mar 16, 2026

/review

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds transparent gzip compression support to the Go Doris stream load SDK for CSV payloads, allowing callers to enable compression via config without pre-compressing data.

Changes:

  • Introduces EnableGzip in load Config and validates it is only used with CSV format.
  • Compresses the HTTP request body with gzip when enabled and adds the compress_type: gz header.
  • Adds a runnable gzip example and documents the new option in the SDK README.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
sdk/go-doris-sdk/pkg/load/loader/request_builder.go Adds in-memory gzip compression wrapper and sets compress_type header when enabled.
sdk/go-doris-sdk/pkg/load/config/load_config.go Adds EnableGzip config flag and validation for CSV-only support.
sdk/go-doris-sdk/examples/gzip_example.go New example demonstrating gzip-compressed CSV stream load.
sdk/go-doris-sdk/cmd/examples/main.go Adds a gzip runnable example option to the examples runner.
sdk/go-doris-sdk/README.md Documents EnableGzip usage and adds gzip to the examples list.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review Summary

This PR adds gzip compression support for CSV stream load in the Go SDK. The implementation is clean and well-structured, but there are several issues that should be addressed.

Critical Checkpoints

1. Does the code accomplish its goal? Is there a test that proves it?
The code accomplishes gzip compression for stream load. However, there are no unit tests for any of the new code — not for the config validation (EnableGzip with JSON rejection), not for wrapWithGzip(), and not for the header-setting logic in buildStreamLoadOptions(). The PR's own checklist has no test boxes checked. Unit tests should be added for the new functionality.

2. Is this modification as small, clear, and focused as possible?
Yes, the change is focused on a single feature. The struct field reformat is an unnecessary whitespace change but is minor.

3. Concurrency?
Not applicable — no new concurrency introduced.

4. Special lifecycle management?
Not applicable.

5. Configuration items added?
EnableGzip is a new config field. It does not need to be dynamically changeable since configs are set at client creation time.

6. Incompatible changes?
No — adding a new optional bool field with zero-value false is backward compatible.

7. Functionally parallel code paths?
No parallel code paths affected.

8. Special conditional checks?
The JSON format rejection check has a factual issue — see inline comment. Doris BE actually supports compress_type with JSON format (via NewJsonReader + Decompressor). The validation is unnecessarily restrictive.

9. Test coverage?
No tests at all. This is the primary gap.

10. Observability?
No logging is added when gzip compression is applied. A debug log in CreateStreamLoadRequest when gzip wrapping occurs would help troubleshooting.

11. Transaction/persistence?
Not applicable.

12. FE-BE variable passing?
Not applicable (SDK-only change).

13. Performance issues?
Gzip compression is re-performed on every retry attempt. Since CreateStreamLoadRequest is called per-retry with a fresh uncompressed reader (from getBodyFunc()), the data is re-compressed each time. For large payloads this is wasteful. Consider compressing once and caching the compressed bytes, or doing the compression in the retry layer (doris_load_client.go) instead of per-request.

14. Other issues?

  • Example file uses a hardcoded internal IP address (10.16.10.6:48939).
  • The "all" case in cmd/examples/main.go does not include the gzip example.
  • The usage const string does not list the gzip example.
  • No conflict detection if user manually sets compress_type in Options map while also setting EnableGzip: true.

Issues Found

# Severity File Issue
1 High load_config.go JSON format rejection is factually incorrect — Doris supports compress_type with JSON
2 High (missing) No unit tests for any new functionality
3 Medium request_builder.go Gzip re-compression on every retry attempt is wasteful for large payloads
4 Low gzip_example.go Hardcoded internal IP address should use localhost placeholder
5 Low cmd/examples/main.go "all" case and usage string don't include gzip example

@JNSimba
Copy link
Member Author

JNSimba commented Mar 16, 2026

run buildall

@JNSimba JNSimba changed the title [improve](sdk) add gzip compression support for CSV stream load [improve](sdk) add gzip compression support for stream load Mar 16, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants