Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: adding victoriametrics remote write #3641

Merged
merged 2 commits into from Apr 7, 2024

Conversation

sunng87
Copy link
Member

@sunng87 sunng87 commented Apr 5, 2024

I hereby agree to the terms of the GreptimeDB CLA.

Refer to a related PR or issue link (optional)

What's changed and what's your intention?

This patch enhances our remote write interface to support victoriametrics' remote write variant.

Basically, the protocol variant is a change from snappy encoding to zstd encoding. For compatibility check, it introduces a new handshake process.

The client sends an empty write request with get_vm_proto_version

Hypertext Transfer Protocol
    POST /api/v1/write?get_vm_proto_version=1 HTTP/1.1\r\n
    Host: localhost:8000\r\n
    User-Agent: vmagent\r\n
    Content-Length: 0\r\n
    Content-Encoding: snappy\r\n
    Content-Type: application/x-protobuf\r\n
    X-Prometheus-Remote-Write-Version: 0.1.0\r\n
    Accept-Encoding: gzip\r\n
    \r\n
    [Full request URI: http://localhost:8000/api/v1/write?get_vm_proto_version=1]
    [HTTP request 1/2]
    [Response in frame: 271]
    [Next request in frame: 281]

Compatible server will reply 200 and body of a single 1

Hypertext Transfer Protocol
    HTTP/1.1 200 OK\r\n
    Content-Type: text/plain; charset=utf-8\r\n
    Vary: Accept-Encoding\r\n
    X-Server-Hostname: 56c731f5dd72\r\n
    Date: Fri, 05 Apr 2024 03:32:01 GMT\r\n
    Content-Length: 1\r\n
    \r\n
    [HTTP response 1/2]
    [Time since request: 0.000131397 seconds]
    [Request in frame: 269]
    [Next request in frame: 281]
    [Next response in frame: 282]
    [Request URI: http://localhost:8000/api/v1/write?get_vm_proto_version=1]
    File Data: 1 byte
Line-based text data: text/plain (1 lines)
    1

Then client starts to send remote write protobuf in zstd encoding:

Hypertext Transfer Protocol
    POST /api/v1/write HTTP/1.1\r\n
    Host: localhost:8000\r\n
    User-Agent: vmagent\r\n
    Content-Length: 229\r\n
    Content-Encoding: zstd\r\n
    Content-Type: application/x-protobuf\r\n
    X-Victoriametrics-Remote-Write-Version: 1\r\n
    Accept-Encoding: gzip\r\n
    \r\n
    [Full request URI: http://localhost:8000/api/v1/write]
    [HTTP request 2/2]
    [Prev request in frame: 269]
    [Response in frame: 282]
    Content-encoded entity body (zstd): 229 bytes

The server will reply same 204.

Hypertext Transfer Protocol
    HTTP/1.1 204 No Content\r\n
    Content-Type: text/plain; charset=utf-8\r\n
    Vary: Accept-Encoding\r\n
    X-Server-Hostname: 56c731f5dd72\r\n
    Date: Fri, 05 Apr 2024 03:32:13 GMT\r\n
    \r\n
    [HTTP response 2/2]
    [Time since request: 0.000432983 seconds]
    [Prev request in frame: 269]
    [Prev response in frame: 271]
    [Request in frame: 281]
    [Request URI: http://localhost:8000/api/v1/write]

Learn more about VictoriaMetrics remote write: https://victoriametrics.com/blog/victoriametrics-remote-write/

The patch also includes e2e tests for both prometheus and vm remote write.

Checklist

  • I have written the necessary rustdoc comments.
  • I have added the necessary unit tests and integration tests.
  • This PR does not require documentation updates.

@github-actions github-actions bot added the docs-not-required This change does not impact docs. label Apr 5, 2024
@sunng87 sunng87 changed the title feat: adding victoria metrics remote write feat: adding victoriametrics remote write Apr 5, 2024
Copy link

codecov bot commented Apr 5, 2024

Codecov Report

Attention: Patch coverage is 16.66667% with 25 lines in your changes are missing coverage. Please review.

Project coverage is 84.81%. Comparing base (86d377d) to head (356db8c).

❗ Current head 356db8c differs from pull request most recent head 70e41ba. Consider uploading reports for the commit 70e41ba to get more accurate results

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #3641      +/-   ##
==========================================
- Coverage   85.14%   84.81%   -0.33%     
==========================================
  Files         943      943              
  Lines      157070   157091      +21     
==========================================
- Hits       133731   133233     -498     
- Misses      23339    23858     +519     

@sunng87 sunng87 requested review from waynexia, v0y4g3r and shuiyisong and removed request for waynexia and v0y4g3r April 5, 2024 18:18
@github-actions github-actions bot added docs-required This change requires docs update. and removed docs-not-required This change does not impact docs. labels Apr 5, 2024
Copy link
Contributor

@v0y4g3r v0y4g3r left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@shuiyisong shuiyisong added this pull request to the merge queue Apr 7, 2024
Merged via the queue into GreptimeTeam:main with commit b32e0bb Apr 7, 2024
20 checks passed
@shuiyisong shuiyisong deleted the feature/vm-remote-write branch April 7, 2024 07:29
killme2008 pushed a commit to killme2008/greptimedb that referenced this pull request Apr 8, 2024
* feat: adding victoria metrics remote write

* test: add e2e tests for prom and vm remote writes
github-merge-queue bot pushed a commit that referenced this pull request Apr 8, 2024
* fix: columns table in information_schema misses some columns

* fix: test_information_schema_dot_columns

* fix: fuzz test

* feat: adds srs_id and refactor some columns with constant vector

* fix: test_information_schema_dot_columns

* chore: update comment

Co-authored-by: JeremyHi <jiachun_feng@proton.me>

* build(deps): bump h2 from 0.3.24 to 0.3.26 (#3642)

Bumps [h2](https://github.com/hyperium/h2) from 0.3.24 to 0.3.26.
- [Release notes](https://github.com/hyperium/h2/releases)
- [Changelog](https://github.com/hyperium/h2/blob/v0.3.26/CHANGELOG.md)
- [Commits](hyperium/h2@v0.3.24...v0.3.26)

---
updated-dependencies:
- dependency-name: h2
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* build(deps): bump whoami from 1.4.1 to 1.5.1 (#3643)

Bumps [whoami](https://github.com/ardaku/whoami) from 1.4.1 to 1.5.1.
- [Changelog](https://github.com/ardaku/whoami/blob/v1/CHANGELOG.md)
- [Commits](ardaku/whoami@v1.4.1...v1.5.1)

---
updated-dependencies:
- dependency-name: whoami
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* feat: adding victoriametrics remote write (#3641)

* feat: adding victoria metrics remote write

* test: add e2e tests for prom and vm remote writes

* fix: construct correct pk list with pre-existing pk (#3614)

* fix: construct correct pk list with pre-existing pk

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* update UT

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

---------

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* test(sqlness): release databases after tests (#3648)

* refactor: rename Greptime_Type to Greptime_type

---------

Signed-off-by: dependabot[bot] <support@github.com>
Signed-off-by: Ruihang Xia <waynestxia@gmail.com>
Co-authored-by: JeremyHi <jiachun_feng@proton.me>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Ning Sun <sunng@protonmail.com>
Co-authored-by: Ruihang Xia <waynestxia@gmail.com>
Co-authored-by: Weny Xu <wenymedia@gmail.com>
@jiekun
Copy link

jiekun commented May 13, 2024

I came across this PR. Just a small heads up: vmagent could send remote-write requests in Snappy compression after the VM protocol handshake. This occurs when:

  1. vmagent sends data in Snappy.
  2. vmagent buffers data on disk when the remote-write target is down for upgrading or other reasons.
  3. vmagent restarts after the remote-write target is up and performs the handshake.
  4. vmagent sends data in zstd.

In the fourth step, vmagent actually needs to send the buffered (on-disk) data (in Snappy) before sending new data.

I mentioned this issue in https://jiekun.dev/posts/vmagent-data-structures (思考题2). VictoriaMetrics will handle such situations with fallback logic. It will try Snappy when zstd encounters an error.

I'm not familiar with Rust, so I haven't read the code and I'm not sure if this is also implemented in GreptimeDB. This could potentially cause vmagent to receive an error response and retry. I don't know if this is a necessary feature for GreptimeDB since it's hard to reproduce in production :)


Edit: It seems the related source codes is:

let buf = Bytes::from(if is_zstd {

It could be fixed like this (generated by ChatGPT):

let buf = match is_zstd {
    true => {
        match zstd_decompress(&body[..]) {
            Ok(result) => Bytes::from(result),
            Err(_) => snappy_decompress(&body[..]).map_err(|_| "Both decompression methods failed")?,
        }
    }
    false => {
        match snappy_decompress(&body[..]) {
            Ok(result) => Bytes::from(result),
            Err(_) => zstd_decompress(&body[..]).map_err(|_| "Both decompression methods failed")?,
        }
    }
};

@zyy17
Copy link
Collaborator

zyy17 commented May 13, 2024

@jiekun, thank you for your hint. It seems like a bug in vmagent. Can you open the issue, and we can discuss it?

@jiekun
Copy link

jiekun commented May 13, 2024

@jiekun, thank you for your hint. It seems like a bug in vmagent. Can you open the issue, and we can discuss it?

It should be handled more gracefully by vmagent but I doubt this could be fixed or not, since data is compressed before being sent to a remote-write target or on-disk queue.

Remote-write clients won't know what protocol data is compressed in.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
docs-required This change requires docs update.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants