Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: log ingestion support #4014

Merged
merged 64 commits into from
Jun 14, 2024
Merged

Conversation

paomian
Copy link
Contributor

@paomian paomian commented May 22, 2024

I hereby agree to the terms of the GreptimeDB CLA.

Refer to a related PR or issue link (optional)

What's changed and what's your intention?

This pr adds log ingestion support for GreptimeDB.

src/pipeline/src/etl is a simple implementation of elastic ingest pipelines. We use pipelines to describe ETL process.

Feature-wise, this pr includes:

  1. Pipeline management api for creating and storing pipeline model in GreptimeDB(greptime_private schema)
  2. /v1/events/log http api for ingesting log

How to use

1. Create a pipeline model

## Request
curl -X "POST" "http://localhost:4000/v1/events/pipelines/test" \
     -H 'Content-Type: application/x-yaml' \
     -d $'processors:
  - date:
      field: time
      formats:
        - "%Y-%m-%d %H:%M:%S%.3f"
      ignore_missing: true

transform:
  - fields:
      - id1
      - id2
    type: int32
  - fields:
      - type
      - log
      - logger
    type: string
  - field: time
    type: time
    index: timestamp
'

or

curl -X "POST" "http://localhost:4000/v1/event/pipelines/test" -F "file=@test.yaml" -v
Note: Unnecessary use of -X or --request, POST is already inferred.
*   Trying [::1]:4000...
* connect to ::1 port 4000 failed: Connection refused
*   Trying 127.0.0.1:4000...
* Connected to localhost (127.0.0.1) port 4000
> POST /v1/event/pipelines/test HTTP/1.1
> Host: localhost:4000
> User-Agent: curl/8.4.0
> Accept: */*
> Content-Length: 514
> Content-Type: multipart/form-data; boundary=------------------------Tuq0bldZlWywES7f4FfGtx
>
* We are completely uploaded and fine
< HTTP/1.1 200 OK
< content-type: text/plain; charset=utf-8
< content-length: 2
< date: Thu, 06 Jun 2024 06:13:54 GMT
<
* Connection #0 to host localhost left intact
ok

Use select * from greptime_private.pipelines; to check pipeline is successfully created.

2. Ingest logs

curl -X "POST" "http://localhost:4000/v1/events/logs?db=public&table=logs1&pipeline_name=test" \
     -H 'Content-Type: application/json' \
     -d '[
    {
      "id1": "2436",
      "id2": "2528",
      "logger": "INTERACT.MANAGER",
      "type": "I",
      "time": "2024-05-25 20:16:37.217",
      "log": "ClusterAdapter:enter sendTextDataToCluster\\n"
    }
  ]'

A logs1 table is created and log is inserted into the table. Use select * from logs1; to confirm.

Discussion

  1. Should we add config for enabling log ingestion http api, like we do with prom api or influx api(or we leave it enabled for default like sql api)
  2. the api path is now v1/event/logs for ingestion and v1/event/pipelines for adding pipeline. Should we rename them for better semantics
  3. if one record/item in a batch request failed to be parsed into valid json value, should we interrupt the whole request or ignore the single record/item and continue

TODO

Some future works will be done in a separate pr, since it's already a big one.

  1. implement delete_pipeline
  2. insert pipeline with same name as update, and get the latest one using created_at
  3. add support for more Content-Type in ingestion api(like urlencoded)
  4. fix styling in etl mod, namely mod.rs

Checklist

  • I have written the necessary rustdoc comments.
  • I have added the necessary unit tests and integration tests.
  • This PR requires documentation updates.

@github-actions github-actions bot added the docs-not-required This change does not impact docs. label May 22, 2024
@shuiyisong shuiyisong self-requested a review May 28, 2024 01:53
src/frontend/src/instance/log_handler.rs Outdated Show resolved Hide resolved
src/servers/Cargo.toml Outdated Show resolved Hide resolved
src/servers/src/http.rs Outdated Show resolved Hide resolved
src/servers/src/http.rs Outdated Show resolved Hide resolved
src/servers/src/http/handler.rs Outdated Show resolved Hide resolved
@killme2008
Copy link
Contributor

There are some code and TOM format issues @paomian

@paomian
Copy link
Contributor Author

paomian commented Jun 12, 2024

There are some code and TOM format issues @paomian

fixed.

@paomian paomian requested a review from evenyag June 12, 2024 13:30
@waynexia waynexia self-requested a review June 12, 2024 13:38
src/frontend/src/instance/log_handler.rs Show resolved Hide resolved
src/operator/src/metrics.rs Outdated Show resolved Hide resolved
src/operator/src/metrics.rs Outdated Show resolved Hide resolved
src/pipeline/src/manager/error.rs Outdated Show resolved Hide resolved
src/pipeline/src/manager/pipeline_operator.rs Outdated Show resolved Hide resolved
src/pipeline/src/manager/table.rs Outdated Show resolved Hide resolved
src/pipeline/src/manager/table.rs Outdated Show resolved Hide resolved
src/pipeline/src/manager/table.rs Outdated Show resolved Hide resolved
src/servers/src/http/event.rs Outdated Show resolved Hide resolved
src/servers/src/http/event.rs Outdated Show resolved Hide resolved
src/operator/src/metrics.rs Outdated Show resolved Hide resolved
src/operator/src/metrics.rs Outdated Show resolved Hide resolved
src/operator/src/metrics.rs Outdated Show resolved Hide resolved
src/operator/src/insert.rs Show resolved Hide resolved
src/pipeline/src/manager/table.rs Outdated Show resolved Hide resolved
src/pipeline/src/manager/table.rs Show resolved Hide resolved
src/pipeline/src/manager/table.rs Outdated Show resolved Hide resolved
Copy link
Contributor

@killme2008 killme2008 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Let's move forward. Other suggestions can be improved in the following PR.

@killme2008 killme2008 added this pull request to the merge queue Jun 14, 2024
Merged via the queue into GreptimeTeam:main with commit 01e3a24 Jun 14, 2024
49 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
docs-not-required This change does not impact docs.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants