-
Notifications
You must be signed in to change notification settings - Fork 313
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: support text/plain
format for log ingestion
#4300
feat: support text/plain
format for log ingestion
#4300
Conversation
WalkthroughThe update introduces a new error variant, Changes
Sequence Diagram(s)sequenceDiagram
participant Client
participant HTTPServer
participant PipelineManager
participant DataFrameHandler
Client->>HTTPServer: Send plain text data ingestion request
HTTPServer->>PipelineManager: Forward request data
PipelineManager->>DataFrameHandler: Prepare and execute dataframe
DataFrameHandler-->>PipelineManager: Return dataframe result
PipelineManager-->>HTTPServer: Return processed data response
HTTPServer-->>Client: Send response with ingestion result
Poem
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (invoked as PR comments)
Additionally, you can add CodeRabbit Configration File (
|
e08759b
to
9fc3645
Compare
9fc3645
to
f87472a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Files ignored due to path filters (1)
Cargo.lock
is excluded by!**/*.lock
Files selected for processing (6)
- src/pipeline/src/manager/error.rs (2 hunks)
- src/pipeline/src/manager/table.rs (5 hunks)
- src/pipeline/src/manager/util.rs (2 hunks)
- src/pipeline/tests/pipeline.rs (2 hunks)
- src/servers/src/http/event.rs (2 hunks)
- tests-integration/tests/http.rs (2 hunks)
Additional comments not posted (11)
src/pipeline/src/manager/util.rs (1)
37-52
: LGTM! But verify the function usage in the codebase.The logic for constructing filter expressions and combining them using
Expr::and
is correct and efficient.However, ensure that all invocations of
prepare_dataframe_conditions
match the new signature and expected parameters.Verification successful
The function usage looks correct, but ensure the parameters are defined and passed correctly.
The function
prepare_dataframe_conditions
is invoked insrc/pipeline/src/manager/table.rs
with the parametersschema
,name
, andversion
. Verify these parameters are correctly defined and used.
Verified: All invocations of
prepare_dataframe_conditions
are correct and consistent with the new signature.The parameters
schema
,name
, andversion
are defined and used appropriately insrc/pipeline/src/manager/table.rs
.
schema
: Verified definition and usage.name
: Verified definition and usage.version
: Verified definition and usage.Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Verify all function calls to `prepare_dataframe_conditions` match the new signature. # Test: Search for the function usage. Expect: Only occurrences of the new signature. rg --type rust -A 5 $'prepare_dataframe_conditions'Length of output: 1920
Script:
#!/bin/bash # Description: Verify definitions and usages of `schema`, `name`, and `version` in `table.rs`. # Search for the definitions and usages of `schema`. rg --type rust -A 5 $'schema' src/pipeline/src/manager/table.rs # Search for the definitions and usages of `name`. rg --type rust -A 5 $'name' src/pipeline/src/manager/table.rs # Search for the definitions and usages of `version`. rg --type rust -A 5 $'version' src/pipeline/src/manager/table.rsLength of output: 18412
src/pipeline/src/manager/error.rs (2)
84-89
: Addition of theDataFrame
error variant looks good.The new
DataFrame
error variant is correctly defined with appropriate fields.
136-136
: Update inErrorExt
implementation looks good.The new
DataFrame
error variant is correctly handled in theErrorExt
implementation.src/servers/src/http/event.rs (1)
252-264
: LGTM! But verify the content type handling logic.The content type handling logic for
text/plain
andtext/plain; charset=utf-8
is correct and efficient.However, ensure that all content type handling scenarios are covered and tested.
Verification successful
LGTM! But verify the content type handling logic.
The content type handling logic for
text/plain
andtext/plain; charset=utf-8
is correct and efficient. Additionally, the current implementation covers all necessary content types as per the codebase.Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Verify all content type handling scenarios are covered and tested. # Test: Search for the content type handling logic. Expect: Only occurrences of the new content type handling logic. rg --type rust -A 5 $'ContentType::text'Length of output: 495
Script:
#!/bin/bash # Description: Verify that all content type handling scenarios are covered and tested. # Test: Search for all occurrences of ContentType handling logic in the codebase. rg --type rust -A 5 $'ContentType::'Length of output: 783
src/pipeline/tests/pipeline.rs (2)
17-18
: Addition of theTimestampNanosecondValue
variant looks good.The new
TimestampNanosecondValue
variant is correctly defined and integrated into theValueData
enum.
460-510
: Update intest_simple_data
function looks good.The function correctly tests the new
TimestampNanosecondValue
variant.src/pipeline/src/manager/table.rs (4)
29-29
: LGTM! Import added for dataframe handling.The
use query::dataframe::DataFrame
import is necessary for the refactored dataframe handling.
36-52
: LGTM! Import added for refactored function.The
prepare_dataframe_conditions
function is imported from theutil
module, indicating the usage of a refactored function for preparing dataframe conditions.
339-350
: LGTM! Refactored dataframe preparation.The dataframe preparation logic has been refactored to use the
query_engine
to read the table and apply a filter usingprepare_dataframe_conditions
. This improves clarity and modularity.
408-431
: LGTM! Refactored dataframe preparation.The dataframe preparation logic has been refactored to use the
query_engine
to read the table and apply a filter usingprepare_dataframe_conditions
. This improves clarity and modularity.tests-integration/tests/http.rs (1)
1133-1231
: LGTM! Comprehensive test for plain text ingestion.The new public async function
test_plain_text_ingestion
is well-structured and comprehensive, covering the entire workflow of creating a pipeline, ingesting data, and verifying the results.
text/plain
format of log inputtext/plain
format for log ingestion
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Files selected for processing (1)
- src/servers/src/http/event.rs (2 hunks)
Files skipped from review as they are similar to previous changes (1)
- src/servers/src/http/event.rs
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #4300 +/- ##
==========================================
- Coverage 85.18% 84.92% -0.26%
==========================================
Files 1060 1061 +1
Lines 189069 189497 +428
==========================================
- Hits 161057 160929 -128
- Misses 28012 28568 +556 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Files selected for processing (1)
- src/servers/src/http/event.rs (4 hunks)
Files skipped from review as they are similar to previous changes (1)
- src/servers/src/http/event.rs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
Outside diff range, codebase verification and nitpick comments (2)
src/pipeline/src/etl/processor.rs (1)
96-108
: Unit tests missing forValue::String
handling inexec_array
functionThe search did not find any direct unit tests for the
exec_array
function or theProcessor
trait handlingValue::String
. It is essential to add specific unit tests to cover this new branch.
- Add unit tests for the
exec_array
function to ensureValue::String
handling is correctly implemented.- Verify the correctness of this implementation with the new unit tests.
Analysis chain
LGTM! Verify the correctness with unit tests.
The changes to handle
Value::String
in theexec_array
function are correct. Ensure that unit tests cover this new branch.Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Verify the correctness of the `Value::String` handling in the `exec_array` function. # Test: Search for unit tests for the exec_array function. Expect: Tests cover the new branch. rg --type rust -A 5 'fn exec_array' --glob '*_test.rs'Length of output: 56
Script:
#!/bin/bash # Description: Find all test files related to the Processor trait and check for tests handling Value::String in the exec_array function. # Find all test files related to the Processor trait. rg --type rust 'Processor' --glob '*_test.rs' # Check for tests handling Value::String in the exec_array function. rg --type rust 'Value::String' --glob '*_test.rs'Length of output: 228
src/servers/src/http/event.rs (1)
265-291
: No unit tests found forextract_pipeline_value_by_content_type
function.There are no unit tests covering the
extract_pipeline_value_by_content_type
function. It's important to add unit tests to ensure all branches of this function are properly tested.
- Add unit tests to cover JSON content type handling.
- Add unit tests to cover text content type handling.
- Add unit tests for unsupported content types.
Analysis chain
LGTM! Verify the correctness with unit tests.
The
extract_pipeline_value_by_content_type
function correctly handles JSON and text content types and returns errors for unsupported content types. Ensure unit tests cover all branches.Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Verify the correctness of the `extract_pipeline_value_by_content_type` function. # Test: Search for unit tests for the extract_pipeline_value_by_content_type function. Expect: Tests cover all branches. rg --type rust -A 5 'fn extract_pipeline_value_by_content_type' --glob '*_test.rs'Length of output: 84
Script:
#!/bin/bash # Description: Verify the presence of tests for the `extract_pipeline_value_by_content_type` function. # Test: Search for any tests related to the `extract_pipeline_value_by_content_type` function across all Rust files. rg --type rust 'extract_pipeline_value_by_content_type' -A 10Length of output: 1363
Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Files selected for processing (2)
- src/pipeline/src/etl/processor.rs (1 hunks)
- src/servers/src/http/event.rs (4 hunks)
Additional comments not posted (1)
src/servers/src/http/event.rs (1)
252-252
: LGTM! Ensure integration tests cover this new functionality.The changes to the
log_ingester
function to handletext/plain
content type are correct.Verification successful
Integration tests cover the new functionality for
text/plain
content type.The
test_plain_text_ingestion
function intests-integration/tests/http.rs
indirectly tests thelog_ingester
function fortext/plain
content type, confirming that the new functionality is covered by integration tests.Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Ensure integration tests cover the new functionality for text/plain content type. # Test: Search for integration tests for the log_ingester function. Expect: Tests cover the new functionality. rg --type rust -A 5 'fn log_ingester' --glob '*_test.rs'Length of output: 58
Script:
#!/bin/bash # Description: Ensure integration tests cover the new functionality for text/plain content type. # Test: Search for any references to log_ingester in all Rust test files. rg --type rust 'log_ingester' --glob '*_test.rs'Length of output: 48
Script:
#!/bin/bash # Description: Ensure integration tests cover the new functionality for text/plain content type. # Test: Search for any references to 'text/plain' content type in all Rust test files. rg --type rust 'text/plain' --glob '*_test.rs'Length of output: 46
Script:
#!/bin/bash # Description: Ensure integration tests cover the new functionality for text/plain content type. # Test: Search for any references to 'log_ingester' throughout the entire codebase. rg --type rust 'log_ingester' # Test: Search for any references to 'text/plain' throughout the entire codebase. rg --type rust 'text/plain'Length of output: 365
Script:
#!/bin/bash # Description: Inspect the tests-integration/tests/http.rs file for any tests that might indirectly cover the log_ingester functionality. # Print the contents of tests-integration/tests/http.rs cat tests-integration/tests/http.rsLength of output: 38662
I'm in the process of refactoring |
I agree. We can merge this first and discuss for next step. |
I hereby agree to the terms of the GreptimeDB CLA.
Refer to a related PR or issue link (optional)
What's changed and what's your intention?
This pr
text/plain
andtext/plain; charset=utf-8
on log ingestiondataframe
apinow you can use plain text logs to ingest log data, for example:
Checklist
Summary by CodeRabbit
New Features
DataFrame
, for more detailed error reporting related to data frames.TimestampNanosecondValue
support to handle more precise timestamp data in tests.Refactor
PipelineTable
module.Bug Fixes