-
Notifications
You must be signed in to change notification settings - Fork 3.7k
enhancement: support csv base64 url encoding #57633
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
|
Thank you for your contribution to Apache Doris. Please clearly describe your PR:
|
|
@BiteTheDDDDt Hi, do you have some free time to look at this PR lately? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR adds base64 encoding support for AggState data type serialization in CSV exports. The implementation ensures that binary aggregate state data can be safely transmitted in CSV format by encoding it as base64 strings.
- Introduces
DataTypeAggStateSerdeclass to handle AggState serialization with base64 encoding - Adds comprehensive test suites for base64 encoding/decoding and CSV serialization
- Updates
DataTypeAggStateto use the new serialization wrapper
Reviewed Changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
| be/src/vec/data_types/serde/data_type_agg_state_serde.h | Defines the DataTypeAggStateSerde class for base64 serialization of AggState data |
| be/src/vec/data_types/serde/data_type_agg_state_serde.cpp | Implements base64 encoding logic for CSV and Hive text formats |
| be/src/vec/data_types/data_type_agg_state.h | Updates get_serde() to return the new AggState serde wrapper |
| be/test/vec/runtime/agg_state_csv_test.cpp | Adds unit tests for base64 encoding/decoding functionality |
| be/test/vec/data_types/serde/data_type_agg_state_serde_test.cpp | Adds comprehensive tests for AggState serialization with various data types and edge cases |
| be/test/CMakeLists.txt | Updates build configuration to include new test files |
Comments suppressed due to low confidence (1)
be/test/vec/data_types/serde/data_type_agg_state_serde_test.cpp:1
- The loop condition
i >= len - 2combined with unsigned decrement can cause undefined behavior. Whenlenis less than 2,len - 2underflows to a very large value (due to unsigned arithmetic), causing incorrect behavior. Consider using signed arithmetic or restructuring the loop to avoid this issue.
// Licensed to the Apache Software Foundation (ASF) under one
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
run buildall |
| Status DataTypeAggStateSerde::serialize_one_cell_to_json(const IColumn& column, int64_t row_num, | ||
| BufferWritable& bw, | ||
| FormatOptions& options) const { | ||
| // Check column type: AggState may be stored as ColumnString or ColumnFixedLengthObject |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里可能有问题,agg state存储的column是根据内部agg function的get_serialized_type方法来的,不只有ColumnString 和 ColumnFixedLengthObject,比如bitmap的一些相关函数会用ColumnBitmap,一些复杂类型相关的函数会用ColumnArray,ColumnMap,ColumnStruct
|
|
||
| // Get serialized string data | ||
| const auto& serialized_data = tmp_col->get_data_at(0); | ||
| _encode_to_base64(serialized_data.data, serialized_data.size, bw); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
如果这里直接改了编码解码方式,那旧版本导出的数据怎么在新版本导入呢?是否应该考虑做一个开关来控制这里的行为
| } | ||
|
|
||
| private: | ||
| // Internal method: encode binary data to base64 and write to buffer |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
为什么只有编码做了base64,解码不用做base64吗?这样doris自己导出的数据自己还能导入吗,能否在这方面写一些regression test,类似regression-test/suites/nereids_p0/outfile/agg_state/ 目录下的
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
您好,我刚看到。上周末出差了, 这周花点时间处理
What problem does this PR solve?
Issue Number: Related to issue-50718
Related PR: #xxx
Problem Summary:
Release note
None
Check List (For Author)
Test
Behavior changed:
Does this need documentation?
Check List (For Reviewer who merge this PR)