-
Notifications
You must be signed in to change notification settings - Fork 3.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[fix](load)Support load json data with default value #20078
Conversation
clang-tidy review says "All clean, LGTM! 👍" |
run buildall |
clang-tidy review says "All clean, LGTM! 👍" |
run buildall |
clang-tidy review says "All clean, LGTM! 👍" |
run buildall |
run p0 |
run P0 |
clang-tidy review says "All clean, LGTM! 👍" |
run buildall |
if (!(*valid)) { | ||
return Status::OK(); | ||
auto* column_ptr = block.get_by_position(i).column->assume_mutable().get(); | ||
auto field = value->find_field_unordered(slot_desc->col_name()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is much slower than iterate through for (auto field : *value)
, this loop will utilize simd instruction
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, I will fix it
} else { | ||
const std::string& v_str = col_value->second; | ||
nullable_column->get_null_map_data().push_back(0); | ||
assert_cast<ColumnString*>(column_ptr)->insert_data(v_str.c_str(), v_str.size()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what if default value is CURRENT_TIMESTAMP
will v_str
be ok?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should add test cases
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for your reminding, I had tested for default value is CURRENT_TIMESTAMP , I will add some test case.
} | ||
} | ||
} | ||
if (!has_valid_value) { | ||
for (int i = 0; i < block.columns(); ++i) { | ||
auto column = block.get_by_position(i).column->assume_mutable(); | ||
column->pop_back(1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add comment to explain why we need to pop_back here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If has no valid value in json line and we fill with default value, it's not reasonable,so we should remove this line in block. I will add some comment there
@@ -1370,53 +1374,40 @@ Status NewJsonReader::_simdjson_handle_nested_complex_json( | |||
return Status::OK(); | |||
} | |||
|
|||
size_t NewJsonReader::_column_index(const StringRef& name, size_t key_index) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this function accelerate parsing speed, do not delete it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this function accelerate parsing speed, do not delete it
I have added it back
@@ -1730,4 +1726,52 @@ Status NewJsonReader::_simdjson_write_columns_by_jsonpath( | |||
return Status::OK(); | |||
} | |||
|
|||
Status NewJsonReader::_get_column_default_value( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should _fill_missing_column
also when fill missing slot
// fill missing slot
int ctx_idx = 0;
int nullcount = 0;
for (auto slot_desc : slot_descs) {
if (!slot_desc->is_materialized()) {
continue;
}
int dest_index = ctx_idx++;
auto* column_ptr = block.get_by_position(dest_index).column->assume_mutable().get();
if (column_ptr->size() < cur_row_count + 1) {
// ..._fill_missing_column.. here
++nullcount;
}
DCHECK(column_ptr->size() == cur_row_count + 1);
}
// There is at least one valid value here
DCHECK(nullcount < block.columns());
*valid = true;
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I have filled all slot before, do I need to fill missing slot again?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so this logic is useless now? or maybe we could delete it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so this logic is useless now? or maybe we could delete it?
Yea, it's useless logically speaking
clang-tidy review says "All clean, LGTM! 👍" |
run buildall |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
PR approved by anyone and no changes requested. |
maybe rebase could help to pass p0 regression-test |
…the file cache is enabled. #20560 Use consistent hash to collect BE only when the file cache is enabled. And move the consistent BE assign code to FederationBackendPolicy. Fix explain split number and file size incorrect bug.
In some case of agg function, maybe running as streaming agg firstly, this will call the add function when serialize, so need implement add function also.
Co-authored-by: 王翔宇 <wangxiangyu@360shuke.com>
change check tablet path interval's default value to -1
Fix some bugs of orc lazy materialization(#18615) - Fix issue causing column size to continuously increase after `execute_conjuncts()` by calling `Block::erase_useless_column()`. - Fix partition issues of orc lazy materialization. - Fix lazy materialization will not be used when the predicate column is inconsistent with the orc file.
…ot correct when creating `MergeRangeFileReader` in orc reader. (#20393) Fix the inner reader of MergeRangeFileReader is not correct when creating MergeRangeFileReader in orc reader.
…ult value (#20478) image update FE config max_running_txn_num_per_db default value: old value : 100 new value : 1000
…perations in Legacy (#20354) The precision handling for division with DECIMALV3 is as follows (excluding cases where division increases precision): (p1, s1) / (p2, s2) ----> (p1 + s2, s1) However, due to precision loss in division, it is considered to increase the precision of the left operand: (p1, s1) / (p2, s2) =====> (p1 + s2, s1 + s2) / (p2, s2) ----> (p1 + s2, s1) However, the legacy optimizer repeats the analyze and substitute steps for an expression, which can result in the accumulation of precision: (p1, s1) / (p2, s2) =====> (p1 + s2, s1 + s2) / (p2, s2) =====> (p1 + s2 + s2, s1 + s2 + s2) / (p2, s2) To address this, the previous approach was to forcibly convert the left operand of DECIMALV3 calculations. This results in rewriting the expression as: (p1, s1) / (p2, s2) =====> cast((p1, s1) as (p1 + s2, s1 + s2)) / (p2, s2) Then, during the substitution step, a check is performed. If it is a cast expression, the expression modified by the cast is extracted: cast((p1, s1) as (p1 + s2, s1 + s2)) =====> (p1, s1) protected Expr substituteImpl(ExprSubstitutionMap smap, ExprSubstitutionMap disjunctsMap, Analyzer analyzer) { if (isImplicitCast()) { return getChild(0).substituteImpl(smap, disjunctsMap, analyzer); } This way, there won't be repeated analysis, preventing the continuous increase in precision. However, if the left expression is a constant (literal), theoretically, the precision would continue to increase. Unfortunately, the code that was removed in this PR (#19926) obscured this issue. for (Expr child : children) { if (child instanceof DecimalLiteral && child.getType().isDecimalV3()) { ((DecimalLiteral)child).tryToReduceType(); } } An attempt will be made to reduce the precision of literals in the expressions. However, this code snippet can cause such a bug. mysql [test]>select cast(1 as DECIMALV3(16, 2)) / cast(3 as DECIMALV3(16, 2)); +-----------------------------------------------------------+ | CAST(1 AS DECIMALV3(16, 2)) / CAST(3 AS DECIMALV3(16, 2)) | +-----------------------------------------------------------+ | 0.00 | +-----------------------------------------------------------+ 1.00 / 3.00, due to reduced precision, becomes 1 / 3. <--Describe your changes.-->
…ble (#20391) in join node, if it's broadcast_join and shared hash table, some counter/timer about build hash table is useless, so we could add those counter/timer in faker profile, and those will not display in web profile.
…20342) 1. fix hive catalog docs 2. fix dlf properties 3. fix obs impl
…20423) fix config::txn_commit_rpc_timeout_ms is not available for commit phase in 2pc.
truncate table with partition name need case insensitive
Co-authored-by: hugoluo <hugoluo@tencent.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
clang-tidy made some suggestions
// specific language governing permissions and limitations | ||
// under the License. | ||
|
||
#include "olap/delete_bitmap_calculator.h" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
warning: 'olap/delete_bitmap_calculator.h' file not found [clang-diagnostic-error]
#include "olap/delete_bitmap_calculator.h"
^
|
||
#include <gtest/gtest.h> | ||
|
||
#include "olap/memtable.h" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
warning: 'olap/memtable.h' file not found [clang-diagnostic-error]
#include "olap/memtable.h"
^
There is something wrong when I rebase my branch, I have created a new PR #20624, can you review it please? Thank you. |
请问你已经解决了这个问题吗?我也遇到了 字段为null时默认值不生效的问题 |
#20624 这个 PR 已经解决了这个问题 |
Proposed changes
Issue Number: close #19741
Problem summary
If a column has been defined as :
key INT NULL DEFALT '100'
, when loading json data which lacks columnkey
, the default value ofkey
is not100
butNULL
.Checklist(Required)
Further comments
If this is a relatively large or complex change, kick off the discussion at dev@doris.apache.org by explaining why you chose the solution you did and what alternatives you considered, etc...