Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: feature signature codegen #3877

Merged
merged 21 commits into from Apr 19, 2024

Conversation

wyl4pd
Copy link
Collaborator

@wyl4pd wyl4pd commented Apr 15, 2024

Implement signature features.
Add GCFormat, CSV, LIBSVM functions.

@github-actions github-actions bot added execute-engine hybridse sql engine storage-engine openmldb storage engine. nameserver & tablet labels Apr 15, 2024
Copy link
Contributor

github-actions bot commented Apr 15, 2024

SDK Test Report

102 files  ±0  102 suites  ±0   2m 12s ⏱️ -5s
357 tests ±0  343 ✅ ±0  14 💤 ±0  0 ❌ ±0 
483 runs  ±0  469 ✅ ±0  14 💤 ±0  0 ❌ ±0 

Results for commit a0e5c37. ± Comparison against base commit 6b52ee5.

This pull request removes 42 and adds 21 tests. Note that renamed tests count towards both.
  PARTITION BY db1.t1.col2 ORDER BY db1.t1.col1
  PARTITION BY t1.col2 ORDER BY t1.col1
  ROWS_RANGE BETWEEN 3 PRECEDING AND CURRENT ROW
 ) limit 10;](1)
 ) limit 10;](2)
 ) limit 10;](3)
 FROM db1.t1
 FROM t1
 WINDOW w1 AS (
 last join db2.t2 order by db2.t2.col1
…
com._4paradigm.hybridse.sdk.SqlEngineTest ‑ sqlLastJoinWithMultipleDB[,  SELECT sum(db1.t1.col1) over w1 as sum_t1_col1, db2.t2.str1 as t2_str1
 FROM db1.t1
 last join db2.t2 order by db2.t2.col1
 on db1.t1.col1 = db2.t2.col1 and db1.t1.col2 = db2.t2.col0
 WINDOW w1 AS (
  PARTITION BY db1.t1.col2 ORDER BY db1.t1.col1
  ROWS_RANGE BETWEEN 3 PRECEDING AND CURRENT ROW
 ) limit 10;](2)
com._4paradigm.hybridse.sdk.SqlEngineTest ‑ sqlLastJoinWithMultipleDB[db1,  SELECT sum(t1.col1) over w1 as sum_t1_col1, db2.t2.str1 as t2_str1
 FROM t1
 last join db2.t2 order by db2.t2.col1
 on t1.col1 = db2.t2.col1 and t1.col2 = db2.t2.col0
 WINDOW w1 AS (
  PARTITION BY t1.col2 ORDER BY t1.col1
  ROWS_RANGE BETWEEN 3 PRECEDING AND CURRENT ROW
 ) limit 10;](1)
com._4paradigm.hybridse.sdk.SqlEngineTest ‑ sqlLastJoinWithMultipleDB[null,  SELECT sum(db1.t1.col1) over w1 as sum_t1_col1, db2.t2.str1 as t2_str1
 FROM db1.t1
 last join db2.t2 order by db2.t2.col1
 on db1.t1.col1 = db2.t2.col1 and db1.t1.col2 = db2.t2.col0
 WINDOW w1 AS (
  PARTITION BY db1.t1.col2 ORDER BY db1.t1.col1
  ROWS_RANGE BETWEEN 3 PRECEDING AND CURRENT ROW
 ) limit 10;](3)
com._4paradigm.hybridse.sdk.SqlEngineTest ‑ sqlMultipleDBErrorTest[, SELECT db2.t2.str1 as t2_str1
 FROM t1
 last join db2.t2 order by db2.t2.col1
 on t1.col1 = db2.t2.col1 and t1.col2 = db2.t2.col0;
, SQL parse error: Fail to transform data provider op: table t1 not exists in database []](4)
com._4paradigm.hybridse.sdk.SqlEngineTest ‑ sqlMultipleDBErrorTest[db1, SELECT db1.t2.str1 as t2_str1
 FROM t1
 last join db2.t2 order by db2.t2.col1
 on t1.col1 = db2.t2.col1 and t1.col2 = db2.t2.col0;
, SQL parse error: Column Not found: db1.t2.str1](2)
com._4paradigm.hybridse.sdk.SqlEngineTest ‑ sqlMultipleDBErrorTest[db1, SELECT db2.t2.str1 as t2_str1
 FROM t1
 last join db2.t2 order by db2.t2.col1
 on t1.col1 = t2.col1 and t1.col2 = db2.t2.col0;
, SQL parse error: Column Not found: .t2.col1](3)
com._4paradigm.hybridse.sdk.SqlEngineTest ‑ sqlMultipleDBErrorTest[db1, SELECT t2.str1 as t2_str1
 FROM t1
 last join db2.t2 order by db2.t2.col1
 on t1.col1 = db2.t2.col1 and t1.col2 = db2.t2.col0;
, SQL parse error: Column Not found: .t2.str1](1)
com._4paradigm.hybridse.sdk.SqlEngineTest ‑ sqlMultipleDBErrorTest[null, SELECT db2.t2.str1 as t2_str1
 FROM t1
 last join db2.t2 order by db2.t2.col1
 on t1.col1 = db2.t2.col1 and t1.col2 = db2.t2.col0;
, SQL parse error: Fail to transform data provider op: table t1 not exists in database []](5)
com._4paradigm.hybridse.sdk.SqlEngineTest ‑ sqlWindowLastJoin[ SELECT sum(t1.col1) over w1 as sum_t1_col1, t2.str1 as t2_str1
 FROM t1
 last join t2 order by t2.col1
 on t1.col1 = t2.col1 and t1.col2 = t2.col0
 WINDOW w1 AS (
  PARTITION BY t1.col2 ORDER BY t1.col1
  ROWS_RANGE BETWEEN 3 PRECEDING AND CURRENT ROW
 ) limit 10;](1)
com._4paradigm.openmldb.jdbc.SQLRouterSmokeTest ‑ testInsertMeta[com._4paradigm.openmldb.sdk.impl.SqlClusterExecutor@103e9972](2)
…

♻️ This comment has been updated with latest results.

Copy link
Contributor

github-actions bot commented Apr 15, 2024

HybridSE Linux Test Report

20 362 tests   20 360 ✅  6m 46s ⏱️
   261 suites       2 💤
    68 files         0 ❌

Results for commit a0e5c37.

♻️ This comment has been updated with latest results.

Copy link
Contributor

github-actions bot commented Apr 15, 2024

HybridSE Mac Test Report

20 362 tests   20 360 ✅  12m 48s ⏱️
   261 suites       2 💤
    68 files         0 ❌

Results for commit a0e5c37.

♻️ This comment has been updated with latest results.

Copy link
Contributor

github-actions bot commented Apr 15, 2024

Linux Test Report

    57 files  +    57     250 suites  +250   1h 42m 7s ⏱️ + 1h 42m 7s
13 505 tests +13 505  13 498 ✅ +13 498  7 💤 +7  0 ❌ ±0 
19 177 runs  +19 177  19 170 ✅ +19 170  7 💤 +7  0 ❌ ±0 

Results for commit a0e5c37. ± Comparison against base commit 6b52ee5.

♻️ This comment has been updated with latest results.

Copy link

codecov bot commented Apr 17, 2024

Codecov Report

Attention: Patch coverage is 84.73896% with 76 lines in your changes are missing coverage. Please review.

Project coverage is 74.89%. Comparing base (6b52ee5) to head (a0e5c37).
Report is 1 commits behind head on main.

Files Patch % Lines
hybridse/src/udf/udf_registry.h 79.67% 25 Missing ⚠️
hybridse/src/node/sql_node.cc 0.00% 14 Missing ⚠️
...idse/src/udf/default_defs/feature_signature_def.cc 95.17% 11 Missing ⚠️
hybridse/include/node/sql_node.h 52.38% 10 Missing ⚠️
hybridse/src/node/expr_node.cc 0.00% 9 Missing ⚠️
hybridse/src/codegen/udf_ir_builder.cc 90.32% 3 Missing ⚠️
hybridse/src/udf/udf_registry.cc 95.08% 3 Missing ⚠️
hybridse/src/udf/udf_library.cc 50.00% 1 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##               main    #3877      +/-   ##
============================================
+ Coverage     74.85%   74.89%   +0.03%     
  Complexity      658      658              
============================================
  Files           745      746       +1     
  Lines        134302   134794     +492     
  Branches       1500     1440      -60     
============================================
+ Hits         100534   100951     +417     
- Misses        33464    33539      +75     
  Partials        304      304              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@wyl4pd wyl4pd changed the title Feat/feature signature codegen feat: feature signature codegen Apr 17, 2024
@@ -76,6 +76,7 @@ enum SqlNodeType {
kUdfByCodeGenDef,
kUdafDef,
kLambdaDef,
kVariadicUdfDef,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

append new enum to end

@@ -385,6 +385,10 @@ class NodeManager {
FnDefNode *merge_func, FnDefNode *output_func);
LambdaNode *MakeLambdaNode(const std::vector<ExprIdNode *> &args,
ExprNode *body);
VariadicUdfDefNode *MakeVariadicUdfDefNode(const std::string &name,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use the template function MakeNode instead

using GetTypeF =
typename std::function<void(node::NodeManager*, node::TypeNode**)>;

// TypeAnnotatedFuncPtr can only be bulit from non-void return type
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

return type info will be inferred by get_return_type_func for void functions, maybe that's because DataTypeTrait do not support tuple ?

@@ -1225,6 +1265,20 @@ class ExternalFuncRegistryHelper : public UdfRegistryHelper {
return *this;
}

template <typename Ret, typename... Args>
ExternalFuncRegistryHelper& with_return_args(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this likely break existing auto type infer facility . Consider support tuple type directly, or add clear note

.args_in<bool, int16_t, int32_t, int64_t, float, double>();

v1::InstanceFormatHelper<v1::GCFormat>::Register(this, "gcformat", R"(
@brief Return instance in GCFormat format.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

more info, GCFormat link ?

NativeValue* output) {
CHECK_TRUE(arg_types.size() == args.size(), kCodegenError);
CHECK_TRUE(fn->GetArgSize() == args.size(), kCodegenError);
std::vector<NativeValue> cur_state_values(1 + fn->update_func().size());
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's a reduce algorithm ? So ideally can replace list of NativeValue into single accumulate NativeValue ?

)")
.args_in<bool, int16_t, int32_t, int64_t, float, double>();

RegisterExternalTemplate<v1::Discrete>("discrete")
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

null, slot_number

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bucket_size <= 0 return null

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

libsvm slot_number + bucket_size


std::shared_ptr<UdfRegistry> init_gen;
ArgSignatureTable update_gen;
std::unordered_map<std::string, ArgSignatureTable> output_gen;
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

todo

Copy link
Collaborator

@tobegit3hub tobegit3hub left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@wyl4pd wyl4pd merged commit 99c179e into 4paradigm:main Apr 19, 2024
31 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
execute-engine hybridse sql engine storage-engine openmldb storage engine. nameserver & tablet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants