New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support Kusto Query Language dialect - phase 2 #42510
Support Kusto Query Language dialect - phase 2 #42510
Conversation
fc771d9
to
0ff578d
Compare
c5c9c3d
to
2ea4e7c
Compare
@yakov-olkhovskiy This PR is clean now. Please review it again. Thanks. |
programs/server/users.xml
Outdated
<!-- How to choose between replicas during distributed query processing. | ||
random - choose random replica from set of replicas with minimum number of errors | ||
nearest_hostname - from set of replicas with minimum number of errors, choose replica | ||
with minimum number of different symbols between replica's hostname and local hostname | ||
(Hamming distance). | ||
in_order - first live replica is chosen in specified order. | ||
first_or_random - if first replica one has higher number of errors, pick a random one from replicas with minimum number of errors. | ||
--> | ||
<load_balancing>random</load_balancing> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do we need this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will remove it. It's probably added by accident when rebase
src/Client/ClientBase.cpp
Outdated
else if (dialect == Dialect::kusto_auto) | ||
{ | ||
res = tryParseQuery(parser, pos, end, message, true, "", allow_multi_statements, max_length, settings.max_parser_depth); | ||
if (!res) | ||
{ | ||
pos = begin; | ||
res = tryParseQuery(kql_parser, pos, end, message, true, "", allow_multi_statements, max_length, settings.max_parser_depth); | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure we want such complications, and after all it's possible we will have other dialects in the future...
@alexey-milovidov do you think we want to have autodetection of dialect? and if we want it I think it's better to generalize it to just auto
to incorporate future dialects
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, currently we need such an ability to run kql
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The auto
option shouldn't be added in this PR, as it can be implemented (or not implemented at all) separately, later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kashwy could you please remove this functionality for now - we will return to it some later
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@yakov-olkhovskiy , sure, will remove
ParserKeyword s_kql("KQL"); | ||
|
||
if (ASTPtr select_node; select.parse(pos, select_node, expected)) | ||
if (s_kql.ignore(pos, expected)) | ||
{ | ||
result_node = std::move(select_node); | ||
if (!ParserKQLTaleFunction().parse(pos, result_node, expected)) | ||
return false; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we don't want to change clickhouse syntax - I think it's possible to implement this as a table function
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this one of our use cases, we need a syntax to embed kql statment inside a SQL query, like:
select * from kql(table|column)
will check to use table functions
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe it's possible to implement this as a function, maybe not that simple though - look as a primary example view
function (https://github.com/ClickHouse/ClickHouse/blob/master/src/TableFunctions/TableFunctionView.h). Most likely you will need to extend ExpressionListParsers (https://github.com/ClickHouse/ClickHouse/blob/master/src/Parsers/ExpressionListParsers.cpp) as you need to parse arguments differently.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure, will try to use table function
{ | ||
result_node = std::move(select_node); | ||
if (!ParserKQLTaleFunction().parse(pos, result_node, expected)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe it's a typo - should be ParserKQLTableFunction
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will fix
967ad5b
to
60f4f5c
Compare
The PR is updated , could you please review again? Thanks |
6c7aa15
to
5ffe822
Compare
All issued have been addressed, removed the auto dialect, changed kql() function to table function. can you take some time to review again ? thanks |
This is an automated comment for commit 16f992a with description of existing statuses. It's updated for the latest CI running ❌ Click here to open a full report in a separate page Successful checks
|
3ef384a
to
fa9f799
Compare
93e0efb
to
a5742e4
Compare
cfa7346
to
5f77c20
Compare
5f77c20
to
ce6d042
Compare
Hi @yakov-olkhovskiy , it appears that there are no test failures related to KQL now , could you please review it again? Thanks. |
@yakov-olkhovskiy , I have merged master in and resolved conflict. Thanks |
Did you get chance to review again? Thanks |
programs/client/Client.cpp
Outdated
const auto & settings = global_context->getSettingsRef(); | ||
const Dialect & dialect = settings.dialect; | ||
String old_dialect; | ||
switch (dialect) | ||
{ | ||
case DB::Dialect::kusto: | ||
old_dialect = "kusto"; | ||
break; | ||
case DB::Dialect::clickhouse: | ||
old_dialect = "clickhouse"; | ||
break; | ||
case DB::Dialect::prql: | ||
old_dialect = "prql"; | ||
break; | ||
} | ||
|
||
if (auto *q = orig_ast->as<ASTSetQuery>()) | ||
{ | ||
auto *setDialect = q->changes.tryGet("dialect"); | ||
if (setDialect) | ||
{ | ||
old_dialect = setDialect->get<String>(); | ||
} | ||
} | ||
|
||
//setting dialect to clickhouse during query fuzzing, restore dialect to original value after fuzzing | ||
|
||
SCOPE_EXIT_SAFE({ | ||
global_context->setSetting("dialect", old_dialect); | ||
}); | ||
|
||
if (dialect != DB::Dialect::clickhouse) | ||
{ | ||
SettingChange new_setting("dialect", "clickhouse"); | ||
global_context->applySettingChange(new_setting); | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kashwy I wonder why do we need it at all - this fuzzing functionality is introduced for testing - do you have problems without this addition? it seems somewhat off and I would prefer to remove it if it's not absolutely necessary
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we got fuzzing test failure without this. the reason behind is that, fuzzing tests are generated from AST which are SQLs. while during KQL test , the dialect has been set to 'kusto' , so the fuzzing SQL wont work
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess we can avoid this issue just returning true if it's a kusto dialect - the same as above at line 706
@kashew please look that piece I pointed out above |
I have addressed the piece you pointed thanks |
programs/client/Client.cpp
Outdated
const auto & settings = global_context->getSettingsRef(); | ||
const Dialect & dialect = settings.dialect; | ||
String old_dialect; | ||
switch (dialect) | ||
{ | ||
case DB::Dialect::kusto: | ||
old_dialect = "kusto"; | ||
break; | ||
case DB::Dialect::clickhouse: | ||
old_dialect = "clickhouse"; | ||
break; | ||
case DB::Dialect::prql: | ||
old_dialect = "prql"; | ||
break; | ||
} | ||
|
||
if (auto *q = orig_ast->as<ASTSetQuery>()) | ||
{ | ||
auto *setDialect = q->changes.tryGet("dialect"); | ||
if (setDialect) | ||
{ | ||
old_dialect = setDialect->get<String>(); | ||
} | ||
} | ||
|
||
//setting dialect to clickhouse during query fuzzing, restore dialect to original value after fuzzing | ||
|
||
SCOPE_EXIT_SAFE({ | ||
global_context->setSetting("dialect", old_dialect); | ||
}); | ||
|
||
if (dialect != DB::Dialect::clickhouse) | ||
{ | ||
SettingChange new_setting("dialect", "clickhouse"); | ||
global_context->applySettingChange(new_setting); | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
const auto & settings = global_context->getSettingsRef(); | |
const Dialect & dialect = settings.dialect; | |
String old_dialect; | |
switch (dialect) | |
{ | |
case DB::Dialect::kusto: | |
old_dialect = "kusto"; | |
break; | |
case DB::Dialect::clickhouse: | |
old_dialect = "clickhouse"; | |
break; | |
case DB::Dialect::prql: | |
old_dialect = "prql"; | |
break; | |
} | |
if (auto *q = orig_ast->as<ASTSetQuery>()) | |
{ | |
auto *setDialect = q->changes.tryGet("dialect"); | |
if (setDialect) | |
{ | |
old_dialect = setDialect->get<String>(); | |
} | |
} | |
//setting dialect to clickhouse during query fuzzing, restore dialect to original value after fuzzing | |
SCOPE_EXIT_SAFE({ | |
global_context->setSetting("dialect", old_dialect); | |
}); | |
if (dialect != DB::Dialect::clickhouse) | |
{ | |
SettingChange new_setting("dialect", "clickhouse"); | |
global_context->applySettingChange(new_setting); | |
} | |
// Kusto is not a subject for fuzzing (yet) | |
if (global_context->getSettingsRef().dialect == DB::Dialect::kusto) | |
{ | |
return true; | |
} | |
if (auto *q = orig_ast->as<ASTSetQuery>()) | |
{ | |
if (auto *setDialect = q->changes.tryGet("dialect"); setDialect && setDialect->safeGet<String>() == "kusto") | |
return true; | |
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
changed
@kashwy apparently we have some issues running tests with thread sanitizer: |
@kashwy seems like this: |
Hi @yakov-olkhovskiy , I will check it |
@kashwy I already opened a PR (see reference above) - please check if it's a correct fix |
sure |
it's used to generate unique alias for same functions used in one statement. so thread_local is good enough, |
the question is how unique it is - I'm afraid if it's seeded by the same value then two generated values in two different threads will be the same |
no problem if same value in different thread, because this uniqueness is to prevent same function alias in one statement if a function has been used more than once, which cause parsing error. so I think it's good as long as its unique in same thread. |
// This particular random generator hits each number exactly once before looping over. | ||
// Because of this, it's sufficient for queries consisting of up to 2^16 (= 65536) distinct function calls. | ||
// Reference: https://www.pcg-random.org/using-pcg-cpp.html#insecure-generators | ||
static pcg32_once_insecure random_generator; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This generator is not thread safe - https://s3.amazonaws.com/clickhouse-test-reports/55418/769ed2e19d46fcb9cb6a678a0da6d6f2fc5d239e/stateless_tests__tsan__[5_5]/stderr.log
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, we know - already fixed
Is it possible to output the query result in JSON format with Kusto? |
it's part of dynamic type, which is not fully supported yet |
@ywangtht it is possible to incorporate kql expression into usual sql query with kql table function and use |
I know I can output it in JSON with: But how to do it though http interface with curl? I do not see an option to pass format JSON as SETTINGS. There is not option of specifying "format JSON" at end of kusto either. I understand this is an experimental feature, but still I am wondering how everybody else uses kusto to query ClickHouse. Another alternative I can think of is to add a clickhouse-client option to convert kql to ClickHouse SQL and let end user to append the format JSON at last. |
oh, with curl it's pretty simple: |
@yakov-olkhovskiy, Thanks, the HTTP header works! Another issue is seems like the dialect=kusto setting cannot be carried through distributed table via http. I have a distributed table which has two CH nodes, node1 and node2.
|
@ywangtht this is an interesting one... you can report it as an issue I think |
Changelog category (leave one):
Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):
This is the second part of Kusto Query Language dialect support.
Phase 1 implementation has been merged.
Implemented KQL Features
sql_dialect
clickhouse
set sql_dialect='clickhouse'
set sql_dialect='kusto'
(hide)
KQL operators:
Customers
Customers | project FirstName,LastName,Occupation
Customers | project FirstName,LastName,Occupation | take 1 | take 3
Customers | order by Age desc , FirstName asc
Customers | where Occupation == 'Skilled Manual'
Customers |summarize max(Age) by Occupation
Customers | distinct *
Customers | distinct Occupation
Customers | distinct Occupation, Education
Customers | where Age <30 | distinct Occupation, Education
Customers | where Age <30 | order by Age| distinct Occupation, Education
T | extend T | extend duration = endTime - startTime
T | project endTime, startTime | extend duration = endTime - startTime
T | make-series PriceAvg = avg(Price) default=0 on Purchase from datetime(2016-09-10) to datetime(2016-09-13) step 1d by Supplier, Fruit
T | mv-expand c
T | mv-expand c, d
T | mv-expand b | mv-expand c
T | mv-expand c to typeof(bool)
T | mv-expand with_itemindex=index b, c, d
T | mv-expand array_concat(c,d)
T | mv-expand x = c, y = d
T | mv-expand xy = array_concat(c, d)
T | mv-expand with_itemindex=index c,d to typeof(bool)
Aggregate Functions:
Customers | summarize t = make_list(FirstName) by FirstName
Customers | summarize t = make_list(FirstName, 10) by FirstName
Customers | summarize t = make_list_if(FirstName, Age > 10) by FirstName
Customers | summarize t = make_list_if(FirstName, Age > 10, 10) by FirstName
Customers | summarize t = make_list_with_nulls(Age) by FirstName
Customers | summarize t = make_set(FirstName) by FirstName
Customers | summarize t = make_set(FirstName, 10) by FirstName
Customers | summarize t = make_set_if(FirstName, Age > 10) by FirstName
Customers | summarize t = make_set_if(FirstName, Age > 10, 10) by FirstName
print res = bin_at(6.5, 2.5, 7)
print res = bin_at(1h, 1d, 12h)
print res = bin_at(datetime(2017-05-15 10:20:00.0), 1d, datetime(1970-01-01 12:00:00.0))
print res = bin_at(datetime(2017-05-17 10:20:00.0), 7d, datetime(2017-06-04 00:00:00.0))
Supports only basic lookup. Do not support start_index, length and occurrence
print output = array_index_of(dynamic(['John', 'Denver', 'Bob', 'Marley']), 'Marley')
print output = array_index_of(dynamic([1, 2, 3]), 2)
print output = array_sum(dynamic([2, 5, 3]))
print output = array_sum(dynamic([2.5, 5.5, 3]))
print output = array_length(dynamic(['John', 'Denver', 'Bob', 'Marley']))
print output = array_length(dynamic([1, 2, 3]))
print bin(4.5, 1)
print bin(time(16d), 7d)
print bin(datetime(1970-05-11 13:45:07), 1d)
Customers | summarize t = stdev(Age) by FirstName
Customers | summarize t = stdevif(Age, Age < 10) by FirstName
Customers | summarize t = binary_all_and(Age) by FirstName
Customers | summarize t = binary_all_or(Age) by FirstName
Customers | summarize t = binary_all_xor(Age) by FirstName
Customers | summarize percentiles(Age, 30, 40, 50, 60, 70) by FirstName
DataTable | summarize t = percentilesw(Bucket, Frequency, 50, 75, 99.9)
Customers | summarize t = percentile(Age, 50) by FirstName
DataTable | summarize t = percentilew(Bucket, Frequency, 50)
Array functions
Please note that only arrays of the same type are supported in our current implementation. The underlying reasons are explained under the section of the
dynamic
data type.array_reverse
print array_reverse(dynamic(["this", "is", "an", "example"])) == dynamic(["example","an","is","this"])
array_rotate_left
print array_rotate_left(dynamic([1,2,3,4,5]), 2) == dynamic([3,4,5,1,2])
print array_rotate_left(dynamic([1,2,3,4,5]), -2) == dynamic([4,5,1,2,3])
array_rotate_right
print array_rotate_right(dynamic([1,2,3,4,5]), -2) == dynamic([3,4,5,1,2])
print array_rotate_right(dynamic([1,2,3,4,5]), 2) == dynamic([4,5,1,2,3])
array_shift_left
print array_shift_left(dynamic([1,2,3,4,5]), 2) == dynamic([3,4,5,null,null])
print array_shift_left(dynamic([1,2,3,4,5]), -2) == dynamic([null,null,1,2,3])
print array_shift_left(dynamic([1,2,3,4,5]), 2, -1) == dynamic([3,4,5,-1,-1])
print array_shift_left(dynamic(['a', 'b', 'c']), 2) == dynamic(['c','',''])
array_shift_right
print array_shift_right(dynamic([1,2,3,4,5]), -2) == dynamic([3,4,5,null,null])
print array_shift_right(dynamic([1,2,3,4,5]), 2) == dynamic([null,null,1,2,3])
print array_shift_right(dynamic([1,2,3,4,5]), -2, -1) == dynamic([3,4,5,-1,-1])
print array_shift_right(dynamic(['a', 'b', 'c']), -2) == dynamic(['c','',''])
pack_array
print x = 1, y = x * 2, z = y * 2, pack_array(x,y,z)
Please note that only arrays of elements of the same type may be created at this time. The underlying reasons are explained under the release note section of the
dynamic
data type.repeat
print repeat(1, 0) == dynamic([])
print repeat(1, 3) == dynamic([1, 1, 1])
print repeat("asd", 3) == dynamic(['asd', 'asd', 'asd'])
print repeat(timespan(1d), 3) == dynamic([86400, 86400, 86400])
print repeat(true, 3) == dynamic([true, true, true])
zip
print zip(dynamic([1,3,5]), dynamic([2,4,6]))
array_sort_asc
Only support the constant dynamic array.
Returns an array. So, each element of the input has to be of same datatype.
print t = array_sort_asc(dynamic([null, 'd', 'a', 'c', 'c']))
print t = array_sort_asc(dynamic([4, 1, 3, 2]))
print t = array_sort_asc(dynamic(['b', 'a', 'c']), dynamic(['q', 'p', 'r']))
print t = array_sort_asc(dynamic(['q', 'p', 'r']), dynamic(['clickhouse','hello', 'world']))
print t = array_sort_asc( dynamic(['d', null, 'a', 'c', 'c']) , false)
print t = array_sort_asc( dynamic(['d', null, 'a', 'c', 'c']) , 1 > 2)
print t = array_sort_asc( dynamic([null, 'd', null, null, 'a', 'c', 'c', null, null, null]) , false)
print t = array_sort_asc( dynamic([null, null, null]) , false)
print t = array_sort_asc(dynamic([2, 1, null,3]), dynamic([20, 10, 40, 30]), 1 > 2)
print t = array_sort_asc(dynamic([2, 1, null,3]), dynamic([20, 10, 40, 30, 50, 3]), 1 > 2)
array_sort_desc (only support the constant dynamic array)
print t = array_sort_desc(dynamic([null, 'd', 'a', 'c', 'c']))
print t = array_sort_desc(dynamic([4, 1, 3, 2]))
print t = array_sort_desc(dynamic(['b', 'a', 'c']), dynamic(['q', 'p', 'r']))
print t = array_sort_desc(dynamic(['q', 'p', 'r']), dynamic(['clickhouse','hello', 'world']))
print t = array_sort_desc( dynamic(['d', null, 'a', 'c', 'c']) , false)
print t = array_sort_desc( dynamic(['d', null, 'a', 'c', 'c']) , 1 > 2)
print t = array_sort_desc( dynamic([null, 'd', null, null, 'a', 'c', 'c', null, null, null]) , false)
print t = array_sort_desc( dynamic([null, null, null]) , false)
print t = array_sort_desc(dynamic([2, 1, null, 3]), dynamic([20, 10, 40, 30]), 1 > 2)
print t = array_sort_desc(dynamic([2, 1, null,3, null]), dynamic([20, 10, 40, 30, 50, 3]), 1 > 2)
array_concat
print array_concat(dynamic([1, 2, 3]), dynamic([4, 5]), dynamic([6, 7, 8, 9])) == dynamic([1, 2, 3, 4, 5, 6, 7, 8, 9])
array_iff / array_iif
print array_iif(dynamic([true, false, true]), dynamic([1, 2, 3]), dynamic([4, 5, 6])) == dynamic([1, 5, 3])
print array_iif(dynamic([true, false, true]), dynamic([1, 2, 3, 4]), dynamic([4, 5, 6])) == dynamic([1, 5, 3])
print array_iif(dynamic([true, false, true, false]), dynamic([1, 2, 3, 4]), dynamic([4, 5, 6])) == dynamic([1, 5, 3, null])
print array_iif(dynamic([1, 0, -1, 44, 0]), dynamic([1, 2, 3, 4]), dynamic([4, 5, 6])) == dynamic([1, 5, 3, 4, null])
array_slice
print array_slice(dynamic([1,2,3]), 1, 2) == dynamic([2, 3])
print array_slice(dynamic([1,2,3,4,5]), 2, -1) == dynamic([3, 4, 5])
print array_slice(dynamic([1,2,3,4,5]), -3, -2) == dynamic([3, 4])
array_split
print array_split(dynamic([1,2,3,4,5]), 2) == dynamic([[1,2],[3,4,5]])
print array_split(dynamic([1,2,3,4,5]), dynamic([1,3])) == dynamic([[1],[2,3],[4,5]])
Data types
dynamic
print isnull(dynamic(null))
print dynamic(1) == 1
print dynamic(timespan(1d)) == 86400
print dynamic([1, 2, 3])
print dynamic([[1], [2], [3]])
print dynamic(['a', "b", 'c'])
According to the KQL specifications
dynamic
is a literal, which means that no function calls are permitted. Expressions producing literals such asdatetime
andtimespan
and their aliases (ie.date
andtime
, respectively) along with nesteddynamic
literals are allowed.Please note that our current implementation supports only scalars and arrays made up of elements of the same type.
bool,boolean
print bool(1)
print boolean(0)
datetime
print datetime(2015-12-31 23:59:59.9)
print datetime('2015-12-31 23:59:59.9')
print datetime("2015-12-31:)
guid
print guid(74be27de-1e4e-49d9-b579-fe0b331d3642)
print guid('74be27de-1e4e-49d9-b579-fe0b331d3642')
print guid('74be27de1e4e49d9b579fe0b331d3642')
int
print int(1)
long
print long(16)
real
print real(1)
timespan ,time
Note the timespan is used for calculating datatime, so the output is in seconds. e.g. time(1h) = 3600
print 1d
print 30m
print time('0.12:34:56.7')
print time(2h)
print timespan(2h)
Data Type Conversion
tobool / toboolean
print tobool(true) == true
print toboolean(false) == false
print tobool(0) == false
print toboolean(19819823) == true
print tobool(-2) == true
print isnull(toboolean('a'))
print tobool('true') == true
print toboolean('false') == false
todouble / toreal
print todouble(4) == 4
print toreal(4.2) == 4.2
print isnull(todouble('a'))
print toreal('-0.3') == -0.3
toint
print isnull(toint('a'))
print toint(4) == 4
print toint('4') == 4
print isnull(toint(4.2))
tostring
print tostring(123) == '123'
print tostring('asd') == 'asd'
Set functions
jaccard_index
print jaccard_index(dynamic([1, 1, 2, 2, 3, 3]), dynamic([1, 2, 3, 4, 4, 4])) == 0.75
print jaccard_index(dynamic([1, 2, 3]), dynamic([])) == 0
print jaccard_index(dynamic([]), dynamic([1, 2, 3, 4])) == 0
print isnan(jaccard_index(dynamic([]), dynamic([])))
print jaccard_index(dynamic([1, 2, 3]), dynamic([4, 5, 6, 7])) == 0
print jaccard_index(dynamic(['a', 's', 'd']), dynamic(['f', 'd', 's', 'a'])) == 0.75
print jaccard_index(dynamic(['Chewbacca', 'Darth Vader', 'Han Solo']), dynamic(['Darth Sidious', 'Darth Vader'])) == 0.25
set_difference
print set_difference(dynamic([1, 1, 2, 2, 3, 3]), dynamic([1, 2, 3])) == dynamic([])
print array_sort_asc(set_difference(dynamic([1, 4, 2, 3, 5, 4, 6]), dynamic([1, 2, 3])))[1] == dynamic([4, 5, 6])
print set_difference(dynamic([4]), dynamic([1, 2, 3])) == dynamic([4])
print array_sort_asc(set_difference(dynamic([1, 2, 3, 4, 5]), dynamic([5]), dynamic([2, 4])))[1] == dynamic([1, 3])
print array_sort_asc(set_difference(dynamic([1, 2, 3]), dynamic([])))[1] == dynamic([1, 2, 3])
print array_sort_asc(set_difference(dynamic(['a', 's', 'd']), dynamic(['a', 'f'])))[1] == dynamic(['d', 's'])
print array_sort_asc(set_difference(dynamic(['Chewbacca', 'Darth Vader', 'Han Solo']), dynamic(['Darth Sidious', 'Darth Vader'])))[1] == dynamic(['Chewbacca', 'Han Solo'])
set_has_element
print set_has_element(dynamic(["this", "is", "an", "example"]), "example") == true
print set_has_element(dynamic(["this", "is", "an", "example"]), "examplee") == false
print set_has_element(dynamic([1, 2, 3]), 2) == true
print set_has_element(dynamic([1, 2, 3, 4.2]), 4) == false
set_intersect
print array_sort_asc(set_intersect(dynamic([1, 1, 2, 2, 3, 3]), dynamic([1, 2, 3])))[1] == dynamic([1, 2, 3])
print array_sort_asc(set_intersect(dynamic([1, 4, 2, 3, 5, 4, 6]), dynamic([1, 2, 3])))[1] == dynamic([1, 2, 3])
print set_intersect(dynamic([4]), dynamic([1, 2, 3])) == dynamic([])
print set_intersect(dynamic([1, 2, 3, 4, 5]), dynamic([1, 3, 5]), dynamic([2, 5])) == dynamic([5])
print set_intersect(dynamic([1, 2, 3]), dynamic([])) == dynamic([])
print set_intersect(dynamic(['a', 's', 'd']), dynamic(['a', 'f'])) == dynamic(['a'])
print set_intersect(dynamic(['Chewbacca', 'Darth Vader', 'Han Solo']), dynamic(['Darth Sidious', 'Darth Vader'])) == dynamic(['Darth Vader'])
set_union
print array_sort_asc(set_union(dynamic([1, 1, 2, 2, 3, 3]), dynamic([1, 2, 3])))[1] == dynamic([1, 2, 3])
print array_sort_asc(set_union(dynamic([1, 4, 2, 3, 5, 4, 6]), dynamic([1, 2, 3])))[1] == dynamic([1, 2, 3, 4, 5, 6])
print array_sort_asc(set_union(dynamic([4]), dynamic([1, 2, 3])))[1] == dynamic([1, 2, 3, 4])
print array_sort_asc(set_union(dynamic([1, 3, 4]), dynamic([5]), dynamic([2, 4])))[1] == dynamic([1, 2, 3, 4, 5])
print array_sort_asc(set_union(dynamic([1, 2, 3]), dynamic([])))[1] == dynamic([1, 2, 3])
print array_sort_asc(set_union(dynamic(['a', 's', 'd']), dynamic(['a', 'f'])))[1] == dynamic(['a', 'd', 'f', 's'])
print array_sort_asc(set_union(dynamic(['Chewbacca', 'Darth Vader', 'Han Solo']), dynamic(['Darth Sidious', 'Darth Vader'])))[1] == dynamic(['Chewbacca', 'Darth Sidious', 'Darth Vader', 'Han Solo'])
Binary functions
print binary_and(15, 3) == 3
print binary_and(1, 2) == 0
print binary_not(1) == -2
print binary_or(3, 8) == 11
print binary_or(1, 2) == 3
print binary_shift_left(1, 1) == 2
print binary_shift_left(1, 64) == 1
print binary_shift_right(1, 1) == 0
print binary_shift_right(1, 64) == 1
print binary_xor(1, 3) == 2
print bitset_count_ones(42) == 3
IP functions
print format_ipv4('192.168.1.255', 24) == '192.168.1.0'
print format_ipv4(3232236031, 24) == '192.168.1.0'
print format_ipv4_mask('192.168.1.255', 24) == '192.168.1.0/24'
print format_ipv4_mask(3232236031, 24) == '192.168.1.0/24'
print ipv4_compare('127.0.0.1', '127.0.0.1') == 0
print ipv4_compare('192.168.1.1', '192.168.1.255') < 0
print ipv4_compare('192.168.1.1/24', '192.168.1.255/24') == 0
print ipv4_compare('192.168.1.1', '192.168.1.255', 24) == 0
print ipv4_is_match('127.0.0.1', '127.0.0.1') == true
print ipv4_is_match('192.168.1.1', '192.168.1.255') == false
print ipv4_is_match('192.168.1.1/24', '192.168.1.255/24') == true
print ipv4_is_match('192.168.1.1', '192.168.1.255', 24) == true
print ipv6_compare('::ffff:7f00:1', '127.0.0.1') == 0
print ipv6_compare('fe80::85d:e82c:9446:7994', 'fe80::85d:e82c:9446:7995') < 0
print ipv6_compare('192.168.1.1/24', '192.168.1.255/24') == 0
print ipv6_compare('fe80::85d:e82c:9446:7994/127', 'fe80::85d:e82c:9446:7995/127') == 0
print ipv6_compare('fe80::85d:e82c:9446:7994', 'fe80::85d:e82c:9446:7995', 127) == 0
print ipv6_is_match('::ffff:7f00:1', '127.0.0.1') == true
print ipv6_is_match('fe80::85d:e82c:9446:7994', 'fe80::85d:e82c:9446:7995') == false
print ipv6_is_match('192.168.1.1/24', '192.168.1.255/24') == true
print ipv6_is_match('fe80::85d:e82c:9446:7994/127', 'fe80::85d:e82c:9446:7995/127') == true
print ipv6_is_match('fe80::85d:e82c:9446:7994', 'fe80::85d:e82c:9446:7995', 127) == true
print parse_ipv4_mask('127.0.0.1', 24) == 2130706432
print parse_ipv4_mask('192.1.168.2', 31) == 3221334018
print parse_ipv4_mask('192.1.168.3', 31) == 3221334018
print parse_ipv4_mask('127.2.3.4', 32) == 2130838276
print parse_ipv6_mask('127.0.0.1', 24) == '0000:0000:0000:0000:0000:ffff:7f00:0000'
print parse_ipv6_mask('fe80::85d:e82c:9446:7994', 120) == 'fe80:0000:0000:0000:085d:e82c:9446:7900'
"Customers | project parse_ipv4('127.0.0.1')"
"Customers | project parse_ipv6('127.0.0.1')"
KQL string operators and functions
contains
Customers |where Education contains 'degree'
!contains
Customers |where Education !contains 'degree'
contains_cs
Customers |where Education contains 'Degree'
!contains_cs
Customers |where Education !contains 'Degree'
endswith
Customers | where FirstName endswith 'RE'
!endswith
Customers | where !FirstName endswith 'RE'
endswith_cs
Customers | where FirstName endswith_cs 're'
!endswith_cs
Customers | where FirstName !endswith_cs 're'
==
Customers | where Occupation == 'Skilled Manual'
!=
Customers | where Occupation != 'Skilled Manual'
has
Customers | where Occupation has 'skilled'
!has
Customers | where Occupation !has 'skilled'
has_cs
Customers | where Occupation has 'Skilled'
!has_cs
Customers | where Occupation !has 'Skilled'
hasprefix
Customers | where Occupation hasprefix_cs 'Ab'
!hasprefix
Customers | where Occupation !hasprefix_cs 'Ab'
hasprefix_cs
Customers | where Occupation hasprefix_cs 'ab'
!hasprefix_cs
Customers | where Occupation! hasprefix_cs 'ab'
hassuffix
Customers | where Occupation hassuffix 'Ent'
!hassuffix
Customers | where Occupation !hassuffix 'Ent'
hassuffix_cs
Customers | where Occupation hassuffix 'ent'
!hassuffix_cs
Customers | where Occupation hassuffix 'ent'
in
Customers |where Education in ('Bachelors','High School')
!in
Customers | where Education !in ('Bachelors','High School')
matches regex
Customers | where FirstName matches regex 'P.*r'
startswith
Customers | where FirstName startswith 'pet'
!startswith
Customers | where FirstName !startswith 'pet'
startswith_cs
Customers | where FirstName startswith_cs 'pet'
!startswith_cs
Customers | where FirstName !startswith_cs 'pet'
base64_encode_tostring()
Customers | project base64_encode_tostring('Kusto1') | take 1
base64_decode_tostring()
Customers | project base64_decode_tostring('S3VzdG8x') | take 1
isempty()
Customers | where isempty(LastName)
isnotempty()
Customers | where isnotempty(LastName)
isnotnull()
Customers | where isnotnull(FirstName)
isnull()
Customers | where isnull(FirstName)
url_decode()
Customers | project url_decode('https%3A%2F%2Fwww.test.com%2Fhello%20word') | take 1
url_encode()
Customers | project url_encode('https://www.test.com/hello word') | take 1
substring()
Customers | project name_abbr = strcat(substring(FirstName,0,3), ' ', substring(LastName,2))
strcat()
Customers | project name = strcat(FirstName, ' ', LastName)
strlen()
Customers | project FirstName, strlen(FirstName)
strrep()
Customers | project strrep(FirstName,2,'_')
toupper()
Customers | project toupper(FirstName)
tolower()
Customers | project tolower(FirstName)
support subquery for
in
orerator (https://docs.microsoft.com/en-us/azure/data-explorer/kusto/query/in-cs-operator)(subquery need to be wraped with bracket inside bracket)
Customers | where Age in ((Customers|project Age|where Age < 30))
Note: case-insensitive not supported yet
has_all (https://docs.microsoft.com/en-us/azure/data-explorer/kusto/query/has-all-operator)
Customers|where Occupation has_any ('Skilled','abcd')
note : subquery not supported yet
has _any (https://docs.microsoft.com/en-us/azure/data-explorer/kusto/query/has-anyoperator)
Customers|where Occupation has_all ('Skilled','abcd')
note : subquery not supported yet
countof (https://docs.microsoft.com/en-us/azure/data-explorer/kusto/query/countoffunction)
Customers | project countof('The cat sat on the mat', 'at')
Customers | project countof('The cat sat on the mat', 'at', 'normal')
Customers | project countof('The cat sat on the mat', 'at', 'regex')
extract ( https://docs.microsoft.com/en-us/azure/data-explorer/kusto/query/extractfunction)
Customers | project extract('(\\b[A-Z]+\\b).+(\\b\\d+)', 0, 'The price of PINEAPPLE ice cream is 20')
Customers | project extract('(\\b[A-Z]+\\b).+(\\b\\d+)', 1, 'The price of PINEAPPLE ice cream is 20')
Customers | project extract('(\\b[A-Z]+\\b).+(\\b\\d+)', 2, 'The price of PINEAPPLE ice cream is 20')
Customers | project extract('(\\b[A-Z]+\\b).+(\\b\\d+)', 3, 'The price of PINEAPPLE ice cream is 20')
Customers | project extract('(\\b[A-Z]+\\b).+(\\b\\d+)', 2, 'The price of PINEAPPLE ice cream is 20', typeof(real))
[extract_all] (https://docs.microsoft.com/en-us/azure/data-explorer/kusto/query/extractallfunction)
Customers | project extract_all('(\\w)(\\w+)(\\w)','The price of PINEAPPLE ice cream is 20')
note: captureGroups not supported yet
[split] (https://docs.microsoft.com/en-us/azure/data-explorer/kusto/query/splitfunction)
Customers | project split('aa_bb', '_')
Customers | project split('aaa_bbb_ccc', '_', 1)
Customers | project split('', '_')
Customers | project split('a__b', '_')
Customers | project split('aabbcc', 'bb')
[strcat_delim] (https://docs.microsoft.com/en-us/azure/data-explorer/kusto/query/strcat-delimfunction)
Customers | project strcat_delim('-', '1', '2', 'A') , 1s)
Customers | project strcat_delim('-', '1', '2', strcat('A','b'))
note: only support string now.
[indexof] (https://docs.microsoft.com/en-us/azure/data-explorer/kusto/query/indexoffunction)
Customers | project indexof('abcdefg','cde')
Customers | project indexof('abcdefg','cde',2)
Customers | project indexof('abcdefg','cde',6)
note: length and occurrence not supported yet
[strcmp] (https://docs.microsoft.com/en-us/azure/data-explorer/kusto/query/strcmpfunction)
print strcmp('abc','ABC')
[parse_url] (https://docs.microsoft.com/en-us/azure/data-explorer/kusto/query/parseurlfunction)
print Result = parse_url('scheme://username:password@www.google.com:1234/this/is/a/path?k1=v1&k2=v2#fragment')
[parse_urlquery] (https://docs.microsoft.com/en-us/azure/data-explorer/kusto/query/parseurlqueryfunction)
print Result = parse_urlquery('k1=v1&k2=v2&k3=v3')
[print operator] (https://docs.microsoft.com/en-us/azure/data-explorer/kusto/query/printoperator)
print x=1, s=strcat('Hello', ', ', 'World!')
base64_encode_fromguid
print Quine = base64_encode_fromguid('ae3133f2-6e22-49ae-b06a-16e6a9b212eb')
base64_decode_toarray
print base64_decode_toarray('S3VzdG8=')
base64_decode_toguid
print base64_decode_toguid('YWUzMTMzZjItNmUyMi00OWFlLWIwNmEtMTZlNmE5YjIxMmVi')
replace_regex
print replace_regex('Hello, World!', '.', '\\0\\0')
has_any_index
print idx = has_any_index('this is an example', dynamic(['this', 'example']))
translate
print translate('krasp', 'otsku', 'spark')
trim
print trim('--', '--https://bing.com--')
trim_end
print trim_end('.com', 'bing.com')
trim_start
print trim_start('[^\\w]+', strcat('- ','Te st1','// $'))
reverse
print reverse(123)
print reverse(123.34)
print reverse('clickhouse')
print reverse(3h)
print reverse(datetime(2017-1-1 12:23:34))
parse_command_line
print parse_command_line('echo \"hello world!\" print$?', \"Windows\")
parse_csv
print result=parse_csv('aa,b,cc')
print result_multi_record=parse_csv('record1,a,b,c\nrecord2,x,y,z')
parse_json
print parse_json( dynamic([1, 2, 3]))
print parse_json('{"a":123.5, "b":"{\\"c\\":456}"}')
extract_json
print extract_json( "$.a" , '{"a":123, "b":"{\\"c\\":456}"}' , typeof(int))
parse_version
print parse_version('1')
print parse_version('1.2.3.40')
DateTimeFunctions
ago
print ago(2h)
endofday
print endofday(datetime(2017-01-01 10:10:17), -1)
print endofday(datetime(2017-01-01 10:10:17), 1)
print endofday(datetime(2017-01-01 10:10:17))
endofmonth
print endofmonth(datetime(2017-01-01 10:10:17), -1)
print endofmonth(datetime(2017-01-01 10:10:17), 1)
print endofmonth(datetime(2017-01-01 10:10:17))
endofweek
print endofweek(datetime(2017-01-01 10:10:17), 1)
print endofweek(datetime(2017-01-01 10:10:17), -1)
print endofweek(datetime(2017-01-01 10:10:17))
endofyear
print endofyear(datetime(2017-01-01 10:10:17), -1)
print endofyear(datetime(2017-01-01 10:10:17), 1)
print endofyear(datetime(2017-01-01 10:10:17))
make_datetime
print make_datetime(2017,10,01)
print make_datetime(2017,10,01,12,10)
print make_datetime(2017,10,01,12,11,0.1234567)
datetime_diff
print datetime_diff('year',datetime(2017-01-01),datetime(2000-12-31))
print datetime_diff('quarter',datetime(2017-07-01),datetime(2017-03-30))
print datetime_diff('minute',datetime(2017-10-30 23:05:01),datetime(2017-10-30 23:00:59))
unixtime_microseconds_todatetime
print unixtime_microseconds_todatetime(1546300800000000)
unixtime_milliseconds_todatetime
print unixtime_milliseconds_todatetime(1546300800000)
unixtime_nanoseconds_todatetime
print unixtime_nanoseconds_todatetime(1546300800000000000)
datetime_part
print datetime_part('day', datetime(2017-10-30 01:02:03.7654321))
datetime_add
print datetime_add('day',1,datetime(2017-10-30 01:02:03.7654321))
format_timespan
print format_timespan(time(1d), 'd-[hh:mm:ss]')
print format_timespan(time('12:30:55.123'), 'ddddd-[hh:mm:ss.ffff]')
format_datetime
print format_datetime(todatetime('2009-06-15T13:45:30.6175425'), 'yy-M-dd [H:mm:ss.fff]')
print format_datetime(datetime(2015-12-14 02:03:04.12345), 'y-M-d h:m:s tt')
todatetime
print todatetime('2014-05-25T08:20:03.123456Z')
print todatetime('2014-05-25 20:03.123')
[totimespan] (https://docs.microsoft.com/en-us/azure/data-explorer/kusto/query/totimespanfunction)
print totimespan('0.01:34:23')
print totimespan(1d)
startofyear
print startofyear(datetime(2017-01-01 10:10:17), -1)
print startofyear(datetime(2017-01-01 10:10:17), 0)
print startofyear(datetime(2017-01-01 10:10:17), 1)
weekofyear
print week_of_year(datetime(2020-12-31))
print week_of_year(datetime(2020-06-15))
print week_of_year(datetime(1970-01-01))
print week_of_year(datetime(2000-01-01))
startofweek
print startofweek(datetime(2017-01-01 10:10:17), -1)
print startofweek(datetime(2017-01-01 10:10:17), 0)
print startofweek(datetime(2017-01-01 10:10:17), 1)
startofmonth
print startofmonth(datetime(2017-01-01 10:10:17), -1)
print startofmonth(datetime(2017-01-01 10:10:17), 0)
print startofmonth(datetime(2017-01-01 10:10:17), 1)
startofday
print startofday(datetime(2017-01-01 10:10:17), -1)
print startofday(datetime(2017-01-01 10:10:17), 0)
print startofday(datetime(2017-01-01 10:10:17), 1)
monthofyear
print monthofyear(datetime("2015-12-14"))
hourofday
print hourofday(datetime(2015-12-14 18:54:00))
getyear
print getyear(datetime(2015-10-12))
getmonth
print getmonth(datetime(2015-10-12))
dayofyear
print dayofyear(datetime(2015-12-14))
dayofmonth
print (datetime(2015-12-14))
unixtime_seconds_todatetime
print unixtime_seconds_todatetime(1546300800)
dayofweek
print dayofweek(datetime(2015-12-20))
now
print now()
print now(2d)
print now(-2h)
print now(5microseconds)
print now(5seconds)
print now(6minutes)
print now(-2d)
print now(time(1d))
Miscellaneous functions
print isnan(double(nan)) == true
print isnan(4.2) == false
print isnan(4) == false
print isnan(real(+inf)) == false
The config setting to allow modify dialect setting.
users.xml
). This sets thedialect
at server startup and CH will do query parsing for all users withdefault
profile acording to dialect value.For example:
<profiles> <!-- Default settings. --> <default> <load_balancing>random</load_balancing> <dialect>kusto_auto</dialect> </default>
Query can be executed with HTTP client as below once dialect is set in users.xml
echo "KQL query" | curl -sS "http://localhost:8123/?" --data-binary @-
To execute the query using clickhouse-client , Update clickhouse-client.xml as below and connect clickhouse-client with --config-file option (
clickhouse-client --config-file=<config-file path>
)<config> <dialect>kusto_auto</dialect> </config>
OR
pass dialect setting with '--'. For example :
clickhouse-client --dialect='kusto_auto' -q "KQL query"
double quote support
print res = strcat("double ","quote")