Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support Kusto Query Language dialect - phase 2 #42510

Merged
merged 260 commits into from Oct 11, 2023

Conversation

larryluogit
Copy link
Contributor

@larryluogit larryluogit commented Oct 19, 2022

Changelog category (leave one):

  • Improvement

Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):

This is the second part of Kusto Query Language dialect support.
Phase 1 implementation has been merged.

Implemented KQL Features

sql_dialect

  • default is clickhouse
    set sql_dialect='clickhouse'
  • only process kql
    set sql_dialect='kusto'

(hide)

KQL() function

  • create table
    CREATE TABLE kql_table4 ENGINE = Memory AS select *, now() as new_column From kql(Customers | project LastName,Age);
    verify the content of kql_table
    select * from kql_table

  • insert into table
    create a tmp table:

    CREATE TABLE temp
    (    
        FirstName Nullable(String),
        LastName String, 
        Age Nullable(UInt8)
    ) ENGINE = Memory;
    

    INSERT INTO temp select * from kql(Customers|project FirstName,LastName,Age);
    verify the content of temp
    select * from temp

  • Select from kql()
    Select * from kql(Customers|project FirstName)

KQL operators:

  • Tabular expression statements
    Customers
  • Select Column
    Customers | project FirstName,LastName,Occupation
  • Limit returned results
    Customers | project FirstName,LastName,Occupation | take 1 | take 3
  • sort, order
    Customers | order by Age desc , FirstName asc
  • Filter
    Customers | where Occupation == 'Skilled Manual'
  • summarize
    Customers |summarize max(Age) by Occupation
  • distinct
    Customers | distinct *
    Customers | distinct Occupation
    Customers | distinct Occupation, Education
    Customers | where Age <30 | distinct Occupation, Education
    Customers | where Age <30 | order by Age| distinct Occupation, Education
  • extend
    T | extend T | extend duration = endTime - startTime
    T | project endTime, startTime | extend duration = endTime - startTime
  • make-series
    T | make-series PriceAvg = avg(Price) default=0 on Purchase from datetime(2016-09-10) to datetime(2016-09-13) step 1d by Supplier, Fruit
  • mv-expand
    T | mv-expand c
    T | mv-expand c, d
    T | mv-expand b | mv-expand c
    T | mv-expand c to typeof(bool)
    T | mv-expand with_itemindex=index b, c, d
    T | mv-expand array_concat(c,d)
    T | mv-expand x = c, y = d
    T | mv-expand xy = array_concat(c, d)
    T | mv-expand with_itemindex=index c,d to typeof(bool)

Aggregate Functions:

  • arg_max()
  • arg_min()
  • avg()
  • avgif()
  • count()
  • countif()
  • max()
  • maxif()
  • min()
  • minif()
  • sum()
  • sumif()
  • dcount()
  • dcountif()
  • make_list()
    Customers | summarize t = make_list(FirstName) by FirstName
    Customers | summarize t = make_list(FirstName, 10) by FirstName
  • make_list_if()
    Customers | summarize t = make_list_if(FirstName, Age > 10) by FirstName
    Customers | summarize t = make_list_if(FirstName, Age > 10, 10) by FirstName
  • make_list_with_nulls()
    Customers | summarize t = make_list_with_nulls(Age) by FirstName
  • make_set()
    Customers | summarize t = make_set(FirstName) by FirstName
    Customers | summarize t = make_set(FirstName, 10) by FirstName
  • make_set_if()
    Customers | summarize t = make_set_if(FirstName, Age > 10) by FirstName
    Customers | summarize t = make_set_if(FirstName, Age > 10, 10) by FirstName
  • bin_at
    print res = bin_at(6.5, 2.5, 7)
    print res = bin_at(1h, 1d, 12h)
    print res = bin_at(datetime(2017-05-15 10:20:00.0), 1d, datetime(1970-01-01 12:00:00.0))
    print res = bin_at(datetime(2017-05-17 10:20:00.0), 7d, datetime(2017-06-04 00:00:00.0))
  • array_index_of
    Supports only basic lookup. Do not support start_index, length and occurrence
    print output = array_index_of(dynamic(['John', 'Denver', 'Bob', 'Marley']), 'Marley')
    print output = array_index_of(dynamic([1, 2, 3]), 2)
  • array_sum
    print output = array_sum(dynamic([2, 5, 3]))
    print output = array_sum(dynamic([2.5, 5.5, 3]))
  • array_length
    print output = array_length(dynamic(['John', 'Denver', 'Bob', 'Marley']))
    print output = array_length(dynamic([1, 2, 3]))
  • bin
    print bin(4.5, 1)
    print bin(time(16d), 7d)
    print bin(datetime(1970-05-11 13:45:07), 1d)
  • stdev
    Customers | summarize t = stdev(Age) by FirstName
  • stdevif
    Customers | summarize t = stdevif(Age, Age < 10) by FirstName
  • binary_all_and
    Customers | summarize t = binary_all_and(Age) by FirstName
  • binary_all_or
    Customers | summarize t = binary_all_or(Age) by FirstName
  • binary_all_xor
    Customers | summarize t = binary_all_xor(Age) by FirstName
  • percentiles
    Customers | summarize percentiles(Age, 30, 40, 50, 60, 70) by FirstName
  • percentilesw
    DataTable | summarize t = percentilesw(Bucket, Frequency, 50, 75, 99.9)
  • percentile
    Customers | summarize t = percentile(Age, 50) by FirstName
  • percentilew
    DataTable | summarize t = percentilew(Bucket, Frequency, 50)

Array functions

Please note that only arrays of the same type are supported in our current implementation. The underlying reasons are explained under the section of the dynamic data type.

  • array_reverse
    print array_reverse(dynamic(["this", "is", "an", "example"])) == dynamic(["example","an","is","this"])

  • array_rotate_left
    print array_rotate_left(dynamic([1,2,3,4,5]), 2) == dynamic([3,4,5,1,2])
    print array_rotate_left(dynamic([1,2,3,4,5]), -2) == dynamic([4,5,1,2,3])

  • array_rotate_right
    print array_rotate_right(dynamic([1,2,3,4,5]), -2) == dynamic([3,4,5,1,2])
    print array_rotate_right(dynamic([1,2,3,4,5]), 2) == dynamic([4,5,1,2,3])

  • array_shift_left
    print array_shift_left(dynamic([1,2,3,4,5]), 2) == dynamic([3,4,5,null,null])
    print array_shift_left(dynamic([1,2,3,4,5]), -2) == dynamic([null,null,1,2,3])
    print array_shift_left(dynamic([1,2,3,4,5]), 2, -1) == dynamic([3,4,5,-1,-1])
    print array_shift_left(dynamic(['a', 'b', 'c']), 2) == dynamic(['c','',''])

  • array_shift_right
    print array_shift_right(dynamic([1,2,3,4,5]), -2) == dynamic([3,4,5,null,null])
    print array_shift_right(dynamic([1,2,3,4,5]), 2) == dynamic([null,null,1,2,3])
    print array_shift_right(dynamic([1,2,3,4,5]), -2, -1) == dynamic([3,4,5,-1,-1])
    print array_shift_right(dynamic(['a', 'b', 'c']), -2) == dynamic(['c','',''])

  • pack_array
    print x = 1, y = x * 2, z = y * 2, pack_array(x,y,z)

    Please note that only arrays of elements of the same type may be created at this time. The underlying reasons are explained under the release note section of the dynamic data type.

  • repeat
    print repeat(1, 0) == dynamic([])
    print repeat(1, 3) == dynamic([1, 1, 1])
    print repeat("asd", 3) == dynamic(['asd', 'asd', 'asd'])
    print repeat(timespan(1d), 3) == dynamic([86400, 86400, 86400])
    print repeat(true, 3) == dynamic([true, true, true])

  • zip
    print zip(dynamic([1,3,5]), dynamic([2,4,6]))

  • array_sort_asc
    Only support the constant dynamic array.
    Returns an array. So, each element of the input has to be of same datatype.
    print t = array_sort_asc(dynamic([null, 'd', 'a', 'c', 'c']))
    print t = array_sort_asc(dynamic([4, 1, 3, 2]))
    print t = array_sort_asc(dynamic(['b', 'a', 'c']), dynamic(['q', 'p', 'r']))
    print t = array_sort_asc(dynamic(['q', 'p', 'r']), dynamic(['clickhouse','hello', 'world']))
    print t = array_sort_asc( dynamic(['d', null, 'a', 'c', 'c']) , false)
    print t = array_sort_asc( dynamic(['d', null, 'a', 'c', 'c']) , 1 > 2)
    print t = array_sort_asc( dynamic([null, 'd', null, null, 'a', 'c', 'c', null, null, null]) , false)
    print t = array_sort_asc( dynamic([null, null, null]) , false)
    print t = array_sort_asc(dynamic([2, 1, null,3]), dynamic([20, 10, 40, 30]), 1 > 2)
    print t = array_sort_asc(dynamic([2, 1, null,3]), dynamic([20, 10, 40, 30, 50, 3]), 1 > 2)

  • array_sort_desc (only support the constant dynamic array)

    print t = array_sort_desc(dynamic([null, 'd', 'a', 'c', 'c']))
    print t = array_sort_desc(dynamic([4, 1, 3, 2]))
    print t = array_sort_desc(dynamic(['b', 'a', 'c']), dynamic(['q', 'p', 'r']))
    print t = array_sort_desc(dynamic(['q', 'p', 'r']), dynamic(['clickhouse','hello', 'world']))
    print t = array_sort_desc( dynamic(['d', null, 'a', 'c', 'c']) , false)
    print t = array_sort_desc( dynamic(['d', null, 'a', 'c', 'c']) , 1 > 2)
    print t = array_sort_desc( dynamic([null, 'd', null, null, 'a', 'c', 'c', null, null, null]) , false)
    print t = array_sort_desc( dynamic([null, null, null]) , false)
    print t = array_sort_desc(dynamic([2, 1, null, 3]), dynamic([20, 10, 40, 30]), 1 > 2)
    print t = array_sort_desc(dynamic([2, 1, null,3, null]), dynamic([20, 10, 40, 30, 50, 3]), 1 > 2)

  • array_concat
    print array_concat(dynamic([1, 2, 3]), dynamic([4, 5]), dynamic([6, 7, 8, 9])) == dynamic([1, 2, 3, 4, 5, 6, 7, 8, 9])

  • array_iff / array_iif
    print array_iif(dynamic([true, false, true]), dynamic([1, 2, 3]), dynamic([4, 5, 6])) == dynamic([1, 5, 3])
    print array_iif(dynamic([true, false, true]), dynamic([1, 2, 3, 4]), dynamic([4, 5, 6])) == dynamic([1, 5, 3])
    print array_iif(dynamic([true, false, true, false]), dynamic([1, 2, 3, 4]), dynamic([4, 5, 6])) == dynamic([1, 5, 3, null])
    print array_iif(dynamic([1, 0, -1, 44, 0]), dynamic([1, 2, 3, 4]), dynamic([4, 5, 6])) == dynamic([1, 5, 3, 4, null])

  • array_slice
    print array_slice(dynamic([1,2,3]), 1, 2) == dynamic([2, 3])
    print array_slice(dynamic([1,2,3,4,5]), 2, -1) == dynamic([3, 4, 5])
    print array_slice(dynamic([1,2,3,4,5]), -3, -2) == dynamic([3, 4])

  • array_split
    print array_split(dynamic([1,2,3,4,5]), 2) == dynamic([[1,2],[3,4,5]])
    print array_split(dynamic([1,2,3,4,5]), dynamic([1,3])) == dynamic([[1],[2,3],[4,5]])

Data types

  • dynamic
    print isnull(dynamic(null))
    print dynamic(1) == 1
    print dynamic(timespan(1d)) == 86400
    print dynamic([1, 2, 3])
    print dynamic([[1], [2], [3]])
    print dynamic(['a', "b", 'c'])

    According to the KQL specifications dynamic is a literal, which means that no function calls are permitted. Expressions producing literals such as datetime and timespan and their aliases (ie. date and time, respectively) along with nested dynamic literals are allowed.
    Please note that our current implementation supports only scalars and arrays made up of elements of the same type.

  • bool,boolean
    print bool(1)
    print boolean(0)

  • datetime
    print datetime(2015-12-31 23:59:59.9)
    print datetime('2015-12-31 23:59:59.9')
    print datetime("2015-12-31:)

  • guid
    print guid(74be27de-1e4e-49d9-b579-fe0b331d3642)
    print guid('74be27de-1e4e-49d9-b579-fe0b331d3642')
    print guid('74be27de1e4e49d9b579fe0b331d3642')

  • int
    print int(1)

  • long
    print long(16)

  • real
    print real(1)

  • timespan ,time
    Note the timespan is used for calculating datatime, so the output is in seconds. e.g. time(1h) = 3600
    print 1d
    print 30m
    print time('0.12:34:56.7')
    print time(2h)
    print timespan(2h)

Data Type Conversion

  • tobool / toboolean
    print tobool(true) == true
    print toboolean(false) == false
    print tobool(0) == false
    print toboolean(19819823) == true
    print tobool(-2) == true
    print isnull(toboolean('a'))
    print tobool('true') == true
    print toboolean('false') == false

  • todouble / toreal
    print todouble(4) == 4
    print toreal(4.2) == 4.2
    print isnull(todouble('a'))
    print toreal('-0.3') == -0.3

  • toint
    print isnull(toint('a'))
    print toint(4) == 4
    print toint('4') == 4
    print isnull(toint(4.2))

  • tostring
    print tostring(123) == '123'
    print tostring('asd') == 'asd'

Set functions

  • jaccard_index
    print jaccard_index(dynamic([1, 1, 2, 2, 3, 3]), dynamic([1, 2, 3, 4, 4, 4])) == 0.75
    print jaccard_index(dynamic([1, 2, 3]), dynamic([])) == 0
    print jaccard_index(dynamic([]), dynamic([1, 2, 3, 4])) == 0
    print isnan(jaccard_index(dynamic([]), dynamic([])))
    print jaccard_index(dynamic([1, 2, 3]), dynamic([4, 5, 6, 7])) == 0
    print jaccard_index(dynamic(['a', 's', 'd']), dynamic(['f', 'd', 's', 'a'])) == 0.75
    print jaccard_index(dynamic(['Chewbacca', 'Darth Vader', 'Han Solo']), dynamic(['Darth Sidious', 'Darth Vader'])) == 0.25

  • set_difference
    print set_difference(dynamic([1, 1, 2, 2, 3, 3]), dynamic([1, 2, 3])) == dynamic([])
    print array_sort_asc(set_difference(dynamic([1, 4, 2, 3, 5, 4, 6]), dynamic([1, 2, 3])))[1] == dynamic([4, 5, 6])
    print set_difference(dynamic([4]), dynamic([1, 2, 3])) == dynamic([4])
    print array_sort_asc(set_difference(dynamic([1, 2, 3, 4, 5]), dynamic([5]), dynamic([2, 4])))[1] == dynamic([1, 3])
    print array_sort_asc(set_difference(dynamic([1, 2, 3]), dynamic([])))[1] == dynamic([1, 2, 3])
    print array_sort_asc(set_difference(dynamic(['a', 's', 'd']), dynamic(['a', 'f'])))[1] == dynamic(['d', 's'])
    print array_sort_asc(set_difference(dynamic(['Chewbacca', 'Darth Vader', 'Han Solo']), dynamic(['Darth Sidious', 'Darth Vader'])))[1] == dynamic(['Chewbacca', 'Han Solo'])

  • set_has_element
    print set_has_element(dynamic(["this", "is", "an", "example"]), "example") == true
    print set_has_element(dynamic(["this", "is", "an", "example"]), "examplee") == false
    print set_has_element(dynamic([1, 2, 3]), 2) == true
    print set_has_element(dynamic([1, 2, 3, 4.2]), 4) == false

  • set_intersect
    print array_sort_asc(set_intersect(dynamic([1, 1, 2, 2, 3, 3]), dynamic([1, 2, 3])))[1] == dynamic([1, 2, 3])
    print array_sort_asc(set_intersect(dynamic([1, 4, 2, 3, 5, 4, 6]), dynamic([1, 2, 3])))[1] == dynamic([1, 2, 3])
    print set_intersect(dynamic([4]), dynamic([1, 2, 3])) == dynamic([])
    print set_intersect(dynamic([1, 2, 3, 4, 5]), dynamic([1, 3, 5]), dynamic([2, 5])) == dynamic([5])
    print set_intersect(dynamic([1, 2, 3]), dynamic([])) == dynamic([])
    print set_intersect(dynamic(['a', 's', 'd']), dynamic(['a', 'f'])) == dynamic(['a'])
    print set_intersect(dynamic(['Chewbacca', 'Darth Vader', 'Han Solo']), dynamic(['Darth Sidious', 'Darth Vader'])) == dynamic(['Darth Vader'])

  • set_union
    print array_sort_asc(set_union(dynamic([1, 1, 2, 2, 3, 3]), dynamic([1, 2, 3])))[1] == dynamic([1, 2, 3])
    print array_sort_asc(set_union(dynamic([1, 4, 2, 3, 5, 4, 6]), dynamic([1, 2, 3])))[1] == dynamic([1, 2, 3, 4, 5, 6])
    print array_sort_asc(set_union(dynamic([4]), dynamic([1, 2, 3])))[1] == dynamic([1, 2, 3, 4])
    print array_sort_asc(set_union(dynamic([1, 3, 4]), dynamic([5]), dynamic([2, 4])))[1] == dynamic([1, 2, 3, 4, 5])
    print array_sort_asc(set_union(dynamic([1, 2, 3]), dynamic([])))[1] == dynamic([1, 2, 3])
    print array_sort_asc(set_union(dynamic(['a', 's', 'd']), dynamic(['a', 'f'])))[1] == dynamic(['a', 'd', 'f', 's'])
    print array_sort_asc(set_union(dynamic(['Chewbacca', 'Darth Vader', 'Han Solo']), dynamic(['Darth Sidious', 'Darth Vader'])))[1] == dynamic(['Chewbacca', 'Darth Sidious', 'Darth Vader', 'Han Solo'])

Binary functions

IP functions

  • format_ipv4
    print format_ipv4('192.168.1.255', 24) == '192.168.1.0'
    print format_ipv4(3232236031, 24) == '192.168.1.0'
  • format_ipv4_mask
    print format_ipv4_mask('192.168.1.255', 24) == '192.168.1.0/24'
    print format_ipv4_mask(3232236031, 24) == '192.168.1.0/24'
  • ipv4_compare
    print ipv4_compare('127.0.0.1', '127.0.0.1') == 0
    print ipv4_compare('192.168.1.1', '192.168.1.255') < 0
    print ipv4_compare('192.168.1.1/24', '192.168.1.255/24') == 0
    print ipv4_compare('192.168.1.1', '192.168.1.255', 24) == 0
  • ipv4_is_match
    print ipv4_is_match('127.0.0.1', '127.0.0.1') == true
    print ipv4_is_match('192.168.1.1', '192.168.1.255') == false
    print ipv4_is_match('192.168.1.1/24', '192.168.1.255/24') == true
    print ipv4_is_match('192.168.1.1', '192.168.1.255', 24) == true
  • ipv6_compare
    print ipv6_compare('::ffff:7f00:1', '127.0.0.1') == 0
    print ipv6_compare('fe80::85d:e82c:9446:7994', 'fe80::85d:e82c:9446:7995') < 0
    print ipv6_compare('192.168.1.1/24', '192.168.1.255/24') == 0
    print ipv6_compare('fe80::85d:e82c:9446:7994/127', 'fe80::85d:e82c:9446:7995/127') == 0
    print ipv6_compare('fe80::85d:e82c:9446:7994', 'fe80::85d:e82c:9446:7995', 127) == 0
  • ipv6_is_match
    print ipv6_is_match('::ffff:7f00:1', '127.0.0.1') == true
    print ipv6_is_match('fe80::85d:e82c:9446:7994', 'fe80::85d:e82c:9446:7995') == false
    print ipv6_is_match('192.168.1.1/24', '192.168.1.255/24') == true
    print ipv6_is_match('fe80::85d:e82c:9446:7994/127', 'fe80::85d:e82c:9446:7995/127') == true
    print ipv6_is_match('fe80::85d:e82c:9446:7994', 'fe80::85d:e82c:9446:7995', 127) == true
  • parse_ipv4_mask
    print parse_ipv4_mask('127.0.0.1', 24) == 2130706432
    print parse_ipv4_mask('192.1.168.2', 31) == 3221334018
    print parse_ipv4_mask('192.1.168.3', 31) == 3221334018
    print parse_ipv4_mask('127.2.3.4', 32) == 2130838276
  • parse_ipv6_mask
    print parse_ipv6_mask('127.0.0.1', 24) == '0000:0000:0000:0000:0000:ffff:7f00:0000'
    print parse_ipv6_mask('fe80::85d:e82c:9446:7994', 120) == 'fe80:0000:0000:0000:085d:e82c:9446:7900'
  • parse_ipv4
    "Customers | project parse_ipv4('127.0.0.1')"
  • parse_ipv6
    "Customers | project parse_ipv6('127.0.0.1')"
  • ipv4_is_private
  • ipv4_is_in_range
  • ipv4_netmask_suffix

KQL string operators and functions

  • contains
    Customers |where Education contains 'degree'

  • !contains
    Customers |where Education !contains 'degree'

  • contains_cs
    Customers |where Education contains 'Degree'

  • !contains_cs
    Customers |where Education !contains 'Degree'

  • endswith
    Customers | where FirstName endswith 'RE'

  • !endswith
    Customers | where !FirstName endswith 'RE'

  • endswith_cs
    Customers | where FirstName endswith_cs 're'

  • !endswith_cs
    Customers | where FirstName !endswith_cs 're'

  • ==
    Customers | where Occupation == 'Skilled Manual'

  • !=
    Customers | where Occupation != 'Skilled Manual'

  • has
    Customers | where Occupation has 'skilled'

  • !has
    Customers | where Occupation !has 'skilled'

  • has_cs
    Customers | where Occupation has 'Skilled'

  • !has_cs
    Customers | where Occupation !has 'Skilled'

  • hasprefix
    Customers | where Occupation hasprefix_cs 'Ab'

  • !hasprefix
    Customers | where Occupation !hasprefix_cs 'Ab'

  • hasprefix_cs
    Customers | where Occupation hasprefix_cs 'ab'

  • !hasprefix_cs
    Customers | where Occupation! hasprefix_cs 'ab'

  • hassuffix
    Customers | where Occupation hassuffix 'Ent'

  • !hassuffix
    Customers | where Occupation !hassuffix 'Ent'

  • hassuffix_cs
    Customers | where Occupation hassuffix 'ent'

  • !hassuffix_cs
    Customers | where Occupation hassuffix 'ent'

  • in
    Customers |where Education in ('Bachelors','High School')

  • !in
    Customers | where Education !in ('Bachelors','High School')

  • matches regex
    Customers | where FirstName matches regex 'P.*r'

  • startswith
    Customers | where FirstName startswith 'pet'

  • !startswith
    Customers | where FirstName !startswith 'pet'

  • startswith_cs
    Customers | where FirstName startswith_cs 'pet'

  • !startswith_cs
    Customers | where FirstName !startswith_cs 'pet'

  • base64_encode_tostring()
    Customers | project base64_encode_tostring('Kusto1') | take 1

  • base64_decode_tostring()
    Customers | project base64_decode_tostring('S3VzdG8x') | take 1

  • isempty()
    Customers | where isempty(LastName)

  • isnotempty()
    Customers | where isnotempty(LastName)

  • isnotnull()
    Customers | where isnotnull(FirstName)

  • isnull()
    Customers | where isnull(FirstName)

  • url_decode()
    Customers | project url_decode('https%3A%2F%2Fwww.test.com%2Fhello%20word') | take 1

  • url_encode()
    Customers | project url_encode('https://www.test.com/hello word') | take 1

  • substring()
    Customers | project name_abbr = strcat(substring(FirstName,0,3), ' ', substring(LastName,2))

  • strcat()
    Customers | project name = strcat(FirstName, ' ', LastName)

  • strlen()
    Customers | project FirstName, strlen(FirstName)

  • strrep()
    Customers | project strrep(FirstName,2,'_')

  • toupper()
    Customers | project toupper(FirstName)

  • tolower()
    Customers | project tolower(FirstName)

  • support subquery for in orerator (https://docs.microsoft.com/en-us/azure/data-explorer/kusto/query/in-cs-operator)
    (subquery need to be wraped with bracket inside bracket)

    Customers | where Age in ((Customers|project Age|where Age < 30))
    Note: case-insensitive not supported yet

  • has_all (https://docs.microsoft.com/en-us/azure/data-explorer/kusto/query/has-all-operator)
    Customers|where Occupation has_any ('Skilled','abcd')
    note : subquery not supported yet

  • has _any (https://docs.microsoft.com/en-us/azure/data-explorer/kusto/query/has-anyoperator)
    Customers|where Occupation has_all ('Skilled','abcd')
    note : subquery not supported yet

  • countof (https://docs.microsoft.com/en-us/azure/data-explorer/kusto/query/countoffunction)
    Customers | project countof('The cat sat on the mat', 'at')
    Customers | project countof('The cat sat on the mat', 'at', 'normal')
    Customers | project countof('The cat sat on the mat', 'at', 'regex')

  • extract ( https://docs.microsoft.com/en-us/azure/data-explorer/kusto/query/extractfunction)
    Customers | project extract('(\\b[A-Z]+\\b).+(\\b\\d+)', 0, 'The price of PINEAPPLE ice cream is 20')
    Customers | project extract('(\\b[A-Z]+\\b).+(\\b\\d+)', 1, 'The price of PINEAPPLE ice cream is 20')
    Customers | project extract('(\\b[A-Z]+\\b).+(\\b\\d+)', 2, 'The price of PINEAPPLE ice cream is 20')
    Customers | project extract('(\\b[A-Z]+\\b).+(\\b\\d+)', 3, 'The price of PINEAPPLE ice cream is 20')
    Customers | project extract('(\\b[A-Z]+\\b).+(\\b\\d+)', 2, 'The price of PINEAPPLE ice cream is 20', typeof(real))

  • [extract_all] (https://docs.microsoft.com/en-us/azure/data-explorer/kusto/query/extractallfunction)
    Customers | project extract_all('(\\w)(\\w+)(\\w)','The price of PINEAPPLE ice cream is 20')
    note: captureGroups not supported yet

  • [split] (https://docs.microsoft.com/en-us/azure/data-explorer/kusto/query/splitfunction)
    Customers | project split('aa_bb', '_')
    Customers | project split('aaa_bbb_ccc', '_', 1)
    Customers | project split('', '_')
    Customers | project split('a__b', '_')
    Customers | project split('aabbcc', 'bb')

  • [strcat_delim] (https://docs.microsoft.com/en-us/azure/data-explorer/kusto/query/strcat-delimfunction)
    Customers | project strcat_delim('-', '1', '2', 'A') , 1s)
    Customers | project strcat_delim('-', '1', '2', strcat('A','b'))
    note: only support string now.

  • [indexof] (https://docs.microsoft.com/en-us/azure/data-explorer/kusto/query/indexoffunction)
    Customers | project indexof('abcdefg','cde')
    Customers | project indexof('abcdefg','cde',2)
    Customers | project indexof('abcdefg','cde',6)
    note: length and occurrence not supported yet

  • [strcmp] (https://docs.microsoft.com/en-us/azure/data-explorer/kusto/query/strcmpfunction)
    print strcmp('abc','ABC')

  • [parse_url] (https://docs.microsoft.com/en-us/azure/data-explorer/kusto/query/parseurlfunction)
    print Result = parse_url('scheme://username:password@www.google.com:1234/this/is/a/path?k1=v1&k2=v2#fragment')

  • [parse_urlquery] (https://docs.microsoft.com/en-us/azure/data-explorer/kusto/query/parseurlqueryfunction)
    print Result = parse_urlquery('k1=v1&k2=v2&k3=v3')

  • [print operator] (https://docs.microsoft.com/en-us/azure/data-explorer/kusto/query/printoperator)
    print x=1, s=strcat('Hello', ', ', 'World!')

  • base64_encode_fromguid
    print Quine = base64_encode_fromguid('ae3133f2-6e22-49ae-b06a-16e6a9b212eb')

  • base64_decode_toarray
    print base64_decode_toarray('S3VzdG8=')

  • base64_decode_toguid
    print base64_decode_toguid('YWUzMTMzZjItNmUyMi00OWFlLWIwNmEtMTZlNmE5YjIxMmVi')

  • replace_regex
    print replace_regex('Hello, World!', '.', '\\0\\0')

  • has_any_index
    print idx = has_any_index('this is an example', dynamic(['this', 'example']))

  • translate
    print translate('krasp', 'otsku', 'spark')

  • trim
    print trim('--', '--https://bing.com--')

  • trim_end
    print trim_end('.com', 'bing.com')

  • trim_start
    print trim_start('[^\\w]+', strcat('- ','Te st1','// $'))

  • reverse
    print reverse(123)
    print reverse(123.34)
    print reverse('clickhouse')
    print reverse(3h)
    print reverse(datetime(2017-1-1 12:23:34))

  • parse_command_line
    print parse_command_line('echo \"hello world!\" print$?', \"Windows\")

  • parse_csv
    print result=parse_csv('aa,b,cc')
    print result_multi_record=parse_csv('record1,a,b,c\nrecord2,x,y,z')

  • parse_json
    print parse_json( dynamic([1, 2, 3]))
    print parse_json('{"a":123.5, "b":"{\\"c\\":456}"}')

  • extract_json
    print extract_json( "$.a" , '{"a":123, "b":"{\\"c\\":456}"}' , typeof(int))

  • parse_version
    print parse_version('1')
    print parse_version('1.2.3.40')

DateTimeFunctions

  • ago
    print ago(2h)

  • endofday
    print endofday(datetime(2017-01-01 10:10:17), -1)
    print endofday(datetime(2017-01-01 10:10:17), 1)
    print endofday(datetime(2017-01-01 10:10:17))

  • endofmonth
    print endofmonth(datetime(2017-01-01 10:10:17), -1)
    print endofmonth(datetime(2017-01-01 10:10:17), 1)
    print endofmonth(datetime(2017-01-01 10:10:17))

  • endofweek
    print endofweek(datetime(2017-01-01 10:10:17), 1)
    print endofweek(datetime(2017-01-01 10:10:17), -1)
    print endofweek(datetime(2017-01-01 10:10:17))

  • endofyear
    print endofyear(datetime(2017-01-01 10:10:17), -1)
    print endofyear(datetime(2017-01-01 10:10:17), 1)
    print endofyear(datetime(2017-01-01 10:10:17))

  • make_datetime
    print make_datetime(2017,10,01)
    print make_datetime(2017,10,01,12,10)
    print make_datetime(2017,10,01,12,11,0.1234567)

  • datetime_diff
    print datetime_diff('year',datetime(2017-01-01),datetime(2000-12-31))
    print datetime_diff('quarter',datetime(2017-07-01),datetime(2017-03-30))
    print datetime_diff('minute',datetime(2017-10-30 23:05:01),datetime(2017-10-30 23:00:59))

  • unixtime_microseconds_todatetime
    print unixtime_microseconds_todatetime(1546300800000000)

  • unixtime_milliseconds_todatetime
    print unixtime_milliseconds_todatetime(1546300800000)

  • unixtime_nanoseconds_todatetime
    print unixtime_nanoseconds_todatetime(1546300800000000000)

  • datetime_part
    print datetime_part('day', datetime(2017-10-30 01:02:03.7654321))

  • datetime_add
    print datetime_add('day',1,datetime(2017-10-30 01:02:03.7654321))

  • format_timespan
    print format_timespan(time(1d), 'd-[hh:mm:ss]')
    print format_timespan(time('12:30:55.123'), 'ddddd-[hh:mm:ss.ffff]')

  • format_datetime
    print format_datetime(todatetime('2009-06-15T13:45:30.6175425'), 'yy-M-dd [H:mm:ss.fff]')
    print format_datetime(datetime(2015-12-14 02:03:04.12345), 'y-M-d h:m:s tt')

  • todatetime
    print todatetime('2014-05-25T08:20:03.123456Z')
    print todatetime('2014-05-25 20:03.123')

  • [totimespan] (https://docs.microsoft.com/en-us/azure/data-explorer/kusto/query/totimespanfunction)
    print totimespan('0.01:34:23')
    print totimespan(1d)

  • startofyear
    print startofyear(datetime(2017-01-01 10:10:17), -1)
    print startofyear(datetime(2017-01-01 10:10:17), 0)
    print startofyear(datetime(2017-01-01 10:10:17), 1)

  • weekofyear
    print week_of_year(datetime(2020-12-31))
    print week_of_year(datetime(2020-06-15))
    print week_of_year(datetime(1970-01-01))
    print week_of_year(datetime(2000-01-01))

  • startofweek
    print startofweek(datetime(2017-01-01 10:10:17), -1)
    print startofweek(datetime(2017-01-01 10:10:17), 0)
    print startofweek(datetime(2017-01-01 10:10:17), 1)

  • startofmonth
    print startofmonth(datetime(2017-01-01 10:10:17), -1)
    print startofmonth(datetime(2017-01-01 10:10:17), 0)
    print startofmonth(datetime(2017-01-01 10:10:17), 1)

  • startofday
    print startofday(datetime(2017-01-01 10:10:17), -1)
    print startofday(datetime(2017-01-01 10:10:17), 0)
    print startofday(datetime(2017-01-01 10:10:17), 1)

  • monthofyear
    print monthofyear(datetime("2015-12-14"))

  • hourofday
    print hourofday(datetime(2015-12-14 18:54:00))

  • getyear
    print getyear(datetime(2015-10-12))

  • getmonth
    print getmonth(datetime(2015-10-12))

  • dayofyear
    print dayofyear(datetime(2015-12-14))

  • dayofmonth
    print (datetime(2015-12-14))

  • unixtime_seconds_todatetime
    print unixtime_seconds_todatetime(1546300800)

  • dayofweek
    print dayofweek(datetime(2015-12-20))

  • now
    print now()
    print now(2d)
    print now(-2h)
    print now(5microseconds)
    print now(5seconds)
    print now(6minutes)
    print now(-2d)
    print now(time(1d))

Miscellaneous functions

  • isnan
    print isnan(double(nan)) == true
    print isnan(4.2) == false
    print isnan(4) == false
    print isnan(real(+inf)) == false

The config setting to allow modify dialect setting.

  • Set dialect setting in server configuration XML at user level(users.xml). This sets the dialect at server startup and CH will do query parsing for all users with default profile acording to dialect value.

For example:
<profiles> <!-- Default settings. --> <default> <load_balancing>random</load_balancing> <dialect>kusto_auto</dialect> </default>

  • Query can be executed with HTTP client as below once dialect is set in users.xml
    echo "KQL query" | curl -sS "http://localhost:8123/?" --data-binary @-

  • To execute the query using clickhouse-client , Update clickhouse-client.xml as below and connect clickhouse-client with --config-file option (clickhouse-client --config-file=<config-file path>)

    <config> <dialect>kusto_auto</dialect> </config>

OR
pass dialect setting with '--'. For example :
clickhouse-client --dialect='kusto_auto' -q "KQL query"

double quote support
print res = strcat("double ","quote")

@robot-ch-test-poll robot-ch-test-poll added pr-feature Pull request with new product feature submodule changed At least one submodule changed in this PR. labels Oct 19, 2022
@alexey-milovidov alexey-milovidov added the can be tested Allows running workflows for external contributors label Dec 4, 2022
@yakov-olkhovskiy yakov-olkhovskiy self-assigned this Dec 7, 2022
@yakov-olkhovskiy
Copy link
Member

@larryluogit larryluogit force-pushed the Kusto-phase2-oss-pr branch 2 times, most recently from c5c9c3d to 2ea4e7c Compare January 17, 2023 04:03
@larryluogit
Copy link
Contributor Author

@yakov-olkhovskiy This PR is clean now. Please review it again. Thanks.

Comment on lines 8 to 16
<!-- How to choose between replicas during distributed query processing.
random - choose random replica from set of replicas with minimum number of errors
nearest_hostname - from set of replicas with minimum number of errors, choose replica
with minimum number of different symbols between replica's hostname and local hostname
(Hamming distance).
in_order - first live replica is chosen in specified order.
first_or_random - if first replica one has higher number of errors, pick a random one from replicas with minimum number of errors.
-->
<load_balancing>random</load_balancing>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we need this?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will remove it. It's probably added by accident when rebase

Comment on lines 340 to 348
else if (dialect == Dialect::kusto_auto)
{
res = tryParseQuery(parser, pos, end, message, true, "", allow_multi_statements, max_length, settings.max_parser_depth);
if (!res)
{
pos = begin;
res = tryParseQuery(kql_parser, pos, end, message, true, "", allow_multi_statements, max_length, settings.max_parser_depth);
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure we want such complications, and after all it's possible we will have other dialects in the future...
@alexey-milovidov do you think we want to have autodetection of dialect? and if we want it I think it's better to generalize it to just auto to incorporate future dialects

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, currently we need such an ability to run kql

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The auto option shouldn't be added in this PR, as it can be implemented (or not implemented at all) separately, later.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kashwy could you please remove this functionality for now - we will return to it some later

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yakov-olkhovskiy , sure, will remove

Comment on lines 109 to 117
ParserKeyword s_kql("KQL");

if (ASTPtr select_node; select.parse(pos, select_node, expected))
if (s_kql.ignore(pos, expected))
{
result_node = std::move(select_node);
if (!ParserKQLTaleFunction().parse(pos, result_node, expected))
return false;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we don't want to change clickhouse syntax - I think it's possible to implement this as a table function

Copy link
Contributor

@kashwy kashwy Mar 7, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this one of our use cases, we need a syntax to embed kql statment inside a SQL query, like:
select * from kql(table|column)

will check to use table functions

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe it's possible to implement this as a function, maybe not that simple though - look as a primary example view function (https://github.com/ClickHouse/ClickHouse/blob/master/src/TableFunctions/TableFunctionView.h). Most likely you will need to extend ExpressionListParsers (https://github.com/ClickHouse/ClickHouse/blob/master/src/Parsers/ExpressionListParsers.cpp) as you need to parse arguments differently.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure, will try to use table function

{
result_node = std::move(select_node);
if (!ParserKQLTaleFunction().parse(pos, result_node, expected))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe it's a typo - should be ParserKQLTableFunction

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will fix

@kashwy
Copy link
Contributor

kashwy commented Apr 10, 2023

Hi, @yakov-olkhovskiy

The PR is updated , could you please review again?

Thanks

@kashwy
Copy link
Contributor

kashwy commented May 3, 2023

Hi, @yakov-olkhovskiy

All issued have been addressed, removed the auto dialect, changed kql() function to table function.

can you take some time to review again ?

thanks

@robot-ch-test-poll
Copy link
Contributor

robot-ch-test-poll commented May 3, 2023

This is an automated comment for commit 16f992a with description of existing statuses. It's updated for the latest CI running

❌ Click here to open a full report in a separate page

Successful checks
Check nameDescriptionStatus
AST fuzzerRuns randomly generated queries to catch program errors. The build type is optionally given in parenthesis. If it fails, ask a maintainer for help✅ success
CI runningA meta-check that indicates the running CI. Normally, it's in success or pending state. The failed status indicates some problems with the PR✅ success
ClickHouse build checkBuilds ClickHouse in various configurations for use in further steps. You have to fix the builds that fail. Build logs often has enough information to fix the error, but you might have to reproduce the failure locally. The cmake options can be found in the build log, grepping for cmake. Use these options and follow the general build process✅ success
Compatibility checkChecks that clickhouse binary runs on distributions with old libc versions. If it fails, ask a maintainer for help✅ success
Docker image for serversThe check to build and optionally push the mentioned image to docker hub✅ success
Docs CheckBuilds and tests the documentation✅ success
Fast testNormally this is the first check that is ran for a PR. It builds ClickHouse and runs most of stateless functional tests, omitting some. If it fails, further checks are not started until it is fixed. Look at the report to see which tests fail, then reproduce the failure locally as described here✅ success
Flaky testsChecks if new added or modified tests are flaky by running them repeatedly, in parallel, with more randomization. Functional tests are run 100 times with address sanitizer, and additional randomization of thread scheduling. Integrational tests are run up to 10 times. If at least once a new test has failed, or was too long, this check will be red. We don't allow flaky tests, read the doc✅ success
Install packagesChecks that the built packages are installable in a clear environment✅ success
Integration testsThe integration tests report. In parenthesis the package type is given, and in square brackets are the optional part/total tests✅ success
Mergeable CheckChecks if all other necessary checks are successful✅ success
Performance ComparisonMeasure changes in query performance. The performance test report is described in detail here. In square brackets are the optional part/total tests✅ success
Push to DockerhubThe check for building and pushing the CI related docker images to docker hub✅ success
SQLTestThere's no description for the check yet, please add it to tests/ci/ci_config.py:CHECK_DESCRIPTIONS✅ success
SQLancerFuzzing tests that detect logical bugs with SQLancer tool✅ success
SqllogicRun clickhouse on the sqllogic test set against sqlite and checks that all statements are passed✅ success
Stateful testsRuns stateful functional tests for ClickHouse binaries built in various configurations -- release, debug, with sanitizers, etc✅ success
Stress testRuns stateless functional tests concurrently from several clients to detect concurrency-related errors✅ success
Style CheckRuns a set of checks to keep the code style clean. If some of tests failed, see the related log from the report✅ success
Unit testsRuns the unit tests for different release types✅ success
Upgrade checkRuns stress tests on server version from last release and then tries to upgrade it to the version from the PR. It checks if the new server can successfully startup without any errors, crashes or sanitizer asserts✅ success
Check nameDescriptionStatus
Stateless testsRuns stateless functional tests for ClickHouse binaries built in various configurations -- release, debug, with sanitizers, etc❌ failure

@kashwy kashwy force-pushed the Kusto-phase2-oss-pr branch 2 times, most recently from 3ef384a to fa9f799 Compare May 16, 2023 04:31
@CLAassistant
Copy link

CLAassistant commented May 23, 2023

CLA assistant check
All committers have signed the CLA.

@larryluogit larryluogit force-pushed the Kusto-phase2-oss-pr branch 2 times, most recently from 93e0efb to a5742e4 Compare May 23, 2023 03:12
@kashwy kashwy force-pushed the Kusto-phase2-oss-pr branch 5 times, most recently from cfa7346 to 5f77c20 Compare June 1, 2023 03:19
@kashwy
Copy link
Contributor

kashwy commented Jun 4, 2023

Hi @yakov-olkhovskiy ,

it appears that there are no test failures related to KQL now , could you please review it again?

Thanks.

@kashwy
Copy link
Contributor

kashwy commented Sep 28, 2023

@yakov-olkhovskiy , I have merged master in and resolved conflict.

Thanks

@kashwy
Copy link
Contributor

kashwy commented Oct 3, 2023

Hi @yakov-olkhovskiy,

Did you get chance to review again?

Thanks

Comment on lines 709 to 745
const auto & settings = global_context->getSettingsRef();
const Dialect & dialect = settings.dialect;
String old_dialect;
switch (dialect)
{
case DB::Dialect::kusto:
old_dialect = "kusto";
break;
case DB::Dialect::clickhouse:
old_dialect = "clickhouse";
break;
case DB::Dialect::prql:
old_dialect = "prql";
break;
}

if (auto *q = orig_ast->as<ASTSetQuery>())
{
auto *setDialect = q->changes.tryGet("dialect");
if (setDialect)
{
old_dialect = setDialect->get<String>();
}
}

//setting dialect to clickhouse during query fuzzing, restore dialect to original value after fuzzing

SCOPE_EXIT_SAFE({
global_context->setSetting("dialect", old_dialect);
});

if (dialect != DB::Dialect::clickhouse)
{
SettingChange new_setting("dialect", "clickhouse");
global_context->applySettingChange(new_setting);
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kashwy I wonder why do we need it at all - this fuzzing functionality is introduced for testing - do you have problems without this addition? it seems somewhat off and I would prefer to remove it if it's not absolutely necessary

Copy link
Contributor

@kashwy kashwy Oct 10, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we got fuzzing test failure without this. the reason behind is that, fuzzing tests are generated from AST which are SQLs. while during KQL test , the dialect has been set to 'kusto' , so the fuzzing SQL wont work

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess we can avoid this issue just returning true if it's a kusto dialect - the same as above at line 706

@yakov-olkhovskiy
Copy link
Member

@kashew please look that piece I pointed out above
everything else looks fine, let's finish it this week

@kashwy
Copy link
Contributor

kashwy commented Oct 10, 2023

@kashew please look that piece I pointed out above everything else looks fine, let's finish it this week

I have addressed the piece you pointed

thanks

Comment on lines 709 to 745
const auto & settings = global_context->getSettingsRef();
const Dialect & dialect = settings.dialect;
String old_dialect;
switch (dialect)
{
case DB::Dialect::kusto:
old_dialect = "kusto";
break;
case DB::Dialect::clickhouse:
old_dialect = "clickhouse";
break;
case DB::Dialect::prql:
old_dialect = "prql";
break;
}

if (auto *q = orig_ast->as<ASTSetQuery>())
{
auto *setDialect = q->changes.tryGet("dialect");
if (setDialect)
{
old_dialect = setDialect->get<String>();
}
}

//setting dialect to clickhouse during query fuzzing, restore dialect to original value after fuzzing

SCOPE_EXIT_SAFE({
global_context->setSetting("dialect", old_dialect);
});

if (dialect != DB::Dialect::clickhouse)
{
SettingChange new_setting("dialect", "clickhouse");
global_context->applySettingChange(new_setting);
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
const auto & settings = global_context->getSettingsRef();
const Dialect & dialect = settings.dialect;
String old_dialect;
switch (dialect)
{
case DB::Dialect::kusto:
old_dialect = "kusto";
break;
case DB::Dialect::clickhouse:
old_dialect = "clickhouse";
break;
case DB::Dialect::prql:
old_dialect = "prql";
break;
}
if (auto *q = orig_ast->as<ASTSetQuery>())
{
auto *setDialect = q->changes.tryGet("dialect");
if (setDialect)
{
old_dialect = setDialect->get<String>();
}
}
//setting dialect to clickhouse during query fuzzing, restore dialect to original value after fuzzing
SCOPE_EXIT_SAFE({
global_context->setSetting("dialect", old_dialect);
});
if (dialect != DB::Dialect::clickhouse)
{
SettingChange new_setting("dialect", "clickhouse");
global_context->applySettingChange(new_setting);
}
// Kusto is not a subject for fuzzing (yet)
if (global_context->getSettingsRef().dialect == DB::Dialect::kusto)
{
return true;
}
if (auto *q = orig_ast->as<ASTSetQuery>())
{
if (auto *setDialect = q->changes.tryGet("dialect"); setDialect && setDialect->safeGet<String>() == "kusto")
return true;
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed

@yakov-olkhovskiy yakov-olkhovskiy merged commit 0738984 into ClickHouse:master Oct 11, 2023
283 of 284 checks passed
@yakov-olkhovskiy
Copy link
Member

@kashwy apparently we have some issues running tests with thread sanitizer:
https://s3.amazonaws.com/clickhouse-test-reports/0/299845b422ebcffaa2af4dd435785abd5c9cebb6/stateless_tests__tsan__[5_5]/run.log
look for ThreadSanitizer report
seems some functionality of pcg-random you are using is not thread safe
I'm working to resolve this issue, but any help/insight would be greatly appreciated :)

@yakov-olkhovskiy
Copy link
Member

yakov-olkhovskiy commented Oct 13, 2023

@kashwy seems like this:
https://github.com/ClickHouse/ClickHouse/blob/master/src/Parsers/Kusto/KustoFunctions/IParserKQLFunction.cpp#L113
this static should be at least thread_local - do you think? though, I'm not sure how unique it will be and should be then...

@kashwy
Copy link
Contributor

kashwy commented Oct 13, 2023

Hi @yakov-olkhovskiy , I will check it

@yakov-olkhovskiy
Copy link
Member

@kashwy I already opened a PR (see reference above) - please check if it's a correct fix

@kashwy
Copy link
Contributor

kashwy commented Oct 13, 2023

sure

@kashwy
Copy link
Contributor

kashwy commented Oct 13, 2023

@kashwy seems like this: https://github.com/ClickHouse/ClickHouse/blob/master/src/Parsers/Kusto/KustoFunctions/IParserKQLFunction.cpp#L113 this static should be at least thread_local - do you think? though, I'm not sure how unique it will be and should be then...

it's used to generate unique alias for same functions used in one statement. so thread_local is good enough,

@yakov-olkhovskiy
Copy link
Member

the question is how unique it is - I'm afraid if it's seeded by the same value then two generated values in two different threads will be the same

@kashwy
Copy link
Contributor

kashwy commented Oct 13, 2023

no problem if same value in different thread, because this uniqueness is to prevent same function alias in one statement if a function has been used more than once, which cause parsing error. so I think it's good as long as its unique in same thread.

// This particular random generator hits each number exactly once before looping over.
// Because of this, it's sufficient for queries consisting of up to 2^16 (= 65536) distinct function calls.
// Reference: https://www.pcg-random.org/using-pcg-cpp.html#insecure-generators
static pcg32_once_insecure random_generator;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, we know - already fixed

@ywangtht
Copy link

ywangtht commented Nov 1, 2023

Is it possible to output the query result in JSON format with Kusto?

@kashwy
Copy link
Contributor

kashwy commented Nov 1, 2023

Is it possible to output the query result in JSON format with Kusto?

it's part of dynamic type, which is not fully supported yet

@yakov-olkhovskiy
Copy link
Member

@ywangtht it is possible to incorporate kql expression into usual sql query with kql table function and use format JSON:
select * from kql('Customers') format JSON

@ywangtht
Copy link

ywangtht commented Nov 2, 2023

I know I can output it in JSON with:
clickhouse-client -q "events |take 1" --dialect='kusto' --format JSON

But how to do it though http interface with curl? I do not see an option to pass format JSON as SETTINGS. There is not option of specifying "format JSON" at end of kusto either.

I understand this is an experimental feature, but still I am wondering how everybody else uses kusto to query ClickHouse.

Another alternative I can think of is to add a clickhouse-client option to convert kql to ClickHouse SQL and let end user to append the format JSON at last.

@yakov-olkhovskiy
Copy link
Member

yakov-olkhovskiy commented Nov 2, 2023

oh, with curl it's pretty simple:
curl -H "X-ClickHouse-Format: JSON" "http://localhost:8123/?dialect=kusto&query=events|take+1"
or
curl -H "X-ClickHouse-Format: JSON" "http://localhost:8123/?dialect=kusto" -d "events|take 1"
-H sets HTTP header

@ywangtht
Copy link

ywangtht commented Nov 2, 2023

@yakov-olkhovskiy, Thanks, the HTTP header works!

Another issue is seems like the dialect=kusto setting cannot be carried through distributed table via http.

I have a distributed table which has two CH nodes, node1 and node2.

curl 'http://node1:8123/?dialect=kusto' -d "events_all | take 1"
Code: 62. DB::Exception: Received from node2:9000. DB::Exception: Syntax error: failed at position 1583 (end of query): . Expected one of: KQL Statement, KQL with output, KQL query, possibly with UNION, KQL query, KQL Table, SET query, SET. (SYNTAX_ERROR) (version 23.10.1.1976 (official build))

@yakov-olkhovskiy
Copy link
Member

@ywangtht this is an interesting one... you can report it as an issue I think

@ywangtht
Copy link

ywangtht commented Nov 3, 2023

@ywangtht this is an interesting one... you can report it as an issue I think

Reported #56289.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
can be tested Allows running workflows for external contributors pr-improvement Pull request with some product improvements submodule changed At least one submodule changed in this PR.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet