-
Notifications
You must be signed in to change notification settings - Fork 6.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement digits function #9077
Comments
It looks like a very special case to get |
This may seem like it is specific to my problem but considering that other databases also implemented something like this, it shows that in some cases it helps a lot. Use case 1Some systems place different information in different digits. For example, in Turkey, old social security number was something like this: 100000601 Last digits are territory codes, 06 is the city Sometimes, it is needed to be extracted in query. Use case 2Sometimes there are checksum digits in numbers, which are calculated based on previous digits. This is the case where Turkey SSN uses. This is the query I wrote to validate the checksum, as you can see there are lots of WITH
[1,3,5,7,9] as odd,
[2,4,6,8] as even,
[1,2,3,4,5,6,7,8,9,10] as alls
select
TCKN,
arrayMap(x -> toInt16(substr(toString(TCKN), x, 1)), odd) odds,
arrayMap(x -> toInt16(substr(toString(TCKN), x, 1)), even) evens,
arrayMap(x -> toInt16(substr(toString(TCKN), x, 1)), alls) as all_digits,
arraySum(odds) as odd_sum,
arraySum(evens) as even_sum,
toInt64(substr(toString(TCKN), 10, 1)) as digit10,
toInt64(substr(toString(TCKN), 11, 1)) as digit11,
(100 + odd_sum * 7 - even_sum) % 10 as digit10_calculated,
arraySum(all_digits) % 10 as digit11_calculated,
digit10 == digit10_calculated as digit10_matches,
digit11 == digit11_calculated as digit11_matches,
digit10_matches + digit11_matches == 2 as valid
from TABLE where valid = 0;; |
That's reasonable. Actual implementation should use divide + remainder of division by power of ten. |
We have
substr
function to get a chunk of a string but there is no equivalent for numbers. We can of course use type convertions but it often gets ugly. IBM DB2 database hasdigits
function for this purpose(but it produces char, not number). ClickHouse deserves better.For Strings:
Column Value:
Performance
as StringChars between 3,7 as String -->>
substr(col, 3, 4)
:rfor
For Numbers:
Column Value:
1234567890
as Int64.Digits between 3,2 as number -->
toInt32(substr(toString(col),3,2))
:34
Compare
toInt32(substr(toString(col),3,4))
againstdigits(col, 3, 2)
, which one is prettier?Solution
Have a
digits
function to do this.Usage
digits(column, offset, length)
column: A numeric column
offset: Staring digit(starts from left)
length: Number of digits
Example
Additional
DigitsReversed
also can be implemented to make it easier to read digits from right to left.The text was updated successfully, but these errors were encountered: