Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement digits function #9077

Open
ramazanpolat opened this issue Feb 11, 2020 · 3 comments
Open

Implement digits function #9077

ramazanpolat opened this issue Feb 11, 2020 · 3 comments
Labels

Comments

@ramazanpolat
Copy link
Contributor

We have substr function to get a chunk of a string but there is no equivalent for numbers. We can of course use type convertions but it often gets ugly. IBM DB2 database has digits function for this purpose(but it produces char, not number). ClickHouse deserves better.

For Strings:
Column Value: Performance as String
Chars between 3,7 as String -->> substr(col, 3, 4): rfor

For Numbers:
Column Value: 1234567890 as Int64.
Digits between 3,2 as number --> toInt32(substr(toString(col),3,2)):34

Compare toInt32(substr(toString(col),3,4)) against digits(col, 3, 2), which one is prettier?

Solution

Have a digits function to do this.

Usage

digits(column, offset, length)

column: A numeric column
offset: Staring digit(starts from left)
length: Number of digits

Example

SELECT digits(1234567890, 3, 2) == 34 as E

┌─E─┐
│ 1 │
└───┘

Additional

DigitsReversed also can be implemented to make it easier to read digits from right to left.

@4ertus2
Copy link
Contributor

4ertus2 commented Feb 11, 2020

It looks like a very special case to get a chunk of numbers. Could you describe your task where you need it?

@4ertus2 4ertus2 added the st-need-info We need extra data to continue (waiting for response) label Feb 11, 2020
@ramazanpolat
Copy link
Contributor Author

It looks like a very special case to get a chunk of numbers. Could you describe your task where you need it?

This may seem like it is specific to my problem but considering that other databases also implemented something like this, it shows that in some cases it helps a lot.

Use case 1

Some systems place different information in different digits. For example, in Turkey, old social security number was something like this: 100000601

Last digits are territory codes, 06 is the city Ankara and 01 is the town Cankaya.

Sometimes, it is needed to be extracted in query.

Use case 2

Sometimes there are checksum digits in numbers, which are calculated based on previous digits. This is the case where Turkey SSN uses. This is the query I wrote to validate the checksum, as you can see there are lots of toInt16(substr(toString...)) notations. It was even longer before because I got rid of repeating them by using arrayMap.

WITH
	[1,3,5,7,9] as odd,
	[2,4,6,8] as even,
	[1,2,3,4,5,6,7,8,9,10] as alls
select
	TCKN,
	arrayMap(x -> toInt16(substr(toString(TCKN), x, 1)), odd) odds,
	arrayMap(x -> toInt16(substr(toString(TCKN), x, 1)), even) evens,
	arrayMap(x -> toInt16(substr(toString(TCKN), x, 1)), alls) as all_digits,
	arraySum(odds) as odd_sum,
	arraySum(evens) as even_sum,
	toInt64(substr(toString(TCKN), 10, 1)) as digit10,
	toInt64(substr(toString(TCKN), 11, 1)) as digit11,
	(100 + odd_sum * 7 - even_sum) % 10 as digit10_calculated,
	arraySum(all_digits) % 10 as digit11_calculated,
	digit10 == digit10_calculated as digit10_matches,
	digit11 == digit11_calculated as digit11_matches,
	digit10_matches + digit11_matches == 2 as valid
from TABLE where valid = 0;;

@4ertus2 4ertus2 removed the st-need-info We need extra data to continue (waiting for response) label Feb 11, 2020
@alexey-milovidov
Copy link
Member

That's reasonable.

Actual implementation should use divide + remainder of division by power of ten.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants