Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a SQL function to format number into human readable format #10584

Closed
FrankChen021 opened this issue Nov 14, 2020 · 1 comment
Closed

Add a SQL function to format number into human readable format #10584

FrankChen021 opened this issue Nov 14, 2020 · 1 comment

Comments

@FrankChen021
Copy link
Member

Description

We're planning to add a format function that turns a number into human readable format.

Motivation

One of our use cases is to use Druid as part of application monitoring system. Most of data stored in Druid are related to byte, such as memory size, disk size, network IO byte. In most cases, theses values are very large especially after aggregation, so we have to convert these data into human readable format to make it easy to recognize when querying.

Druid does not provide such a format function, so now we have to do it in the following ways:

  1. a middleware service does the conversion
  2. front-end projects do the conversion

The downside of above solutions is obvious -- we have to write extra code to do the conversion along side the Druid SQL itself, or we have to tell our frontend engineers that they have to do the conversion on some fields.

So if this format ability is shipped with Druid itself, I think it could simplify the whole development and deployment of our system.

Requirement

These requirements are based on what we have implemented in our project.

  1. Support to format number in IEC format(binary format, such as KiB, MiB, GiB)
  2. Support to format number in SI format(decimal format, such as KB, MB, GB)
  3. Support to format value in simplified IEC or SI format (such as K, M, G)
  4. Support customer precision from 0 to 3
  5. flexible format
    When displaying data in a table, it's more readable to format the value as '5.00 GB', a space between the number and the unit, so that it is easy to recognize the unit. when displaying data in a card(a kind of frontend component, usually displays text in a large size), it's better to show it as '5.17G' in abbreviated unit format, no space between the number and unit

Proposed Change

MySQL has provided a similar function named format_bytes, which satisfies the requirement 1 above.

We're planning to add a SQL function named as size_format(same style as string_format) to serve the needs above, and some alternatives we come up with are listed below

Alternative 1

Prototype
size_format(expression, [format], [unit system], [precision])
  • unit system: format the input in which unit system. Optional value is bin(standing for binary, IEC format) or dec(decimal, SI format). Default is bin
  • precision: precision of value, ranging from 0 to 3. Default is 2
  • format: indicate if there's space between number and value, and if there's a B(indicating byte) suffix
Examples
size_format(1024) # outputs 1.00Ki
size_format(1024, 'B') #outputs 1.00KiB
size_format(1024, ' B') #outputs 1.00 KiB
size_format(1024, ' B',  'bin', 0) #outputs 1 KiB
size_format(1024, ' B',  'bin', 1) #outputs 1.0 KiB
size_format(1024, ' B',  'dec', 1) #outputs 1.02 KB
size_format(1024, ' B',  'dec', 1) #outputs 1.02 KB

Alternative 2

From the view of implementation, the format argument of alternative 1 is not easy to control. And What if user does not provide B as suffix instead of other strings ? So there are rules imposed on this argument that might make it a little bit complex both for users and developers.

Prototype
size_format(expression, [unit], [precision], [has_space])
  • unit: unlike the definition in alternative 1, the unit here divides into 4 categories:
    binary, binary_byte, decimal, decimal_byte. The default is binary_byte.
  • precision: the same as alternative 1, ranging from 0 to 3, default is 2.
  • has_space: whether or not there's a space between the number and the unit. Default is true
Examples
size_format(1024) # outputs 1.00 KiB
size_format(1024, 'binary') #outputs 1.00 Ki
size_format(1024, 'binary_byte') #outputs 1.00 KiB
size_format(1024, 'binary_byte', 1) #outputs 1.0 KiB
size_format(1024, 'binary_byte', 1, false) #outputs 1.0KiB
size_format(1024, 'decimal') #outputs 1.02 K
size_format(1024, 'decimal_byte') #outputs 1.02 KB
size_format(1024, 'decimal_byte', 3, false) #outputs 1.023 KB

Note: The use cases of binary in practice I think are rare, we don't plan to implement it.

Currently, we tend to do it in the way of alternative 2. Want to hear what you think @gianm @jihoonson @asdf2014

@FrankChen021
Copy link
Member Author

Closed via #10635

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant