- Type: Standard Library API proposal
- Author: Abduqodiri Qurbonzoda
- Status: Implemented in Kotlin 1.9.0
- Prototype: Implemented
- Target issue: KT-57762
- Discussion: TBD
Convenient API for formatting binary data into hexadecimal string form and parsing back.
Our research has shown that hexadecimal representation is more widely used than other numeric bases, second only to decimal representation. There are some fundamental reasons for the hex popularity:
- Hexadecimal representation is more human-readable and understandable when it comes to bits. Each digit in the hex system represents exactly four bits of data, making the mapping of a hex digit to its corresponding nibble straightforward.
- Hex representation is more compact than the decimal format and consumes a predictable number of characters.
- The implementation of a hex encoder/decoder is relatively simple and fast.
By providing a convenient API for common use cases described below, we aim to make coding in Kotlin easier and more enjoyable.
The readability of the format makes it very appealing for logging and debugging. The value that is converted to hex for logging is usually less informative itself than its binary representation, e.g., when the value has some particular bit pattern. Another popular use case is printing bytes in some hex dump format, split into lines and groups.
Sometimes binary data needs to be embedded into text-only formats such as URL, XML, or JSON.
Our research indicates that in this use case, hex encoding is among the most frequently used encodings,
especially when encoding primitive values such as Int
and Long
.
The following popular protocols require hex format:
- When generating or parsing HTML code, one might need to work with the hex representation of RGB color codes.
e.g.,
<div style="background-color:#ff6347;">...</div>
- To express Unicode code points in HTML or XML.
e.g.,
<message>It's 🌧 outside, be sure to grab ☂</message>
- The framework used in your project might require specifying IP or MAC addresses in a certain hex format.
e.g.,
"00:1b:63:84:45:e6"
or"001B.6384.45E6"
- Java
HexFormat
class. - Python binascii module. Also,
hex
andfromhex
functions on bytes objects.
Considering the use cases mentioned above it is proposed to have the following format options.
For formatting a numeric value:
- Whether upper case or lower case hexadecimal digits should be used
- The prefix of the hex representation
- The suffix of the hex representation
- Whether to remove leading zeros in the hex representation
For formatting ByteArray
:
- Whether upper case or lower case hexadecimal digits should be used
- The number of bytes per line
- The number of bytes per group
- The string used to separate groups in a line
- The string used to separate bytes in a group
- The prefix of a byte hex representation
- The suffix of a byte hex representation
It is proposed to introduce an immutable HexFormat
class that holds the options.
Builder
is used to configure a format. Each option in the builder has a default value that can be customized.
All related types are nested inside HexFormat
to reduce the top-level surface area of the API:
public class HexFormat internal constructor(
val upperCase: Boolean,
val bytes: BytesHexFormat,
val number: NumberHexFormat
) {
public class Builder internal constructor() {
var upperCase: Boolean = false
val bytes: BytesHexFormat.Builder = BytesHexFormat.Builder()
val number: NumberHexFormat.Builder = NumberHexFormat.Builder()
inline fun bytes(builderAction: BytesHexFormat.Builder.() -> Unit)
inline fun number(builderAction: NumberHexFormat.Builder.() -> Unit)
}
public class BytesHexFormat internal constructor(
val bytesPerLine: Int,
val bytesPerGroup: Int,
val groupSeparator: String,
val byteSeparator: String,
val bytePrefix: String,
val byteSuffix: String
) {
public class Builder internal constructor() {
var bytesPerLine: Int = Int.MAX_VALUE
var bytesPerGroup: Int = Int.MAX_VALUE
var groupSeparator: String = " "
var byteSeparator: String = ""
var bytePrefix: String = ""
var byteSuffix: String = ""
}
}
public class NumberHexFormat internal constructor(
val prefix: String,
val suffix: String,
val removeLeadingZeros: Boolean
) {
public class Builder internal constructor() {
var prefix: String = ""
var suffix: String = ""
var removeLeadingZeros: Boolean = false
}
}
}
BytesHexFormat
and NumberHexFormat
classes hold format options for ByteArray
and numeric values, correspondingly.
upperCase
option, which is common to both ByteArray
and numeric values, is stored in HexFormat
.
It's not possible to instantiate a HexFormat
or its builder directly. The following function is provided instead:
public inline fun HexFormat(builderAction: HexFormat.Builder.() -> Unit): HexFormat
For formatting, the following extension functions are proposed:
// Formats the byte array using HexFormat.upperCase and HexFormat.bytes
public fun ByteArray.toHexString(format: HexFormat = HexFormat.Default): String
public fun ByteArray.toHexString(
startIndex: Int = 0,
endIndex: Int = size,
format: HexFormat = HexFormat.Default
): String
// Formats the numeric value using HexFormat.upperCase and HexFormat.number
// N is Byte, Short, Int, Long, and their unsigned counterparts
public fun N.toHexString(format: HexFormat = HexFormat.Default): String
It is critical to be able to parse the results of the formatting functions above. For parsing, the following extension functions are proposed:
// Parses a byte array
public fun String.hexToByteArray(format: HexFormat = HexFormat.Default): ByteArray
// Parses a numeric value
// N is Byte, Short, Int, Long, and their unsigned counterparts
public fun String.hexToN(format: HexFormat = HexFormat.Default): String
- When formatting a
ByteArray
, the LF character is used to separate lines. - When parsing a
ByteArray
, any of the char sequences CRLF ("\r\n"
), LF ("\n"
) and CR ("\r"
) are considered a valid line separator. - Parsing is performed in a case-insensitive manner.
NumberHexFormat.removeLeadingZeros
is ignored when parsing.- Assigning a non-positive value to
BytesHexFormat.Builder.bytesPerLine/bytesPerGroup
is prohibited. In this caseIllegalArgumentException
is thrown. - Assigning a string containing LF or CR character to
BytesHexFormat.Builder.byteSeparator/bytePrefix/byteSuffix
andNumberHexFormat.Builder.prefix/suffix
is prohibited. In this caseIllegalArgumentException
is thrown.
// Parsing an Int
"3A".hexToInt() // 58
// Formatting an Int
93.toHexString() // "0000005d"
// Parsing a ByteArray
val macAddress = "001b638445e6".hexToByteArray()
// Formatting a ByteArray
macAddress.toHexString(HexFormat { bytes.byteSeparator = ":" }) // "00:1b:63:84:45:e6"
// Defining a format and assigning it to a variable
val threeGroupFormat = HexFormat { upperCase = true; bytes.bytesPerGroup = 2; bytes.groupSeparator = "." }
// Formatting a ByteArray using a previously defined format
macAddress.toHexString(threeGroupFormat) // "001B.6384.45E6"
The Kotlin standard library provides Primitive.toString(radix = 16)
for converting primitive values
to their hex representation. However, this function focuses on converting the values, not bits. As a result:
- Negative values are formatted with minus sign. One needs to convert values of signed types to corresponding unsigned types before converting to hex representation.
- Leading zero nibbles are ignored. To get the full length one must additionally
padStart
the result with'0'
. - Related complaint: KT-60782
There is also String.toPrimitive(radix = 16)
for parsing back a primitive value.
But this function throws if the primitive type can't have the resulting value, even if the bits fit.
e.g., "FF".toByte()
fails. To prevent this, the string must first be converted to the corresponding unsigned type.
ByteArray.joinToString(separator) { byte -> byte.toString(radix = 16) }
can be used to format a ByteArray.
Downsides are:
- Not possible to separate bytes into groups and lines
- Challenges with formatting
Byte
to hex described above
There is no API for parsing ByteArray
currently.
For ByteArray:
contentToString
encodeToByteArray
/decodeToString
joinToString
For primitive types:
toString(radix)
Char.digitToInt()
Int.digitToInt()
As listed above, existing functions with similar purpose use toString
suffix when converting to String
,
and toType
when converting from String
to another type. Thus, options with similar naming schemes were considered:
- Proposed:
toHexString
andhexToType
for formatting and parsing, correspondingly- "hex" used as an adjective
hexToString
orhexifyToString
for formatting- "hex" used as a verb
- A similar verb is needed to describe the parsing of a hex-formatted string
- Use
format
andparse
verbs, e.g.,formatToHexString
andparseHexToByteArray
To
already indicates that the function converts the receiver
- Proposed: Provide formatting and parsing functions as extensions on the type to be converted
- Pro: Discoverable
- Users already know and use the
toString
family of extension functions. When typing "toString", code completion displays the hex conversion functions as well. This can also prompt users to wonder howtoString(radix = 16)
differs fromtoHexString()
, and help to choose the proper one. - Typing ".hex" is enough for code completion to display the hex conversion function for the receiver. No need to remember the exact function name.
- Users already know and use the
- Pro: Allows chaining with other calls
- Con: May pollute code completion for
String
receiver
- Pro: Discoverable
- Provide all formatting and parsing functions on
HexFormat
, similar to JavaHexFormat
and KotlinBase64
- Pro: Gathers all related functions under a single type
- Con: Less discoverable than the proposed approach. Users need to remember that there is
HexFormat
class. - Con: Requires
let
orrun
scope function for chaining with other calls
- Have
BytesHexFormat
andNumberHexFormat
as top-level classes, each with its ownupperCase
property. No need forHexFormat
class. Functions for formatting/parsingByteArray
takeBytesHexFormat
, while functions for numeric types takeNumberHexFormat
. e.g.,byteArray.toHexString( BytesHexFormat { byteSeparator = " "; bytesPerLine = 16 } )
- Pro: Eliminates possible confusion about what options affect formatting
- Con: Two variables are needed to store preferred format options
Builder
overrides a provided format, e.g.,HexFormat(MY_HEX_FORMAT) { bytes.bytesPerLine = ":" }
- Not so many use cases for altering an existing format
- Can be added as an overload of
fun HexFormat()
- Pass options to formatting and parsing functions directly, without introducing
HexFormat
- Not convenient in cases when a format is defined once and used in multiple occasions
- Adding new options in the future is problematic
- There is no way in Kotlin to require calling a function with named arguments.
Passing multiple arguments without specifying names damages code readability,
e.g.,
bitMask.toHexString(true, "0x", false)
Only a subset of Kotlin Standard Library available on all supported platforms is required.
- Standard Library
kotlin.text
package
- HexFormat class: https://github.com/JetBrains/kotlin/blob/master/libraries/stdlib/src/kotlin/text/HexFormat.kt
- Extensions for formatting and parsing: https://github.com/JetBrains/kotlin/blob/master/libraries/stdlib/src/kotlin/text/HexExtensions.kt
- Test cases for formatting and parsing
ByteArray
: https://github.com/JetBrains/kotlin/blob/master/libraries/stdlib/test/text/BytesHexFormatTest.kt - Test cases for formatting and parsing numeric values: https://github.com/JetBrains/kotlin/blob/master/libraries/stdlib/test/text/NumberHexFormatTest.kt
- Adding the ability to limit the number of hex digits when formatting numeric values
NumberHexFormat.maxLength
could be introduced- When formatting an
Int
, combination ofmaxLength = 6
andremoveLeadingZeros = false
results to exactly 6 least significant hex digits - Combination of
maxLength = 6
andremoveLeadingZeros = true
returns at most 6 hex (least-significant) digits without leading zeros
- When formatting an
- Related request: KT-60787
- Overloads for parsing a substring: KT-58277
- Overloads for appending format result to an
Appendable
toHexString
might need to be renamed tohexToString/Appendable
orhexifyToString/Appendable
, becauseInt.toHexString(stringBuilder)
isn't intuitive to infer that the result is appended to the providedStringBuilder
- Formatting and parsing I/O streams in Kotlin/JVM
- Formatting and parsing a
Char
- Although
Char
is not a numeric type, it has aChar.code
associated with it. With the proposed API formatting aChar
won't be an easy task:Char.code.toShort().toHexString()
- Although