Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a benchmark on individual Unicode codepoints reading/writing #320

Merged
merged 1 commit into from
Jun 12, 2024
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
65 changes: 65 additions & 0 deletions benchmarks/src/commonMain/kotlin/BufferOps.kt
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@ package kotlinx.io.benchmarks
import kotlinx.benchmark.*
import kotlinx.io.*
import kotlinx.io.bytestring.ByteString
import kotlin.random.Random

@State(Scope.Benchmark)
abstract class BufferRWBenchmarkBase {
Expand Down Expand Up @@ -415,3 +416,67 @@ open class IndexOfByteString {
@Benchmark
fun benchmark() = buffer.indexOf(byteString)
}

@State(Scope.Benchmark)
open class Utf8CodePointsBenchmark : BufferRWBenchmarkBase() {
private val codePointsCount = 128

// Encoding names follow naming from Utf8StringBenchmark
@Param("ascii", "utf8", "sparse", "2bytes", "3bytes", "4bytes", "bad")
var encoding: String = "ascii"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe it's better to use enum in such cases?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I decided to use the same values as for the old read/writeString benchmark. I can replace the parameter with enum, but then the old value need to be replaced (especially 2|3|4bytes), and that would complicate results comparison against older releases.


override fun padding(): ByteArray {
return ByteArray(minGap) { '.'.code.toByte() }
}

private val codePoints = IntArray(codePointsCount)
private var codePointIdx = 0

@Setup
fun fillCodePointsArray() {
fun IntArray.fill(generator: () -> Int) {
for (idx in this.indices) {
this[idx] = generator()
}
}

when (encoding) {
"ascii" -> codePoints.fill { Random.nextInt(' '.code, '~'.code) }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are codes less than 0x20 ignored?)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a personal preference :)

"utf8" -> codePoints.fill {
var cp: Int
do {
cp = Random.nextInt(0, 0x10ffff)
} while (cp in 0xd800 .. 0xdfff)
cp
}
"sparse" -> {
codePoints.fill { Random.nextInt(' '.code, '~'.code) }
codePoints[42] = '⌛'.code
}
"2bytes" -> codePoints.fill { Random.nextInt(0x80, 0x800) }
"3bytes" -> codePoints.fill {
var cp: Int
do {
cp = Random.nextInt(0x800, 0x10000)
} while (cp in 0xd800 .. 0xdfff)
cp
}
"4bytes" -> codePoints.fill { Random.nextInt(0x10000, 0x10ffff) }
"bad" -> codePoints.fill { Random.nextInt(0xd800, 0xdfff) }
}
}


private fun nextCodePoint(): Int {
val idx = codePointIdx
val cp = codePoints[idx]
codePointIdx = (idx + 1) % codePointsCount
return cp
}

@Benchmark
fun benchmark(): Int {
buffer.writeCodePointValue(nextCodePoint())
return buffer.readCodePointValue()
}
}