Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kotlin/Native support #32

Merged
merged 13 commits into from Aug 6, 2021
Merged

Kotlin/Native support #32

merged 13 commits into from Aug 6, 2021

Conversation

LepilkinaElena
Copy link

No description provided.

build.gradle Outdated Show resolved Hide resolved
plugin/build.gradle Outdated Show resolved Hide resolved
runtime/build.gradle Outdated Show resolved Hide resolved
runtime/build.gradle Outdated Show resolved Hide resolved
@qwwdfsad qwwdfsad self-requested a review February 15, 2021 15:04
@@ -166,6 +162,8 @@ Available configuration options:
* `iterationTimeUnit` – time unit for `iterationTime` (default is seconds)
* `outputTimeUnit` – time unit for results output
* `mode` – "thrpt" for measuring operations per time, or "avgt" for measuring time per operation
* `iterationMode` – "external" for iterating in gradle in order to get correct Kotlin/Native runtime input in measurement,
"internal" can be used if it's known that measured code have no calls in K/N runtime that can influence on measurement unrepeatedly.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From this description it's very hard to get when one should use this configuration option. Perhaps, if there's a sensible default, we could turn it into a boolean option that would allow to override this default and explain why users would want to do it.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"External" is already such default. It's similar to other settings that also have. default values, but about description I agree. If you have ideas what to write. in README, I'll be grateful

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

* iterationMode – way of iteration for K/N benchmarks. Default value - "external", each iteration is processed as separate run, allows to get more precise results in most cases. "internal" - all iterations of one benchmark run during one binary execution (pay attention, some code performance can't be measured properly using this mode, e.g. singleton initialization, because of implementation details of K/N runtime)

Is this better?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It feels conceptually like 'fork' option in JMH (see the example). Perhaps we could find a resembling name for this option.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

singleton initialization

I always thought that microbenchmarks tried to measure the settled code execution performance, i.e. when all fixed initialization costs are done and no longer affect the performance of the code being measured. That's why they do warm-up iterations.
So for me it looks like that the mode of a fork-per-benchmark method should be the preferred default.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought during implementation about fork, but it's different in jmh. Firstly fork is number and user can set by user. Also as I undertood JMH created needed number of forks and warmups and iterations is made inside each fork, logic of working for "external" mode is another, forks are even more similar for "internal" mode where is one fork for each benchmark. I really think that using fork can confuse users, because they'll expect the same behaviour for Native.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when all fixed initialization costs are done and no longer affect the performance of the code being measured.

It isn't true for native runtime. Done initialization may influence on the next code. And we don't know what exactly user want to measure.

That's why they do warm-up iterations.

Warm-uo for native applications is needed to warmup processors caches, etc..

So for me it looks like that the mode of a fork-per-benchmark method should be the preferred default.

Any code can influence on thresholds inside runtime and that can influence on performance, if user doesn't know runtime implementation details and want to get accurate results he should use "external". We can change default, but I amn't sure that it's right for majority of users. May be we should discuss this, it was already discussed with @qurbonzoda


// Execute benchmark
if (config.iterationMode == "internal") {
val jsonFile = createTempFile("bench", ".json").toFile()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a suggestion: most kotlin.io.* functions have analogs in kotlin.io.path, so you can avoid converting the result of createTempFile to File and just use the returned Path.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem is that kotlin.io.createTempFile is deprecated

@Deprecated(
    "Avoid creating temporary files in the default temp location with this function " +
    "due to too wide permissions on the newly created file. " +
    "Use kotlin.io.path.createTempFile instead or resort to java.io.File.createTempFile."
)

I used one of suggested alternatives

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I mean you can use the result of kotlin.io.path.createTempFile without converting it toFile() first. Though I don't insist.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

May be I didn't get. What was your suggestion? Use deprecated function in order to eliminate toFile()? Or do you mean using Path, not File?

runtime/nativeMain/src/kotlinx/benchmark/Utils.kt Outdated Show resolved Hide resolved
@@ -166,6 +162,8 @@ Available configuration options:
* `iterationTimeUnit` – time unit for `iterationTime` (default is seconds)
* `outputTimeUnit` – time unit for results output
* `mode` – "thrpt" for measuring operations per time, or "avgt" for measuring time per operation
* `iterationMode` – "external" for iterating in gradle in order to get correct Kotlin/Native runtime input in measurement,
"internal" can be used if it's known that measured code have no calls in K/N runtime that can influence on measurement unrepeatedly.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It feels conceptually like 'fork' option in JMH (see the example). Perhaps we could find a resembling name for this option.

@@ -166,6 +162,8 @@ Available configuration options:
* `iterationTimeUnit` – time unit for `iterationTime` (default is seconds)
* `outputTimeUnit` – time unit for results output
* `mode` – "thrpt" for measuring operations per time, or "avgt" for measuring time per operation
* `iterationMode` – "external" for iterating in gradle in order to get correct Kotlin/Native runtime input in measurement,
"internal" can be used if it's known that measured code have no calls in K/N runtime that can influence on measurement unrepeatedly.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

singleton initialization

I always thought that microbenchmarks tried to measure the settled code execution performance, i.e. when all fixed initialization costs are done and no longer affect the performance of the code being measured. That's why they do warm-up iterations.
So for me it looks like that the mode of a fork-per-benchmark method should be the preferred default.

@whyoleg whyoleg mentioned this pull request Feb 20, 2021
actual fun consume(obj: Any?) {
// hashCode now is implemented as taking address of object, so it's suitable now.
// If implementation is changed `Blackhole` should be reimplemented.
consumer += obj.hashCode()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a virtual call that also may be way too heavy for a user-supplied class. That's just a time bomb waiting to explode, so I'd suggest using Blackhole approach (non-ordered volatile write, LLVM directive, w/e).

Also, see https://bugs.openjdk.java.net/browse/JDK-8252505 for the potential pitfalls

consumer += obj.hashCode()
}
actual fun consume(bool: Boolean) {
consumer += bool.hashCode()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here and below, this is still a write, moreover, it is a write to @ThreadLocal object field, so it's unlikely to be add instruction to a known address.

Let's please avoid any writes in performance sensitive-code

val startTime = getTimeNanos()
while (counter-- > 0) {
@Suppress("UNUSED_VARIABLE")
val result = instance.executeFunction() // ignore result for now, but might need to consume it somehow
}
GC.collect()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unconditional GC between each measurement iteration implies multiple things:

  • Small micro-nano benchmarks may be dominated by this runtime call with undesired side-effects
  • The amortized cost of allocations (especially TLAB-like) is simply not measured. E.g. it would be impossible to compare two JSON parsers that differentiate mostly in allocation patterns
  • We potentially mess up with CPU caches on each iteration. That's not really helpful for correct measurements and comparison, especially when comparing regular and cache-obvious algorithms

I strongly suggest making GC after each iteration optional and disabled by default

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I got your concerns, I can do several modes. We can even use fully turned off GC as the second mode. I really don't like an idea to. run benchmark with only auto GC calls made by runtime, because then measurements can differ a lot between different runs of benchmarks. Moreover, some users expect that the result of running same benchmark as separate application should be close to results of benchmarking with library, that's impossible with auto-called GC.

What do you think if we make 2 modes: one with calling GC after each iteration (behaviour similar to separate application) and without GC calls at all?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO It's hard to actually have non-micro benchmarks without GC at all, but in my (JVM-ish) experience I mostly used to measure GC as amortized part of the execution. And in really rare cases (whether I want precision, a minimal amount of noise or do not care about memory pressure) I use -prof gc that collects garbage after each iteration.

I think it's ok to have three modes for GC: none, auto and iteration with auto as default

@@ -114,28 +116,133 @@ fun Project.createNativeBenchmarkExecTask(
onlyIf { linkTask.enabled }

val reportsDir = benchmarkReportsDir(config, target)
val reportFile = reportsDir.resolve("${target.name}.${config.reportFileExt()}")
reportFile = reportsDir.resolve("${target.name}.json")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Preserving ${config.reportFileExt()} is enough to support specified report format.

// Get full list of running benchmarks
execute(listOf(configFile.absolutePath, "--list", benchsDescriptionDir.absolutePath))
val detailedConfigFiles = project.fileTree(benchsDescriptionDir).files.sortedBy { it.absolutePath }
val jsonReportParts = mutableListOf<File>()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems it is a redundant val

@@ -1,13 +1,178 @@
package kotlinx.benchmark

import kotlinx.cinterop.pin
import kotlin.experimental.xor
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please comment out these imports as well

@@ -94,10 +94,12 @@ benchmark {
fast { // --> jvmFastBenchmark
include("Common")
exclude("long")
iterations = 1
iterations = 10
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The purpose of the fast configuration is to take a very small time. The configuration is usually run to test if the project still builds and runs benchmarks, without emphasis on correctness. Maybe 5 iterations are enough?

@@ -9,6 +9,8 @@ open class BenchmarkConfiguration(val extension: BenchmarksExtension, val name:
var iterationTime: Long? = null
var iterationTimeUnit: String? = null
var mode: String? = null
var nativeIterationMode: String? = null // TODO: where should warning about K/N specific of this parameter be shown?
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can these advanced platform-specific options be defined using advanced(name, value)?

): Double {
val executeFunction = benchmark.function
var counter = cycles
GC.collect()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is GC.collect() before each iteration intentional even if nativeGCCollectMode is auto?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

warmup also does GC.collect()

@@ -9,6 +9,7 @@ open class BenchmarkConfiguration(val extension: BenchmarksExtension, val name:
var iterationTime: Long? = null
var iterationTimeUnit: String? = null
var mode: String? = null
var nativeGCCollectMode: String? = null
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would rather make nativeGCCollectMode be declared via advanced(...) as well, similar to nativeIterationMode.

@LepilkinaElena LepilkinaElena merged commit 43884af into master Aug 6, 2021
@LepilkinaElena LepilkinaElena deleted the lepilkina/native_support branch August 6, 2021 08:33
@ilya-g
Copy link
Member

ilya-g commented Aug 6, 2021

Several discussions are left unresolved here, and I think the merge was too early.
We should resolve outstanding questions with follow-up commits/PRs to the master.

@LepilkinaElena
Copy link
Author

It was discussed with @qurbonzoda in direct messages and it's decided to merge it, I got an approve to merge from him.

@ilya-g could you please write all your concerns.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants