Skip to content

Comments

perf: optimize Groovy-based LAL DSL with static compilation#7263

Merged
kezhenxu94 merged 1 commit intomasterfrom
perf/groovy
Jul 9, 2021
Merged

perf: optimize Groovy-based LAL DSL with static compilation#7263
kezhenxu94 merged 1 commit intomasterfrom
perf/groovy

Conversation

@kezhenxu94
Copy link
Member

Groovy naturally supports many dynamic features that we don't benefit for now but cost performance loss, in this patch we compile our Groovy-based DSL scripts statically to optimize performance.

Improve the performance of LAL

class MyBenchmark {

    ModuleManager manager;
    DSL dsl;
    LogAnalyzerModuleConfig config;
    FilterSpec filter;


    @Setup
    public void setup() throws ModuleStartException {
        manager = new ModuleManager();
        config = new LogAnalyzerModuleConfig();
        filter = new FilterSpec(manager, config);
        dsl = DSL.of(manager, new LogAnalyzerModuleConfig(), "filter {\n" +
                                                             "  json {\n" +
                                                             "  }\n" +
                                                             "  // only collect abnormal logs (http status code >= 300, or commonProperties?.responseFlags is not empty)\n" +
                                                             "  if (parsed?.response?.responseCode as Integer < 400 && !parsed?.commonProperties?.responseFlags) {\n" +
                                                             "    abort {}\n" +
                                                             "  }\n" +
                                                             "  extractor {\n" +
                                                             "    if (parsed?.response?.responseCode) {\n" +
                                                             "      tag 'status.code': parsed?.response?.responseCode as int\n" +
                                                             "    }\n" +
                                                             "    tag 'response.flag': parsed?.commonProperties?.responseFlags\n" +
                                                             "  }\n" +
                                                             "  sink {\n" +
                                                             "    sampler {\n" +
                                                             "      if (parsed?.commonProperties?.responseFlags) {\n" +
                                                             "        // use service:errorCode as sampler id so that each service:errorCode has its own sampler,\n" +
                                                             "        // e.g. checkoutservice:[upstreamConnectionFailure], checkoutservice:[upstreamRetryLimitExceeded]\n" +
                                                             "        rateLimit(\"${log.service}:${parsed?.commonProperties?.responseFlags}\") {\n" +
                                                             "          qps 100\n" +
                                                             "        }\n" +
                                                             "      } else {\n" +
                                                             "        // use service:responseCode as sampler id so that each service:responseCode has its own sampler,\n" +
                                                             "        // e.g. checkoutservice:500, checkoutservice:404.\n" +
                                                             "        rateLimit(\"${log.service}:${parsed?.response?.responseCode}\") {\n" +
                                                             "          qps 100\n" +
                                                             "        }\n" +
                                                             "      }\n" +
                                                             "    }\n" +
                                                             "  }\n" +
                                                             "}"
        );
        dsl.bind(new Binding().log(LogData.newBuilder().build()));
    }

    @Benchmark
    public void testDSL(Blackhole bh) throws ModuleStartException {
        dsl.evaluate();
    }

    public static void main(String[] args) throws RunnerException {
        LogManager.shutdown();
        Options opt = new OptionsBuilder()
                .warmupForks(3)
                .forks(5)
                .include(MyBenchmark.class.getSimpleName())
                .addProfiler(GCProfiler.class)
                .build();

        new Runner(opt).run();
    }
}
  • The benchmark result.

    • Before
    Benchmark                                              Mode  Cnt       Score      Error   Units
    MyBenchmark.testDSL                                   thrpt   25  246670.893 ± 6418.101   ops/s
    MyBenchmark.testDSL:·gc.alloc.rate                    thrpt   25     602.013 ±   15.646  MB/sec
    MyBenchmark.testDSL:·gc.alloc.rate.norm               thrpt   25    2688.000 ±    0.001    B/op
    MyBenchmark.testDSL:·gc.churn.PS_Eden_Space           thrpt   25     602.247 ±   15.530  MB/sec
    MyBenchmark.testDSL:·gc.churn.PS_Eden_Space.norm      thrpt   25    2689.112 ±   11.894    B/op
    MyBenchmark.testDSL:·gc.churn.PS_Survivor_Space       thrpt   25       0.058 ±    0.008  MB/sec
    MyBenchmark.testDSL:·gc.churn.PS_Survivor_Space.norm  thrpt   25       0.257 ±    0.035    B/op
    MyBenchmark.testDSL:·gc.count                         thrpt   25    1498.000             counts
    MyBenchmark.testDSL:·gc.time                          thrpt   25    1636.000                 ms
    
    • After
    Benchmark                                              Mode  Cnt       Score      Error   Units
    MyBenchmark.testDSL                                   thrpt   25  385547.228 ± 9829.731   ops/s
    MyBenchmark.testDSL:·gc.alloc.rate                    thrpt   25     627.257 ±   16.030  MB/sec
    MyBenchmark.testDSL:·gc.alloc.rate.norm               thrpt   25    1792.000 ±    0.001    B/op
    MyBenchmark.testDSL:·gc.churn.PS_Eden_Space           thrpt   25     628.196 ±   18.549  MB/sec
    MyBenchmark.testDSL:·gc.churn.PS_Eden_Space.norm      thrpt   25    1794.442 ±   15.527    B/op
    MyBenchmark.testDSL:·gc.churn.PS_Survivor_Space       thrpt   25       0.066 ±    0.009  MB/sec
    MyBenchmark.testDSL:·gc.churn.PS_Survivor_Space.norm  thrpt   25       0.190 ±    0.027    B/op
    MyBenchmark.testDSL:·gc.count                         thrpt   25    1513.000             counts
    MyBenchmark.testDSL:·gc.time                          thrpt   25    1798.000                 ms
    
  • Links/URLs to the theory proof or discussion articles/blogs. <links/URLs here>

  • If this pull request closes/resolves/fixes an existing issue, replace the issue number. Closes #. Not close, related to Improve LAL performance  #7180

  • Update the CHANGES log.

@kezhenxu94 kezhenxu94 added backend OAP backend related. enhancement Enhancement on performance or codes labels Jul 8, 2021
@kezhenxu94 kezhenxu94 added this to the 8.7.0 milestone Jul 8, 2021
@kezhenxu94 kezhenxu94 requested a review from wu-sheng July 8, 2021 04:56
@codecov
Copy link

codecov bot commented Jul 8, 2021

Codecov Report

Merging #7263 (b2952fa) into master (58dce44) will decrease coverage by 2.79%.
The diff coverage is 65.75%.

❗ Current head b2952fa differs from pull request most recent head f9879d7. Consider uploading reports for the commit f9879d7 to get more accurate results
Impacted file tree graph

@@             Coverage Diff              @@
##             master    #7263      +/-   ##
============================================
- Coverage     53.81%   51.01%   -2.80%     
+ Complexity     4349     4050     -299     
============================================
  Files          1899     1904       +5     
  Lines         40950    41139     +189     
  Branches       4595     4583      -12     
============================================
- Hits          22038    20988    -1050     
- Misses        17822    19022    +1200     
- Partials       1090     1129      +39     
Impacted Files Coverage Δ
...ng/oap/log/analyzer/dsl/spec/sink/SamplerSpec.java 64.70% <ø> (+41.17%) ⬆️
...lking/oap/log/analyzer/dsl/spec/sink/SinkSpec.java 62.50% <ø> (+43.75%) ⬆️
...oap/log/analyzer/dsl/spec/LALDelegatingScript.java 23.52% <23.52%> (ø)
...p/log/analyzer/dsl/spec/parser/TextParserSpec.java 66.66% <50.00%> (+2.38%) ⬆️
.../oap/log/analyzer/dsl/LALPrecompiledExtension.java 71.42% <71.42%> (ø)
...log/analyzer/dsl/spec/extractor/ExtractorSpec.java 50.00% <76.92%> (+25.78%) ⬆️
...rg/apache/skywalking/oap/log/analyzer/dsl/DSL.java 100.00% <100.00%> (ø)
...g/oap/log/analyzer/dsl/spec/filter/FilterSpec.java 73.43% <100.00%> (+28.82%) ⬆️
...g/analyzer/dsl/spec/parser/AbstractParserSpec.java 100.00% <100.00%> (ø)
...p/log/analyzer/dsl/spec/parser/JsonParserSpec.java 100.00% <100.00%> (+20.00%) ⬆️
... and 137 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 58dce44...f9879d7. Read the comment docs.

@wu-sheng
Copy link
Member

wu-sheng commented Jul 8, 2021

Please fix CI. And serialization has more performance impact for groovy, right?

Copy link
Member

@wu-sheng wu-sheng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this static compilation available for runtime script update or new script as input?
I am thinking to share the same core for #7168 requirement

@wu-sheng
Copy link
Member

wu-sheng commented Jul 9, 2021

Comparing 385k to 246k, we have a 50% performance increment, is this running in one Mac Intel core? Could you attach the hardware standard here?

@kezhenxu94
Copy link
Member Author

Is this static compilation available for runtime script update or new script as input?
I am thinking to share the same core for #7168 requirement

Yes. Whenever a new LAL DSL script is created, we can compile it to Java-style static byte codes once, after that its execution performance is close to Java.

Comparing 385k to 246k, we have a 50% performance increment, is this running in one Mac Intel core? Could you attach the hardware standard here?

Yes, they were benchmarked in a single thread on my Mac.

MacBook Pro (16-inch, 2019)
Processor 2.4 GHz 8-Core Intel Core i9
Memory 32 GB 2667 MHz DDR4
# JMH version: 1.32
# VM version: JDK 1.8.0_292, OpenJDK 64-Bit Server VM, 25.292-b10
# VM invoker: /Users/kezhenxu94/.asdf/installs/java/adoptopenjdk-8.0.292+10/jre/bin/java
# VM options: <none>
# Blackhole mode: full + dont-inline hint
# Warmup: 5 iterations, 10 s each
# Measurement: 5 iterations, 10 s each
# Timeout: 10 min per iteration
# Threads: 1 thread, will synchronize iterations
# Benchmark mode: Throughput, ops/time
# Benchmark: org.apache.skywalking.MyBenchmark.testDSL

@wu-sheng
Copy link
Member

wu-sheng commented Jul 9, 2021

Is this static compilation available for runtime script update or new script as input?

I am thinking to share the same core for #7168 requirement

Yes. Whenever a new LAL DSL script is created, we can compile it to Java-style static byte codes once, after that its execution performance is close to Java.

Then my next question would be, if we use this as debug tool? Whether could cause metaspace oom? Or we need to set up previous dymanic engine for debug tool.

@kezhenxu94
Copy link
Member Author

kezhenxu94 commented Jul 9, 2021

Is this static compilation available for runtime script update or new script as input?

I am thinking to share the same core for #7168 requirement

Yes. Whenever a new LAL DSL script is created, we can compile it to Java-style static byte codes once, after that its execution performance is close to Java.

Then my next question would be, if we use this as debug tool? Whether could cause metaspace oom? Or we need to set up previous dymanic engine for debug tool.

In terms of dynamic update LAL / MAL and debugging tool for LAL, we can discuss in their corresponding issues because metaspace OOM may also exist before this PR, it's due to Groovy Shell's compilation and references to some class loaders, etc., when we want to dynamically update LAL / MAL and debug them in UI, we should consider this then.

Groovy naturally supports many dynamic features that we don't benefit for now but cost performance loss, in this patch we compile our Groovy-based DSL scripts statically to optimize performance.
@wu-sheng
Copy link
Member

wu-sheng commented Jul 9, 2021

Is this static compilation available for runtime script update or new script as input?

I am thinking to share the same core for #7168 requirement

Yes. Whenever a new LAL DSL script is created, we can compile it to Java-style static byte codes once, after that its execution performance is close to Java.

Then my next question would be, if we use this as debug tool? Whether could cause metaspace oom? Or we need to set up previous dymanic engine for debug tool.

In terms of dynamic update LAL / MAL and debugging tool for LAL, we can discuss in their corresponding issues because metaspace OOM may also exist before this PR, it's due to Groovy Shell's compilation and references to some class loaders, etc., when we want to dynamically update LAL / MAL and debug them in UI, we should consider this then.

FYI @dmsolr This change should have a very clear impact for your update feature. Please recheck at your sides, especially whether update cause memory leak(because of new classes generated).

Copy link
Member

@wu-sheng wu-sheng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Once all tests passed, I am good.

@kezhenxu94 kezhenxu94 merged commit fd92f46 into master Jul 9, 2021
@kezhenxu94 kezhenxu94 deleted the perf/groovy branch July 9, 2021 10:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backend OAP backend related. enhancement Enhancement on performance or codes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants