Benchmark(local) result of SQLParserEngine(MySQL) #8208

wu-sheng · 2020-11-18T09:38:27Z

Hi @tristaZero , as promised, I run some tests about the performance of SQL Parser. I choose the MySQL grammar randomly.

Here is the tested SQLs, with the codes I run in the tests.

public class SQLParseBenchmark {
    @Test
    @Benchmark
    public void parseSimpleSQLPerf() {
        String sqlStatement = "select * from table_1 where col = ?";
        ParseTree tree = new SQLParserEngine("MySQL").parse(sqlStatement, false);
    }

    @Test
    @Benchmark
    public void parseComplexSQLPerf() {
        String sqlStatement = "SELECT \n" +
            "    productname, \n" +
            "    buyprice\n" +
            "FROM\n" +
            "    products p1\n" +
            "WHERE\n" +
            "    buyprice > (SELECT \n" +
            "            AVG(buyprice)\n" +
            "        FROM\n" +
            "            products\n" +
            "        WHERE\n" +
            "            productline = p1.productline)";
        ParseTree tree = new SQLParserEngine("MySQL").parse(sqlStatement, false);
    }

    @Test
    @Benchmark
    public void parseComplexSQL2Perf() {
        String sqlStatement = "SELECT \n" +
            "    customerNumber, \n" +
            "    customerName\n" +
            "FROM\n" +
            "    customers\n" +
            "WHERE\n" +
            "    EXISTS( SELECT \n" +
            "            orderNumber, SUM(priceEach * quantityOrdered)\n" +
            "        FROM\n" +
            "            orderdetails\n" +
            "                INNER JOIN\n" +
            "            orders USING (orderNumber)\n" +
            "        WHERE\n" +
            "            customerNumber = customers.customerNumber\n" +
            "        GROUP BY orderNumber\n" +
            "        HAVING SUM(priceEach * quantityOrdered) > 60000);";
        ParseTree tree = new SQLParserEngine("MySQL").parse(sqlStatement, false);
    }
}

Single Thread Case W/O Cache

Benchmark                                Mode  Cnt       Score       Error  Units
SQLParseBenchmark.parseSimpleSQLPerf    thrpt    5  200801.158 ± 28973.845  ops/s
SQLParseBenchmark.parseComplexSQLPerf   thrpt    5   56151.571 ±  3432.966  ops/s
SQLParseBenchmark.parseComplexSQL2Perf  thrpt    5   34157.511 ±  3750.865  ops/s

The length of SQL affects the performance pretty much, drop from 200k to 34k, with the complexity of SQLs. And have 10%+- fluctuation.

Active Cache With 1k/100k possible SQLs

I simply added a variable I to provide 1k different SQL samples to warm up the cache like this.

public class SQLParseBenchmark {
    public static int POSSIBILITY = 1000;
    public static int I = 1;

    @Test
    @Benchmark
    public void parseSimpleSQLPerf() {
        String sqlStatement = "select * from table_1 where col = " + I;
        ParseTree tree = new SQLParserEngine("MySQL").parse(sqlStatement, true);
        if (I > POSSIBILITY) {
            I = 1;
        } else {
            I++;
        }
    }

    @Test
    @Benchmark
    public void parseComplexSQLPerf() {
        String sqlStatement = "SELECT \n" +
            "    productname, \n" +
            "    buyprice\n" +
            "FROM\n" +
            "    products p1" + I +
            " WHERE\n" +
            "    buyprice > (SELECT \n" +
            "            AVG(buyprice)\n" +
            "        FROM\n" +
            "            products\n" +
            "        WHERE\n" +
            "            productline = p1.productline)";
        ParseTree tree = new SQLParserEngine("MySQL").parse(sqlStatement, true);
        if (I > POSSIBILITY) {
            I = 1;
        } else {
            I++;
        }
    }

    @Test
    @Benchmark
    public void parseComplexSQL2Perf() {
        String sqlStatement = "SELECT \n" +
            "    customerNumber, \n" +
            "    customerName\n" +
            "FROM\n" +
            "    customers" + I +
            " WHERE\n" +
            "    EXISTS( SELECT \n" +
            "            orderNumber, SUM(priceEach * quantityOrdered)\n" +
            "        FROM\n" +
            "            orderdetails\n" +
            "                INNER JOIN\n" +
            "            orders USING (orderNumber)\n" +
            "        WHERE\n" +
            "            customerNumber = customers.customerNumber\n" +
            "        GROUP BY orderNumber\n" +
            "        HAVING SUM(priceEach * quantityOrdered) > 60000);";
        ParseTree tree = new SQLParserEngine("MySQL").parse(sqlStatement, true);
        if (I > POSSIBILITY) {
            I = 1;
        } else {
            I++;
        }
    }

    public static void main(String[] args) throws RunnerException {
        Options opt = new OptionsBuilder().include(SQLParseBenchmark.class.getSimpleName())
                                          .forks(1)
                                          .warmupIterations(3)
                                          .threads(1)
                                          .syncIterations(false)
                                          .measurementIterations(5)
                                          .build();

        new Runner(opt).run();
    }
}

Benchmark                                Mode  Cnt        Score        Error  Units
SQLParseBenchmark.parseSimpleSQLPerf    thrpt    5  5055988.471 ± 900336.628  ops/s
SQLParseBenchmark.parseComplexSQLPerf   thrpt    5  2760262.638 ± 395728.318  ops/s
SQLParseBenchmark.parseComplexSQL2Perf  thrpt    5  1686125.536 ± 476530.762  ops/s

The performance increases clearly, for the most complex SQL case, it increases from 34k to 1.6m, nearly 47x.

Then I change the possibility of SQLs to 100k, to make LRU cache not very effective, interesting things happened.

Benchmark                                Mode  Cnt       Score       Error  Units
SQLParseBenchmark.parseSimpleSQLPerf    thrpt    5  126939.502 ± 31545.825  ops/s
SQLParseBenchmark.parseComplexSQLPerf   thrpt    5   28216.660 ± 11769.703  ops/s
SQLParseBenchmark.parseComplexSQL2Perf  thrpt    5   18382.000 ±  6581.555  ops/s

If the LRU cache doesn't work efficiently, it is worse than no cache. 34k down to 18k, nearly -50%

Concurrency Tests

I switch to the concurrency cases, first, LRU cache works(1k SQL samples), and activate 6 threads

SQLParseBenchmark.parseSimpleSQLPerf    thrpt    5  10535380.660 ±  417418.673  ops/s
SQLParseBenchmark.parseComplexSQLPerf   thrpt    5   7126066.718 ± 1634052.358  ops/s
SQLParseBenchmark.parseComplexSQL2Perf  thrpt    5   6251270.156 ± 1361498.623  ops/s

168k -> 625k, 4x faster in 6 threads(My Local Laptop is 4Core Intel Core i7, MacBook Pro (15-inch, 2017)).

Then make LRU fails again(100k SQL samples), and still activate 6 threads.

SQLParseBenchmark.parseSimpleSQLPerf    thrpt    5  159424.392 ± 66291.903  ops/s
SQLParseBenchmark.parseComplexSQLPerf   thrpt    5   42207.768 ±  9511.269  ops/s
SQLParseBenchmark.parseComplexSQL2Perf  thrpt    5   27640.844 ± 11868.306  ops/s

Only 18k -> 27k.

One More

I am just curious about the performance of no LRU in the concurrency case, then I keep the 6 threads and close the LRU

Benchmark                                Mode  Cnt       Score       Error  Units
SQLParseBenchmark.parseSimpleSQLPerf    thrpt    5  572351.472 ± 27852.061  ops/s
SQLParseBenchmark.parseComplexSQLPerf   thrpt    5  164481.782 ± 24937.811  ops/s
SQLParseBenchmark.parseComplexSQL2Perf  thrpt    5   88742.504 ± 81617.211  ops/s

The most interesting thing happened, it could provide 88k, comparing to 27k with LRU ON but with large data set.

Conclusion

The concurrent performance makes me a little concerned (not a block) to use this in the trace analysis core. Because, the performance of the OAP backend(one node) could be 10k+ segments/s very easily, in each segment, there could be 5-10 SQL statements normally. And the execution of the analysis would be concurrently for sure.
So, the current performance seems an impact on the current status.

Could you have a deeper discussion about what we could do to improve this?

The text was updated successfully, but these errors were encountered:

terrymanu · 2020-11-18T15:17:42Z

Thank you for the great performance test report.
The performance of parse engine can spilt for using cache and without cache.
It is true for low performance if missing cache hit, but the most popular scenario is using same SQL pattern with parameter marker which is most suitable for using cache.
So we can talk about improve the performance for cache hit. We are using a static guava cache for whole SQL parse result. firstly, I will try to improve the performance for it.

wu-sheng · 2020-11-18T15:24:15Z

but the most popular scenario is using same SQL pattern with parameter marker which is most suitable for using cache

Yes, I just tested for the worse case. SkyWalking as an APM facing countless strange performance issues. We are trying to avoid to be the next one.

jingshanglu · 2020-12-03T02:53:21Z

@wu-sheng Now,cache in parser engine is not static, and create a new API to permit user customize the initial size of it. So, creating multiple parser engines will lead to a linear improvement in performance.

terrymanu · 2022-06-24T04:15:47Z

I think this issue has already expired, the performance kept enhancing in each release.

wu-sheng added type: question type: performance labels Nov 18, 2020

wu-sheng assigned tristaZero Nov 18, 2020

wu-sheng mentioned this issue Nov 18, 2020

Leverage ShardingSphere's SQL parser to get table's read/write level metrics apache/skywalking#5838

Closed

4 tasks

terrymanu mentioned this issue Nov 19, 2020

Adjust parser cache structure for performance and customization reason #8224

Closed

4 tasks

terrymanu closed this as completed Jun 24, 2022

wu-sheng mentioned this issue Sep 22, 2022

[Feature] A lightweight and APM-oriented SQL parser module apache/skywalking#9661

Closed

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Benchmark(local) result of SQLParserEngine(MySQL) #8208

Benchmark(local) result of SQLParserEngine(MySQL) #8208

wu-sheng commented Nov 18, 2020

terrymanu commented Nov 18, 2020

wu-sheng commented Nov 18, 2020

jingshanglu commented Dec 3, 2020

terrymanu commented Jun 24, 2022

Benchmark(local) result of SQLParserEngine(MySQL) #8208

Benchmark(local) result of SQLParserEngine(MySQL) #8208

Comments

wu-sheng commented Nov 18, 2020

Single Thread Case W/O Cache

Active Cache With 1k/100k possible SQLs

Concurrency Tests

One More

Conclusion

terrymanu commented Nov 18, 2020

wu-sheng commented Nov 18, 2020

jingshanglu commented Dec 3, 2020

terrymanu commented Jun 24, 2022