Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance [WIP] #928

Open
3 of 13 tasks
zhicwu opened this issue May 10, 2022 · 0 comments
Open
3 of 13 tasks

Performance [WIP] #928

zhicwu opened this issue May 10, 2022 · 0 comments

Comments

@zhicwu
Copy link
Contributor

zhicwu commented May 10, 2022

This is an umbrella issue to track performance issues, collect ideas, and document improvements as well as latest update. It can also be used for other purposes like evaluating ClickHouse client, understanding how slow Java lib is and why.

Issues

# Module Issue Workaround
1 clickhouse-client CPU was not fully utilized in query Change bufferingMode to PERFORMANCE to spawn one more thread for compression/decompression
2 clickhouse-jdbc Slow SQL parsing Use Java client or avoid passing large SQL to JDBC driver
3 clickhouse-jdbc Slow batch insert and high memory usage when dealing large data volume Use Java client or multi-threading
4 clickhouse-jdbc Query for initializing JDBC connection set server_time_zone and server_version in connection string or properties to skip that

TODOs

  • Benchmark
    • Compare different buffering modes
    • Compare secure and insecure connections
    • Collect server metrics
    • Docker image
  • Execution
    • Adaptive query mode - Adaptive query mode? #1105
    • Enable async execution in JDBC driver
    • GraalVM for less startup time (mainly for testing)
  • Serialization
    • Add ByteUtils along with unsafe implementation
    • Introduce ByteBudy / asm to generate compact code without unnecessary checks / method calls
    • New ByteUtils implementation on top of Vector API for JDK 17+
  • Streaming
    • Direct write instead of piped stream
    • DirectBuffer if possible
    • Zero-copy for direct file-based streaming(e.g. ClickHouseClient.dump() and ClickHouseClient.load()), if possible

Test Environment

Two KVMs on same host, one as ClickHouse server and the other as test client.

VM vCPU MEM O/S Software
ch-server 4 16GB fedora 6.1.9-100.fc36.x86_64 clickhouse-server 22.8.13.20
test-client 4 16GB fedora 6.1.9-100.fc36.x86_64 clickhouse-client 22.8.13.20 & Open JDK 17.0.5

Note: the host is a PC server(O/S: Fedora 36 server, CPU: 24x 12th Gen Intel(R) Core(TM) i9-12900T, MEM: 2 x 32 GB DDR4, SSD). iperf3 shows ~75.6 Gbits/sec bitrate between the two VMs.

Query Time

Measure elapsed time and resource usage at client side. To understand the gaps:

  • Protocol Discrepancy(http vs. tcp/native) - curl vs. clickhouse-client
  • Streaming Efficiency - curl vs. Java(dump/load)
  • Deserialization Cost - Java(dump) vs. Java(custom)
  • Serialization Cost - Java(load) vs. Java(write)
  • Encapsulation Overhead & Poor Implementation - Java(custom) vs. Java(default) and Java(default/write) vs. JDBC
About the test...
  • Steps

    1. restart ClickHouse server and sleep 10 seconds for each case
    2. execute same case consecutively 5 times with 5-second interval
    3. pick the median elapsed time as result
  • Queries

    Query SQL
    Int8 select (number % 256)::Int8 v from numbers(500000000)
    UInt64 select * from numbers(500000000)
    String select toString(number) from numbers(500000000)
    Mixed select (number % 256)::Int8 a, number b, toString(number) c from numbers(300000000)
  • Clients

    • clickhouse-client - ClickHouse native command-line client
      # read
      clickhouse-client --compression=$compression --host=$db_host --port=$db_tcp_port \
          --database="$db_name" --query="$query" --format=$format > $outfile
      # write
      cat $outfile | clickhouse-client --compression=$compression --host=$db_host --port=$db_tcp_port \
       --database=$db_name --query="insert into test_load format $format"
    • curl - A command-line tool for transferring data specified with URL syntax
      # read - when compression is 1, compress_header_opt will be "-H 'Accept-Encoding: lz4'"
      curl -s -u "$db_user:$db_passwd" --get --data-urlencode "compress=$compression" \
          --data-urlencode "query=$query" $compress_header_opt \
          -H "X-ClickHouse-Format: $format" "http://$db_host:$db_http_port" > $outfile
      # write - don't use -d @file or it will end up with OOM error
      curl -s -u "$db_user:$db_passwd" -X POST -H "X-ClickHouse-Database: $db_name" -H "Transfer-Encoding: chunked" -T $outfile \
          "http://$db_host:$db_http_port/?query=insert%20into%20test_load%20format%20$format&decompress=$compression"
    • Java(dump) - Java client to dump query result without decompression and deserialization
      try (FileOutputStream out = new FileOutputStream(outfile);
          ClickHouseResponse response = request.query(query).format(format.defaultInputFormat())
              .output(out).executeAndWait()) {
          return (int) response.getSummary().getWrittenRows();
      }
    • Java(load) - Java client to load data from file without compression and serialization
      try (FileOutputStream out = new FileOutputStream(outfile);
          ClickHouseResponse response = request.query(query).format(format.defaultInputFormat())
              .output(out).executeAndWait()) {
          return (int) response.getSummary().getWrittenRows();
      }
    • Java(custom) - Java client with decompression and custom implementation for deserializing RowBinary response
      try (ClickHouseResponse response = request.query(query).executeAndWait();
          ClickHouseInputStream input = response.getInputStream()) {
            int count = 0;
            try (FileOutputStream out = new FileOutputStream(outfile)) {
                byte b = (byte) 0;
                long l = 0L;
                String s = "";
                input.setCopyToTarget(out);
                while (input.available() > 0) {
                    b = input.readByte(); // read Int8
                    l = input.readBuffer(8).asLong(); // read UInt64
                    s = input.readUnicodeString(); // read String
                    count++;
                }
                if (count > (0xFF & b) && count > l && count > s.length()) {
                    out.flush();
                }
           }
      }
    • Java(default) - Java client with decompression and ClickHouseDataProcessor & ClickHouseRecord for deserialization
      try (ClickHouseResponse response = request.query(query).executeAndWait();
          ClickHouseInputStream input = response.getInputStream()) {
            int count = 0;
            byte b = (byte) 0;
            long l = 0L;
            String s = "";
            input.setCopyToTarget(out);
            for (ClickHouseRecord r : response.records()) {
                b = r.getValue(0).asByte();
                l = r.getValue(1).asLong();
                s = r.getValue(2).asString();
                count++;
            }
            if (count > (0xFF & b) && count > l && count > s.length()) {
                out.flush();
            }
      }
    • Java(write) - Java client to insert generated data using ClickHouseDataProcess for serialization and ClickHouseWriter for streaming
      try (ClickHouseResponse response = request.write().format(format.defaultInputFormat()).table("test_load").data(o -> {
          ClickHouseDataProcessor processor = ClickHouseDataStreamFactory.getInstance()
              .getProcessor(request.getConfig(), null, o, null, null);
          int len = columns.length;
          ClickHouseValue[] values = new ClickHouseValue[len];
          for (int i = 0; i < len; i++) {
              values[i] = columns[i].newValue(request.getConfig());
          }
      
          int range = len == 1 ? 500000000 : 300000000;
          for (int i = 0; i < range; i++) {
              for (int j = 0; j < len; j++) {
                  processor.write(values[j].update(i), columns[j]);
              }
          }
          o.flush();
      }).executeAndWait()) {
          return (int) response.getSummary().getWrittenRows();
      }
    • JDBC - JDBC driver with standard API
      // read
      try (Connection conn = driver.connect(url, props); Statement stmt = conn.createStatement();
          ResultSet rs = stmt.executeQuery(query)) {
          int count = 0;
          byte b = (byte) 0;
          long l = 0L;
          String s = "";
          rs.unwrap(ClickHouseResponse.class).getInputStream().setCopyToTarget(out);
          while (rs.next()) {
              b = rs.getByte(1);
              l = rs.getLong(2);
              s = rs.getString(3);
              count++;
          }
          if (count > (0xFF & b) && count > l && count > s.length()) {
              out.flush();
          }
      }
      
      // write - batch insert
      String insertQuery = "insert into test_load values(?,?,?)"; // or insert into test_load values(?) for other cases
      int range = 300000000; // 500000000 for Int8, UInt64 and String
      try (Connection conn = driver.connect(url, props); PreparedStatement ps = conn.prepareStatement(insertQuery)) {
          int batchSize = 100000;
          int count = 0;
          for (int i = 0; i < range; i++) {
              ps.setByte(1, (byte) (i % 256));
              ps.setLong(2, i);
              ps.setString(3, Integer.toString(i));
              ps.addBatch();
              if ((count = i % batchSize) == 0) {
                  ps.executeBatch();
              }
          }
          if (count > 0) {
              ps.executeBatch();
          }
      }

Read & Write

        lz4     none    
Mode Client Format Case ElapsedTime CPU% MEM(MB) ElapsedTime CPU% MEM(MB)
Read clickhouse-client Native Int8 0.39 52% 107.39 0.44 53% 107.66
      Mixed 8.46 84% 112.74 4.79 91% 112.51
      String 9.23 75% 110.92 5.44 96% 110.02
      UInt64 5.41 85% 109.1 1.59 99% 109.06
    RowBinary Int8 1.25 387% 109.55 1.3 386% 109.5
      Mixed 8.69 151% 135.3 4.38 252% 134.85
      String 9.34 108% 118.56 4.13 209% 118.12
      UInt64 5.11 181% 111.93 2.86 200% 112.25
  curl Native Int8 0.35 2% 9.01 0.71 49% 8.78
      Mixed 10.1 9% 8.96 6.15 41% 8.92
      String 11.41 7% 8.97 6.57 32% 8.82
      UInt64 5.33 16% 8.87 2.09 91% 8.8
    RowBinary Int8 1.27 0% 8.83 1.42 20% 8.86
      Mixed 5.75 13% 8.86 4.8 58% 8.85
      String 8.26 11% 8.87 5.26 49% 8.95
      UInt64 6.1 17% 8.89 3.02 68% 8.93
  Java Client(custom) RowBinaryWithNamesAndTypes Int8 1.94 68% 66.88 1.69 79% 69.18
      Mixed 9.99 102% 265.53 8.43 102% 360.99
      String 13.53 86% 328.16 10.42 92% 405.59
      UInt64 6.84 104% 211.25 4.48 105% 102.79
  Java Client(deser) RowBinaryWithNamesAndTypes Int8 3.48 95% 67.42 3.41 102% 66.71
      Mixed 14.63 96% 214.33 13.21 99% 259.7
      String 14.45 102% 266.73 13.48 95% 260.09
      UInt64 9.07 104% 212.3 7.13 102% 99.25
  Java Client(dump) Native Int8 0.46 51% 55.39 0.92 161% 112.65
      Mixed 10.55 18% 71.19 6.18 74% 101.36
      String 11.68 14% 71.49 8.27 49% 108.95
      UInt64 5.64 30% 69.58 3.95 75% 109.5
    RowBinaryWithNamesAndTypes Int8 1.4 17% 55.83 1.72 46% 62.3
      Mixed 5.72 35% 96.98 5.64 94% 138
      String 8.72 23% 78.37 6.15 68% 82.48
      UInt64 5.8 34% 99.35 4.46 77% 117.42
  Java Client(read) RowBinaryWithNamesAndTypes Int8 2.8 95% 67.6 2.85 104% 69.49
      Mixed 18.05 94% 215.16 13.22 99% 206.48
      String 14.09 102% 268.16 13.31 91% 261.98
      UInt64 8.3 103% 213.43 5.8 104% 104.67
  JDBC Driver RowBinaryWithNamesAndTypes Int8 4.95 97% 72.24 5.2 102% 69
      Mixed 16.69 95% 212.74 15.05 100% 209.39
      String 16.3 101% 266.23 15.72 92% 261.5
      UInt64 11.68 102% 212.85 9.21 102% 107.32
Write clickhouse-client Native Int8 0.18 57% 1.94 0.65 19% 1.95
      Mixed 18.9 7% 1.95 16.66 10% 1.94
      String 2.69 37% 1.94 5.2 18% 1.94
      UInt64 1.45 62% 1.94 3.25 31% 1.94
    RowBinary Int8 7.52 2% 1.94 7.56 3% 1.94
      Mixed 16.56 5% 1.94 17.73 7% 1.94
      String 19.34 7% 1.94 18.5 7% 1.94
      UInt64 13.66 7% 1.94 11.83 9% 1.95
  curl Native Int8 0.47 0% 8.92 0.58 20% 8.85
      Mixed 16.49 3% 8.82 16.94 6% 8.82
      String 6.4 6% 8.92 8.1 8% 8.91
      UInt64 5.51 8% 8.78 2.77 38% 8.87
    RowBinary Int8 7.55 0% 8.9 7.75 1% 8.84
      Mixed 23.48 2% 8.85 23.51 5% 8.89
      String 22.76 1% 8.91 23.51 3% 8.89
      UInt64 16.86 3% 8.94 15.93 6% 8.88
  Java Client(load) Native Int8 0.59 37% 54.9 0.67 76% 59.87
      Mixed 16.88 9% 62.66 15.95 18% 58.67
      String 6.75 18% 60.12 5.81 48% 56.61
      UInt64 7.61 15% 60 5.29 49% 56.42
    RowBinaryWithNamesAndTypes Int8 7.74 3% 55.86 7.84 7% 60.35
      Mixed 23.39 5% 60.46 23.67 13% 57.86
      String 24.56 5% 60.24 21.8 13% 59.41
      UInt64 17.56 7% 59.42 14.75 15% 56.61
  Java Client(write) RowBinaryWithNamesAndTypes Int8 7.71 36% 214.87 7.87 27% 213.43
      Mixed 27.65 98% 347.98 25.43 77% 265.56
      String 26.46 114% 350.92 24.63 77% 276.4
      UInt64 21.25 86% 350.57 16.23 47% 210.88
  JDBC Driver(batch) RowBinaryWithNamesAndTypes Int8 14.52 49% 213.21 14.34 50% 198.91
      Mixed 44.58 67% 462.6 42.03 52% 366.8
      String 42.61 73% 399.83 39.25 54% 351.37
      UInt64 29.62 71% 405.58 25.06 48% 295.75
  JDBC Driver(load) Native Int8 0.63 48% 60.55 0.74 87% 65.6
      Mixed 19.97 8% 64.8 16.93 22% 65.15
      String 8.42 16% 66.14 5.96 48% 64.26
      UInt64 5.71 24% 64.43 5.83 44% 64.71
    RowBinaryWithNamesAndTypes Int8 7.74 4% 60.39 7.88 7% 62.55
      Mixed 25.52 5% 63.84 24.04 12% 65.52
      String 25.2 6% 64.14 23.77 12% 64.41
      UInt64 18.86 7% 64.67 15.55 16% 65.05

Throughput (ops/s)

TODO

Pitfalls

  • Protocol matters the most
  • Large SQL statement
  • Batch insert

Improvements

References

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant