From f282331d12975687391a7648aacde19a58774936 Mon Sep 17 00:00:00 2001 From: Gengliang Wang Date: Wed, 7 Nov 2018 21:03:05 +0800 Subject: [PATCH 1/3] fix --- .../DataSourceReadBenchmark-results.txt | 378 +++++++++--------- .../benchmark/DataSourceReadBenchmark.scala | 4 +- .../benchmarks/OrcReadBenchmark-results.txt | 204 +++++----- .../spark/sql/hive/orc/OrcReadBenchmark.scala | 11 +- 4 files changed, 301 insertions(+), 296 deletions(-) diff --git a/sql/core/benchmarks/DataSourceReadBenchmark-results.txt b/sql/core/benchmarks/DataSourceReadBenchmark-results.txt index 2d3bae442cc50..72e44e75492e4 100644 --- a/sql/core/benchmarks/DataSourceReadBenchmark-results.txt +++ b/sql/core/benchmarks/DataSourceReadBenchmark-results.txt @@ -2,268 +2,268 @@ SQL Single Numeric Column Scan ================================================================================================ -OpenJDK 64-Bit Server VM 1.8.0_181-b13 on Linux 3.10.0-862.3.2.el7.x86_64 -Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz +Java HotSpot(TM) 64-Bit Server VM 1.8.0_131-b11 on Mac OS X 10.13.6 +Intel(R) Core(TM) i7-6920HQ CPU @ 2.90GHz SQL Single TINYINT Column Scan: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------ -SQL CSV 21508 / 22112 0.7 1367.5 1.0X -SQL Json 8705 / 8825 1.8 553.4 2.5X -SQL Parquet Vectorized 157 / 186 100.0 10.0 136.7X -SQL Parquet MR 1789 / 1794 8.8 113.8 12.0X -SQL ORC Vectorized 156 / 166 100.9 9.9 138.0X -SQL ORC Vectorized with copy 218 / 225 72.1 13.9 98.6X -SQL ORC MR 1448 / 1492 10.9 92.0 14.9X - -OpenJDK 64-Bit Server VM 1.8.0_181-b13 on Linux 3.10.0-862.3.2.el7.x86_64 -Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz +SQL CSV 15974 / 16222 1.0 1015.6 1.0X +SQL Json 5917 / 6174 2.7 376.2 2.7X +SQL Parquet Vectorized 115 / 128 136.8 7.3 138.9X +SQL Parquet MR 1459 / 1571 10.8 92.8 10.9X +SQL ORC Vectorized 164 / 194 95.8 10.4 97.3X +SQL ORC Vectorized with copy 204 / 303 77.2 12.9 78.4X +SQL ORC MR 1095 / 1143 14.4 69.6 14.6X + +Java HotSpot(TM) 64-Bit Server VM 1.8.0_131-b11 on Mac OS X 10.13.6 +Intel(R) Core(TM) i7-6920HQ CPU @ 2.90GHz Parquet Reader Single TINYINT Column Scan: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------ -ParquetReader Vectorized 202 / 211 77.7 12.9 1.0X -ParquetReader Vectorized -> Row 118 / 120 133.5 7.5 1.7X +ParquetReader Vectorized 139 / 156 113.1 8.8 1.0X +ParquetReader Vectorized -> Row 83 / 89 188.7 5.3 1.7X -OpenJDK 64-Bit Server VM 1.8.0_181-b13 on Linux 3.10.0-862.3.2.el7.x86_64 -Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz +Java HotSpot(TM) 64-Bit Server VM 1.8.0_131-b11 on Mac OS X 10.13.6 +Intel(R) Core(TM) i7-6920HQ CPU @ 2.90GHz SQL Single SMALLINT Column Scan: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------ -SQL CSV 23282 / 23312 0.7 1480.2 1.0X -SQL Json 9187 / 9189 1.7 584.1 2.5X -SQL Parquet Vectorized 204 / 218 77.0 13.0 114.0X -SQL Parquet MR 1941 / 1953 8.1 123.4 12.0X -SQL ORC Vectorized 217 / 225 72.6 13.8 107.5X -SQL ORC Vectorized with copy 279 / 289 56.3 17.8 83.4X -SQL ORC MR 1541 / 1549 10.2 98.0 15.1X - -OpenJDK 64-Bit Server VM 1.8.0_181-b13 on Linux 3.10.0-862.3.2.el7.x86_64 -Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz +SQL CSV 16394 / 16643 1.0 1042.3 1.0X +SQL Json 6014 / 6020 2.6 382.4 2.7X +SQL Parquet Vectorized 147 / 155 106.9 9.4 111.4X +SQL Parquet MR 1575 / 1581 10.0 100.1 10.4X +SQL ORC Vectorized 168 / 173 93.9 10.7 97.9X +SQL ORC Vectorized with copy 219 / 227 71.8 13.9 74.8X +SQL ORC MR 1185 / 1187 13.3 75.3 13.8X + +Java HotSpot(TM) 64-Bit Server VM 1.8.0_131-b11 on Mac OS X 10.13.6 +Intel(R) Core(TM) i7-6920HQ CPU @ 2.90GHz Parquet Reader Single SMALLINT Column Scan: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------ -ParquetReader Vectorized 288 / 297 54.6 18.3 1.0X -ParquetReader Vectorized -> Row 255 / 257 61.7 16.2 1.1X +ParquetReader Vectorized 193 / 216 81.4 12.3 1.0X +ParquetReader Vectorized -> Row 160 / 175 98.3 10.2 1.2X -OpenJDK 64-Bit Server VM 1.8.0_181-b13 on Linux 3.10.0-862.3.2.el7.x86_64 -Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz +Java HotSpot(TM) 64-Bit Server VM 1.8.0_131-b11 on Mac OS X 10.13.6 +Intel(R) Core(TM) i7-6920HQ CPU @ 2.90GHz SQL Single INT Column Scan: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------ -SQL CSV 24990 / 25012 0.6 1588.8 1.0X -SQL Json 9837 / 9865 1.6 625.4 2.5X -SQL Parquet Vectorized 170 / 180 92.3 10.8 146.6X -SQL Parquet MR 2319 / 2328 6.8 147.4 10.8X -SQL ORC Vectorized 293 / 301 53.7 18.6 85.3X -SQL ORC Vectorized with copy 297 / 309 52.9 18.9 84.0X -SQL ORC MR 1667 / 1674 9.4 106.0 15.0X - -OpenJDK 64-Bit Server VM 1.8.0_181-b13 on Linux 3.10.0-862.3.2.el7.x86_64 -Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz +SQL CSV 17168 / 17306 0.9 1091.5 1.0X +SQL Json 6167 / 6180 2.6 392.1 2.8X +SQL Parquet Vectorized 134 / 142 117.5 8.5 128.2X +SQL Parquet MR 1659 / 1740 9.5 105.5 10.3X +SQL ORC Vectorized 225 / 229 69.9 14.3 76.3X +SQL ORC Vectorized with copy 231 / 235 68.2 14.7 74.4X +SQL ORC MR 1287 / 1388 12.2 81.8 13.3X + +Java HotSpot(TM) 64-Bit Server VM 1.8.0_131-b11 on Mac OS X 10.13.6 +Intel(R) Core(TM) i7-6920HQ CPU @ 2.90GHz Parquet Reader Single INT Column Scan: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------ -ParquetReader Vectorized 257 / 274 61.3 16.3 1.0X -ParquetReader Vectorized -> Row 259 / 264 60.8 16.4 1.0X +ParquetReader Vectorized 178 / 187 88.2 11.3 1.0X +ParquetReader Vectorized -> Row 174 / 184 90.3 11.1 1.0X -OpenJDK 64-Bit Server VM 1.8.0_181-b13 on Linux 3.10.0-862.3.2.el7.x86_64 -Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz +Java HotSpot(TM) 64-Bit Server VM 1.8.0_131-b11 on Mac OS X 10.13.6 +Intel(R) Core(TM) i7-6920HQ CPU @ 2.90GHz SQL Single BIGINT Column Scan: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------ -SQL CSV 32537 / 32554 0.5 2068.7 1.0X -SQL Json 12610 / 12668 1.2 801.7 2.6X -SQL Parquet Vectorized 258 / 276 61.0 16.4 126.2X -SQL Parquet MR 2422 / 2435 6.5 154.0 13.4X -SQL ORC Vectorized 378 / 385 41.6 24.0 86.2X -SQL ORC Vectorized with copy 381 / 389 41.3 24.2 85.4X -SQL ORC MR 1797 / 1819 8.8 114.3 18.1X - -OpenJDK 64-Bit Server VM 1.8.0_181-b13 on Linux 3.10.0-862.3.2.el7.x86_64 -Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz +SQL CSV 21253 / 21542 0.7 1351.2 1.0X +SQL Json 8208 / 8209 1.9 521.9 2.6X +SQL Parquet Vectorized 180 / 241 87.3 11.5 117.9X +SQL Parquet MR 1769 / 1801 8.9 112.5 12.0X +SQL ORC Vectorized 3271 / 3277 4.8 207.9 6.5X +SQL ORC Vectorized with copy 290 / 297 54.3 18.4 73.4X +SQL ORC MR 1468 / 1514 10.7 93.4 14.5X + +Java HotSpot(TM) 64-Bit Server VM 1.8.0_131-b11 on Mac OS X 10.13.6 +Intel(R) Core(TM) i7-6920HQ CPU @ 2.90GHz Parquet Reader Single BIGINT Column Scan: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------ -ParquetReader Vectorized 352 / 368 44.7 22.4 1.0X -ParquetReader Vectorized -> Row 351 / 359 44.8 22.3 1.0X +ParquetReader Vectorized 290 / 304 54.2 18.5 1.0X +ParquetReader Vectorized -> Row 246 / 270 64.1 15.6 1.2X -OpenJDK 64-Bit Server VM 1.8.0_181-b13 on Linux 3.10.0-862.3.2.el7.x86_64 -Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz +Java HotSpot(TM) 64-Bit Server VM 1.8.0_131-b11 on Mac OS X 10.13.6 +Intel(R) Core(TM) i7-6920HQ CPU @ 2.90GHz SQL Single FLOAT Column Scan: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------ -SQL CSV 27179 / 27184 0.6 1728.0 1.0X -SQL Json 12578 / 12585 1.3 799.7 2.2X -SQL Parquet Vectorized 161 / 171 97.5 10.3 168.5X -SQL Parquet MR 2361 / 2395 6.7 150.1 11.5X -SQL ORC Vectorized 473 / 480 33.3 30.0 57.5X -SQL ORC Vectorized with copy 478 / 483 32.9 30.4 56.8X -SQL ORC MR 1858 / 1859 8.5 118.2 14.6X - -OpenJDK 64-Bit Server VM 1.8.0_181-b13 on Linux 3.10.0-862.3.2.el7.x86_64 -Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz +SQL CSV 18835 / 19052 0.8 1197.5 1.0X +SQL Json 8774 / 8780 1.8 557.8 2.1X +SQL Parquet Vectorized 150 / 162 104.8 9.5 125.5X +SQL Parquet MR 1727 / 1779 9.1 109.8 10.9X +SQL ORC Vectorized 311 / 327 50.6 19.8 60.5X +SQL ORC Vectorized with copy 314 / 320 50.2 19.9 60.1X +SQL ORC MR 1384 / 1386 11.4 88.0 13.6X + +Java HotSpot(TM) 64-Bit Server VM 1.8.0_131-b11 on Mac OS X 10.13.6 +Intel(R) Core(TM) i7-6920HQ CPU @ 2.90GHz Parquet Reader Single FLOAT Column Scan: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------ -ParquetReader Vectorized 251 / 255 62.7 15.9 1.0X -ParquetReader Vectorized -> Row 255 / 259 61.8 16.2 1.0X +ParquetReader Vectorized 187 / 208 84.0 11.9 1.0X +ParquetReader Vectorized -> Row 184 / 194 85.3 11.7 1.0X -OpenJDK 64-Bit Server VM 1.8.0_181-b13 on Linux 3.10.0-862.3.2.el7.x86_64 -Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz +Java HotSpot(TM) 64-Bit Server VM 1.8.0_131-b11 on Mac OS X 10.13.6 +Intel(R) Core(TM) i7-6920HQ CPU @ 2.90GHz SQL Single DOUBLE Column Scan: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------ -SQL CSV 34797 / 34830 0.5 2212.3 1.0X -SQL Json 17806 / 17828 0.9 1132.1 2.0X -SQL Parquet Vectorized 260 / 269 60.6 16.5 134.0X -SQL Parquet MR 2512 / 2534 6.3 159.7 13.9X -SQL ORC Vectorized 582 / 593 27.0 37.0 59.8X -SQL ORC Vectorized with copy 576 / 584 27.3 36.6 60.4X -SQL ORC MR 2309 / 2313 6.8 146.8 15.1X - -OpenJDK 64-Bit Server VM 1.8.0_181-b13 on Linux 3.10.0-862.3.2.el7.x86_64 -Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz +SQL CSV 22506 / 22602 0.7 1430.9 1.0X +SQL Json 11925 / 11942 1.3 758.2 1.9X +SQL Parquet Vectorized 217 / 229 72.6 13.8 103.8X +SQL Parquet MR 1795 / 1856 8.8 114.1 12.5X +SQL ORC Vectorized 392 / 396 40.1 24.9 57.4X +SQL ORC Vectorized with copy 394 / 397 40.0 25.0 57.2X +SQL ORC MR 1524 / 1535 10.3 96.9 14.8X + +Java HotSpot(TM) 64-Bit Server VM 1.8.0_131-b11 on Mac OS X 10.13.6 +Intel(R) Core(TM) i7-6920HQ CPU @ 2.90GHz Parquet Reader Single DOUBLE Column Scan: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------ -ParquetReader Vectorized 350 / 363 44.9 22.3 1.0X -ParquetReader Vectorized -> Row 350 / 366 44.9 22.3 1.0X +ParquetReader Vectorized 305 / 349 51.5 19.4 1.0X +ParquetReader Vectorized -> Row 263 / 290 59.8 16.7 1.2X ================================================================================================ Int and String Scan ================================================================================================ -OpenJDK 64-Bit Server VM 1.8.0_181-b13 on Linux 3.10.0-862.3.2.el7.x86_64 -Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz +Java HotSpot(TM) 64-Bit Server VM 1.8.0_131-b11 on Mac OS X 10.13.6 +Intel(R) Core(TM) i7-6920HQ CPU @ 2.90GHz Int and String Scan: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------ -SQL CSV 22486 / 22590 0.5 2144.5 1.0X -SQL Json 14124 / 14195 0.7 1347.0 1.6X -SQL Parquet Vectorized 2342 / 2347 4.5 223.4 9.6X -SQL Parquet MR 4660 / 4664 2.2 444.4 4.8X -SQL ORC Vectorized 2378 / 2379 4.4 226.8 9.5X -SQL ORC Vectorized with copy 2548 / 2571 4.1 243.0 8.8X -SQL ORC MR 4206 / 4211 2.5 401.1 5.3X +SQL CSV 16802 / 16833 0.6 1602.3 1.0X +SQL Json 8958 / 8972 1.2 854.3 1.9X +SQL Parquet Vectorized 1891 / 1960 5.5 180.3 8.9X +SQL Parquet MR 3796 / 3808 2.8 362.0 4.4X +SQL ORC Vectorized 2020 / 2064 5.2 192.7 8.3X +SQL ORC Vectorized with copy 2160 / 2161 4.9 206.0 7.8X +SQL ORC MR 3800 / 3807 2.8 362.4 4.4X ================================================================================================ Repeated String Scan ================================================================================================ -OpenJDK 64-Bit Server VM 1.8.0_181-b13 on Linux 3.10.0-862.3.2.el7.x86_64 -Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz +Java HotSpot(TM) 64-Bit Server VM 1.8.0_131-b11 on Mac OS X 10.13.6 +Intel(R) Core(TM) i7-6920HQ CPU @ 2.90GHz Repeated String: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------ -SQL CSV 12150 / 12178 0.9 1158.7 1.0X -SQL Json 7012 / 7014 1.5 668.7 1.7X -SQL Parquet Vectorized 792 / 796 13.2 75.5 15.3X -SQL Parquet MR 1961 / 1975 5.3 187.0 6.2X -SQL ORC Vectorized 482 / 485 21.8 46.0 25.2X -SQL ORC Vectorized with copy 710 / 715 14.8 67.7 17.1X -SQL ORC MR 2081 / 2083 5.0 198.5 5.8X +SQL CSV 10020 / 10068 1.0 955.6 1.0X +SQL Json 4944 / 4947 2.1 471.5 2.0X +SQL Parquet Vectorized 620 / 639 16.9 59.1 16.2X +SQL Parquet MR 1389 / 1389 7.6 132.4 7.2X +SQL ORC Vectorized 399 / 413 26.3 38.0 25.1X +SQL ORC Vectorized with copy 573 / 585 18.3 54.6 17.5X +SQL ORC MR 1479 / 1483 7.1 141.0 6.8X ================================================================================================ Partitioned Table Scan ================================================================================================ -OpenJDK 64-Bit Server VM 1.8.0_181-b13 on Linux 3.10.0-862.3.2.el7.x86_64 -Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz +Java HotSpot(TM) 64-Bit Server VM 1.8.0_131-b11 on Mac OS X 10.13.6 +Intel(R) Core(TM) i7-6920HQ CPU @ 2.90GHz Partitioned Table: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------ -Data column - CSV 31789 / 31791 0.5 2021.1 1.0X -Data column - Json 12873 / 12918 1.2 818.4 2.5X -Data column - Parquet Vectorized 267 / 280 58.9 17.0 119.1X -Data column - Parquet MR 3387 / 3402 4.6 215.3 9.4X -Data column - ORC Vectorized 391 / 453 40.2 24.9 81.2X -Data column - ORC Vectorized with copy 392 / 398 40.2 24.9 81.2X -Data column - ORC MR 2508 / 2512 6.3 159.4 12.7X -Partition column - CSV 6965 / 6977 2.3 442.8 4.6X -Partition column - Json 5563 / 5576 2.8 353.7 5.7X -Partition column - Parquet Vectorized 65 / 78 241.1 4.1 487.2X -Partition column - Parquet MR 1811 / 1811 8.7 115.1 17.6X -Partition column - ORC Vectorized 66 / 73 239.0 4.2 483.0X -Partition column - ORC Vectorized with copy 65 / 70 241.1 4.1 487.3X -Partition column - ORC MR 1775 / 1778 8.9 112.8 17.9X -Both columns - CSV 30032 / 30113 0.5 1909.4 1.1X -Both columns - Json 13941 / 13959 1.1 886.3 2.3X -Both columns - Parquet Vectorized 312 / 330 50.3 19.9 101.7X -Both columns - Parquet MR 3858 / 3862 4.1 245.3 8.2X -Both columns - ORC Vectorized 431 / 437 36.5 27.4 73.8X -Both column - ORC Vectorized with copy 523 / 529 30.1 33.3 60.7X -Both columns - ORC MR 2712 / 2805 5.8 172.4 11.7X +Data column - CSV 22274 / 22397 0.7 1416.1 1.0X +Data column - Json 8705 / 8706 1.8 553.5 2.6X +Data column - Parquet Vectorized 201 / 213 78.3 12.8 110.9X +Data column - Parquet MR 2270 / 2286 6.9 144.3 9.8X +Data column - ORC Vectorized 289 / 295 54.4 18.4 77.0X +Data column - ORC Vectorized with copy 290 / 296 54.3 18.4 76.8X +Data column - ORC MR 1675 / 1678 9.4 106.5 13.3X +Partition column - CSV 4527 / 4554 3.5 287.8 4.9X +Partition column - Json 3374 / 3376 4.7 214.5 6.6X +Partition column - Parquet Vectorized 69 / 77 227.5 4.4 322.2X +Partition column - Parquet MR 1132 / 1143 13.9 72.0 19.7X +Partition column - ORC Vectorized 69 / 76 227.6 4.4 322.3X +Partition column - ORC Vectorized with copy 74 / 78 213.5 4.7 302.3X +Partition column - ORC MR 1141 / 1144 13.8 72.5 19.5X +Both columns - CSV 20791 / 20800 0.8 1321.9 1.1X +Both columns - Json 8798 / 8815 1.8 559.4 2.5X +Both columns - Parquet Vectorized 237 / 248 66.2 15.1 93.8X +Both columns - Parquet MR 2523 / 2528 6.2 160.4 8.8X +Both columns - ORC Vectorized 323 / 331 48.6 20.6 68.9X +Both column - ORC Vectorized with copy 367 / 376 42.8 23.4 60.6X +Both columns - ORC MR 1813 / 1842 8.7 115.3 12.3X ================================================================================================ String with Nulls Scan ================================================================================================ -OpenJDK 64-Bit Server VM 1.8.0_181-b13 on Linux 3.10.0-862.3.2.el7.x86_64 -Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz -String with Nulls Scan: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative +Java HotSpot(TM) 64-Bit Server VM 1.8.0_131-b11 on Mac OS X 10.13.6 +Intel(R) Core(TM) i7-6920HQ CPU @ 2.90GHz +String with Nulls Scan (0.0%): Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------ -SQL CSV 13525 / 13823 0.8 1289.9 1.0X -SQL Json 9913 / 9921 1.1 945.3 1.4X -SQL Parquet Vectorized 1517 / 1517 6.9 144.7 8.9X -SQL Parquet MR 3996 / 4008 2.6 381.1 3.4X -ParquetReader Vectorized 1120 / 1128 9.4 106.8 12.1X -SQL ORC Vectorized 1203 / 1224 8.7 114.7 11.2X -SQL ORC Vectorized with copy 1639 / 1646 6.4 156.3 8.3X -SQL ORC MR 3720 / 3780 2.8 354.7 3.6X - -OpenJDK 64-Bit Server VM 1.8.0_181-b13 on Linux 3.10.0-862.3.2.el7.x86_64 -Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz -String with Nulls Scan: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative +SQL CSV 11268 / 11281 0.9 1074.6 1.0X +SQL Json 6597 / 6698 1.6 629.1 1.7X +SQL Parquet Vectorized 1071 / 1080 9.8 102.1 10.5X +SQL Parquet MR 2861 / 2873 3.7 272.9 3.9X +ParquetReader Vectorized 754 / 760 13.9 71.9 14.9X +SQL ORC Vectorized 840 / 847 12.5 80.1 13.4X +SQL ORC Vectorized with copy 1190 / 1212 8.8 113.5 9.5X +SQL ORC MR 2762 / 2868 3.8 263.4 4.1X + +Java HotSpot(TM) 64-Bit Server VM 1.8.0_131-b11 on Mac OS X 10.13.6 +Intel(R) Core(TM) i7-6920HQ CPU @ 2.90GHz +String with Nulls Scan (50.0%): Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------ -SQL CSV 15860 / 15877 0.7 1512.5 1.0X -SQL Json 7676 / 7688 1.4 732.0 2.1X -SQL Parquet Vectorized 1072 / 1084 9.8 102.2 14.8X -SQL Parquet MR 2890 / 2897 3.6 275.6 5.5X -ParquetReader Vectorized 1052 / 1053 10.0 100.4 15.1X -SQL ORC Vectorized 1248 / 1248 8.4 119.0 12.7X -SQL ORC Vectorized with copy 1627 / 1637 6.4 155.2 9.7X -SQL ORC MR 3365 / 3369 3.1 320.9 4.7X - -OpenJDK 64-Bit Server VM 1.8.0_181-b13 on Linux 3.10.0-862.3.2.el7.x86_64 -Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz -String with Nulls Scan: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative +SQL CSV 10541 / 10591 1.0 1005.3 1.0X +SQL Json 5121 / 5139 2.0 488.4 2.1X +SQL Parquet Vectorized 782 / 799 13.4 74.5 13.5X +SQL Parquet MR 2043 / 2062 5.1 194.8 5.2X +ParquetReader Vectorized 718 / 732 14.6 68.4 14.7X +SQL ORC Vectorized 947 / 970 11.1 90.4 11.1X +SQL ORC Vectorized with copy 1220 / 1226 8.6 116.3 8.6X +SQL ORC MR 2466 / 2470 4.3 235.2 4.3X + +Java HotSpot(TM) 64-Bit Server VM 1.8.0_131-b11 on Mac OS X 10.13.6 +Intel(R) Core(TM) i7-6920HQ CPU @ 2.90GHz +String with Nulls Scan (95.0%): Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------ -SQL CSV 13401 / 13561 0.8 1278.1 1.0X -SQL Json 5253 / 5303 2.0 500.9 2.6X -SQL Parquet Vectorized 233 / 242 45.0 22.2 57.6X -SQL Parquet MR 1791 / 1796 5.9 170.8 7.5X -ParquetReader Vectorized 236 / 238 44.4 22.5 56.7X -SQL ORC Vectorized 453 / 473 23.2 43.2 29.6X -SQL ORC Vectorized with copy 573 / 577 18.3 54.7 23.4X -SQL ORC MR 1846 / 1850 5.7 176.0 7.3X +SQL CSV 8786 / 8820 1.2 837.9 1.0X +SQL Json 2964 / 2991 3.5 282.7 3.0X +SQL Parquet Vectorized 181 / 186 58.0 17.2 48.6X +SQL Parquet MR 1275 / 1277 8.2 121.6 6.9X +ParquetReader Vectorized 163 / 165 64.5 15.5 54.0X +SQL ORC Vectorized 324 / 330 32.3 30.9 27.1X +SQL ORC Vectorized with copy 407 / 411 25.8 38.8 21.6X +SQL ORC MR 1154 / 1167 9.1 110.0 7.6X ================================================================================================ Single Column Scan From Wide Columns ================================================================================================ -OpenJDK 64-Bit Server VM 1.8.0_181-b13 on Linux 3.10.0-862.3.2.el7.x86_64 -Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz +Java HotSpot(TM) 64-Bit Server VM 1.8.0_131-b11 on Mac OS X 10.13.6 +Intel(R) Core(TM) i7-6920HQ CPU @ 2.90GHz Single Column Scan from 10 columns: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------ -SQL CSV 3147 / 3148 0.3 3001.1 1.0X -SQL Json 2666 / 2693 0.4 2542.9 1.2X -SQL Parquet Vectorized 54 / 58 19.5 51.3 58.5X -SQL Parquet MR 220 / 353 4.8 209.9 14.3X -SQL ORC Vectorized 63 / 77 16.8 59.7 50.3X -SQL ORC Vectorized with copy 63 / 66 16.7 59.8 50.2X -SQL ORC MR 317 / 321 3.3 302.2 9.9X - -OpenJDK 64-Bit Server VM 1.8.0_181-b13 on Linux 3.10.0-862.3.2.el7.x86_64 -Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz +SQL CSV 2026 / 2029 0.5 1932.0 1.0X +SQL Json 1662 / 1669 0.6 1585.4 1.2X +SQL Parquet Vectorized 48 / 52 22.0 45.5 42.5X +SQL Parquet MR 172 / 177 6.1 163.7 11.8X +SQL ORC Vectorized 53 / 59 19.6 51.0 37.9X +SQL ORC Vectorized with copy 56 / 59 18.9 53.0 36.5X +SQL ORC MR 226 / 231 4.6 216.0 8.9X + +Java HotSpot(TM) 64-Bit Server VM 1.8.0_131-b11 on Mac OS X 10.13.6 +Intel(R) Core(TM) i7-6920HQ CPU @ 2.90GHz Single Column Scan from 50 columns: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------ -SQL CSV 7902 / 7921 0.1 7536.2 1.0X -SQL Json 9467 / 9491 0.1 9028.6 0.8X -SQL Parquet Vectorized 73 / 79 14.3 69.8 108.0X -SQL Parquet MR 239 / 247 4.4 228.0 33.1X -SQL ORC Vectorized 78 / 84 13.4 74.6 101.0X -SQL ORC Vectorized with copy 78 / 88 13.4 74.4 101.3X -SQL ORC MR 910 / 918 1.2 867.6 8.7X - -OpenJDK 64-Bit Server VM 1.8.0_181-b13 on Linux 3.10.0-862.3.2.el7.x86_64 -Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz +SQL CSV 4762 / 4787 0.2 4541.7 1.0X +SQL Json 5733 / 5740 0.2 5467.3 0.8X +SQL Parquet Vectorized 104 / 109 10.1 98.9 45.9X +SQL Parquet MR 212 / 218 4.9 202.4 22.4X +SQL ORC Vectorized 105 / 118 10.0 99.7 45.6X +SQL ORC Vectorized with copy 115 / 123 9.1 109.5 41.5X +SQL ORC MR 701 / 707 1.5 668.6 6.8X + +Java HotSpot(TM) 64-Bit Server VM 1.8.0_131-b11 on Mac OS X 10.13.6 +Intel(R) Core(TM) i7-6920HQ CPU @ 2.90GHz Single Column Scan from 100 columns: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------ -SQL CSV 13539 / 13543 0.1 12912.0 1.0X -SQL Json 17420 / 17446 0.1 16613.1 0.8X -SQL Parquet Vectorized 103 / 120 10.2 98.1 131.6X -SQL Parquet MR 250 / 258 4.2 238.9 54.1X -SQL ORC Vectorized 99 / 104 10.6 94.6 136.5X -SQL ORC Vectorized with copy 100 / 106 10.5 95.6 135.1X -SQL ORC MR 1653 / 1659 0.6 1576.3 8.2X +SQL CSV 7931 / 7972 0.1 7563.6 1.0X +SQL Json 10984 / 11037 0.1 10475.1 0.7X +SQL Parquet Vectorized 178 / 185 5.9 169.9 44.5X +SQL Parquet MR 260 / 267 4.0 247.6 30.5X +SQL ORC Vectorized 169 / 184 6.2 161.0 47.0X +SQL ORC Vectorized with copy 182 / 189 5.8 173.4 43.6X +SQL ORC MR 1287 / 1288 0.8 1227.0 6.2X diff --git a/sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/DataSourceReadBenchmark.scala b/sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/DataSourceReadBenchmark.scala index a1f51f8e54805..ecd9ead0ae39a 100644 --- a/sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/DataSourceReadBenchmark.scala +++ b/sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/DataSourceReadBenchmark.scala @@ -447,7 +447,9 @@ object DataSourceReadBenchmark extends BenchmarkBase with SQLHelper { } def stringWithNullsScanBenchmark(values: Int, fractionOfNulls: Double): Unit = { - val benchmark = new Benchmark("String with Nulls Scan", values, output = output) + val percentageOfNulls = fractionOfNulls * 100 + val benchmark = + new Benchmark(s"String with Nulls Scan ($percentageOfNulls%)", values, output = output) withTempPath { dir => withTempTable("t1", "csvTable", "jsonTable", "parquetTable", "orcTable") { diff --git a/sql/hive/benchmarks/OrcReadBenchmark-results.txt b/sql/hive/benchmarks/OrcReadBenchmark-results.txt index c77f966723d71..7aa96efd73b58 100644 --- a/sql/hive/benchmarks/OrcReadBenchmark-results.txt +++ b/sql/hive/benchmarks/OrcReadBenchmark-results.txt @@ -2,172 +2,172 @@ SQL Single Numeric Column Scan ================================================================================================ -OpenJDK 64-Bit Server VM 1.8.0_181-b13 on Linux 3.10.0-862.3.2.el7.x86_64 -Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz +Java HotSpot(TM) 64-Bit Server VM 1.8.0_131-b11 on Mac OS X 10.13.6 +Intel(R) Core(TM) i7-6920HQ CPU @ 2.90GHz SQL Single TINYINT Column Scan: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------ -Native ORC MR 1630 / 1639 9.7 103.6 1.0X -Native ORC Vectorized 253 / 288 62.2 16.1 6.4X -Native ORC Vectorized with copy 227 / 244 69.2 14.5 7.2X -Hive built-in ORC 1980 / 1991 7.9 125.9 0.8X +Native ORC MR 1135 / 1192 13.9 72.2 1.0X +Native ORC Vectorized 159 / 191 99.1 10.1 7.2X +Native ORC Vectorized with copy 132 / 140 119.4 8.4 8.6X +Hive built-in ORC 1344 / 1348 11.7 85.4 0.8X -OpenJDK 64-Bit Server VM 1.8.0_181-b13 on Linux 3.10.0-862.3.2.el7.x86_64 -Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz +Java HotSpot(TM) 64-Bit Server VM 1.8.0_131-b11 on Mac OS X 10.13.6 +Intel(R) Core(TM) i7-6920HQ CPU @ 2.90GHz SQL Single SMALLINT Column Scan: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------ -Native ORC MR 1587 / 1589 9.9 100.9 1.0X -Native ORC Vectorized 227 / 242 69.2 14.5 7.0X -Native ORC Vectorized with copy 228 / 238 69.0 14.5 7.0X -Hive built-in ORC 2323 / 2332 6.8 147.7 0.7X +Native ORC MR 1149 / 1208 13.7 73.0 1.0X +Native ORC Vectorized 191 / 203 82.5 12.1 6.0X +Native ORC Vectorized with copy 190 / 206 83.0 12.1 6.1X +Hive built-in ORC 1572 / 1615 10.0 100.0 0.7X -OpenJDK 64-Bit Server VM 1.8.0_181-b13 on Linux 3.10.0-862.3.2.el7.x86_64 -Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz +Java HotSpot(TM) 64-Bit Server VM 1.8.0_131-b11 on Mac OS X 10.13.6 +Intel(R) Core(TM) i7-6920HQ CPU @ 2.90GHz SQL Single INT Column Scan: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------ -Native ORC MR 1726 / 1771 9.1 109.7 1.0X -Native ORC Vectorized 309 / 333 50.9 19.7 5.6X -Native ORC Vectorized with copy 313 / 321 50.2 19.9 5.5X -Hive built-in ORC 2668 / 2672 5.9 169.6 0.6X +Native ORC MR 1397 / 1416 11.3 88.8 1.0X +Native ORC Vectorized 238 / 245 66.0 15.2 5.9X +Native ORC Vectorized with copy 241 / 254 65.3 15.3 5.8X +Hive built-in ORC 1843 / 1915 8.5 117.2 0.8X -OpenJDK 64-Bit Server VM 1.8.0_181-b13 on Linux 3.10.0-862.3.2.el7.x86_64 -Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz +Java HotSpot(TM) 64-Bit Server VM 1.8.0_131-b11 on Mac OS X 10.13.6 +Intel(R) Core(TM) i7-6920HQ CPU @ 2.90GHz SQL Single BIGINT Column Scan: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------ -Native ORC MR 1722 / 1747 9.1 109.5 1.0X -Native ORC Vectorized 395 / 403 39.8 25.1 4.4X -Native ORC Vectorized with copy 399 / 405 39.4 25.4 4.3X -Hive built-in ORC 2767 / 2777 5.7 175.9 0.6X +Native ORC MR 1350 / 1383 11.7 85.8 1.0X +Native ORC Vectorized 300 / 305 52.4 19.1 4.5X +Native ORC Vectorized with copy 318 / 334 49.5 20.2 4.2X +Hive built-in ORC 1887 / 1916 8.3 120.0 0.7X -OpenJDK 64-Bit Server VM 1.8.0_181-b13 on Linux 3.10.0-862.3.2.el7.x86_64 -Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz +Java HotSpot(TM) 64-Bit Server VM 1.8.0_131-b11 on Mac OS X 10.13.6 +Intel(R) Core(TM) i7-6920HQ CPU @ 2.90GHz SQL Single FLOAT Column Scan: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------ -Native ORC MR 1797 / 1824 8.8 114.2 1.0X -Native ORC Vectorized 434 / 441 36.2 27.6 4.1X -Native ORC Vectorized with copy 437 / 447 36.0 27.8 4.1X -Hive built-in ORC 2701 / 2710 5.8 171.7 0.7X +Native ORC MR 1382 / 1419 11.4 87.9 1.0X +Native ORC Vectorized 351 / 366 44.8 22.3 3.9X +Native ORC Vectorized with copy 361 / 368 43.6 22.9 3.8X +Hive built-in ORC 1898 / 1950 8.3 120.7 0.7X -OpenJDK 64-Bit Server VM 1.8.0_181-b13 on Linux 3.10.0-862.3.2.el7.x86_64 -Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz +Java HotSpot(TM) 64-Bit Server VM 1.8.0_131-b11 on Mac OS X 10.13.6 +Intel(R) Core(TM) i7-6920HQ CPU @ 2.90GHz SQL Single DOUBLE Column Scan: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------ -Native ORC MR 1931 / 2028 8.1 122.8 1.0X -Native ORC Vectorized 542 / 557 29.0 34.5 3.6X -Native ORC Vectorized with copy 550 / 564 28.6 35.0 3.5X -Hive built-in ORC 2816 / 3206 5.6 179.1 0.7X +Native ORC MR 1510 / 1804 10.4 96.0 1.0X +Native ORC Vectorized 467 / 484 33.7 29.7 3.2X +Native ORC Vectorized with copy 465 / 490 33.8 29.6 3.2X +Hive built-in ORC 2075 / 2111 7.6 131.9 0.7X ================================================================================================ Int and String Scan ================================================================================================ -OpenJDK 64-Bit Server VM 1.8.0_181-b13 on Linux 3.10.0-862.3.2.el7.x86_64 -Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz +Java HotSpot(TM) 64-Bit Server VM 1.8.0_131-b11 on Mac OS X 10.13.6 +Intel(R) Core(TM) i7-6920HQ CPU @ 2.90GHz Int and String Scan: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------ -Native ORC MR 4012 / 4068 2.6 382.6 1.0X -Native ORC Vectorized 2337 / 2339 4.5 222.9 1.7X -Native ORC Vectorized with copy 2520 / 2540 4.2 240.3 1.6X -Hive built-in ORC 5503 / 5575 1.9 524.8 0.7X +Native ORC MR 3596 / 3680 2.9 343.0 1.0X +Native ORC Vectorized 2136 / 2397 4.9 203.7 1.7X +Native ORC Vectorized with copy 2388 / 2422 4.4 227.8 1.5X +Hive built-in ORC 4304 / 4336 2.4 410.5 0.8X ================================================================================================ Partitioned Table Scan ================================================================================================ -OpenJDK 64-Bit Server VM 1.8.0_181-b13 on Linux 3.10.0-862.3.2.el7.x86_64 -Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz +Java HotSpot(TM) 64-Bit Server VM 1.8.0_131-b11 on Mac OS X 10.13.6 +Intel(R) Core(TM) i7-6920HQ CPU @ 2.90GHz Partitioned Table: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------ -Data column - Native ORC MR 2020 / 2025 7.8 128.4 1.0X -Data column - Native ORC Vectorized 398 / 409 39.5 25.3 5.1X -Data column - Native ORC Vectorized with copy 406 / 411 38.8 25.8 5.0X -Data column - Hive built-in ORC 2967 / 2969 5.3 188.6 0.7X -Partition column - Native ORC MR 1494 / 1505 10.5 95.0 1.4X -Partition column - Native ORC Vectorized 73 / 82 216.3 4.6 27.8X -Partition column - Native ORC Vectorized with copy 71 / 80 221.4 4.5 28.4X -Partition column - Hive built-in ORC 1932 / 1937 8.1 122.8 1.0X -Both columns - Native ORC MR 2057 / 2071 7.6 130.8 1.0X -Both columns - Native ORC Vectorized 445 / 448 35.4 28.3 4.5X -Both column - Native ORC Vectorized with copy 534 / 539 29.4 34.0 3.8X -Both columns - Hive built-in ORC 2994 / 2994 5.3 190.3 0.7X +Data column - Native ORC MR 1473 / 1483 10.7 93.6 1.0X +Data column - Native ORC Vectorized 309 / 325 50.9 19.6 4.8X +Data column - Native ORC Vectorized with copy 322 / 368 48.9 20.5 4.6X +Data column - Hive built-in ORC 2061 / 2084 7.6 131.0 0.7X +Partition column - Native ORC MR 1000 / 1018 15.7 63.6 1.5X +Partition column - Native ORC Vectorized 81 / 88 193.2 5.2 18.1X +Partition column - Native ORC Vectorized with copy 80 / 86 196.5 5.1 18.4X +Partition column - Hive built-in ORC 1212 / 1230 13.0 77.1 1.2X +Both columns - Native ORC MR 1496 / 1528 10.5 95.1 1.0X +Both columns - Native ORC Vectorized 359 / 378 43.8 22.8 4.1X +Both column - Native ORC Vectorized with copy 412 / 442 38.2 26.2 3.6X +Both columns - Hive built-in ORC 2220 / 2224 7.1 141.1 0.7X ================================================================================================ Repeated String Scan ================================================================================================ -OpenJDK 64-Bit Server VM 1.8.0_181-b13 on Linux 3.10.0-862.3.2.el7.x86_64 -Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz +Java HotSpot(TM) 64-Bit Server VM 1.8.0_131-b11 on Mac OS X 10.13.6 +Intel(R) Core(TM) i7-6920HQ CPU @ 2.90GHz Repeated String: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------ -Native ORC MR 1771 / 1785 5.9 168.9 1.0X -Native ORC Vectorized 372 / 375 28.2 35.5 4.8X -Native ORC Vectorized with copy 543 / 576 19.3 51.8 3.3X -Hive built-in ORC 2671 / 2671 3.9 254.7 0.7X +Native ORC MR 1311 / 1319 8.0 125.0 1.0X +Native ORC Vectorized 286 / 375 36.7 27.3 4.6X +Native ORC Vectorized with copy 445 / 456 23.6 42.4 2.9X +Hive built-in ORC 1935 / 1968 5.4 184.6 0.7X ================================================================================================ String with Nulls Scan ================================================================================================ -OpenJDK 64-Bit Server VM 1.8.0_181-b13 on Linux 3.10.0-862.3.2.el7.x86_64 -Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz +Java HotSpot(TM) 64-Bit Server VM 1.8.0_131-b11 on Mac OS X 10.13.6 +Intel(R) Core(TM) i7-6920HQ CPU @ 2.90GHz String with Nulls Scan (0.0%): Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------ -Native ORC MR 3276 / 3302 3.2 312.5 1.0X -Native ORC Vectorized 1057 / 1080 9.9 100.8 3.1X -Native ORC Vectorized with copy 1420 / 1431 7.4 135.4 2.3X -Hive built-in ORC 5377 / 5407 2.0 512.8 0.6X - -OpenJDK 64-Bit Server VM 1.8.0_181-b13 on Linux 3.10.0-862.3.2.el7.x86_64 -Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz -String with Nulls Scan (0.5%): Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative +Native ORC MR 2599 / 2701 4.0 247.8 1.0X +Native ORC Vectorized 818 / 846 12.8 78.0 3.2X +Native ORC Vectorized with copy 1084 / 1149 9.7 103.4 2.4X +Hive built-in ORC 3807 / 3885 2.8 363.1 0.7X + +Java HotSpot(TM) 64-Bit Server VM 1.8.0_131-b11 on Mac OS X 10.13.6 +Intel(R) Core(TM) i7-6920HQ CPU @ 2.90GHz +String with Nulls Scan (50.0%): Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------ -Native ORC MR 3147 / 3147 3.3 300.1 1.0X -Native ORC Vectorized 1305 / 1319 8.0 124.4 2.4X -Native ORC Vectorized with copy 1685 / 1686 6.2 160.7 1.9X -Hive built-in ORC 4077 / 4085 2.6 388.8 0.8X - -OpenJDK 64-Bit Server VM 1.8.0_181-b13 on Linux 3.10.0-862.3.2.el7.x86_64 -Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz -String with Nulls Scan (0.95%): Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative +Native ORC MR 2430 / 2808 4.3 231.7 1.0X +Native ORC Vectorized 2016 / 2508 5.2 192.3 1.2X +Native ORC Vectorized with copy 1268 / 1272 8.3 121.0 1.9X +Hive built-in ORC 3016 / 3030 3.5 287.7 0.8X + +Java HotSpot(TM) 64-Bit Server VM 1.8.0_131-b11 on Mac OS X 10.13.6 +Intel(R) Core(TM) i7-6920HQ CPU @ 2.90GHz +String with Nulls Scan (95.0%): Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------ -Native ORC MR 1739 / 1744 6.0 165.8 1.0X -Native ORC Vectorized 500 / 501 21.0 47.7 3.5X -Native ORC Vectorized with copy 618 / 631 17.0 58.9 2.8X -Hive built-in ORC 2411 / 2427 4.3 229.9 0.7X +Native ORC MR 1216 / 1228 8.6 116.0 1.0X +Native ORC Vectorized 361 / 368 29.1 34.4 3.4X +Native ORC Vectorized with copy 445 / 459 23.6 42.4 2.7X +Hive built-in ORC 1554 / 1574 6.7 148.2 0.8X ================================================================================================ Single Column Scan From Wide Columns ================================================================================================ -OpenJDK 64-Bit Server VM 1.8.0_181-b13 on Linux 3.10.0-862.3.2.el7.x86_64 -Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz +Java HotSpot(TM) 64-Bit Server VM 1.8.0_131-b11 on Mac OS X 10.13.6 +Intel(R) Core(TM) i7-6920HQ CPU @ 2.90GHz Single Column Scan from 100 columns: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------ -Native ORC MR 1348 / 1366 0.8 1285.3 1.0X -Native ORC Vectorized 119 / 134 8.8 113.5 11.3X -Native ORC Vectorized with copy 119 / 148 8.8 113.9 11.3X -Hive built-in ORC 487 / 507 2.2 464.8 2.8X +Native ORC MR 1098 / 1100 1.0 1047.5 1.0X +Native ORC Vectorized 197 / 212 5.3 187.6 5.6X +Native ORC Vectorized with copy 188 / 200 5.6 178.9 5.9X +Hive built-in ORC 409 / 417 2.6 390.4 2.7X -OpenJDK 64-Bit Server VM 1.8.0_181-b13 on Linux 3.10.0-862.3.2.el7.x86_64 -Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz +Java HotSpot(TM) 64-Bit Server VM 1.8.0_131-b11 on Mac OS X 10.13.6 +Intel(R) Core(TM) i7-6920HQ CPU @ 2.90GHz Single Column Scan from 200 columns: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------ -Native ORC MR 2667 / 2837 0.4 2543.6 1.0X -Native ORC Vectorized 203 / 222 5.2 193.4 13.2X -Native ORC Vectorized with copy 217 / 255 4.8 207.0 12.3X -Hive built-in ORC 737 / 741 1.4 702.4 3.6X +Native ORC MR 2251 / 2722 0.5 2147.0 1.0X +Native ORC Vectorized 343 / 351 3.1 326.7 6.6X +Native ORC Vectorized with copy 350 / 369 3.0 334.2 6.4X +Hive built-in ORC 632 / 714 1.7 602.7 3.6X -OpenJDK 64-Bit Server VM 1.8.0_181-b13 on Linux 3.10.0-862.3.2.el7.x86_64 -Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz +Java HotSpot(TM) 64-Bit Server VM 1.8.0_131-b11 on Mac OS X 10.13.6 +Intel(R) Core(TM) i7-6920HQ CPU @ 2.90GHz Single Column Scan from 300 columns: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------ -Native ORC MR 3954 / 3956 0.3 3770.4 1.0X -Native ORC Vectorized 348 / 360 3.0 331.7 11.4X -Native ORC Vectorized with copy 349 / 359 3.0 333.2 11.3X -Hive built-in ORC 1057 / 1067 1.0 1008.0 3.7X +Native ORC MR 3643 / 3936 0.3 3474.4 1.0X +Native ORC Vectorized 550 / 572 1.9 524.1 6.6X +Native ORC Vectorized with copy 536 / 547 2.0 511.6 6.8X +Hive built-in ORC 950 / 1003 1.1 906.5 3.8X diff --git a/sql/hive/src/test/scala/org/apache/spark/sql/hive/orc/OrcReadBenchmark.scala b/sql/hive/src/test/scala/org/apache/spark/sql/hive/orc/OrcReadBenchmark.scala index ec13288f759a6..eb3cde8472dac 100644 --- a/sql/hive/src/test/scala/org/apache/spark/sql/hive/orc/OrcReadBenchmark.scala +++ b/sql/hive/src/test/scala/org/apache/spark/sql/hive/orc/OrcReadBenchmark.scala @@ -32,9 +32,11 @@ import org.apache.spark.sql.types._ * Benchmark to measure ORC read performance. * {{{ * To run this benchmark: - * 1. without sbt: bin/spark-submit --class - * 2. build/sbt "sql/test:runMain " - * 3. generate result: SPARK_GENERATE_BENCHMARK_FILES=1 build/sbt "sql/test:runMain " + * 1. without sbt: bin/spark-submit --class + * --jars ,,,, + * + * 2. build/sbt "hive/test:runMain " + * 3. generate result: SPARK_GENERATE_BENCHMARK_FILES=1 build/sbt "hive/test:runMain " * Results will be written to "benchmarks/OrcReadBenchmark-results.txt". * }}} * @@ -266,8 +268,9 @@ object OrcReadBenchmark extends BenchmarkBase with SQLHelper { s"SELECT IF(RAND(1) < $fractionOfNulls, NULL, CAST(id as STRING)) AS c1, " + s"IF(RAND(2) < $fractionOfNulls, NULL, CAST(id as STRING)) AS c2 FROM t1")) + val percentageOfNulls = fractionOfNulls * 100 val benchmark = - new Benchmark(s"String with Nulls Scan ($fractionOfNulls%)", values, output = output) + new Benchmark(s"String with Nulls Scan ($percentageOfNulls%)", values, output = output) benchmark.addCase("Native ORC MR") { _ => withSQLConf(SQLConf.ORC_VECTORIZED_READER_ENABLED.key -> "false") { From 3067a6d1f63c93b4295425d90e5894d27c840995 Mon Sep 17 00:00:00 2001 From: Gengliang Wang Date: Thu, 8 Nov 2018 13:12:01 +0800 Subject: [PATCH 2/3] re run benchmark --- .../DataSourceReadBenchmark-results.txt | 268 +++++++++--------- 1 file changed, 134 insertions(+), 134 deletions(-) diff --git a/sql/core/benchmarks/DataSourceReadBenchmark-results.txt b/sql/core/benchmarks/DataSourceReadBenchmark-results.txt index 72e44e75492e4..050345bf1d0b4 100644 --- a/sql/core/benchmarks/DataSourceReadBenchmark-results.txt +++ b/sql/core/benchmarks/DataSourceReadBenchmark-results.txt @@ -6,115 +6,115 @@ Java HotSpot(TM) 64-Bit Server VM 1.8.0_131-b11 on Mac OS X 10.13.6 Intel(R) Core(TM) i7-6920HQ CPU @ 2.90GHz SQL Single TINYINT Column Scan: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------ -SQL CSV 15974 / 16222 1.0 1015.6 1.0X -SQL Json 5917 / 6174 2.7 376.2 2.7X -SQL Parquet Vectorized 115 / 128 136.8 7.3 138.9X -SQL Parquet MR 1459 / 1571 10.8 92.8 10.9X -SQL ORC Vectorized 164 / 194 95.8 10.4 97.3X -SQL ORC Vectorized with copy 204 / 303 77.2 12.9 78.4X -SQL ORC MR 1095 / 1143 14.4 69.6 14.6X +SQL CSV 14108 / 14263 1.1 896.9 1.0X +SQL Json 5477 / 5509 2.9 348.2 2.6X +SQL Parquet Vectorized 115 / 125 137.1 7.3 122.9X +SQL Parquet MR 1318 / 1332 11.9 83.8 10.7X +SQL ORC Vectorized 150 / 159 104.9 9.5 94.1X +SQL ORC Vectorized with copy 206 / 208 76.4 13.1 68.5X +SQL ORC MR 1072 / 1075 14.7 68.1 13.2X Java HotSpot(TM) 64-Bit Server VM 1.8.0_131-b11 on Mac OS X 10.13.6 Intel(R) Core(TM) i7-6920HQ CPU @ 2.90GHz Parquet Reader Single TINYINT Column Scan: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------ -ParquetReader Vectorized 139 / 156 113.1 8.8 1.0X -ParquetReader Vectorized -> Row 83 / 89 188.7 5.3 1.7X +ParquetReader Vectorized 138 / 152 114.0 8.8 1.0X +ParquetReader Vectorized -> Row 80 / 87 197.2 5.1 1.7X Java HotSpot(TM) 64-Bit Server VM 1.8.0_131-b11 on Mac OS X 10.13.6 Intel(R) Core(TM) i7-6920HQ CPU @ 2.90GHz SQL Single SMALLINT Column Scan: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------ -SQL CSV 16394 / 16643 1.0 1042.3 1.0X -SQL Json 6014 / 6020 2.6 382.4 2.7X -SQL Parquet Vectorized 147 / 155 106.9 9.4 111.4X -SQL Parquet MR 1575 / 1581 10.0 100.1 10.4X -SQL ORC Vectorized 168 / 173 93.9 10.7 97.9X -SQL ORC Vectorized with copy 219 / 227 71.8 13.9 74.8X -SQL ORC MR 1185 / 1187 13.3 75.3 13.8X +SQL CSV 14495 / 14507 1.1 921.6 1.0X +SQL Json 5615 / 5668 2.8 357.0 2.6X +SQL Parquet Vectorized 147 / 154 107.4 9.3 98.9X +SQL Parquet MR 1431 / 1454 11.0 91.0 10.1X +SQL ORC Vectorized 170 / 175 92.4 10.8 85.1X +SQL ORC Vectorized with copy 223 / 228 70.6 14.2 65.1X +SQL ORC MR 1187 / 1197 13.2 75.5 12.2X Java HotSpot(TM) 64-Bit Server VM 1.8.0_131-b11 on Mac OS X 10.13.6 Intel(R) Core(TM) i7-6920HQ CPU @ 2.90GHz Parquet Reader Single SMALLINT Column Scan: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------ -ParquetReader Vectorized 193 / 216 81.4 12.3 1.0X -ParquetReader Vectorized -> Row 160 / 175 98.3 10.2 1.2X +ParquetReader Vectorized 190 / 219 82.8 12.1 1.0X +ParquetReader Vectorized -> Row 165 / 169 95.2 10.5 1.1X Java HotSpot(TM) 64-Bit Server VM 1.8.0_131-b11 on Mac OS X 10.13.6 Intel(R) Core(TM) i7-6920HQ CPU @ 2.90GHz SQL Single INT Column Scan: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------ -SQL CSV 17168 / 17306 0.9 1091.5 1.0X -SQL Json 6167 / 6180 2.6 392.1 2.8X -SQL Parquet Vectorized 134 / 142 117.5 8.5 128.2X -SQL Parquet MR 1659 / 1740 9.5 105.5 10.3X -SQL ORC Vectorized 225 / 229 69.9 14.3 76.3X -SQL ORC Vectorized with copy 231 / 235 68.2 14.7 74.4X -SQL ORC MR 1287 / 1388 12.2 81.8 13.3X +SQL CSV 16105 / 16214 1.0 1023.9 1.0X +SQL Json 6289 / 6291 2.5 399.8 2.6X +SQL Parquet Vectorized 142 / 148 111.0 9.0 113.6X +SQL Parquet MR 1797 / 1801 8.8 114.2 9.0X +SQL ORC Vectorized 232 / 238 67.9 14.7 69.5X +SQL ORC Vectorized with copy 237 / 242 66.5 15.0 68.1X +SQL ORC MR 1309 / 1409 12.0 83.2 12.3X Java HotSpot(TM) 64-Bit Server VM 1.8.0_131-b11 on Mac OS X 10.13.6 Intel(R) Core(TM) i7-6920HQ CPU @ 2.90GHz Parquet Reader Single INT Column Scan: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------ -ParquetReader Vectorized 178 / 187 88.2 11.3 1.0X -ParquetReader Vectorized -> Row 174 / 184 90.3 11.1 1.0X +ParquetReader Vectorized 181 / 225 87.0 11.5 1.0X +ParquetReader Vectorized -> Row 180 / 186 87.4 11.4 1.0X Java HotSpot(TM) 64-Bit Server VM 1.8.0_131-b11 on Mac OS X 10.13.6 Intel(R) Core(TM) i7-6920HQ CPU @ 2.90GHz SQL Single BIGINT Column Scan: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------ -SQL CSV 21253 / 21542 0.7 1351.2 1.0X -SQL Json 8208 / 8209 1.9 521.9 2.6X -SQL Parquet Vectorized 180 / 241 87.3 11.5 117.9X -SQL Parquet MR 1769 / 1801 8.9 112.5 12.0X -SQL ORC Vectorized 3271 / 3277 4.8 207.9 6.5X -SQL ORC Vectorized with copy 290 / 297 54.3 18.4 73.4X -SQL ORC MR 1468 / 1514 10.7 93.4 14.5X +SQL CSV 20128 / 20682 0.8 1279.7 1.0X +SQL Json 8277 / 8279 1.9 526.3 2.4X +SQL Parquet Vectorized 198 / 211 79.3 12.6 101.5X +SQL Parquet MR 1788 / 1816 8.8 113.7 11.3X +SQL ORC Vectorized 273 / 290 57.6 17.4 73.7X +SQL ORC Vectorized with copy 292 / 305 53.8 18.6 68.9X +SQL ORC MR 1431 / 1435 11.0 91.0 14.1X Java HotSpot(TM) 64-Bit Server VM 1.8.0_131-b11 on Mac OS X 10.13.6 Intel(R) Core(TM) i7-6920HQ CPU @ 2.90GHz Parquet Reader Single BIGINT Column Scan: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------ -ParquetReader Vectorized 290 / 304 54.2 18.5 1.0X -ParquetReader Vectorized -> Row 246 / 270 64.1 15.6 1.2X +ParquetReader Vectorized 250 / 291 63.0 15.9 1.0X +ParquetReader Vectorized -> Row 261 / 282 60.2 16.6 1.0X Java HotSpot(TM) 64-Bit Server VM 1.8.0_131-b11 on Mac OS X 10.13.6 Intel(R) Core(TM) i7-6920HQ CPU @ 2.90GHz SQL Single FLOAT Column Scan: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------ -SQL CSV 18835 / 19052 0.8 1197.5 1.0X -SQL Json 8774 / 8780 1.8 557.8 2.1X -SQL Parquet Vectorized 150 / 162 104.8 9.5 125.5X -SQL Parquet MR 1727 / 1779 9.1 109.8 10.9X -SQL ORC Vectorized 311 / 327 50.6 19.8 60.5X -SQL ORC Vectorized with copy 314 / 320 50.2 19.9 60.1X -SQL ORC MR 1384 / 1386 11.4 88.0 13.6X +SQL CSV 16456 / 16576 1.0 1046.3 1.0X +SQL Json 9041 / 9348 1.7 574.8 1.8X +SQL Parquet Vectorized 143 / 150 110.3 9.1 115.4X +SQL Parquet MR 1623 / 1628 9.7 103.2 10.1X +SQL ORC Vectorized 305 / 309 51.6 19.4 54.0X +SQL ORC Vectorized with copy 301 / 311 52.3 19.1 54.7X +SQL ORC MR 1352 / 1362 11.6 86.0 12.2X Java HotSpot(TM) 64-Bit Server VM 1.8.0_131-b11 on Mac OS X 10.13.6 Intel(R) Core(TM) i7-6920HQ CPU @ 2.90GHz Parquet Reader Single FLOAT Column Scan: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------ -ParquetReader Vectorized 187 / 208 84.0 11.9 1.0X -ParquetReader Vectorized -> Row 184 / 194 85.3 11.7 1.0X +ParquetReader Vectorized 180 / 205 87.2 11.5 1.0X +ParquetReader Vectorized -> Row 171 / 184 92.1 10.9 1.1X Java HotSpot(TM) 64-Bit Server VM 1.8.0_131-b11 on Mac OS X 10.13.6 Intel(R) Core(TM) i7-6920HQ CPU @ 2.90GHz SQL Single DOUBLE Column Scan: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------ -SQL CSV 22506 / 22602 0.7 1430.9 1.0X -SQL Json 11925 / 11942 1.3 758.2 1.9X -SQL Parquet Vectorized 217 / 229 72.6 13.8 103.8X -SQL Parquet MR 1795 / 1856 8.8 114.1 12.5X -SQL ORC Vectorized 392 / 396 40.1 24.9 57.4X -SQL ORC Vectorized with copy 394 / 397 40.0 25.0 57.2X -SQL ORC MR 1524 / 1535 10.3 96.9 14.8X +SQL CSV 20249 / 20324 0.8 1287.4 1.0X +SQL Json 11745 / 11750 1.3 746.7 1.7X +SQL Parquet Vectorized 202 / 232 77.7 12.9 100.0X +SQL Parquet MR 1783 / 1803 8.8 113.4 11.4X +SQL ORC Vectorized 376 / 379 41.8 23.9 53.8X +SQL ORC Vectorized with copy 379 / 388 41.5 24.1 53.4X +SQL ORC MR 1492 / 1493 10.5 94.9 13.6X Java HotSpot(TM) 64-Bit Server VM 1.8.0_131-b11 on Mac OS X 10.13.6 Intel(R) Core(TM) i7-6920HQ CPU @ 2.90GHz Parquet Reader Single DOUBLE Column Scan: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------ -ParquetReader Vectorized 305 / 349 51.5 19.4 1.0X -ParquetReader Vectorized -> Row 263 / 290 59.8 16.7 1.2X +ParquetReader Vectorized 251 / 304 62.8 15.9 1.0X +ParquetReader Vectorized -> Row 238 / 266 65.9 15.2 1.1X ================================================================================================ @@ -125,13 +125,13 @@ Java HotSpot(TM) 64-Bit Server VM 1.8.0_131-b11 on Mac OS X 10.13.6 Intel(R) Core(TM) i7-6920HQ CPU @ 2.90GHz Int and String Scan: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------ -SQL CSV 16802 / 16833 0.6 1602.3 1.0X -SQL Json 8958 / 8972 1.2 854.3 1.9X -SQL Parquet Vectorized 1891 / 1960 5.5 180.3 8.9X -SQL Parquet MR 3796 / 3808 2.8 362.0 4.4X -SQL ORC Vectorized 2020 / 2064 5.2 192.7 8.3X -SQL ORC Vectorized with copy 2160 / 2161 4.9 206.0 7.8X -SQL ORC MR 3800 / 3807 2.8 362.4 4.4X +SQL CSV 14097 / 14208 0.7 1344.4 1.0X +SQL Json 8323 / 8365 1.3 793.8 1.7X +SQL Parquet Vectorized 1808 / 1852 5.8 172.4 7.8X +SQL Parquet MR 3401 / 3421 3.1 324.4 4.1X +SQL ORC Vectorized 1872 / 1964 5.6 178.5 7.5X +SQL ORC Vectorized with copy 1973 / 2022 5.3 188.2 7.1X +SQL ORC MR 3066 / 3206 3.4 292.4 4.6X ================================================================================================ @@ -142,13 +142,13 @@ Java HotSpot(TM) 64-Bit Server VM 1.8.0_131-b11 on Mac OS X 10.13.6 Intel(R) Core(TM) i7-6920HQ CPU @ 2.90GHz Repeated String: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------ -SQL CSV 10020 / 10068 1.0 955.6 1.0X -SQL Json 4944 / 4947 2.1 471.5 2.0X -SQL Parquet Vectorized 620 / 639 16.9 59.1 16.2X -SQL Parquet MR 1389 / 1389 7.6 132.4 7.2X -SQL ORC Vectorized 399 / 413 26.3 38.0 25.1X -SQL ORC Vectorized with copy 573 / 585 18.3 54.6 17.5X -SQL ORC MR 1479 / 1483 7.1 141.0 6.8X +SQL CSV 8060 / 8354 1.3 768.6 1.0X +SQL Json 4647 / 4664 2.3 443.2 1.7X +SQL Parquet Vectorized 654 / 675 16.0 62.4 12.3X +SQL Parquet MR 1324 / 1337 7.9 126.3 6.1X +SQL ORC Vectorized 397 / 404 26.4 37.9 20.3X +SQL ORC Vectorized with copy 560 / 572 18.7 53.4 14.4X +SQL ORC MR 1437 / 1455 7.3 137.1 5.6X ================================================================================================ @@ -159,27 +159,27 @@ Java HotSpot(TM) 64-Bit Server VM 1.8.0_131-b11 on Mac OS X 10.13.6 Intel(R) Core(TM) i7-6920HQ CPU @ 2.90GHz Partitioned Table: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------ -Data column - CSV 22274 / 22397 0.7 1416.1 1.0X -Data column - Json 8705 / 8706 1.8 553.5 2.6X -Data column - Parquet Vectorized 201 / 213 78.3 12.8 110.9X -Data column - Parquet MR 2270 / 2286 6.9 144.3 9.8X -Data column - ORC Vectorized 289 / 295 54.4 18.4 77.0X -Data column - ORC Vectorized with copy 290 / 296 54.3 18.4 76.8X -Data column - ORC MR 1675 / 1678 9.4 106.5 13.3X -Partition column - CSV 4527 / 4554 3.5 287.8 4.9X -Partition column - Json 3374 / 3376 4.7 214.5 6.6X -Partition column - Parquet Vectorized 69 / 77 227.5 4.4 322.2X -Partition column - Parquet MR 1132 / 1143 13.9 72.0 19.7X -Partition column - ORC Vectorized 69 / 76 227.6 4.4 322.3X -Partition column - ORC Vectorized with copy 74 / 78 213.5 4.7 302.3X -Partition column - ORC MR 1141 / 1144 13.8 72.5 19.5X -Both columns - CSV 20791 / 20800 0.8 1321.9 1.1X -Both columns - Json 8798 / 8815 1.8 559.4 2.5X -Both columns - Parquet Vectorized 237 / 248 66.2 15.1 93.8X -Both columns - Parquet MR 2523 / 2528 6.2 160.4 8.8X -Both columns - ORC Vectorized 323 / 331 48.6 20.6 68.9X -Both column - ORC Vectorized with copy 367 / 376 42.8 23.4 60.6X -Both columns - ORC MR 1813 / 1842 8.7 115.3 12.3X +Data column - CSV 18310 / 18506 0.9 1164.1 1.0X +Data column - Json 8752 / 8762 1.8 556.4 2.1X +Data column - Parquet Vectorized 208 / 218 75.6 13.2 88.0X +Data column - Parquet MR 2384 / 2396 6.6 151.5 7.7X +Data column - ORC Vectorized 282 / 294 55.8 17.9 65.0X +Data column - ORC Vectorized with copy 291 / 300 54.0 18.5 62.9X +Data column - ORC MR 1681 / 1692 9.4 106.9 10.9X +Partition column - CSV 4502 / 4542 3.5 286.2 4.1X +Partition column - Json 3404 / 3415 4.6 216.4 5.4X +Partition column - Parquet Vectorized 70 / 76 225.7 4.4 262.7X +Partition column - Parquet MR 1206 / 1211 13.0 76.7 15.2X +Partition column - ORC Vectorized 70 / 77 225.7 4.4 262.7X +Partition column - ORC Vectorized with copy 68 / 78 230.1 4.3 267.9X +Partition column - ORC MR 1193 / 1195 13.2 75.8 15.4X +Both columns - CSV 17883 / 18028 0.9 1137.0 1.0X +Both columns - Json 9089 / 9100 1.7 577.8 2.0X +Both columns - Parquet Vectorized 235 / 242 67.0 14.9 78.0X +Both columns - Parquet MR 2583 / 2618 6.1 164.2 7.1X +Both columns - ORC Vectorized 327 / 340 48.1 20.8 56.0X +Both column - ORC Vectorized with copy 360 / 374 43.7 22.9 50.9X +Both columns - ORC MR 1875 / 1899 8.4 119.2 9.8X ================================================================================================ @@ -190,40 +190,40 @@ Java HotSpot(TM) 64-Bit Server VM 1.8.0_131-b11 on Mac OS X 10.13.6 Intel(R) Core(TM) i7-6920HQ CPU @ 2.90GHz String with Nulls Scan (0.0%): Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------ -SQL CSV 11268 / 11281 0.9 1074.6 1.0X -SQL Json 6597 / 6698 1.6 629.1 1.7X -SQL Parquet Vectorized 1071 / 1080 9.8 102.1 10.5X -SQL Parquet MR 2861 / 2873 3.7 272.9 3.9X -ParquetReader Vectorized 754 / 760 13.9 71.9 14.9X -SQL ORC Vectorized 840 / 847 12.5 80.1 13.4X -SQL ORC Vectorized with copy 1190 / 1212 8.8 113.5 9.5X -SQL ORC MR 2762 / 2868 3.8 263.4 4.1X +SQL CSV 9293 / 9393 1.1 886.2 1.0X +SQL Json 6686 / 6702 1.6 637.7 1.4X +SQL Parquet Vectorized 1102 / 1104 9.5 105.1 8.4X +SQL Parquet MR 2954 / 3000 3.5 281.7 3.1X +ParquetReader Vectorized 830 / 839 12.6 79.2 11.2X +SQL ORC Vectorized 861 / 869 12.2 82.2 10.8X +SQL ORC Vectorized with copy 1243 / 1283 8.4 118.5 7.5X +SQL ORC MR 2752 / 2783 3.8 262.4 3.4X Java HotSpot(TM) 64-Bit Server VM 1.8.0_131-b11 on Mac OS X 10.13.6 Intel(R) Core(TM) i7-6920HQ CPU @ 2.90GHz String with Nulls Scan (50.0%): Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------ -SQL CSV 10541 / 10591 1.0 1005.3 1.0X -SQL Json 5121 / 5139 2.0 488.4 2.1X -SQL Parquet Vectorized 782 / 799 13.4 74.5 13.5X -SQL Parquet MR 2043 / 2062 5.1 194.8 5.2X -ParquetReader Vectorized 718 / 732 14.6 68.4 14.7X -SQL ORC Vectorized 947 / 970 11.1 90.4 11.1X -SQL ORC Vectorized with copy 1220 / 1226 8.6 116.3 8.6X -SQL ORC MR 2466 / 2470 4.3 235.2 4.3X +SQL CSV 10460 / 10566 1.0 997.5 1.0X +SQL Json 5042 / 5082 2.1 480.8 2.1X +SQL Parquet Vectorized 809 / 820 13.0 77.2 12.9X +SQL Parquet MR 2083 / 2093 5.0 198.7 5.0X +ParquetReader Vectorized 723 / 745 14.5 68.9 14.5X +SQL ORC Vectorized 1018 / 1037 10.3 97.1 10.3X +SQL ORC Vectorized with copy 1301 / 1308 8.1 124.0 8.0X +SQL ORC MR 2542 / 2579 4.1 242.5 4.1X Java HotSpot(TM) 64-Bit Server VM 1.8.0_131-b11 on Mac OS X 10.13.6 Intel(R) Core(TM) i7-6920HQ CPU @ 2.90GHz String with Nulls Scan (95.0%): Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------ -SQL CSV 8786 / 8820 1.2 837.9 1.0X -SQL Json 2964 / 2991 3.5 282.7 3.0X -SQL Parquet Vectorized 181 / 186 58.0 17.2 48.6X -SQL Parquet MR 1275 / 1277 8.2 121.6 6.9X -ParquetReader Vectorized 163 / 165 64.5 15.5 54.0X -SQL ORC Vectorized 324 / 330 32.3 30.9 27.1X -SQL ORC Vectorized with copy 407 / 411 25.8 38.8 21.6X -SQL ORC MR 1154 / 1167 9.1 110.0 7.6X +SQL CSV 8574 / 8631 1.2 817.6 1.0X +SQL Json 3098 / 3120 3.4 295.5 2.8X +SQL Parquet Vectorized 185 / 190 56.7 17.6 46.3X +SQL Parquet MR 1263 / 1286 8.3 120.4 6.8X +ParquetReader Vectorized 167 / 173 62.8 15.9 51.3X +SQL ORC Vectorized 333 / 336 31.5 31.7 25.8X +SQL ORC Vectorized with copy 410 / 416 25.6 39.1 20.9X +SQL ORC MR 1215 / 1222 8.6 115.9 7.1X ================================================================================================ @@ -234,36 +234,36 @@ Java HotSpot(TM) 64-Bit Server VM 1.8.0_131-b11 on Mac OS X 10.13.6 Intel(R) Core(TM) i7-6920HQ CPU @ 2.90GHz Single Column Scan from 10 columns: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------ -SQL CSV 2026 / 2029 0.5 1932.0 1.0X -SQL Json 1662 / 1669 0.6 1585.4 1.2X -SQL Parquet Vectorized 48 / 52 22.0 45.5 42.5X -SQL Parquet MR 172 / 177 6.1 163.7 11.8X -SQL ORC Vectorized 53 / 59 19.6 51.0 37.9X -SQL ORC Vectorized with copy 56 / 59 18.9 53.0 36.5X -SQL ORC MR 226 / 231 4.6 216.0 8.9X +SQL CSV 2031 / 2053 0.5 1936.5 1.0X +SQL Json 1737 / 1740 0.6 1656.4 1.2X +SQL Parquet Vectorized 45 / 54 23.4 42.7 45.3X +SQL Parquet MR 166 / 174 6.3 158.1 12.2X +SQL ORC Vectorized 56 / 59 18.7 53.6 36.1X +SQL ORC Vectorized with copy 54 / 68 19.4 51.7 37.5X +SQL ORC MR 239 / 252 4.4 228.3 8.5X Java HotSpot(TM) 64-Bit Server VM 1.8.0_131-b11 on Mac OS X 10.13.6 Intel(R) Core(TM) i7-6920HQ CPU @ 2.90GHz Single Column Scan from 50 columns: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------ -SQL CSV 4762 / 4787 0.2 4541.7 1.0X -SQL Json 5733 / 5740 0.2 5467.3 0.8X -SQL Parquet Vectorized 104 / 109 10.1 98.9 45.9X -SQL Parquet MR 212 / 218 4.9 202.4 22.4X -SQL ORC Vectorized 105 / 118 10.0 99.7 45.6X -SQL ORC Vectorized with copy 115 / 123 9.1 109.5 41.5X -SQL ORC MR 701 / 707 1.5 668.6 6.8X +SQL CSV 5169 / 5238 0.2 4929.3 1.0X +SQL Json 6279 / 6494 0.2 5987.8 0.8X +SQL Parquet Vectorized 102 / 110 10.2 97.6 50.5X +SQL Parquet MR 202 / 218 5.2 192.7 25.6X +SQL ORC Vectorized 113 / 123 9.3 107.3 45.9X +SQL ORC Vectorized with copy 119 / 127 8.8 113.1 43.6X +SQL ORC MR 750 / 772 1.4 715.5 6.9X Java HotSpot(TM) 64-Bit Server VM 1.8.0_131-b11 on Mac OS X 10.13.6 Intel(R) Core(TM) i7-6920HQ CPU @ 2.90GHz Single Column Scan from 100 columns: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------ -SQL CSV 7931 / 7972 0.1 7563.6 1.0X -SQL Json 10984 / 11037 0.1 10475.1 0.7X -SQL Parquet Vectorized 178 / 185 5.9 169.9 44.5X -SQL Parquet MR 260 / 267 4.0 247.6 30.5X -SQL ORC Vectorized 169 / 184 6.2 161.0 47.0X -SQL ORC Vectorized with copy 182 / 189 5.8 173.4 43.6X -SQL ORC MR 1287 / 1288 0.8 1227.0 6.2X +SQL CSV 8212 / 8480 0.1 7831.4 1.0X +SQL Json 11582 / 11690 0.1 11045.3 0.7X +SQL Parquet Vectorized 171 / 182 6.1 163.0 48.0X +SQL Parquet MR 258 / 265 4.1 245.8 31.9X +SQL ORC Vectorized 178 / 191 5.9 169.8 46.1X +SQL ORC Vectorized with copy 181 / 187 5.8 172.6 45.4X +SQL ORC MR 1342 / 1350 0.8 1279.9 6.1X From 5590304e00b4d90f3ce019d449c55721cc2bacf3 Mon Sep 17 00:00:00 2001 From: Dongjoon Hyun Date: Thu, 8 Nov 2018 09:04:29 +0000 Subject: [PATCH 3/3] update test results --- .../DataSourceReadBenchmark-results.txt | 372 +++++++++--------- .../benchmarks/OrcReadBenchmark-results.txt | 196 ++++----- 2 files changed, 284 insertions(+), 284 deletions(-) diff --git a/sql/core/benchmarks/DataSourceReadBenchmark-results.txt b/sql/core/benchmarks/DataSourceReadBenchmark-results.txt index 050345bf1d0b4..b07e8b1197ff0 100644 --- a/sql/core/benchmarks/DataSourceReadBenchmark-results.txt +++ b/sql/core/benchmarks/DataSourceReadBenchmark-results.txt @@ -2,268 +2,268 @@ SQL Single Numeric Column Scan ================================================================================================ -Java HotSpot(TM) 64-Bit Server VM 1.8.0_131-b11 on Mac OS X 10.13.6 -Intel(R) Core(TM) i7-6920HQ CPU @ 2.90GHz +OpenJDK 64-Bit Server VM 1.8.0_191-b12 on Linux 3.10.0-862.3.2.el7.x86_64 +Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz SQL Single TINYINT Column Scan: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------ -SQL CSV 14108 / 14263 1.1 896.9 1.0X -SQL Json 5477 / 5509 2.9 348.2 2.6X -SQL Parquet Vectorized 115 / 125 137.1 7.3 122.9X -SQL Parquet MR 1318 / 1332 11.9 83.8 10.7X -SQL ORC Vectorized 150 / 159 104.9 9.5 94.1X -SQL ORC Vectorized with copy 206 / 208 76.4 13.1 68.5X -SQL ORC MR 1072 / 1075 14.7 68.1 13.2X - -Java HotSpot(TM) 64-Bit Server VM 1.8.0_131-b11 on Mac OS X 10.13.6 -Intel(R) Core(TM) i7-6920HQ CPU @ 2.90GHz +SQL CSV 26366 / 26562 0.6 1676.3 1.0X +SQL Json 8709 / 8724 1.8 553.7 3.0X +SQL Parquet Vectorized 166 / 187 94.8 10.5 159.0X +SQL Parquet MR 1706 / 1720 9.2 108.4 15.5X +SQL ORC Vectorized 167 / 174 94.2 10.6 157.9X +SQL ORC Vectorized with copy 226 / 231 69.6 14.4 116.7X +SQL ORC MR 1433 / 1465 11.0 91.1 18.4X + +OpenJDK 64-Bit Server VM 1.8.0_191-b12 on Linux 3.10.0-862.3.2.el7.x86_64 +Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz Parquet Reader Single TINYINT Column Scan: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------ -ParquetReader Vectorized 138 / 152 114.0 8.8 1.0X -ParquetReader Vectorized -> Row 80 / 87 197.2 5.1 1.7X +ParquetReader Vectorized 200 / 207 78.7 12.7 1.0X +ParquetReader Vectorized -> Row 117 / 119 134.7 7.4 1.7X -Java HotSpot(TM) 64-Bit Server VM 1.8.0_131-b11 on Mac OS X 10.13.6 -Intel(R) Core(TM) i7-6920HQ CPU @ 2.90GHz +OpenJDK 64-Bit Server VM 1.8.0_191-b12 on Linux 3.10.0-862.3.2.el7.x86_64 +Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz SQL Single SMALLINT Column Scan: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------ -SQL CSV 14495 / 14507 1.1 921.6 1.0X -SQL Json 5615 / 5668 2.8 357.0 2.6X -SQL Parquet Vectorized 147 / 154 107.4 9.3 98.9X -SQL Parquet MR 1431 / 1454 11.0 91.0 10.1X -SQL ORC Vectorized 170 / 175 92.4 10.8 85.1X -SQL ORC Vectorized with copy 223 / 228 70.6 14.2 65.1X -SQL ORC MR 1187 / 1197 13.2 75.5 12.2X - -Java HotSpot(TM) 64-Bit Server VM 1.8.0_131-b11 on Mac OS X 10.13.6 -Intel(R) Core(TM) i7-6920HQ CPU @ 2.90GHz +SQL CSV 26489 / 26547 0.6 1684.1 1.0X +SQL Json 8990 / 8998 1.7 571.5 2.9X +SQL Parquet Vectorized 209 / 221 75.1 13.3 126.5X +SQL Parquet MR 1949 / 1949 8.1 123.9 13.6X +SQL ORC Vectorized 221 / 228 71.3 14.0 120.1X +SQL ORC Vectorized with copy 315 / 319 49.9 20.1 84.0X +SQL ORC MR 1527 / 1549 10.3 97.1 17.3X + +OpenJDK 64-Bit Server VM 1.8.0_191-b12 on Linux 3.10.0-862.3.2.el7.x86_64 +Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz Parquet Reader Single SMALLINT Column Scan: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------ -ParquetReader Vectorized 190 / 219 82.8 12.1 1.0X -ParquetReader Vectorized -> Row 165 / 169 95.2 10.5 1.1X +ParquetReader Vectorized 286 / 296 54.9 18.2 1.0X +ParquetReader Vectorized -> Row 249 / 253 63.1 15.8 1.1X -Java HotSpot(TM) 64-Bit Server VM 1.8.0_131-b11 on Mac OS X 10.13.6 -Intel(R) Core(TM) i7-6920HQ CPU @ 2.90GHz +OpenJDK 64-Bit Server VM 1.8.0_191-b12 on Linux 3.10.0-862.3.2.el7.x86_64 +Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz SQL Single INT Column Scan: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------ -SQL CSV 16105 / 16214 1.0 1023.9 1.0X -SQL Json 6289 / 6291 2.5 399.8 2.6X -SQL Parquet Vectorized 142 / 148 111.0 9.0 113.6X -SQL Parquet MR 1797 / 1801 8.8 114.2 9.0X -SQL ORC Vectorized 232 / 238 67.9 14.7 69.5X -SQL ORC Vectorized with copy 237 / 242 66.5 15.0 68.1X -SQL ORC MR 1309 / 1409 12.0 83.2 12.3X - -Java HotSpot(TM) 64-Bit Server VM 1.8.0_131-b11 on Mac OS X 10.13.6 -Intel(R) Core(TM) i7-6920HQ CPU @ 2.90GHz +SQL CSV 27701 / 27744 0.6 1761.2 1.0X +SQL Json 9703 / 9733 1.6 616.9 2.9X +SQL Parquet Vectorized 176 / 182 89.2 11.2 157.0X +SQL Parquet MR 2164 / 2173 7.3 137.6 12.8X +SQL ORC Vectorized 307 / 314 51.2 19.5 90.2X +SQL ORC Vectorized with copy 312 / 319 50.4 19.8 88.7X +SQL ORC MR 1690 / 1700 9.3 107.4 16.4X + +OpenJDK 64-Bit Server VM 1.8.0_191-b12 on Linux 3.10.0-862.3.2.el7.x86_64 +Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz Parquet Reader Single INT Column Scan: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------ -ParquetReader Vectorized 181 / 225 87.0 11.5 1.0X -ParquetReader Vectorized -> Row 180 / 186 87.4 11.4 1.0X +ParquetReader Vectorized 259 / 277 60.7 16.5 1.0X +ParquetReader Vectorized -> Row 261 / 265 60.3 16.6 1.0X -Java HotSpot(TM) 64-Bit Server VM 1.8.0_131-b11 on Mac OS X 10.13.6 -Intel(R) Core(TM) i7-6920HQ CPU @ 2.90GHz +OpenJDK 64-Bit Server VM 1.8.0_191-b12 on Linux 3.10.0-862.3.2.el7.x86_64 +Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz SQL Single BIGINT Column Scan: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------ -SQL CSV 20128 / 20682 0.8 1279.7 1.0X -SQL Json 8277 / 8279 1.9 526.3 2.4X -SQL Parquet Vectorized 198 / 211 79.3 12.6 101.5X -SQL Parquet MR 1788 / 1816 8.8 113.7 11.3X -SQL ORC Vectorized 273 / 290 57.6 17.4 73.7X -SQL ORC Vectorized with copy 292 / 305 53.8 18.6 68.9X -SQL ORC MR 1431 / 1435 11.0 91.0 14.1X - -Java HotSpot(TM) 64-Bit Server VM 1.8.0_131-b11 on Mac OS X 10.13.6 -Intel(R) Core(TM) i7-6920HQ CPU @ 2.90GHz +SQL CSV 34813 / 34900 0.5 2213.3 1.0X +SQL Json 12570 / 12617 1.3 799.2 2.8X +SQL Parquet Vectorized 270 / 308 58.2 17.2 128.9X +SQL Parquet MR 2427 / 2431 6.5 154.3 14.3X +SQL ORC Vectorized 388 / 398 40.6 24.6 89.8X +SQL ORC Vectorized with copy 395 / 402 39.9 25.1 88.2X +SQL ORC MR 1819 / 1851 8.6 115.7 19.1X + +OpenJDK 64-Bit Server VM 1.8.0_191-b12 on Linux 3.10.0-862.3.2.el7.x86_64 +Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz Parquet Reader Single BIGINT Column Scan: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------ -ParquetReader Vectorized 250 / 291 63.0 15.9 1.0X -ParquetReader Vectorized -> Row 261 / 282 60.2 16.6 1.0X +ParquetReader Vectorized 372 / 379 42.3 23.7 1.0X +ParquetReader Vectorized -> Row 357 / 368 44.1 22.7 1.0X -Java HotSpot(TM) 64-Bit Server VM 1.8.0_131-b11 on Mac OS X 10.13.6 -Intel(R) Core(TM) i7-6920HQ CPU @ 2.90GHz +OpenJDK 64-Bit Server VM 1.8.0_191-b12 on Linux 3.10.0-862.3.2.el7.x86_64 +Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz SQL Single FLOAT Column Scan: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------ -SQL CSV 16456 / 16576 1.0 1046.3 1.0X -SQL Json 9041 / 9348 1.7 574.8 1.8X -SQL Parquet Vectorized 143 / 150 110.3 9.1 115.4X -SQL Parquet MR 1623 / 1628 9.7 103.2 10.1X -SQL ORC Vectorized 305 / 309 51.6 19.4 54.0X -SQL ORC Vectorized with copy 301 / 311 52.3 19.1 54.7X -SQL ORC MR 1352 / 1362 11.6 86.0 12.2X - -Java HotSpot(TM) 64-Bit Server VM 1.8.0_131-b11 on Mac OS X 10.13.6 -Intel(R) Core(TM) i7-6920HQ CPU @ 2.90GHz +SQL CSV 28753 / 28781 0.5 1828.0 1.0X +SQL Json 12039 / 12215 1.3 765.4 2.4X +SQL Parquet Vectorized 170 / 177 92.4 10.8 169.0X +SQL Parquet MR 2184 / 2196 7.2 138.9 13.2X +SQL ORC Vectorized 432 / 440 36.4 27.5 66.5X +SQL ORC Vectorized with copy 439 / 442 35.9 27.9 65.6X +SQL ORC MR 1812 / 1833 8.7 115.2 15.9X + +OpenJDK 64-Bit Server VM 1.8.0_191-b12 on Linux 3.10.0-862.3.2.el7.x86_64 +Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz Parquet Reader Single FLOAT Column Scan: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------ -ParquetReader Vectorized 180 / 205 87.2 11.5 1.0X -ParquetReader Vectorized -> Row 171 / 184 92.1 10.9 1.1X +ParquetReader Vectorized 253 / 260 62.2 16.1 1.0X +ParquetReader Vectorized -> Row 256 / 257 61.6 16.2 1.0X -Java HotSpot(TM) 64-Bit Server VM 1.8.0_131-b11 on Mac OS X 10.13.6 -Intel(R) Core(TM) i7-6920HQ CPU @ 2.90GHz +OpenJDK 64-Bit Server VM 1.8.0_191-b12 on Linux 3.10.0-862.3.2.el7.x86_64 +Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz SQL Single DOUBLE Column Scan: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------ -SQL CSV 20249 / 20324 0.8 1287.4 1.0X -SQL Json 11745 / 11750 1.3 746.7 1.7X -SQL Parquet Vectorized 202 / 232 77.7 12.9 100.0X -SQL Parquet MR 1783 / 1803 8.8 113.4 11.4X -SQL ORC Vectorized 376 / 379 41.8 23.9 53.8X -SQL ORC Vectorized with copy 379 / 388 41.5 24.1 53.4X -SQL ORC MR 1492 / 1493 10.5 94.9 13.6X - -Java HotSpot(TM) 64-Bit Server VM 1.8.0_131-b11 on Mac OS X 10.13.6 -Intel(R) Core(TM) i7-6920HQ CPU @ 2.90GHz +SQL CSV 36177 / 36188 0.4 2300.1 1.0X +SQL Json 18895 / 18898 0.8 1201.3 1.9X +SQL Parquet Vectorized 267 / 276 58.9 17.0 135.6X +SQL Parquet MR 2355 / 2363 6.7 149.7 15.4X +SQL ORC Vectorized 543 / 546 29.0 34.5 66.6X +SQL ORC Vectorized with copy 548 / 557 28.7 34.8 66.0X +SQL ORC MR 2246 / 2258 7.0 142.8 16.1X + +OpenJDK 64-Bit Server VM 1.8.0_191-b12 on Linux 3.10.0-862.3.2.el7.x86_64 +Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz Parquet Reader Single DOUBLE Column Scan: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------ -ParquetReader Vectorized 251 / 304 62.8 15.9 1.0X -ParquetReader Vectorized -> Row 238 / 266 65.9 15.2 1.1X +ParquetReader Vectorized 353 / 367 44.6 22.4 1.0X +ParquetReader Vectorized -> Row 351 / 357 44.7 22.3 1.0X ================================================================================================ Int and String Scan ================================================================================================ -Java HotSpot(TM) 64-Bit Server VM 1.8.0_131-b11 on Mac OS X 10.13.6 -Intel(R) Core(TM) i7-6920HQ CPU @ 2.90GHz +OpenJDK 64-Bit Server VM 1.8.0_191-b12 on Linux 3.10.0-862.3.2.el7.x86_64 +Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz Int and String Scan: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------ -SQL CSV 14097 / 14208 0.7 1344.4 1.0X -SQL Json 8323 / 8365 1.3 793.8 1.7X -SQL Parquet Vectorized 1808 / 1852 5.8 172.4 7.8X -SQL Parquet MR 3401 / 3421 3.1 324.4 4.1X -SQL ORC Vectorized 1872 / 1964 5.6 178.5 7.5X -SQL ORC Vectorized with copy 1973 / 2022 5.3 188.2 7.1X -SQL ORC MR 3066 / 3206 3.4 292.4 4.6X +SQL CSV 21130 / 21246 0.5 2015.1 1.0X +SQL Json 12145 / 12174 0.9 1158.2 1.7X +SQL Parquet Vectorized 2363 / 2377 4.4 225.3 8.9X +SQL Parquet MR 4555 / 4557 2.3 434.4 4.6X +SQL ORC Vectorized 2361 / 2388 4.4 225.1 9.0X +SQL ORC Vectorized with copy 2540 / 2557 4.1 242.2 8.3X +SQL ORC MR 4186 / 4209 2.5 399.2 5.0X ================================================================================================ Repeated String Scan ================================================================================================ -Java HotSpot(TM) 64-Bit Server VM 1.8.0_131-b11 on Mac OS X 10.13.6 -Intel(R) Core(TM) i7-6920HQ CPU @ 2.90GHz +OpenJDK 64-Bit Server VM 1.8.0_191-b12 on Linux 3.10.0-862.3.2.el7.x86_64 +Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz Repeated String: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------ -SQL CSV 8060 / 8354 1.3 768.6 1.0X -SQL Json 4647 / 4664 2.3 443.2 1.7X -SQL Parquet Vectorized 654 / 675 16.0 62.4 12.3X -SQL Parquet MR 1324 / 1337 7.9 126.3 6.1X -SQL ORC Vectorized 397 / 404 26.4 37.9 20.3X -SQL ORC Vectorized with copy 560 / 572 18.7 53.4 14.4X -SQL ORC MR 1437 / 1455 7.3 137.1 5.6X +SQL CSV 11693 / 11729 0.9 1115.1 1.0X +SQL Json 7025 / 7025 1.5 669.9 1.7X +SQL Parquet Vectorized 803 / 821 13.1 76.6 14.6X +SQL Parquet MR 1776 / 1790 5.9 169.4 6.6X +SQL ORC Vectorized 491 / 494 21.4 46.8 23.8X +SQL ORC Vectorized with copy 723 / 725 14.5 68.9 16.2X +SQL ORC MR 2050 / 2063 5.1 195.5 5.7X ================================================================================================ Partitioned Table Scan ================================================================================================ -Java HotSpot(TM) 64-Bit Server VM 1.8.0_131-b11 on Mac OS X 10.13.6 -Intel(R) Core(TM) i7-6920HQ CPU @ 2.90GHz +OpenJDK 64-Bit Server VM 1.8.0_191-b12 on Linux 3.10.0-862.3.2.el7.x86_64 +Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz Partitioned Table: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------ -Data column - CSV 18310 / 18506 0.9 1164.1 1.0X -Data column - Json 8752 / 8762 1.8 556.4 2.1X -Data column - Parquet Vectorized 208 / 218 75.6 13.2 88.0X -Data column - Parquet MR 2384 / 2396 6.6 151.5 7.7X -Data column - ORC Vectorized 282 / 294 55.8 17.9 65.0X -Data column - ORC Vectorized with copy 291 / 300 54.0 18.5 62.9X -Data column - ORC MR 1681 / 1692 9.4 106.9 10.9X -Partition column - CSV 4502 / 4542 3.5 286.2 4.1X -Partition column - Json 3404 / 3415 4.6 216.4 5.4X -Partition column - Parquet Vectorized 70 / 76 225.7 4.4 262.7X -Partition column - Parquet MR 1206 / 1211 13.0 76.7 15.2X -Partition column - ORC Vectorized 70 / 77 225.7 4.4 262.7X -Partition column - ORC Vectorized with copy 68 / 78 230.1 4.3 267.9X -Partition column - ORC MR 1193 / 1195 13.2 75.8 15.4X -Both columns - CSV 17883 / 18028 0.9 1137.0 1.0X -Both columns - Json 9089 / 9100 1.7 577.8 2.0X -Both columns - Parquet Vectorized 235 / 242 67.0 14.9 78.0X -Both columns - Parquet MR 2583 / 2618 6.1 164.2 7.1X -Both columns - ORC Vectorized 327 / 340 48.1 20.8 56.0X -Both column - ORC Vectorized with copy 360 / 374 43.7 22.9 50.9X -Both columns - ORC MR 1875 / 1899 8.4 119.2 9.8X +Data column - CSV 30965 / 31041 0.5 1968.7 1.0X +Data column - Json 12876 / 12882 1.2 818.6 2.4X +Data column - Parquet Vectorized 277 / 282 56.7 17.6 111.6X +Data column - Parquet MR 3398 / 3402 4.6 216.0 9.1X +Data column - ORC Vectorized 399 / 407 39.4 25.4 77.5X +Data column - ORC Vectorized with copy 407 / 447 38.6 25.9 76.0X +Data column - ORC MR 2583 / 2589 6.1 164.2 12.0X +Partition column - CSV 7403 / 7427 2.1 470.7 4.2X +Partition column - Json 5587 / 5625 2.8 355.2 5.5X +Partition column - Parquet Vectorized 71 / 78 222.6 4.5 438.3X +Partition column - Parquet MR 1798 / 1808 8.7 114.3 17.2X +Partition column - ORC Vectorized 72 / 75 219.0 4.6 431.2X +Partition column - ORC Vectorized with copy 71 / 77 221.1 4.5 435.4X +Partition column - ORC MR 1772 / 1778 8.9 112.6 17.5X +Both columns - CSV 30211 / 30212 0.5 1920.7 1.0X +Both columns - Json 13382 / 13391 1.2 850.8 2.3X +Both columns - Parquet Vectorized 321 / 333 49.0 20.4 96.4X +Both columns - Parquet MR 3656 / 3661 4.3 232.4 8.5X +Both columns - ORC Vectorized 443 / 448 35.5 28.2 69.9X +Both column - ORC Vectorized with copy 527 / 533 29.9 33.5 58.8X +Both columns - ORC MR 2626 / 2633 6.0 167.0 11.8X ================================================================================================ String with Nulls Scan ================================================================================================ -Java HotSpot(TM) 64-Bit Server VM 1.8.0_131-b11 on Mac OS X 10.13.6 -Intel(R) Core(TM) i7-6920HQ CPU @ 2.90GHz +OpenJDK 64-Bit Server VM 1.8.0_191-b12 on Linux 3.10.0-862.3.2.el7.x86_64 +Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz String with Nulls Scan (0.0%): Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------ -SQL CSV 9293 / 9393 1.1 886.2 1.0X -SQL Json 6686 / 6702 1.6 637.7 1.4X -SQL Parquet Vectorized 1102 / 1104 9.5 105.1 8.4X -SQL Parquet MR 2954 / 3000 3.5 281.7 3.1X -ParquetReader Vectorized 830 / 839 12.6 79.2 11.2X -SQL ORC Vectorized 861 / 869 12.2 82.2 10.8X -SQL ORC Vectorized with copy 1243 / 1283 8.4 118.5 7.5X -SQL ORC MR 2752 / 2783 3.8 262.4 3.4X - -Java HotSpot(TM) 64-Bit Server VM 1.8.0_131-b11 on Mac OS X 10.13.6 -Intel(R) Core(TM) i7-6920HQ CPU @ 2.90GHz +SQL CSV 13918 / 13979 0.8 1327.3 1.0X +SQL Json 10068 / 10068 1.0 960.1 1.4X +SQL Parquet Vectorized 1563 / 1564 6.7 149.0 8.9X +SQL Parquet MR 3835 / 3836 2.7 365.8 3.6X +ParquetReader Vectorized 1115 / 1118 9.4 106.4 12.5X +SQL ORC Vectorized 1172 / 1208 8.9 111.8 11.9X +SQL ORC Vectorized with copy 1630 / 1644 6.4 155.5 8.5X +SQL ORC MR 3708 / 3711 2.8 353.6 3.8X + +OpenJDK 64-Bit Server VM 1.8.0_191-b12 on Linux 3.10.0-862.3.2.el7.x86_64 +Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz String with Nulls Scan (50.0%): Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------ -SQL CSV 10460 / 10566 1.0 997.5 1.0X -SQL Json 5042 / 5082 2.1 480.8 2.1X -SQL Parquet Vectorized 809 / 820 13.0 77.2 12.9X -SQL Parquet MR 2083 / 2093 5.0 198.7 5.0X -ParquetReader Vectorized 723 / 745 14.5 68.9 14.5X -SQL ORC Vectorized 1018 / 1037 10.3 97.1 10.3X -SQL ORC Vectorized with copy 1301 / 1308 8.1 124.0 8.0X -SQL ORC MR 2542 / 2579 4.1 242.5 4.1X - -Java HotSpot(TM) 64-Bit Server VM 1.8.0_131-b11 on Mac OS X 10.13.6 -Intel(R) Core(TM) i7-6920HQ CPU @ 2.90GHz +SQL CSV 13972 / 14043 0.8 1332.5 1.0X +SQL Json 7436 / 7469 1.4 709.1 1.9X +SQL Parquet Vectorized 1103 / 1112 9.5 105.2 12.7X +SQL Parquet MR 2841 / 2847 3.7 271.0 4.9X +ParquetReader Vectorized 992 / 1012 10.6 94.6 14.1X +SQL ORC Vectorized 1275 / 1349 8.2 121.6 11.0X +SQL ORC Vectorized with copy 1631 / 1644 6.4 155.5 8.6X +SQL ORC MR 3244 / 3259 3.2 309.3 4.3X + +OpenJDK 64-Bit Server VM 1.8.0_191-b12 on Linux 3.10.0-862.3.2.el7.x86_64 +Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz String with Nulls Scan (95.0%): Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------ -SQL CSV 8574 / 8631 1.2 817.6 1.0X -SQL Json 3098 / 3120 3.4 295.5 2.8X -SQL Parquet Vectorized 185 / 190 56.7 17.6 46.3X -SQL Parquet MR 1263 / 1286 8.3 120.4 6.8X -ParquetReader Vectorized 167 / 173 62.8 15.9 51.3X -SQL ORC Vectorized 333 / 336 31.5 31.7 25.8X -SQL ORC Vectorized with copy 410 / 416 25.6 39.1 20.9X -SQL ORC MR 1215 / 1222 8.6 115.9 7.1X +SQL CSV 11228 / 11244 0.9 1070.8 1.0X +SQL Json 5200 / 5247 2.0 495.9 2.2X +SQL Parquet Vectorized 238 / 242 44.1 22.7 47.2X +SQL Parquet MR 1730 / 1734 6.1 165.0 6.5X +ParquetReader Vectorized 237 / 238 44.3 22.6 47.4X +SQL ORC Vectorized 459 / 462 22.8 43.8 24.4X +SQL ORC Vectorized with copy 581 / 583 18.1 55.4 19.3X +SQL ORC MR 1767 / 1783 5.9 168.5 6.4X ================================================================================================ Single Column Scan From Wide Columns ================================================================================================ -Java HotSpot(TM) 64-Bit Server VM 1.8.0_131-b11 on Mac OS X 10.13.6 -Intel(R) Core(TM) i7-6920HQ CPU @ 2.90GHz +OpenJDK 64-Bit Server VM 1.8.0_191-b12 on Linux 3.10.0-862.3.2.el7.x86_64 +Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz Single Column Scan from 10 columns: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------ -SQL CSV 2031 / 2053 0.5 1936.5 1.0X -SQL Json 1737 / 1740 0.6 1656.4 1.2X -SQL Parquet Vectorized 45 / 54 23.4 42.7 45.3X -SQL Parquet MR 166 / 174 6.3 158.1 12.2X -SQL ORC Vectorized 56 / 59 18.7 53.6 36.1X -SQL ORC Vectorized with copy 54 / 68 19.4 51.7 37.5X -SQL ORC MR 239 / 252 4.4 228.3 8.5X - -Java HotSpot(TM) 64-Bit Server VM 1.8.0_131-b11 on Mac OS X 10.13.6 -Intel(R) Core(TM) i7-6920HQ CPU @ 2.90GHz +SQL CSV 3322 / 3356 0.3 3167.9 1.0X +SQL Json 2808 / 2843 0.4 2678.2 1.2X +SQL Parquet Vectorized 56 / 63 18.9 52.9 59.8X +SQL Parquet MR 215 / 219 4.9 205.4 15.4X +SQL ORC Vectorized 64 / 76 16.4 60.9 52.0X +SQL ORC Vectorized with copy 64 / 67 16.3 61.3 51.7X +SQL ORC MR 314 / 316 3.3 299.6 10.6X + +OpenJDK 64-Bit Server VM 1.8.0_191-b12 on Linux 3.10.0-862.3.2.el7.x86_64 +Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz Single Column Scan from 50 columns: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------ -SQL CSV 5169 / 5238 0.2 4929.3 1.0X -SQL Json 6279 / 6494 0.2 5987.8 0.8X -SQL Parquet Vectorized 102 / 110 10.2 97.6 50.5X -SQL Parquet MR 202 / 218 5.2 192.7 25.6X -SQL ORC Vectorized 113 / 123 9.3 107.3 45.9X -SQL ORC Vectorized with copy 119 / 127 8.8 113.1 43.6X -SQL ORC MR 750 / 772 1.4 715.5 6.9X - -Java HotSpot(TM) 64-Bit Server VM 1.8.0_131-b11 on Mac OS X 10.13.6 -Intel(R) Core(TM) i7-6920HQ CPU @ 2.90GHz +SQL CSV 7978 / 7989 0.1 7608.5 1.0X +SQL Json 10294 / 10325 0.1 9816.9 0.8X +SQL Parquet Vectorized 72 / 85 14.5 69.0 110.3X +SQL Parquet MR 237 / 241 4.4 226.4 33.6X +SQL ORC Vectorized 82 / 92 12.7 78.5 97.0X +SQL ORC Vectorized with copy 82 / 88 12.7 78.5 97.0X +SQL ORC MR 900 / 909 1.2 858.5 8.9X + +OpenJDK 64-Bit Server VM 1.8.0_191-b12 on Linux 3.10.0-862.3.2.el7.x86_64 +Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz Single Column Scan from 100 columns: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------ -SQL CSV 8212 / 8480 0.1 7831.4 1.0X -SQL Json 11582 / 11690 0.1 11045.3 0.7X -SQL Parquet Vectorized 171 / 182 6.1 163.0 48.0X -SQL Parquet MR 258 / 265 4.1 245.8 31.9X -SQL ORC Vectorized 178 / 191 5.9 169.8 46.1X -SQL ORC Vectorized with copy 181 / 187 5.8 172.6 45.4X -SQL ORC MR 1342 / 1350 0.8 1279.9 6.1X +SQL CSV 13489 / 13508 0.1 12864.3 1.0X +SQL Json 18813 / 18827 0.1 17941.4 0.7X +SQL Parquet Vectorized 107 / 111 9.8 101.8 126.3X +SQL Parquet MR 275 / 286 3.8 262.3 49.0X +SQL ORC Vectorized 107 / 115 9.8 101.7 126.4X +SQL ORC Vectorized with copy 107 / 115 9.8 102.3 125.8X +SQL ORC MR 1659 / 1664 0.6 1582.3 8.1X diff --git a/sql/hive/benchmarks/OrcReadBenchmark-results.txt b/sql/hive/benchmarks/OrcReadBenchmark-results.txt index 7aa96efd73b58..80c2f5e93405a 100644 --- a/sql/hive/benchmarks/OrcReadBenchmark-results.txt +++ b/sql/hive/benchmarks/OrcReadBenchmark-results.txt @@ -2,172 +2,172 @@ SQL Single Numeric Column Scan ================================================================================================ -Java HotSpot(TM) 64-Bit Server VM 1.8.0_131-b11 on Mac OS X 10.13.6 -Intel(R) Core(TM) i7-6920HQ CPU @ 2.90GHz +OpenJDK 64-Bit Server VM 1.8.0_191-b12 on Linux 3.10.0-862.3.2.el7.x86_64 +Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz SQL Single TINYINT Column Scan: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------ -Native ORC MR 1135 / 1192 13.9 72.2 1.0X -Native ORC Vectorized 159 / 191 99.1 10.1 7.2X -Native ORC Vectorized with copy 132 / 140 119.4 8.4 8.6X -Hive built-in ORC 1344 / 1348 11.7 85.4 0.8X +Native ORC MR 1725 / 1759 9.1 109.7 1.0X +Native ORC Vectorized 272 / 316 57.8 17.3 6.3X +Native ORC Vectorized with copy 239 / 254 65.7 15.2 7.2X +Hive built-in ORC 1970 / 1987 8.0 125.3 0.9X -Java HotSpot(TM) 64-Bit Server VM 1.8.0_131-b11 on Mac OS X 10.13.6 -Intel(R) Core(TM) i7-6920HQ CPU @ 2.90GHz +OpenJDK 64-Bit Server VM 1.8.0_191-b12 on Linux 3.10.0-862.3.2.el7.x86_64 +Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz SQL Single SMALLINT Column Scan: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------ -Native ORC MR 1149 / 1208 13.7 73.0 1.0X -Native ORC Vectorized 191 / 203 82.5 12.1 6.0X -Native ORC Vectorized with copy 190 / 206 83.0 12.1 6.1X -Hive built-in ORC 1572 / 1615 10.0 100.0 0.7X +Native ORC MR 1633 / 1672 9.6 103.8 1.0X +Native ORC Vectorized 238 / 255 66.0 15.1 6.9X +Native ORC Vectorized with copy 235 / 253 66.8 15.0 6.9X +Hive built-in ORC 2293 / 2305 6.9 145.8 0.7X -Java HotSpot(TM) 64-Bit Server VM 1.8.0_131-b11 on Mac OS X 10.13.6 -Intel(R) Core(TM) i7-6920HQ CPU @ 2.90GHz +OpenJDK 64-Bit Server VM 1.8.0_191-b12 on Linux 3.10.0-862.3.2.el7.x86_64 +Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz SQL Single INT Column Scan: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------ -Native ORC MR 1397 / 1416 11.3 88.8 1.0X -Native ORC Vectorized 238 / 245 66.0 15.2 5.9X -Native ORC Vectorized with copy 241 / 254 65.3 15.3 5.8X -Hive built-in ORC 1843 / 1915 8.5 117.2 0.8X +Native ORC MR 1677 / 1699 9.4 106.6 1.0X +Native ORC Vectorized 325 / 342 48.3 20.7 5.2X +Native ORC Vectorized with copy 328 / 341 47.9 20.9 5.1X +Hive built-in ORC 2561 / 2569 6.1 162.8 0.7X -Java HotSpot(TM) 64-Bit Server VM 1.8.0_131-b11 on Mac OS X 10.13.6 -Intel(R) Core(TM) i7-6920HQ CPU @ 2.90GHz +OpenJDK 64-Bit Server VM 1.8.0_191-b12 on Linux 3.10.0-862.3.2.el7.x86_64 +Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz SQL Single BIGINT Column Scan: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------ -Native ORC MR 1350 / 1383 11.7 85.8 1.0X -Native ORC Vectorized 300 / 305 52.4 19.1 4.5X -Native ORC Vectorized with copy 318 / 334 49.5 20.2 4.2X -Hive built-in ORC 1887 / 1916 8.3 120.0 0.7X +Native ORC MR 1791 / 1795 8.8 113.9 1.0X +Native ORC Vectorized 400 / 408 39.3 25.4 4.5X +Native ORC Vectorized with copy 410 / 417 38.4 26.1 4.4X +Hive built-in ORC 2713 / 2720 5.8 172.5 0.7X -Java HotSpot(TM) 64-Bit Server VM 1.8.0_131-b11 on Mac OS X 10.13.6 -Intel(R) Core(TM) i7-6920HQ CPU @ 2.90GHz +OpenJDK 64-Bit Server VM 1.8.0_191-b12 on Linux 3.10.0-862.3.2.el7.x86_64 +Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz SQL Single FLOAT Column Scan: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------ -Native ORC MR 1382 / 1419 11.4 87.9 1.0X -Native ORC Vectorized 351 / 366 44.8 22.3 3.9X -Native ORC Vectorized with copy 361 / 368 43.6 22.9 3.8X -Hive built-in ORC 1898 / 1950 8.3 120.7 0.7X +Native ORC MR 1791 / 1805 8.8 113.8 1.0X +Native ORC Vectorized 433 / 438 36.3 27.5 4.1X +Native ORC Vectorized with copy 441 / 447 35.7 28.0 4.1X +Hive built-in ORC 2690 / 2803 5.8 171.0 0.7X -Java HotSpot(TM) 64-Bit Server VM 1.8.0_131-b11 on Mac OS X 10.13.6 -Intel(R) Core(TM) i7-6920HQ CPU @ 2.90GHz +OpenJDK 64-Bit Server VM 1.8.0_191-b12 on Linux 3.10.0-862.3.2.el7.x86_64 +Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz SQL Single DOUBLE Column Scan: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------ -Native ORC MR 1510 / 1804 10.4 96.0 1.0X -Native ORC Vectorized 467 / 484 33.7 29.7 3.2X -Native ORC Vectorized with copy 465 / 490 33.8 29.6 3.2X -Hive built-in ORC 2075 / 2111 7.6 131.9 0.7X +Native ORC MR 1911 / 1930 8.2 121.5 1.0X +Native ORC Vectorized 543 / 552 29.0 34.5 3.5X +Native ORC Vectorized with copy 547 / 555 28.8 34.8 3.5X +Hive built-in ORC 2967 / 3065 5.3 188.6 0.6X ================================================================================================ Int and String Scan ================================================================================================ -Java HotSpot(TM) 64-Bit Server VM 1.8.0_131-b11 on Mac OS X 10.13.6 -Intel(R) Core(TM) i7-6920HQ CPU @ 2.90GHz +OpenJDK 64-Bit Server VM 1.8.0_191-b12 on Linux 3.10.0-862.3.2.el7.x86_64 +Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz Int and String Scan: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------ -Native ORC MR 3596 / 3680 2.9 343.0 1.0X -Native ORC Vectorized 2136 / 2397 4.9 203.7 1.7X -Native ORC Vectorized with copy 2388 / 2422 4.4 227.8 1.5X -Hive built-in ORC 4304 / 4336 2.4 410.5 0.8X +Native ORC MR 4160 / 4188 2.5 396.7 1.0X +Native ORC Vectorized 2405 / 2406 4.4 229.4 1.7X +Native ORC Vectorized with copy 2588 / 2592 4.1 246.8 1.6X +Hive built-in ORC 5514 / 5562 1.9 525.9 0.8X ================================================================================================ Partitioned Table Scan ================================================================================================ -Java HotSpot(TM) 64-Bit Server VM 1.8.0_131-b11 on Mac OS X 10.13.6 -Intel(R) Core(TM) i7-6920HQ CPU @ 2.90GHz +OpenJDK 64-Bit Server VM 1.8.0_191-b12 on Linux 3.10.0-862.3.2.el7.x86_64 +Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz Partitioned Table: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------ -Data column - Native ORC MR 1473 / 1483 10.7 93.6 1.0X -Data column - Native ORC Vectorized 309 / 325 50.9 19.6 4.8X -Data column - Native ORC Vectorized with copy 322 / 368 48.9 20.5 4.6X -Data column - Hive built-in ORC 2061 / 2084 7.6 131.0 0.7X -Partition column - Native ORC MR 1000 / 1018 15.7 63.6 1.5X -Partition column - Native ORC Vectorized 81 / 88 193.2 5.2 18.1X -Partition column - Native ORC Vectorized with copy 80 / 86 196.5 5.1 18.4X -Partition column - Hive built-in ORC 1212 / 1230 13.0 77.1 1.2X -Both columns - Native ORC MR 1496 / 1528 10.5 95.1 1.0X -Both columns - Native ORC Vectorized 359 / 378 43.8 22.8 4.1X -Both column - Native ORC Vectorized with copy 412 / 442 38.2 26.2 3.6X -Both columns - Hive built-in ORC 2220 / 2224 7.1 141.1 0.7X +Data column - Native ORC MR 1863 / 1867 8.4 118.4 1.0X +Data column - Native ORC Vectorized 411 / 418 38.2 26.2 4.5X +Data column - Native ORC Vectorized with copy 417 / 422 37.8 26.5 4.5X +Data column - Hive built-in ORC 3297 / 3308 4.8 209.6 0.6X +Partition column - Native ORC MR 1505 / 1506 10.4 95.7 1.2X +Partition column - Native ORC Vectorized 80 / 93 195.6 5.1 23.2X +Partition column - Native ORC Vectorized with copy 78 / 86 201.4 5.0 23.9X +Partition column - Hive built-in ORC 1960 / 1979 8.0 124.6 1.0X +Both columns - Native ORC MR 2076 / 2090 7.6 132.0 0.9X +Both columns - Native ORC Vectorized 450 / 463 34.9 28.6 4.1X +Both column - Native ORC Vectorized with copy 532 / 538 29.6 33.8 3.5X +Both columns - Hive built-in ORC 3528 / 3548 4.5 224.3 0.5X ================================================================================================ Repeated String Scan ================================================================================================ -Java HotSpot(TM) 64-Bit Server VM 1.8.0_131-b11 on Mac OS X 10.13.6 -Intel(R) Core(TM) i7-6920HQ CPU @ 2.90GHz +OpenJDK 64-Bit Server VM 1.8.0_191-b12 on Linux 3.10.0-862.3.2.el7.x86_64 +Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz Repeated String: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------ -Native ORC MR 1311 / 1319 8.0 125.0 1.0X -Native ORC Vectorized 286 / 375 36.7 27.3 4.6X -Native ORC Vectorized with copy 445 / 456 23.6 42.4 2.9X -Hive built-in ORC 1935 / 1968 5.4 184.6 0.7X +Native ORC MR 1727 / 1733 6.1 164.7 1.0X +Native ORC Vectorized 375 / 379 28.0 35.7 4.6X +Native ORC Vectorized with copy 552 / 556 19.0 52.6 3.1X +Hive built-in ORC 2665 / 2666 3.9 254.2 0.6X ================================================================================================ String with Nulls Scan ================================================================================================ -Java HotSpot(TM) 64-Bit Server VM 1.8.0_131-b11 on Mac OS X 10.13.6 -Intel(R) Core(TM) i7-6920HQ CPU @ 2.90GHz +OpenJDK 64-Bit Server VM 1.8.0_191-b12 on Linux 3.10.0-862.3.2.el7.x86_64 +Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz String with Nulls Scan (0.0%): Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------ -Native ORC MR 2599 / 2701 4.0 247.8 1.0X -Native ORC Vectorized 818 / 846 12.8 78.0 3.2X -Native ORC Vectorized with copy 1084 / 1149 9.7 103.4 2.4X -Hive built-in ORC 3807 / 3885 2.8 363.1 0.7X +Native ORC MR 3324 / 3325 3.2 317.0 1.0X +Native ORC Vectorized 1085 / 1106 9.7 103.4 3.1X +Native ORC Vectorized with copy 1463 / 1471 7.2 139.5 2.3X +Hive built-in ORC 5272 / 5299 2.0 502.8 0.6X -Java HotSpot(TM) 64-Bit Server VM 1.8.0_131-b11 on Mac OS X 10.13.6 -Intel(R) Core(TM) i7-6920HQ CPU @ 2.90GHz +OpenJDK 64-Bit Server VM 1.8.0_191-b12 on Linux 3.10.0-862.3.2.el7.x86_64 +Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz String with Nulls Scan (50.0%): Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------ -Native ORC MR 2430 / 2808 4.3 231.7 1.0X -Native ORC Vectorized 2016 / 2508 5.2 192.3 1.2X -Native ORC Vectorized with copy 1268 / 1272 8.3 121.0 1.9X -Hive built-in ORC 3016 / 3030 3.5 287.7 0.8X +Native ORC MR 3045 / 3046 3.4 290.4 1.0X +Native ORC Vectorized 1248 / 1260 8.4 119.0 2.4X +Native ORC Vectorized with copy 1609 / 1624 6.5 153.5 1.9X +Hive built-in ORC 3989 / 3999 2.6 380.4 0.8X -Java HotSpot(TM) 64-Bit Server VM 1.8.0_131-b11 on Mac OS X 10.13.6 -Intel(R) Core(TM) i7-6920HQ CPU @ 2.90GHz +OpenJDK 64-Bit Server VM 1.8.0_191-b12 on Linux 3.10.0-862.3.2.el7.x86_64 +Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz String with Nulls Scan (95.0%): Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------ -Native ORC MR 1216 / 1228 8.6 116.0 1.0X -Native ORC Vectorized 361 / 368 29.1 34.4 3.4X -Native ORC Vectorized with copy 445 / 459 23.6 42.4 2.7X -Hive built-in ORC 1554 / 1574 6.7 148.2 0.8X +Native ORC MR 1692 / 1694 6.2 161.3 1.0X +Native ORC Vectorized 471 / 493 22.3 44.9 3.6X +Native ORC Vectorized with copy 588 / 590 17.8 56.1 2.9X +Hive built-in ORC 2398 / 2411 4.4 228.7 0.7X ================================================================================================ Single Column Scan From Wide Columns ================================================================================================ -Java HotSpot(TM) 64-Bit Server VM 1.8.0_131-b11 on Mac OS X 10.13.6 -Intel(R) Core(TM) i7-6920HQ CPU @ 2.90GHz +OpenJDK 64-Bit Server VM 1.8.0_191-b12 on Linux 3.10.0-862.3.2.el7.x86_64 +Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz Single Column Scan from 100 columns: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------ -Native ORC MR 1098 / 1100 1.0 1047.5 1.0X -Native ORC Vectorized 197 / 212 5.3 187.6 5.6X -Native ORC Vectorized with copy 188 / 200 5.6 178.9 5.9X -Hive built-in ORC 409 / 417 2.6 390.4 2.7X +Native ORC MR 1371 / 1379 0.8 1307.5 1.0X +Native ORC Vectorized 121 / 135 8.6 115.8 11.3X +Native ORC Vectorized with copy 122 / 138 8.6 116.2 11.3X +Hive built-in ORC 521 / 561 2.0 497.1 2.6X -Java HotSpot(TM) 64-Bit Server VM 1.8.0_131-b11 on Mac OS X 10.13.6 -Intel(R) Core(TM) i7-6920HQ CPU @ 2.90GHz +OpenJDK 64-Bit Server VM 1.8.0_191-b12 on Linux 3.10.0-862.3.2.el7.x86_64 +Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz Single Column Scan from 200 columns: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------ -Native ORC MR 2251 / 2722 0.5 2147.0 1.0X -Native ORC Vectorized 343 / 351 3.1 326.7 6.6X -Native ORC Vectorized with copy 350 / 369 3.0 334.2 6.4X -Hive built-in ORC 632 / 714 1.7 602.7 3.6X +Native ORC MR 2711 / 2767 0.4 2585.5 1.0X +Native ORC Vectorized 210 / 232 5.0 200.5 12.9X +Native ORC Vectorized with copy 208 / 219 5.0 198.4 13.0X +Hive built-in ORC 764 / 775 1.4 728.3 3.5X -Java HotSpot(TM) 64-Bit Server VM 1.8.0_131-b11 on Mac OS X 10.13.6 -Intel(R) Core(TM) i7-6920HQ CPU @ 2.90GHz +OpenJDK 64-Bit Server VM 1.8.0_191-b12 on Linux 3.10.0-862.3.2.el7.x86_64 +Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz Single Column Scan from 300 columns: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------ -Native ORC MR 3643 / 3936 0.3 3474.4 1.0X -Native ORC Vectorized 550 / 572 1.9 524.1 6.6X -Native ORC Vectorized with copy 536 / 547 2.0 511.6 6.8X -Hive built-in ORC 950 / 1003 1.1 906.5 3.8X +Native ORC MR 3979 / 3988 0.3 3794.4 1.0X +Native ORC Vectorized 357 / 366 2.9 340.2 11.2X +Native ORC Vectorized with copy 361 / 371 2.9 344.5 11.0X +Hive built-in ORC 1091 / 1095 1.0 1040.5 3.6X