Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ARROW-6549: [C++] Switch to jemalloc 5.2.x #5365

Closed
wants to merge 1 commit into from

Conversation

pitrou
Copy link
Member

@pitrou pitrou commented Sep 12, 2019

Revert "ARROW-6478: [C++] Revert to jemalloc stable-4 until we understand 5.2.x performance issues"

This reverts commit 53c5af0.

In addition, configure jemalloc to fix the performance regression.

@pitrou
Copy link
Member Author

pitrou commented Sep 12, 2019

@ursabot benchmark

@pitrou
Copy link
Member Author

pitrou commented Sep 12, 2019

Here is a macro-benchmark of reading a 186 MB CSV file using Python (pa.csv.read_csv):

  • on git master: 185 MB/s single-thread, ~1.2 GB/s multi-thread
  • on this PR: 185 MB/s single-thread, ~1.5 GB/s multi-thread

So it seems that jemalloc 5.2 may bring significant multi-thread allocation improvement. @xhochy @wesm

@pitrou
Copy link
Member Author

pitrou commented Sep 12, 2019

Results for arrow-io-memory-benchmark:

  • on git master:
BufferOutputStreamTinyWrites/real_time    31270771 ns     31251671 ns           90 bytes_per_second=1023.32M/s
BufferOutputStreamSmallWrites/real_time    5746235 ns      5740631 ns          483 bytes_per_second=5.43824G/s
BufferOutputStreamLargeWrites/real_time    2440732 ns      2439994 ns         1158 bytes_per_second=12.592G/s
  • on this PR:
BufferOutputStreamTinyWrites/real_time    32343651 ns     32322905 ns           87 bytes_per_second=989.375M/s
BufferOutputStreamSmallWrites/real_time    5669671 ns      5664962 ns          443 bytes_per_second=5.51168G/s
BufferOutputStreamLargeWrites/real_time    2727681 ns      2725712 ns         1034 bytes_per_second=11.2673G/s

(on a Ryzen 7 1700, Ubuntu 18.04, gcc 7.4.0)

@pitrou
Copy link
Member Author

pitrou commented Sep 12, 2019

Results for arrow-builder-benchmark:

  • on git master:
BufferBuilderTinyWrites/real_time    282421340 ns    282260043 ns           10 bytes_per_second=906.447M/s
BufferBuilderSmallWrites/real_time    49293257 ns     49251628 ns           57 bytes_per_second=5.07168G/s
BufferBuilderLargeWrites/real_time    23227831 ns     23207431 ns          120 bytes_per_second=10.7054G/s
BuildBooleanArrayNoNulls              41054882 ns     41012295 ns           68 bytes_per_second=6.09573G/s
BuildIntArrayNoNulls                  39323008 ns     39289765 ns           71 bytes_per_second=6.36298G/s
BuildAdaptiveIntNoNulls               22392261 ns     22373033 ns          125 bytes_per_second=11.1742G/s
BuildAdaptiveIntNoNullsScalarAppend  248611808 ns    248532755 ns           11 bytes_per_second=1030.05M/s
BuildBinaryArray                     450032496 ns    449743091 ns            6 bytes_per_second=569.214M/s
BuildChunkedBinaryArray              479797052 ns    479506171 ns            6 bytes_per_second=533.883M/s
BuildFixedSizeBinaryArray            359771413 ns    359588868 ns            8 bytes_per_second=711.924M/s
BuildDecimalArray                    595007714 ns    594692229 ns            5 bytes_per_second=860.95M/s
BuildInt64DictionaryArrayRandom      548333764 ns    548164626 ns            5 bytes_per_second=467.013M/s
BuildInt64DictionaryArraySequential  535873799 ns    535700857 ns            5 bytes_per_second=477.879M/s
BuildInt64DictionaryArraySimilar     783811124 ns    783587918 ns            4 bytes_per_second=326.702M/s
BuildStringDictionaryArray          1502769540 ns   1502217577 ns            2 bytes_per_second=227.354M/s
  • on this PR:
BufferBuilderTinyWrites/real_time    284150694 ns    283987717 ns           10 bytes_per_second=900.93M/s
BufferBuilderSmallWrites/real_time    69486922 ns     69436686 ns           40 bytes_per_second=3.5978G/s
BufferBuilderLargeWrites/real_time    34238647 ns     34216762 ns           67 bytes_per_second=7.26264G/s
BuildBooleanArrayNoNulls              36903464 ns     36868775 ns           79 bytes_per_second=6.78081G/s
BuildIntArrayNoNulls                  33506687 ns     33472413 ns          100 bytes_per_second=7.46884G/s
BuildAdaptiveIntNoNulls               23188601 ns     23137237 ns          124 bytes_per_second=10.8051G/s
BuildAdaptiveIntNoNullsScalarAppend  246078319 ns    246003241 ns           11 bytes_per_second=1040.64M/s
BuildBinaryArray                     451653119 ns    450531798 ns            7 bytes_per_second=568.217M/s
BuildChunkedBinaryArray              469191614 ns    468881331 ns            5 bytes_per_second=545.98M/s
BuildFixedSizeBinaryArray            356145125 ns    355958260 ns            6 bytes_per_second=719.185M/s
BuildDecimalArray                    571190581 ns    570872225 ns            4 bytes_per_second=896.873M/s
BuildInt64DictionaryArrayRandom      551707149 ns    551543161 ns            5 bytes_per_second=464.152M/s
BuildInt64DictionaryArraySequential  537975942 ns    537820247 ns            5 bytes_per_second=475.995M/s
BuildInt64DictionaryArraySimilar     796073660 ns    795415522 ns            4 bytes_per_second=321.844M/s
BuildStringDictionaryArray          1512956424 ns   1512414952 ns            2 bytes_per_second=225.821M/s

@ursabot
Copy link

ursabot commented Sep 12, 2019

AMD64 Ubuntu 18.04 C++ Benchmark (#61467) builder has been succeeded.

Revision: e845e32

  ======================================================  ================  ================  ============
  benchmark                                                       baseline         contender        change
  ======================================================  ================  ================  ============
  VisitBits/8192                                               1.07781e+08       1.07379e+08  -0.00372867
  BitmapWriter/8192                                            7.15675e+07       7.14366e+07  -0.00182912
  FirstTimeBitmapWriter/8192                                   1.03104e+08       1.0292e+08   -0.00178664
  CopyBitmapWithOffset/8192                                    5.98019e+08       5.96996e+08  -0.00171053
  BitmapReader/8192                                            1.10054e+08       1.09855e+08  -0.00181152
  GenerateBits/8192                                            9.29735e+07       9.26924e+07  -0.00302338
  GenerateBitsUnrolled/8192                                    1.39046e+08       1.414e+08     0.0169262
  CopyBitmapWithoutOffset/8192                                 6.6857e+10        6.67039e+10  -0.00229046
  VisitBitsUnrolled/8192                                       2.99562e+08       2.99039e+08  -0.00174737
  TypeEqualsWithMetadata                                       4.21107e+07       4.25519e+07   0.0104772
  SchemaEqualsWithMetadata                                     3.43963e+07       3.43837e+07  -0.000367477
  SchemaEquals                                                 3.88801e+07       3.88826e+07   6.51288e-05
  TypeEqualsComplex                                            4.8829e+07        4.87996e+07  -0.000601411
  TypeEqualsSimple                                             7.30582e+07       7.29571e+07  -0.00138386
  ParallelMemoryCopy/threads:32/real_time                      2.37693e+10       2.3736e+10   -0.0014002
  ParallelMemoryCopy/threads:1/real_time                       7.54748e+09       7.52592e+09  -0.0028566
  ParallelMemoryCopy/threads:40/real_time                      2.33125e+10       2.3135e+10   -0.00761385
  BufferOutputStreamSmallWrites/real_time                      1.17461e+10       1.33316e+10   0.13498
  ParallelMemoryCopy/threads:8/real_time                       2.50697e+10       2.50092e+10  -0.00241529
  BufferOutputStreamTinyWrites/real_time                       4.22158e+08       4.21612e+08  -0.00129325
- ParallelMemoryCopy/threads:4/real_time                       2.4153e+10        2.20447e+10  -0.0872892
  BufferOutputStreamLargeWrites/real_time                      1.32615e+10       1.31972e+10  -0.00484125
  ParallelMemoryCopy/threads:2/real_time                       1.04534e+10       1.08204e+10   0.0351113
  ParallelMemoryCopy/threads:16/real_time                      2.42289e+10       2.43053e+10   0.00315518
  UniqueInt64WithNulls/4194304/10240                           1.30154e+09       1.3505e+09    0.0376156
  BuildStringDictionary                                        5.22615e+07       5.35353e+07   0.0243742
  UniqueInt64WithNulls/4194304/1024                            2.02566e+09       2.06992e+09   0.0218493
  UniqueString100bytes/4194304/1024                            2.10611e+09       2.11604e+09   0.0047149
  UniqueString10bytes/4194304/1024                             5.30424e+08       5.31164e+08   0.0013947
  UniqueUInt8NoNulls/4194304/200                               1.23275e+09       1.23554e+09   0.00226235
  UniqueInt64NoNulls/4194304/10240                             1.68838e+09       1.75714e+09   0.0407267
  UniqueUInt8WithNulls/4194304/200                             4.73741e+08       4.74357e+08   0.00130097
  BuildDictionary                                              8.67283e+08       8.68186e+08   0.00104068
  UniqueInt64NoNulls/4194304/1024                              3.04604e+09       2.93618e+09  -0.0360649
  UniqueString100bytes/4194304/10240                           1.19991e+09       1.24525e+09   0.0377842
  UniqueString10bytes/4194304/10240                            2.99307e+08       3.07488e+08   0.0273317
  BufferedOutputStreamLargeWritesToPipe/real_time              2.37666e+09       2.38009e+09   0.00144467
  BufferedOutputStreamSmallWritesToNull/real_time              1.13139e+09       1.13283e+09   0.00127495
  FileOutputStreamSmallWritesToNull/real_time                  6.33922e+07       6.32965e+07  -0.00151039
  BufferedOutputStreamSmallWritesToPipe/real_time              7.52551e+08       7.42311e+08  -0.013607
  FileOutputStreamSmallWritesToPipe/real_time                  3.72528e+07       3.73065e+07   0.00144254
  FileOutputStreamLargeWritesToPipe/real_time                  2.39946e+09       2.34043e+09  -0.0245978
  TakeInt64/1048576/1/min_time:1.000                           4.74425e+08       4.90142e+08   0.0331295
  TakeInt64/32768/1/min_time:1.000                             6.04803e+08       6.06433e+08   0.00269395
  TakeInt64VsFilter/32768/1/min_time:1.000                     2.2238e+09        2.22454e+09   0.000332469
  TakeString/32768/0/min_time:1.000                            1.88563e+09       1.92809e+09   0.022514
  TakeInt64VsFilter/1048576/1/min_time:1.000                   2.36772e+09       2.37462e+09   0.0029175
  TakeString/8388608/1/min_time:1.000                          1.14393e+09       1.17467e+09   0.0268774
  TakeFixedSizeList1Int64/32768/0/min_time:1.000               1.72796e+08       1.73326e+08   0.00306883
  TakeFixedSizeList1Int64/32768/1/min_time:1.000               1.65906e+08       1.6658e+08    0.00406374
  TakeString/32768/10/min_time:1.000                           1.72581e+09       1.75257e+09   0.0155097
  TakeInt64/32768/50/min_time:1.000                            3.76121e+08       3.6981e+08   -0.0167782
  TakeInt64VsFilter/32768/10/min_time:1.000                    1.4778e+09        1.48427e+09   0.00437996
  TakeInt64VsFilter/8388608/1/min_time:1.000                   2.36672e+09       2.37386e+09   0.00301372
  TakeString/1048576/1/min_time:1.000                          1.34072e+09       1.39017e+09   0.0368845
  TakeFixedSizeList1Int64/32768/10/min_time:1.000              1.63546e+08       1.64102e+08   0.00340082
  TakeInt64/8388608/1/min_time:1.000                           4.18828e+08       4.32568e+08   0.0328068
  TakeInt64VsFilter/32768/0/min_time:1.000                     2.43805e+09       2.44576e+09   0.00315919
  TakeFixedSizeList1Int64/1048576/1/min_time:1.000             1.26298e+08       1.28979e+08   0.0212284
  TakeInt64/32768/10/min_time:1.000                            5.19794e+08       5.20704e+08   0.00175072
  TakeString/32768/50/min_time:1.000                           1.12374e+09       1.14455e+09   0.018524
  TakeString/32768/1/min_time:1.000                            1.62253e+09       1.63965e+09   0.0105544
  TakeFixedSizeList1Int64/32768/50/min_time:1.000              1.48295e+08       1.48739e+08   0.00299176
  TakeInt64VsFilter/32768/50/min_time:1.000                    8.16256e+08       8.1189e+08   -0.00534934
  TakeFixedSizeList1Int64/8388608/1/min_time:1.000             1.17624e+08       1.20053e+08   0.0206492
  TakeInt64/32768/0/min_time:1.000                             6.29186e+08       6.30629e+08   0.0022947
  TimestampParsing<TimeUnit::SECOND>                           5.57023e+07       5.57807e+07   0.00140837
  FloatParsing<FloatType>                                      9.0668e+06        9.07858e+06   0.00129912
  IntegerParsing<UInt16Type>                                   3.04247e+08       3.05397e+08   0.00378227
  TimestampParsing<TimeUnit::NANO>                             5.37585e+07       5.38465e+07   0.00163671
  IntegerParsing<Int16Type>                                    2.36003e+08       2.36221e+08   0.000924136
  IntegerParsing<UInt32Type>                                   3.19756e+08       3.2081e+08    0.00329493
  IntegerParsing<Int8Type>                                     2.46409e+08       2.46764e+08   0.00144044
  IntegerParsing<Int32Type>                                    1.83964e+08       1.84267e+08   0.00164543
  IntegerParsing<UInt8Type>                                    4.46707e+08       4.47722e+08   0.00227308
  TimestampParsing<TimeUnit::MILLI>                            5.42119e+07       5.42678e+07   0.00102983
  FloatParsing<DoubleType>                                     2.00043e+07       2.00435e+07   0.00195585
  IntegerParsing<UInt64Type>                                   2.34148e+08       2.34361e+08   0.000911207
  IntegerParsing<Int64Type>                                    1.4559e+08        1.4606e+08    0.00322494
  TimestampParsing<TimeUnit::MICRO>                            5.34906e+07       5.35633e+07   0.0013585
  TrieLookupFound                                              1.14618e+08       1.16171e+08   0.0135445
- TrieLookupNotFound                                           2.83641e+08       2.28253e+08  -0.195275
  HashIntegers                                                 8.81392e+09       8.81386e+09  -7.27231e-06
  HashLargeStrings                                             1.20018e+10       1.18807e+10  -0.0100904
  HashSmallStrings                                             2.48089e+09       2.47653e+09  -0.00175449
  HashMediumStrings                                            6.35033e+09       6.34297e+09  -0.00115959
  ValidateLargeAscii                                           1.8748e+10        1.87727e+10   0.00132052
  ValidateSmallAlmostAscii                                     3.15405e+09       3.15462e+09   0.000179343
  ValidateLargeNonAscii                                        1.5895e+09        1.58973e+09   0.000145659
- ValidateSmallNonAscii                                        1.49769e+09       1.3436e+09   -0.102882
  ValidateTinyNonAscii                                         1.32174e+09       1.32166e+09  -6.04663e-05
  ValidateSmallAscii                                           8.20921e+09       8.20961e+09   4.89301e-05
  ValidateLargeAlmostAscii                                     3.1981e+09        3.19835e+09   8.09469e-05
  ValidateTinyAscii                                            3.85215e+09       3.84377e+09  -0.0021753
  CompareArrayArrayKernel/32768/10                             2.05349e+10       2.0379e+10   -0.00758768
- CompareArrayScalarKernel/32768/10                            1.26337e+10       1.18194e+10  -0.0644504
  CompareArrayScalarKernel/32768/50                            1.05602e+10       1.1589e+10    0.0974232
  CompareArrayArrayKernel/32768/1                              2.04325e+10       2.05987e+10   0.00813217
  CompareArrayArrayKernel/32768/50                             2.05181e+10       2.05285e+10   0.000506319
  CompareArrayScalarKernel/32768/0                             1.18268e+10       1.28373e+10   0.0854427
  CompareArrayScalarKernel/32768/1                             1.132e+10         1.21216e+10   0.0708079
  CompareArrayArrayKernel/32768/0                              2.20585e+10       2.20663e+10   0.000354719
  FloatConversion                                              2.48589e+07       2.48902e+07   0.00125976
  StringConversion                                             6.57084e+07       6.5637e+07   -0.00108536
  Decimal128Conversion                                         1.14456e+07       1.1444e+07   -0.000136588
  Int64Conversion                                              5.71114e+07       5.71772e+07   0.00115285
  SumKernel/32768/0                                            1.84054e+10       1.8265e+10   -0.00762826
  SumKernel/32768/10                                           1.32769e+10       1.32482e+10  -0.00215986
  SumKernel/32768/1                                            1.53767e+10       1.52817e+10  -0.00617827
  SumKernel/32768/50                                           1.15035e+10       1.14894e+10  -0.00122603
  BinaryBitOp                                                  1.05461e+08       1.04935e+08  -0.00498816
  BinaryMathOpAggregate                                        1.0577e+07        1.06907e+07   0.0107505
  Constants                                                    4.07316e+07       4.05163e+07  -0.00528619
  FromString                                                   1.36385e+07       1.36482e+07   0.000713888
  BinaryCompareOp                                              7.18793e+07       7.2261e+07    0.00531018
  UnaryOp                                                      8.81945e+07       8.80089e+07  -0.00210446
  BinaryMathOp                                                 2.78412e+07       2.77963e+07  -0.00161293
  BinaryCompareOpConstant                                      6.49211e+07       6.21469e+07  -0.0427325
  SortToIndicesInt64/1048576/1/min_time:1.000                  6.87089e+07       6.85906e+07  -0.00172265
  SortToIndicesInt64/32768/50/min_time:1.000                   1.61129e+08       1.61192e+08   0.000392657
  SortToIndicesInt64/32768/10/min_time:1.000                   9.3528e+07        9.34592e+07  -0.00073606
  SortToIndicesInt64/8388608/1/min_time:1.000                  5.99098e+07       5.98487e+07  -0.00101989
  SortToIndicesInt64/32768/1/min_time:1.000                    8.66826e+07       8.71045e+07   0.00486761
  SortToIndicesInt64/32768/0/min_time:1.000                    8.72577e+07       8.72488e+07  -0.000102011
  DetectUIntWidthNoNulls                                       2.35736e+10       2.35112e+10  -0.00264441
  DetectIntWidthNoNulls                                        2.04091e+10       2.03832e+10  -0.00126539
  DetectIntWidthNulls                                          1.07373e+10       1.07388e+10   0.000140204
  DetectUIntWidthNulls                                         1.28387e+10       1.28363e+10  -0.000181795
  FilterString/32768/0/min_time:1.000                          5.07058e+09       4.91931e+09  -0.0298319
  FilterInt64/32768/50/min_time:1.000                          4.11019e+08       4.10888e+08  -0.000318984
  FilterFixedSizeList1Int64/8388608/1/min_time:1.000           3.80219e+08       3.79409e+08  -0.00213096
  FilterFixedSizeList1Int64/32768/10/min_time:1.000            3.37975e+08       3.37712e+08  -0.000777484
  FilterFixedSizeList1Int64/32768/50/min_time:1.000            1.85149e+08       1.83802e+08  -0.0072791
  FilterString/8388608/1/min_time:1.000                        3.90376e+09       3.81026e+09  -0.023952
  FilterString/1048576/1/min_time:1.000                        3.66454e+09       3.69179e+09   0.00743729
  FilterFixedSizeList1Int64/1048576/1/min_time:1.000           3.80925e+08       3.82714e+08   0.00469731
  FilterInt64/32768/10/min_time:1.000                          7.78435e+08       7.74795e+08  -0.0046764
  FilterInt64/32768/1/min_time:1.000                           8.47656e+08       8.5532e+08    0.00904099
  FilterInt64/8388608/1/min_time:1.000                         6.6595e+08        6.67596e+08   0.00247147
  FilterInt64/32768/0/min_time:1.000                           8.40875e+08       8.36243e+08  -0.00550858
  FilterFixedSizeList1Int64/32768/0/min_time:1.000             4.70932e+08       4.72803e+08   0.00397245
  FilterFixedSizeList1Int64/32768/1/min_time:1.000             4.05408e+08       4.02649e+08  -0.00680562
  FilterString/32768/1/min_time:1.000                          4.95204e+09       4.96272e+09   0.00215661
  FilterString/32768/10/min_time:1.000                         4.52546e+09       4.44782e+09  -0.0171571
  FilterString/32768/50/min_time:1.000                         2.22963e+09       2.19289e+09  -0.0164786
  FilterInt64/1048576/1/min_time:1.000                         6.6937e+08        6.69143e+08  -0.000338985
  BuildInt64DictionaryArraySequential                          3.57793e+08       3.55883e+08  -0.00533902
  BuildFixedSizeBinaryArray                                    3.89211e+08       3.95862e+08   0.0170869
  BufferBuilderLargeWrites/real_time                           2.40252e+09       2.33018e+09  -0.0301095
  BuildBooleanArrayNoNulls                                     5.56989e+09       5.50649e+09  -0.0113822
  ArrayDataConstructDestruct                              100866            100512            -0.00350968
  BuildAdaptiveIntNoNullsScalarAppend                          1.43918e+09       1.43704e+09  -0.00149109
  BuildIntArrayNoNulls                                         3.07021e+09       3.02253e+09  -0.0155277
  BuildBinaryArray                                             3.24545e+08       3.32854e+08   0.0256022
  BuildAdaptiveIntNoNulls                                      1.07476e+10       1.06152e+10  -0.012321
  BufferBuilderTinyWrites/real_time                            4.78079e+08       4.77451e+08  -0.00131337
  BuildInt64DictionaryArraySimilar                             2.72678e+08       2.70806e+08  -0.00686518
  BuildChunkedBinaryArray                                      2.72078e+08       2.71715e+08  -0.00133257
  BufferBuilderSmallWrites/real_time                           3.61001e+09       3.51775e+09  -0.0255559
  BuildInt64DictionaryArrayRandom                              3.21505e+08       3.52824e+08   0.0974141
  BuildDecimalArray                                            5.91154e+08       5.91009e+08  -0.000245778
  BuildStringDictionaryArray                                   2.45288e+08       2.46456e+08   0.00476031
  ReadJSONBlockWithSchemaMultiThread/real_time                 1.89042e+08       1.87233e+08  -0.00956962
  ChunkJSONLineDelimited                                     109.917           109.879        -0.00034497
- ChunkJSONPrettyPrinted                                       8.40633e+07       7.68993e+07  -0.0852212
  ParseJSONBlockWithSchema                                     4.18418e+07       4.08108e+07  -0.02464
  ReadJSONBlockWithSchemaSingleThread                          3.78651e+07       3.92098e+07   0.0355126
  WriteRecordBatch/64/real_time                                1.06445e+10       1.02481e+10  -0.0372385
  WriteRecordBatch/16/real_time                                1.26426e+10       1.2134e+10   -0.0402267
  WriteRecordBatch/8192/real_time                              3.27965e+08       3.2295e+08   -0.0152935
  ReadRecordBatch/4/real_time                                  6.29799e+11       6.24777e+11  -0.00797472
  ReadRecordBatch/1/real_time                                  1.18357e+12       1.16548e+12  -0.0152862
  WriteRecordBatch/1/real_time                                 1.32558e+10       1.27121e+10  -0.0410175
  WriteRecordBatch/1024/real_time                              2.51259e+09       2.4623e+09   -0.0200146
  ReadRecordBatch/16/real_time                                 2.10718e+11       2.09605e+11  -0.00527934
  ReadRecordBatch/8192/real_time                               2.54e+08          2.50965e+08  -0.0119485
  WriteRecordBatch/4/real_time                                 1.32514e+10       1.26804e+10  -0.0430911
  ReadRecordBatch/1024/real_time                               2.66116e+09       2.59249e+09  -0.0258042
  ReadRecordBatch/64/real_time                                 5.59284e+10       5.58811e+10  -0.000846173
  ReadRecordBatch/4096/real_time                               4.92831e+08       4.87308e+08  -0.0112069
  WriteRecordBatch/4096/real_time                              6.61193e+08       6.49704e+08  -0.0173763
  WriteRecordBatch/256/real_time                               6.64758e+09       6.47117e+09  -0.0265377
  ReadRecordBatch/256/real_time                                1.12464e+10       1.07343e+10  -0.0455329
  ThreadPoolSpawn/threads:4/task_cost:10000/real_time     436570            433540            -0.0069404
  ThreadedTaskGroup/threads:8/task_cost:10000/real_time   196185            194601            -0.0080736
  ThreadedTaskGroup/threads:1/task_cost:10000/real_time   126288            126160            -0.00101618
  ThreadedTaskGroup/threads:1/task_cost:100000/real_time   12910.4           12910            -2.9045e-05
  ThreadPoolSpawn/threads:4/task_cost:100000/real_time     37147.6           39486.5           0.0629644
  ThreadedTaskGroup/threads:8/task_cost:100000/real_time   49990.4           49991.5           2.23484e-05
  ThreadedTaskGroup/threads:4/task_cost:100000/real_time   47222.7           47074            -0.00314901
  ThreadPoolSpawn/threads:1/task_cost:100000/real_time     12590.1           12603.8           0.00108821
  ThreadedTaskGroup/threads:2/task_cost:100000/real_time   25416.2           25417             3.07305e-05
- ThreadedTaskGroup/threads:8/task_cost:1000/real_time    211488            187692            -0.112517
  ThreadedTaskGroup/threads:1/task_cost:1000/real_time    857807            840825            -0.0197967
  ThreadedTaskGroup/threads:4/task_cost:10000/real_time   450754            445731            -0.0111433
  ThreadPoolSpawn/threads:1/task_cost:1000/real_time      921863            905253            -0.0180179
  ThreadedTaskGroup/threads:2/task_cost:1000/real_time    231811            255898             0.103906
  ThreadPoolSpawn/threads:8/task_cost:1000/real_time      208669            216882             0.0393587
  ThreadPoolSpawn/threads:1/task_cost:10000/real_time     124893            124703            -0.0015147
  ThreadPoolSpawn/threads:2/task_cost:1000/real_time      385242            394750             0.0246785
  ThreadPoolSpawn/threads:4/task_cost:1000/real_time      207837            208191             0.00170517
  ThreadedTaskGroup/threads:4/task_cost:1000/real_time    194233            195368             0.00584108
  ThreadedTaskGroup/threads:2/task_cost:10000/real_time   235410            237657             0.0095457
  SerialTaskGroup/task_cost:100000/real_time               13064             13064            -2.49535e-06
  ThreadPoolSpawn/threads:2/task_cost:10000/real_time     210322            220553             0.0486437
  ThreadPoolSpawn/threads:8/task_cost:100000/real_time     51622.4           51607.3          -0.000290675
  SerialTaskGroup/task_cost:10000/real_time               130228            130226            -1.44551e-05
  ThreadPoolSpawn/threads:2/task_cost:100000/real_time     22110.6           22099.7          -0.000491798
  ThreadPoolSpawn/threads:8/task_cost:10000/real_time     273366            275064             0.00620986
  SerialTaskGroup/task_cost:1000/real_time                     1.25775e+06       1.25776e+06   3.98338e-06
  ChunkCSVEscapedBlock                                         9.35103e+08       9.3539e+08    0.000307353
  ChunkCSVNoNewlinesBlock                                     11.2806           11.3988        0.0104752
  ChunkCSVQuotedBlock                                          8.48658e+08       8.48592e+08  -7.77781e-05
  ParseCSVQuotedBlock                                          3.30078e+08       3.29997e+08  -0.000245724
  ParseCSVEscapedBlock                                         2.86795e+08       2.8676e+08   -0.000122009
  ======================================================  ================  ================  ============

@pitrou
Copy link
Member Author

pitrou commented Sep 12, 2019

The ursabot benchmark numbers above are slightly noisy. I don't the trie or utf8 benchmarks can be impacted by memory allocator changes. It seems there are no actual regressions, at least on that machine.

@pitrou
Copy link
Member Author

pitrou commented Sep 12, 2019

@wesm You may want to run some benchmarks on your machine.

@wesm
Copy link
Member

wesm commented Sep 14, 2019

Thanks @pitrou for investigating this. I will run a benchmark comparison on my machine (pretty newish i9-9960X) once I address https://issues.apache.org/jira/browse/ARROW-6559 which I just opened.

@wesm
Copy link
Member

wesm commented Sep 14, 2019

I also just rebased so hopefully we get a passing build now that the CI failure from ARROW-6509 is unblocked

@wesm
Copy link
Member

wesm commented Sep 15, 2019

Here are the benchmark results on my machine (I needed ARROW-6559 to run them)

https://gist.github.com/wesm/7501f688ee221ea826a56092dc02c471

I didn't see anything significant so I'm merging this

@wesm
Copy link
Member

wesm commented Sep 15, 2019

It looks like we have a UBSAN failure

[ RUN      ] TestArrowReadWrite.UseDeprecatedInt96
/arrow/cpp/src/parquet/types.h:558:27: runtime error: signed integer overflow: 2776655897 * 86400000000000 cannot be represented in type 'long'
    #0 0x7f2af04e0582 in parquet::Int96GetNanoSeconds(parquet::Int96 const&) /arrow/cpp/src/parquet/types.h:558:27
    #1 0x7f2af04dfccb in parquet::arrow::TransferInt96(parquet::internal::RecordReader*, arrow::MemoryPool*, std::shared_ptr<arrow::DataType> const&, arrow::compute::Datum*) /arrow/cpp/src/parquet/arrow/reader_internal.cc:741:19
    #2 0x7f2af04e7e66 in parquet::arrow::TransferColumnData(parquet::internal::RecordReader*, std::shared_ptr<arrow::DataType>, parquet::ColumnDescriptor const*, arrow::MemoryPool*, std::shared_ptr<arrow::ChunkedArray>*) /arrow/cpp/src/parquet/arrow/reader_internal.cc:1179:13
    #3 0x7f2af04664d4 in parquet::arrow::LeafReader::NextBatch(long, std::shared_ptr<arrow::ChunkedArray>*) /arrow/cpp/src/parquet/arrow/reader.cc:443:5
    #4 0x7f2af0493b83 in parquet::arrow::FileReaderImpl::ReadSchemaField(int, std::vector<int, std::allocator<int> > const&, std::vector<int, std::allocator<int> > const&, std::shared_ptr<arrow::Field>*, std::shared_ptr<arrow::ChunkedArray>*) /arrow/cpp/src/parquet/arrow/reader.cc:182:20
    #5 0x7f2af044919a in parquet::arrow::FileReaderImpl::ReadRowGroups(std::vector<int, std::allocator<int> > const&, std::vector<int, std::allocator<int> > const&, std::shared_ptr<arrow::Table>*)::$_0::operator()(int) const /arrow/cpp/src/parquet/arrow/reader.cc:806:12
    #6 0x7f2af04480eb in parquet::arrow::FileReaderImpl::ReadRowGroups(std::vector<int, std::allocator<int> > const&, std::vector<int, std::allocator<int> > const&, std::shared_ptr<arrow::Table>*) /arrow/cpp/src/parquet/arrow/reader.cc:826:7
    #7 0x7f2af04620ca in parquet::arrow::FileReaderImpl::ReadTable(std::vector<int, std::allocator<int> > const&, std::shared_ptr<arrow::Table>*) /arrow/cpp/src/parquet/arrow/reader.cc:151:12
    #8 0x7f2af0461c76 in parquet::arrow::FileReaderImpl::ReadTable(std::shared_ptr<arrow::Table>*) /arrow/cpp/src/parquet/arrow/reader.cc:227:12
    #9 0x81f76c in parquet::arrow::DoSimpleRoundtrip(std::shared_ptr<arrow::Table> const&, bool, long, std::vector<int, std::allocator<int> > const&, std::shared_ptr<arrow::Table>*, std::shared_ptr<parquet::ArrowWriterProperties> const&) /arrow/cpp/src/parquet/arrow/arrow_reader_writer_test.cc:436:5
    #10 0x83a1ea in parquet::arrow::TestArrowReadWrite_UseDeprecatedInt96_Test::TestBody() /arrow/cpp/src/parquet/arrow/arrow_reader_writer_test.cc:1398:3
    #11 0x7f2af1c1c88d in void testing::internal::HandleSehExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) /build/cpp/googletest_ep-prefix/src/googletest_ep/googletest/src/gtest.cc:2443:10
    #12 0x7f2af1c058ba in void testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) /build/cpp/googletest_ep-prefix/src/googletest_ep/googletest/src/gtest.cc:2479:14
    #13 0x7f2af1be6cf5 in testing::Test::Run() /build/cpp/googletest_ep-prefix/src/googletest_ep/googletest/src/gtest.cc:2517:5
    #14 0x7f2af1be7a5a in testing::TestInfo::Run() /build/cpp/googletest_ep-prefix/src/googletest_ep/googletest/src/gtest.cc:2693:11
    #15 0x7f2af1be811e in testing::TestCase::Run() /build/cpp/googletest_ep-prefix/src/googletest_ep/googletest/src/gtest.cc:2811:28
    #16 0x7f2af1bf4109 in testing::internal::UnitTestImpl::RunAllTests() /build/cpp/googletest_ep-prefix/src/googletest_ep/googletest/src/gtest.cc:5177:43
    #17 0x7f2af1c2028d in bool testing::internal::HandleSehExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::*)(), char const*) /build/cpp/googletest_ep-prefix/src/googletest_ep/googletest/src/gtest.cc:2443:10
    #18 0x7f2af1c07f6a in bool testing::internal::HandleExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::*)(), char const*) /build/cpp/googletest_ep-prefix/src/googletest_ep/googletest/src/gtest.cc:2479:14
    #19 0x7f2af1bf3df5 in testing::UnitTest::Run() /build/cpp/googletest_ep-prefix/src/googletest_ep/googletest/src/gtest.cc:4786:10
    #20 0x7f2af1e4a9e0 in RUN_ALL_TESTS() /build/cpp/googletest_ep-prefix/src/googletest_ep/googletest/include/gtest/gtest.h:2341:46
    #21 0x7f2af1e4a9bf in main /build/cpp/googletest_ep-prefix/src/googletest_ep/googletest/src/gtest_main.cc:36:10
    #22 0x7f2ae73f9b96 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x21b96)
    #23 0x737679 in _start (/build/cpp/debug/parquet-arrow-test+0x737679)
SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior /arrow/cpp/src/parquet/types.h:558:27 in
/build/cpp/src/parquet

@emkornfield any clue why this would only show up here and not on master (this failure isn't occurring there)?

@pitrou
Copy link
Member Author

pitrou commented Sep 16, 2019

@wesm This has to do with Parquet reading the data for null values. This data seems uninitialized so depending on the memory allocator may take different values.

IMO this may point to a problem in Parquet: it shouldn't serialize uninitialized data (that data may hold user secrets). Note that Arrow itself is not at fault: null data is zero-initialized in TimestampBuilder (unless there's a weird bug there).

@pitrou
Copy link
Member Author

pitrou commented Sep 16, 2019

By the way, 2776655897 is supposed to be a number of days since Unix epoch. If we add kJulianToUnixEpochDays we get the following:

>>> hex(2776655897 + 2440588)
'0xa5a5a5a5'

This points exactly to uninitialized memory, flagged thanks to jemalloc's "junk" option.

opt.junk (const char *) r- [--enable-fill]

Junk filling. If set to “alloc”, each byte of uninitialized allocated memory will be initialized to 0xa5. If set to “free”, all deallocated memory will be initialized to 0x5a. If set to “true”, both allocated and deallocated memory will be initialized, and if set to “false”, junk filling be disabled entirely. This is intended for debugging and will impact performance negatively. This option is “false” by default unless --enable-debug is specified during configuration, in which case it is “true” by default.

@pitrou
Copy link
Member Author

pitrou commented Sep 16, 2019

Ok, the uninitialized memory issue is actually on the decoding side. I will push a fix, I'll let you double-check @wesm

@pitrou
Copy link
Member Author

pitrou commented Sep 16, 2019

Passing Travis-CI build at https://travis-ci.org/pitrou/arrow/builds/585577226

@emkornfield
Copy link
Contributor

I'm not exactly sure, I think somehow we must have been getting lucky (or maybe @pitrou data-poisoining PR made this more obvious, if that has been merged). I believe this is the bug that #4607 is meant to address as well.

Revert "ARROW-6478: [C++] Revert to jemalloc stable-4 until we understand 5.2.x performance issues"

This reverts commit 53c5af0.

In addition, configure jemalloc to fix the performance regression.
@codecov-io
Copy link

Codecov Report

Merging #5365 into master will increase coverage by 0.55%.
The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #5365      +/-   ##
==========================================
+ Coverage   88.58%   89.14%   +0.55%     
==========================================
  Files         950      758     -192     
  Lines      126213   110756   -15457     
  Branches     1495        0    -1495     
==========================================
- Hits       111808    98732   -13076     
+ Misses      14040    12024    -2016     
+ Partials      365        0     -365
Impacted Files Coverage Δ
cpp/src/arrow/memory_pool.cc 79.31% <ø> (ø) ⬆️
cpp/src/arrow/csv/column_builder.cc 95.54% <0%> (-1.49%) ⬇️
cpp/src/arrow/util/thread_pool_test.cc 97.66% <0%> (-0.94%) ⬇️
go/arrow/ipc/writer.go
go/arrow/math/uint64_amd64.go
go/arrow/memory/memory_avx2_amd64.go
go/arrow/ipc/file_reader.go
js/src/builder/index.ts
js/src/enum.ts
go/arrow/array/builder.go
... and 186 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 3bf4d80...71dd00f. Read the comment docs.

Copy link
Member

@wesm wesm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants