Skip to content

Conversation

@AndrewTsao
Copy link
Contributor

@AndrewTsao AndrewTsao commented Sep 3, 2020

By exiting early out of the parsing routine when the input is exhausted, we can save a little bit a processing time.

@github-actions
Copy link

github-actions bot commented Sep 3, 2020

Thanks for opening a pull request!

Could you open an issue for this pull request on JIRA?
https://issues.apache.org/jira/browse/ARROW

Then could you also rename pull request title in the following format?

ARROW-${JIRA_ID}: [${COMPONENT}] ${SUMMARY}

See also:

@pitrou
Copy link
Member

pitrou commented Sep 3, 2020

Hi and thank you for posting a PR. I recommend you read a bit about the contribution guidelines here:
https://arrow.apache.org/docs/developers/contributing.html#report-bugs-and-propose-features

Also, can you explain in which situation this marks an improvement? Any benchmark numbers perhaps?

@AndrewTsao
Copy link
Contributor Author

On my windows machine, test result. BM_ParseUnsignedBreak is use break statment to break unroll loop. (/O2 /Oi)

2020-09-07T13:28:06+08:00
Running release\Release\test_strtoll.exe
Run on (16 X 2095 MHz CPU s)
CPU Caches:
  L1 Data 32 KiB (x8)
  L1 Instruction 32 KiB (x8)
  L2 Unified 1024 KiB (x8)
  L3 Unified 11264 KiB (x1)
----------------------------------------------------------------
Benchmark                      Time             CPU   Iterations
----------------------------------------------------------------
BM_ParseUnsignedBreak        292 ns          292 ns      2243575
BM_ParseUnsigned             262 ns          260 ns      2639500

On my linux machines, (/O3)

Running ./centos8/test_strtoll
Run on (4 X 2200 MHz CPU s)
CPU Caches:
  L1 Data 32 KiB (x4)
  L1 Instruction 32 KiB (x4)
  L2 Unified 256 KiB (x4)
  L3 Unified 30720 KiB (x4)
Load Average: 0.61, 0.16, 0.05
----------------------------------------------------------------
Benchmark                      Time             CPU   Iterations
----------------------------------------------------------------
BM_ParseUnsignedBreak        236 ns          236 ns      2955433
BM_ParseUnsigned             237 ns          236 ns      2963977

Bellow is my benchmark code.

#include <stdlib.h>
#include <time.h>

#include <iostream>
#include <string>
#include <vector>

#include "benchmark/benchmark.h"

#if defined(__GNUC__)
#define ARROW_PREDICT_FALSE(x) (__builtin_expect(!!(x), 0))
#define ARROW_PREDICT_TRUE(x) (__builtin_expect(!!(x), 1))
#define ARROW_NORETURN __attribute__((noreturn))
#define ARROW_NOINLINE __attribute__((noinline))
#define ARROW_PREFETCH(addr) __builtin_prefetch(addr)
#elif defined(_MSC_VER)
#define ARROW_NORETURN __declspec(noreturn)
#define ARROW_NOINLINE __declspec(noinline)
#define ARROW_PREDICT_FALSE(x) (x)
#define ARROW_PREDICT_TRUE(x) (x)
#define ARROW_PREFETCH(addr)
#else
#define ARROW_NORETURN
#define ARROW_PREDICT_FALSE(x) (x)
#define ARROW_PREDICT_TRUE(x) (x)
#define ARROW_PREFETCH(addr)
#endif

#define PARSE_UNSIGNED_ITERATION_BREAK(C_TYPE)          \
  if (length > 0) {                               \
    uint8_t digit = ParseDecimalDigit(*s++);      \
    result = static_cast<C_TYPE>(result * 10U);   \
    length--;                                     \
    if (ARROW_PREDICT_FALSE(digit > 9U)) {        \
      /* Non-digit */                             \
      return false;                               \
    }                                             \
    result = static_cast<C_TYPE>(result + digit); \
  } else {                                        \
    break;                                        \
  }

#define PARSE_UNSIGNED_ITERATION(C_TYPE)         \
  if (length > 0) {                               \
    uint8_t digit = ParseDecimalDigit(*s++);      \
    result = static_cast<C_TYPE>(result * 10U);   \
    length--;                                     \
    if (ARROW_PREDICT_FALSE(digit > 9U)) {        \
      /* Non-digit */                             \
      return false;                               \
    }                                             \
    result = static_cast<C_TYPE>(result + digit); \
  }

#define PARSE_UNSIGNED_ITERATION_LAST(C_TYPE)                            \
  if (length > 0) {                                                      \
    if (ARROW_PREDICT_FALSE(result >                                     \
                            std::numeric_limits<C_TYPE>::max() / 10U)) { \
      /* Overflow */                                                     \
      return false;                                                      \
    }                                                                    \
    uint8_t digit = ParseDecimalDigit(*s++);                             \
    result = static_cast<C_TYPE>(result * 10U);                          \
    C_TYPE new_result = static_cast<C_TYPE>(result + digit);             \
    if (ARROW_PREDICT_FALSE(--length > 0)) {                             \
      /* Too many digits */                                              \
      return false;                                                      \
    }                                                                    \
    if (ARROW_PREDICT_FALSE(digit > 9U)) {                               \
      /* Non-digit */                                                    \
      return false;                                                      \
    }                                                                    \
    if (ARROW_PREDICT_FALSE(new_result < result)) {                      \
      /* Overflow */                                                     \
      return false;                                                      \
    }                                                                    \
    result = new_result;                                                 \
  }

inline uint8_t ParseDecimalDigit(char c) {
  return static_cast<uint8_t>(c - '0');
}

inline bool ParseUnsignedBreak(const char* s, size_t length, uint64_t* out) {
  uint64_t result = 0;
  do {
    PARSE_UNSIGNED_ITERATION_BREAK(uint64_t);
    PARSE_UNSIGNED_ITERATION_BREAK(uint64_t);
    PARSE_UNSIGNED_ITERATION_BREAK(uint64_t);
    PARSE_UNSIGNED_ITERATION_BREAK(uint64_t);
    PARSE_UNSIGNED_ITERATION_BREAK(uint64_t);

    PARSE_UNSIGNED_ITERATION_BREAK(uint64_t);
    PARSE_UNSIGNED_ITERATION_BREAK(uint64_t);
    PARSE_UNSIGNED_ITERATION_BREAK(uint64_t);
    PARSE_UNSIGNED_ITERATION_BREAK(uint64_t);
    PARSE_UNSIGNED_ITERATION_BREAK(uint64_t);

    PARSE_UNSIGNED_ITERATION_BREAK(uint64_t);
    PARSE_UNSIGNED_ITERATION_BREAK(uint64_t);
    PARSE_UNSIGNED_ITERATION_BREAK(uint64_t);
    PARSE_UNSIGNED_ITERATION_BREAK(uint64_t);
    PARSE_UNSIGNED_ITERATION_BREAK(uint64_t);

    PARSE_UNSIGNED_ITERATION_BREAK(uint64_t);
    PARSE_UNSIGNED_ITERATION_BREAK(uint64_t);
    PARSE_UNSIGNED_ITERATION_BREAK(uint64_t);
    PARSE_UNSIGNED_ITERATION_BREAK(uint64_t);
    PARSE_UNSIGNED_ITERATION_LAST(uint64_t);
  } while (false);
  *out = result;
  return true;
}

inline bool ParseUnsigned(const char* s, size_t length, uint64_t* out) {
  uint64_t result = 0;
  PARSE_UNSIGNED_ITERATION(uint64_t);
  PARSE_UNSIGNED_ITERATION(uint64_t);
  PARSE_UNSIGNED_ITERATION(uint64_t);
  PARSE_UNSIGNED_ITERATION(uint64_t);
  PARSE_UNSIGNED_ITERATION(uint64_t);

  PARSE_UNSIGNED_ITERATION(uint64_t);
  PARSE_UNSIGNED_ITERATION(uint64_t);
  PARSE_UNSIGNED_ITERATION(uint64_t);
  PARSE_UNSIGNED_ITERATION(uint64_t);
  PARSE_UNSIGNED_ITERATION(uint64_t);

  PARSE_UNSIGNED_ITERATION(uint64_t);
  PARSE_UNSIGNED_ITERATION(uint64_t);
  PARSE_UNSIGNED_ITERATION(uint64_t);
  PARSE_UNSIGNED_ITERATION(uint64_t);
  PARSE_UNSIGNED_ITERATION(uint64_t);

  PARSE_UNSIGNED_ITERATION(uint64_t);
  PARSE_UNSIGNED_ITERATION(uint64_t);
  PARSE_UNSIGNED_ITERATION(uint64_t);
  PARSE_UNSIGNED_ITERATION(uint64_t);
  PARSE_UNSIGNED_ITERATION_LAST(uint64_t);
  *out = result;
  return true;
}

inline bool ParseUnsignedBreak(const char* s, size_t length, uint32_t* out) {
  uint32_t result = 0;
  do {
    PARSE_UNSIGNED_ITERATION_BREAK(uint32_t);
    PARSE_UNSIGNED_ITERATION_BREAK(uint32_t);
    PARSE_UNSIGNED_ITERATION_BREAK(uint32_t);
    PARSE_UNSIGNED_ITERATION_BREAK(uint32_t);
    PARSE_UNSIGNED_ITERATION_BREAK(uint32_t);

    PARSE_UNSIGNED_ITERATION_BREAK(uint32_t);
    PARSE_UNSIGNED_ITERATION_BREAK(uint32_t);
    PARSE_UNSIGNED_ITERATION_BREAK(uint32_t);
    PARSE_UNSIGNED_ITERATION_BREAK(uint32_t);
    PARSE_UNSIGNED_ITERATION_LAST(uint32_t);
  } while (false);

  *out = result;
  return true;
}

static uint64_t x = 0;

static void BM_ParseUnsigned(benchmark::State& state) {
  // Perform setup here
  std::vector<std::string> samples;
  for (size_t i = 0; i < 22; i++) {
    samples.emplace_back(std::string(i, '1'));
  }

  for (auto _ : state) {
    // This code gets timed
    for (auto& sample : samples) {
      uint64_t val;
      ParseUnsigned(sample.c_str(), sample.size(), &val);
      x += val;
    }
  }
}

static void BM_ParseUnsignedBreak(benchmark::State& state) {
  // Perform setup here
  std::vector<std::string> samples;
  for (size_t i = 0; i < 22; i++) {
    samples.emplace_back(std::string(i, '1'));
  }

  for (auto _ : state) {
    // This code gets timed
    for (auto& sample : samples) {
      uint64_t val;
      ParseUnsignedBreak(sample.c_str(), sample.size(), &val);
      x += val;
    }
  }
  // std::cout << x << std::endl;
}

// Register the function as a benchmark
BENCHMARK(BM_ParseUnsignedBreak);
BENCHMARK(BM_ParseUnsigned);
// Run the benchmark
BENCHMARK_MAIN();

AndrewTsao and others added 2 commits September 7, 2020 12:18
Hi, I find parse unsigned digits is slow. Expanded loop not exit when length equals zero.
@pitrou pitrou changed the title fast exit loop if length equals zero ARROW-9928: [C++] Speed up integer parsing slightly Sep 7, 2020
@github-actions
Copy link

github-actions bot commented Sep 7, 2020

@pitrou
Copy link
Member

pitrou commented Sep 7, 2020

Thank you. The speed up is relatively minor, but I can confirm it on Ubuntu 20.04 with clang 10.

@pitrou
Copy link
Member

pitrou commented Sep 7, 2020

Integer parsing micro-benchmarks:

  • before:
IntegerParsing<Int8Type>            2553 ns         2553 ns       826045 items_per_second=391.727M/s
IntegerParsing<Int16Type>           4728 ns         4727 ns       444020 items_per_second=211.536M/s
IntegerParsing<Int32Type>           9060 ns         9059 ns       231401 items_per_second=110.387M/s
IntegerParsing<Int64Type>          12903 ns        12901 ns       162856 items_per_second=77.5155M/s
IntegerParsing<UInt8Type>           3534 ns         3534 ns       587574 items_per_second=282.989M/s
IntegerParsing<UInt16Type>          2135 ns         2135 ns       974843 items_per_second=468.364M/s
IntegerParsing<UInt32Type>          5658 ns         5657 ns       371323 items_per_second=176.765M/s
IntegerParsing<UInt64Type>          7650 ns         7649 ns       274310 items_per_second=130.737M/s
  • after:
IntegerParsing<Int8Type>            2407 ns         2406 ns       879327 items_per_second=415.55M/s
IntegerParsing<Int16Type>           4433 ns         4432 ns       473344 items_per_second=225.632M/s
IntegerParsing<Int32Type>           8648 ns         8647 ns       243097 items_per_second=115.65M/s
IntegerParsing<Int64Type>          12066 ns        12064 ns       174143 items_per_second=82.8881M/s
IntegerParsing<UInt8Type>           3303 ns         3303 ns       637974 items_per_second=302.789M/s
IntegerParsing<UInt16Type>          1996 ns         1996 ns      1052083 items_per_second=501.06M/s
IntegerParsing<UInt32Type>          5301 ns         5300 ns       396027 items_per_second=188.676M/s
IntegerParsing<UInt64Type>          7160 ns         7159 ns       293150 items_per_second=139.681M/s

@pitrou
Copy link
Member

pitrou commented Sep 7, 2020

CSV conversion micro-benchmark:

  • before:
Int64Conversion     100800 ns       100782 ns        21238 items_per_second=82.6831M/s
  • after:
Int64Conversion      94717 ns        94701 ns        22247 items_per_second=87.9932M/s

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants