Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using "ryu" library to format floats in text form. #8542

Merged
merged 17 commits into from Jan 7, 2020
Merged

Conversation

alexey-milovidov
Copy link
Member

@alexey-milovidov alexey-milovidov commented Jan 6, 2020

Changelog category (leave one):

  • Performance Improvement

Changelog entry (up to few sentences, required except for Non-significant/Documentation categories):
Improved performance of formatting floating point numbers up to 6 times.

Detailed description (optional):

This PR is opened only to be closed. Because in contrast to double-conversion, the ryu library doesn't have the mode to prefer simple format over exponential:

SELECT toString(0.123)

┌─toString(0.123)─┐
│ 1.23E-1         │
└─────────────────┘

SELECT toString(123.456)

┌─toString(123.456)─┐
│ 1.23456E2         │
└───────────────────┘

That's non satisfactory. Probably we can add support for this formatting mode if performance benefits will look as a viable tradeoff.

Update 1: it still has a chance.
Update 2: after some work, it's ready to be integrated.

@alexey-milovidov alexey-milovidov requested a review from a team January 6, 2020 05:26
@ghost ghost requested review from tavplubix and removed request for a team January 6, 2020 05:26
@alexey-milovidov alexey-milovidov removed the request for review from tavplubix January 6, 2020 05:26
@alexey-milovidov
Copy link
Member Author

Performance test is Ok and shows 2% improvement across all the test cases:
https://clickhouse-test-reports.s3.yandex.net/8542/828e43cf9432e3624e6cdfb5513bc5d851f91209/performance_test/comparison_max_rps_gcc_9.html

One query was slowed down, need further investigation:

SELECT count() FROM system.numbers WHERE NOT ignore(toString(toFloat64(number % 10)))

@alexey-milovidov
Copy link
Member Author

alexey-milovidov commented Jan 7, 2020

One query was slowed down, need further investigation:

Yes, Ryu library is slower than doble-conversion on small integers.
This is because the order of branches in this function:

static inline uint32_t decimalLength17(const uint64_t v) {
  // This is slightly faster than a loop.
  // The average output length is 16.38 digits, so we check high-to-low.
  // Function precondition: v is not an 18, 19, or 20-digit number.
  // (17 digits are sufficient for round-tripping.)
  assert(v < 100000000000000000L);
  if (v >= 10000000000000000L) { return 17; }
  if (v >= 1000000000000000L) { return 16; }
  if (v >= 100000000000000L) { return 15; }
  if (v >= 10000000000000L) { return 14; }
  if (v >= 1000000000000L) { return 13; }
  if (v >= 100000000000L) { return 12; }
  if (v >= 10000000000L) { return 11; }
  if (v >= 1000000000L) { return 10; }
  if (v >= 100000000L) { return 9; }
  if (v >= 10000000L) { return 8; }
  if (v >= 1000000L) { return 7; }
  if (v >= 100000L) { return 6; }
  if (v >= 10000L) { return 5; }
  if (v >= 1000L) { return 4; }
  if (v >= 100L) { return 3; }
  if (v >= 10L) { return 2; }
  return 1;
}

Update.

Actually, it's not only because of this function.
The main difference is inside d2d function.

@alexey-milovidov
Copy link
Member Author

Now we have 4% overall performance increase:
https://clickhouse-test-reports.s3.yandex.net/8542/c1ccb427d5b1de0a13ee456458b9a8fbbade92c8/performance_test/comparison_max_rps_gcc_9.html

And there is no slower queries.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
pr-performance Pull request with some performance improvements
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants