Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perf: improve ada::can_parse #674

Merged
merged 8 commits into from
Jun 18, 2024

Conversation

CarlosEduR
Copy link
Member

Relates to: #377

@CarlosEduR
Copy link
Member Author

CarlosEduR commented May 23, 2024

I created a copy of the parse function and moved to the checkers file, I've been profiling it and commenting the code I don't think we need for this scenario.

It's still like a POC, there might be a better place to put it, also even a better function name, I am still poking around.

@CarlosEduR
Copy link
Member Author

Benchmarks

main branch:

-------------------------------------------------------------------------------------
Benchmark                           Time             CPU   Iterations UserCounters...
-------------------------------------------------------------------------------------
BasicBench_AdaURL_CanParse   19704356 ns     19705721 ns           36 GHz=4.29878 cycle/byte=9.59934 cycles/url=833.791 instructions/byte=23.596 instructions/cycle=2.45809 instructions/ns=10.5668 instructions/url=2.04953k ns/url=193.96 speed=440.892M/s time/byte=2.26813ns time/url=197.008ns url/s=5.07594M/s

current branch:

-------------------------------------------------------------------------------------
Benchmark                           Time             CPU   Iterations UserCounters...
-------------------------------------------------------------------------------------
BasicBench_AdaURL_CanParse   14030751 ns     14029900 ns           49 GHz=4.274 cycle/byte=6.86874 cycles/url=596.613 instructions/byte=17.2687 instructions/cycle=2.5141 instructions/ns=10.7453 instructions/url=1.49995k ns/url=139.591 speed=619.255M/s time/byte=1.61484ns time/url=140.264ns url/s=7.12942M/s

@CarlosEduR
Copy link
Member Author

instead of creating a copy of it, we could also just add a parameter to the parse_url function, like store_values=true, and for the can_parse we set it to false.

@CarlosEduR
Copy link
Member Author

CarlosEduR commented May 23, 2024

image

one example from main branch, url.update_base_search is called a few times, and it calls ada::unicode::percent_encode.

@CarlosEduR
Copy link
Member Author

I'd love to know what you guys think about it @anonrig @lemire

@lemire
Copy link
Member

lemire commented May 23, 2024

I think we may want to keep...

template <typename result_type = ada::url_aggregator>
result_type parse_url(std::string_view user_input,
const result_type* base_url = nullptr);

as you have done, at least initially, so we don't mess up our signatures and it is less invasive. But we could add...

template <typename result_type, bool store_values>
result_type parse_url_impl(std::string_view user_input,
const result_type* base_url = nullptr);

And then the implementation of parse_url would just be...

template <typename result_type>
result_type parse_url(std::string_view user_input,
const result_type* base_url) { return parse_url_impl<result_type,true>(user_input); }

and can_parse would just call parse_url_impl<result_type,false>.

This should be nice because store_values will be a compile-time constant, so you can just branch on it, without any runtime cost.

Thoughts?

@CarlosEduR
Copy link
Member Author

nice!! it looks good to me

@anonrig
Copy link
Member

anonrig commented May 29, 2024

@lemire's suggestions looks really good, and this is amazing work @CarlosEduR. Thank you for doing this! I'm so excited!

@CarlosEduR
Copy link
Member Author

I appreciate the support, guys... I am working on it. 👨🏻‍💻

@CarlosEduR
Copy link
Member Author

with these small changes we go from this:

BasicBench_AdaURL_CanParse   19646497 ns     19645213 ns           36 GHz=4.29206 cycle/byte=9.6616 cycles/url=839.199 instructions/byte=23.3987 instructions/cycle=2.42183 instructions/ns=10.3946 instructions/url=2.03239k ns/url=195.523 speed=442.25M/s time/byte=2.26117ns time/url=196.403ns url/s=5.09157M/s

to this:

BasicBench_AdaURL_CanParse   13263878 ns     13263297 ns           53 GHz=4.27177 cycle/byte=6.5079 cycles/url=565.271 instructions/byte=16.6601 instructions/cycle=2.55998 instructions/ns=10.9357 instructions/url=1.44709k ns/url=132.327 speed=655.048M/s time/byte=1.52661ns time/url=132.6ns url/s=7.54149M/s

@CarlosEduR
Copy link
Member Author

CarlosEduR commented Jun 3, 2024

I've noticed the update_base_hostname also cost a good amount for some cases, it's called by the parse_host. I was wondering if there we could avoid calling this update_base_hostname without affecting the can_parse/parse_host functionality. I'd like to hear what you guys think about it.

For instance right here, by adding the if(store_values) around this code, we go from the last value url/s=7.54149M/s to url/s=11.1449M/s. Tests keep passing, but I am worried we are not covering other possible cases.

ada_log("HOST parsing ", host_view, " href=", url.get_href());
// Let host be the result of host parsing host_view with url is not
// special.
if (host_view.empty()) {
  url.update_base_hostname("");
} else if (!url.parse_host(host_view)) {
  return url;
}
ada_log("HOST parsing results in ", url.get_hostname(),
        " href=", url.get_href());

// Set url's host to host, and state to path start state.
state = ada::state::PATH_START;

@CarlosEduR
Copy link
Member Author

current perf report:

image

@lemire
Copy link
Member

lemire commented Jun 3, 2024

've noticed the update_base_hostname also cost a good amount for some cases, it's called by the parse_host. I was wondering if there we could avoid calling this update_base_hostname without affecting the can_parse/parse_host functionality. I'd like to hear what you guys think about it.

In your example, it is called when host_view.empty() returns true. Then we call url.update_base_hostname(""). Is this really a hot function?

I am somewhat surprised that we are frequently setting the hostname to the empty string. But if it is common, then we can certainly try to optimize it.

@CarlosEduR
Copy link
Member Author

In your example, it is called when host_view.empty() returns true. Then we call url.update_base_hostname(""). Is this really a hot function?

I am somewhat surprised that we are frequently setting the hostname to the empty string. But if it is common, then we can certainly try to optimize it.

I updated the code a little bit more and ran a benchmark to check, seems most of it is coming from the parse_host.
Code updated, it's only avoiding the parse_host function:

if (host_view.empty()) {
  url.update_base_hostname("");
} else {
  if (store_values) {
    if (!url.parse_host(host_view)) {
      return url;
    }
  }
}

Benchmark result:

BasicBench_AdaURL_CanParse    9412687 ns      9412250 ns           75 GHz=4.25503 cycle/byte=4.56398 cycles/url=396.423 instructions/byte=11.8192 instructions/cycle=2.58966 instructions/ns=11.0191 instructions/url=1026.6 ns/url=93.1659 speed=923.062M/s time/byte=1083.35ps time/url=94.099ns url/s=10.6271M/s

Copy link
Member

@anonrig anonrig left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason to not use constexpr?

src/parser.cpp Outdated
} else {
url.consume_prepared_path(view);
ADA_ASSERT_TRUE(url.validate());
if (store_values) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if (store_values) {
if constexpr (store_values) {

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No reason, I'll update it.

src/parser.cpp Outdated
ada_log("QUERY update_base_search completed ");
if (fragment.has_value()) {
url.update_unencoded_base_hash(*fragment);
if (store_values) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if (store_values) {
if constexpr (store_values) {

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just noticed and understood this comment, nice! I noticed you mentioned the update_unencoded_base_hash, and I'd like to confirm some other cases. For instance, update_base_pathname and update_base_search would be methods we want to avoid for the can_parse as well, right?

if constexpr (result_type_is_ada_url) {
  url.path = base_url->path;
  url.query = base_url->query;
} else {
  url.update_base_pathname(base_url->get_pathname());
  url.update_base_search(base_url->get_search());
}
url.update_unencoded_base_hash(*fragment);
return url;

@anonrig
Copy link
Member

anonrig commented Jun 6, 2024

@CarlosEduR Can you also share before/after when you share your benchmark result? It's hard to understand the impact :-)

@CarlosEduR
Copy link
Member Author

@CarlosEduR Can you also share before/after when you share your benchmark result? It's hard to understand the impact :-)

sure... currently:

before

BasicBench_AdaURL_CanParse   19646497 ns     19645213 ns           36 GHz=4.29206 cycle/byte=9.6616 cycles/url=839.199 instructions/byte=23.3987 instructions/cycle=2.42183 instructions/ns=10.3946 instructions/url=2.03239k ns/url=195.523 speed=442.25M/s time/byte=2.26117ns time/url=196.403ns url/s=5.09157M/s

after

BasicBench_AdaURL_CanParse   13263878 ns     13263297 ns           53 GHz=4.27177 cycle/byte=6.5079 cycles/url=565.271 instructions/byte=16.6601 instructions/cycle=2.55998 instructions/ns=10.9357 instructions/url=1.44709k ns/url=132.327 speed=655.048M/s time/byte=1.52661ns time/url=132.6ns url/s=7.54149M/s

src/parser.cpp Show resolved Hide resolved
@anonrig
Copy link
Member

anonrig commented Jun 8, 2024

Is this ready to land? @CarlosEduR @lemire?

@CarlosEduR
Copy link
Member Author

Is this ready to land? @CarlosEduR @lemire?

Yeah, I can say it is. There might be more places to avoid allocation still, but I don't think they'll change the perf that much.

@CarlosEduR CarlosEduR marked this pull request as ready for review June 9, 2024 23:42
@anonrig anonrig requested a review from lemire June 10, 2024 21:06
src/parser.cpp Outdated
url.update_base_search(helpers::substring(url_data, input_position),
query_percent_encode_set);
ada_log("QUERY update_base_search completed ");
if constexpr (store_values) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this if statement seems like a duplicate. there is already an if statement that checks for store values couple of lines ahove

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True, updated!

@CarlosEduR
Copy link
Member Author

Last benchmark

before

BasicBench_AdaURL_CanParse   19773135 ns     19771058 ns           36 GHz=4.268 cycle/byte=9.65297 cycles/url=838.45 instructions/byte=23.3984 instructions/cycle=2.42396 instructions/ns=10.3455 instructions/url=2.03237k ns/url=196.45 speed=439.435M/s time/byte=2.27565ns time/url=197.661ns url/s=5.05916M/s

after

BasicBench_AdaURL_CanParse   13270228 ns     13269593 ns           53 GHz=4.29853 cycle/byte=6.50488 cycles/url=565.009 instructions/byte=16.6601 instructions/cycle=2.56117 instructions/ns=11.0093 instructions/url=1.44709k ns/url=131.442 speed=654.737M/s time/byte=1.52733ns time/url=132.663ns url/s=7.53791M/s

@anonrig anonrig merged commit 46ece6b into ada-url:main Jun 18, 2024
36 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants