-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Introduce SIMD optimized scanning for identifiers. #3122
Conversation
2d575de
to
6ce4cb1
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is neat!
For UTF-8 handling, we're going to need to deal with things like Unicode whitespace characters, so we can't get away with classifying everything non-ASCII as continuing an identifier, and then (presumably) doing some later checking -- we'll need to actually decode the characters during tokenization.
As a result, I wonder if maybe the best thing to do would be to fall out of the fast vectorized identifier loop whenever we see any non-ASCII character, and have the slower scalar loop also deal with UTF-8 characters (and ideally jump back into vectorized code if an identifier contains a mixture of ASCII and non-ASCII characters, but that's a rare enough case that not optimizing for it is probably fine). I think by doing that we could simplify the vectorized loop slightly, and maybe remove one of the _mm_movemask
calls.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm happy to let zygoloid handle review here since it seems he understands the instructions better, but I did have one comment thing I'd noticed.
toolchain/lexer/tokenized_buffer.cpp
Outdated
/*0x0:*/ 0b0000'0000, | ||
/*0x1:*/ 0b0000'0000, | ||
/*0x2:*/ 0b0000'0000, | ||
/*0x3:*/ 0b0000'0010, | ||
/*0x4:*/ 0b0000'0100, | ||
/*0x5:*/ 0b0000'1001, | ||
/*0x6:*/ 0b0001'0000, | ||
/*0x7:*/ 0b0010'0000, | ||
/*0x8:*/ 0b1000'0000, | ||
/*0x9:*/ 0b1000'0000, | ||
/*0xA:*/ 0b1000'0000, | ||
/*0xB:*/ 0b1000'0000, | ||
/*0xC:*/ 0b1000'0000, | ||
/*0xD:*/ 0b1000'0000, | ||
/*0xE:*/ 0b1000'0000, | ||
/*0xF:*/ 0b1000'0000); | ||
const auto low_lut = _mm_setr_epi8( | ||
/*0x0:*/ 0b1010'1010, | ||
/*0x1:*/ 0b1011'1110, | ||
/*0x2:*/ 0b1011'1110, | ||
/*0x3:*/ 0b1011'1110, | ||
/*0x4:*/ 0b1011'1110, | ||
/*0x5:*/ 0b1011'1110, | ||
/*0x6:*/ 0b1011'1110, | ||
/*0x7:*/ 0b1011'1110, | ||
/*0x8:*/ 0b1011'1110, | ||
/*0x9:*/ 0b1011'1110, | ||
/*0xA:*/ 0b1011'1100, | ||
/*0xB:*/ 0b1001'0100, | ||
/*0xC:*/ 0b1001'0100, | ||
/*0xD:*/ 0b1001'0100, | ||
/*0xE:*/ 0b1001'0100, | ||
/*0xF:*/ 0b1001'0101); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you think of switching to line comments for style? We'd been going that way in C++ because Carbon wouldn't allow /* */
comments like this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are already long and hard to read, and I think line comments would make that even worse.
What I'd prefer is something more like a designated initializer. I think we'll have good options for this in Carbon, but in C++ it didn't seem worth trying to find one, and instead to just use a comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe instead put this in that kind of form, then? It looks like parameters are named b0 through b15:
_mm_setr_epi8(char __b0, char __b1, char __b2, char __b3, char __b4, char __b5,
char __b6, char __b7, char __b8, char __b9, char __b10,
char __b11, char __b12, char __b13, char __b14, char __b15) {
So wouldn't style be /*__b0=*/
, etc?
Note, if you don't want to do this, it's probably worth thinking about what it means for Carbon if a function in a library that we don't own has parameter names you don't find useful.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll try this and see if I can get the parameter name approach to work well. Even though they're not the most readable, this would have actually caught a bug I wrote the first time (I used _mm_set_epi8
instead).
That said, not sure it makes sense to generalize too much to Carbon here -- I feel like there are a bunch of overlapping challenges from C++ here and we might better off just working with Carbon code that ends up with bit tables and similar constructs in its API design, and see what the best tools are or what changes would give better results there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PTAL!
toolchain/lexer/tokenized_buffer.cpp
Outdated
/*0x0:*/ 0b0000'0000, | ||
/*0x1:*/ 0b0000'0000, | ||
/*0x2:*/ 0b0000'0000, | ||
/*0x3:*/ 0b0000'0010, | ||
/*0x4:*/ 0b0000'0100, | ||
/*0x5:*/ 0b0000'1001, | ||
/*0x6:*/ 0b0001'0000, | ||
/*0x7:*/ 0b0010'0000, | ||
/*0x8:*/ 0b1000'0000, | ||
/*0x9:*/ 0b1000'0000, | ||
/*0xA:*/ 0b1000'0000, | ||
/*0xB:*/ 0b1000'0000, | ||
/*0xC:*/ 0b1000'0000, | ||
/*0xD:*/ 0b1000'0000, | ||
/*0xE:*/ 0b1000'0000, | ||
/*0xF:*/ 0b1000'0000); | ||
const auto low_lut = _mm_setr_epi8( | ||
/*0x0:*/ 0b1010'1010, | ||
/*0x1:*/ 0b1011'1110, | ||
/*0x2:*/ 0b1011'1110, | ||
/*0x3:*/ 0b1011'1110, | ||
/*0x4:*/ 0b1011'1110, | ||
/*0x5:*/ 0b1011'1110, | ||
/*0x6:*/ 0b1011'1110, | ||
/*0x7:*/ 0b1011'1110, | ||
/*0x8:*/ 0b1011'1110, | ||
/*0x9:*/ 0b1011'1110, | ||
/*0xA:*/ 0b1011'1100, | ||
/*0xB:*/ 0b1001'0100, | ||
/*0xC:*/ 0b1001'0100, | ||
/*0xD:*/ 0b1001'0100, | ||
/*0xE:*/ 0b1001'0100, | ||
/*0xF:*/ 0b1001'0101); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are already long and hard to read, and I think line comments would make that even worse.
What I'd prefer is something more like a designated initializer. I think we'll have good options for this in Carbon, but in C++ it didn't seem worth trying to find one, and instead to just use a comment.
Currently, for long identifiers, a huge (>30%) fraction of time is spent finding the end of the identifier. We can speed this up with a fun application of SIMD and in-register lookup tables. With this, the BM_ValidIdentifiers/12/64 benchmark goes from around 4 million tokens/second to about 6 mt/s, so roughly 1.5x improvement. However, there was a decent amount of noise in the measurement and I didn't study it too closely as I was very happy with the overall result. The profile shifted from >30% of the time in this loop to <10% of the time, so the scan itself is 3x or more faster with this. One concern with optimizing the lexer right now is that we don't have full Unicode support from the design. I've taken some steps to at least try and avoid this pitfall -- the new routine has a built-in system to classify UTF-8 code units, hopefully ensuring that adding more full support for this doesn't require a completely new technique. Co-authored-by: Richard Smith <richard@metafoo.co.uk>
99b2531
to
ebd7671
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Feels like we might be able to do ... something ... better now we're only using 4 bits of mask, but I don't see it. (We could fold the two lookup tables into one, which would save a register but add a shift and mask into the loop, which I suspect isn't a win.)
Not merging: the initial comment is probably not quite the commit message we want. (At least the reference to #3121 should be removed.) But this looks good to merge to me. |
…ross 1 directory (#4057) Bumps the pip group with 1 update in the /github_tools directory: [urllib3](https://github.com/urllib3/urllib3). Updates `urllib3` from 2.1.0 to 2.2.2 <details> <summary>Release notes</summary> <p><em>Sourced from <a href="https://github.com/urllib3/urllib3/releases">urllib3's releases</a>.</em></p> <blockquote> <h2>2.2.2</h2> <h2>🚀 urllib3 is fundraising for HTTP/2 support</h2> <p><a href="https://sethmlarson.dev/urllib3-is-fundraising-for-http2-support">urllib3 is raising ~$40,000 USD</a> to release HTTP/2 support and ensure long-term sustainable maintenance of the project after a sharp decline in financial support for 2023. If your company or organization uses Python and would benefit from HTTP/2 support in Requests, pip, cloud SDKs, and thousands of other projects <a href="https://opencollective.com/urllib3">please consider contributing financially</a> to ensure HTTP/2 support is developed sustainably and maintained for the long-haul.</p> <p>Thank you for your support.</p> <h2>Changes</h2> <ul> <li>Added the <code>Proxy-Authorization</code> header to the list of headers to strip from requests when redirecting to a different host. As before, different headers can be set via <code>Retry.remove_headers_on_redirect</code>.</li> <li>Allowed passing negative integers as <code>amt</code> to read methods of <code>http.client.HTTPResponse</code> as an alternative to <code>None</code>. (<a href="https://redirect.github.com/urllib3/urllib3/issues/3122">#3122</a>)</li> <li>Fixed return types representing copying actions to use <code>typing.Self</code>. (<a href="https://redirect.github.com/urllib3/urllib3/issues/3363">#3363</a>)</li> </ul> <p><strong>Full Changelog</strong>: <a href="https://github.com/urllib3/urllib3/compare/2.2.1...2.2.2">https://github.com/urllib3/urllib3/compare/2.2.1...2.2.2</a></p> <h2>2.2.1</h2> <h2>🚀 urllib3 is fundraising for HTTP/2 support</h2> <p><a href="https://sethmlarson.dev/urllib3-is-fundraising-for-http2-support">urllib3 is raising ~$40,000 USD</a> to release HTTP/2 support and ensure long-term sustainable maintenance of the project after a sharp decline in financial support for 2023. If your company or organization uses Python and would benefit from HTTP/2 support in Requests, pip, cloud SDKs, and thousands of other projects <a href="https://opencollective.com/urllib3">please consider contributing financially</a> to ensure HTTP/2 support is developed sustainably and maintained for the long-haul.</p> <p>Thank you for your support.</p> <h2>Changes</h2> <ul> <li>Fixed issue where <code>InsecureRequestWarning</code> was emitted for HTTPS connections when using Emscripten. (<a href="https://redirect.github.com/urllib3/urllib3/issues/3331">#3331</a>)</li> <li>Fixed <code>HTTPConnectionPool.urlopen</code> to stop automatically casting non-proxy headers to <code>HTTPHeaderDict</code>. This change was premature as it did not apply to proxy headers and <code>HTTPHeaderDict</code> does not handle byte header values correctly yet. (<a href="https://redirect.github.com/urllib3/urllib3/issues/3343">#3343</a>)</li> <li>Changed <code>ProtocolError</code> to <code>InvalidChunkLength</code> when response terminates before the chunk length is sent. (<a href="https://redirect.github.com/urllib3/urllib3/issues/2860">#2860</a>)</li> <li>Changed <code>ProtocolError</code> to be more verbose on incomplete reads with excess content. (<a href="https://redirect.github.com/urllib3/urllib3/issues/3261">#3261</a>)</li> </ul> <h2>2.2.0</h2> <h2>🖥️ urllib3 now works in the browser</h2> <p>:tada: <strong>This release adds experimental support for <a href="https://urllib3.readthedocs.io/en/stable/reference/contrib/emscripten.html">using urllib3 in the browser with Pyodide</a>!</strong> 🎉</p> <p>Thanks to Joe Marshall (<a href="https://github.com/joemarshall"><code>@joemarshall</code></a>) for contributing this feature. This change was possible thanks to work done in urllib3 v2.0 to detach our API from <code>http.client</code>. Please report all bugs to the <a href="https://github.com/urllib3/urllib3/issues">urllib3 issue tracker</a>.</p> <h2>🚀 urllib3 is fundraising for HTTP/2 support</h2> <p><a href="https://sethmlarson.dev/urllib3-is-fundraising-for-http2-support">urllib3 is raising ~$40,000 USD</a> to release HTTP/2 support and ensure long-term sustainable maintenance of the project after a sharp decline in financial support for 2023. If your company or organization uses Python and would benefit from HTTP/2 support in Requests, pip, cloud SDKs, and thousands of other projects <a href="https://opencollective.com/urllib3">please consider contributing financially</a> to ensure HTTP/2 support is developed sustainably and maintained for the long-haul.</p> <p>Thank you for your support.</p> <h2>Changes</h2> <ul> <li>Added support for <a href="https://urllib3.readthedocs.io/en/latest/reference/contrib/emscripten.html">Emscripten and Pyodide</a>, including streaming support in cross-origin isolated browser environments where threading is enabled. (<a href="https://redirect.github.com/urllib3/urllib3/issues/2951">#2951</a>)</li> <li>Added support for <code>HTTPResponse.read1()</code> method. (<a href="https://redirect.github.com/urllib3/urllib3/issues/3186">#3186</a>)</li> <li>Added rudimentary support for HTTP/2. (<a href="https://redirect.github.com/urllib3/urllib3/issues/3284">#3284</a>)</li> <li>Fixed issue where requests against urls with trailing dots were failing due to SSL errors when using proxy. (<a href="https://redirect.github.com/urllib3/urllib3/issues/2244">#2244</a>)</li> <li>Fixed <code>HTTPConnection.proxy_is_verified</code> and <code>HTTPSConnection.proxy_is_verified</code> to be always set to a boolean after connecting to a proxy. It could be <code>None</code> in some cases previously. (<a href="https://redirect.github.com/urllib3/urllib3/issues/3130">#3130</a>)</li> </ul> <!-- raw HTML omitted --> </blockquote> <p>... (truncated)</p> </details> <details> <summary>Changelog</summary> <p><em>Sourced from <a href="https://github.com/urllib3/urllib3/blob/main/CHANGES.rst">urllib3's changelog</a>.</em></p> <blockquote> <h1>2.2.2 (2024-06-17)</h1> <ul> <li>Added the <code>Proxy-Authorization</code> header to the list of headers to strip from requests when redirecting to a different host. As before, different headers can be set via <code>Retry.remove_headers_on_redirect</code>.</li> <li>Allowed passing negative integers as <code>amt</code> to read methods of <code>http.client.HTTPResponse</code> as an alternative to <code>None</code>. (<code>[#3122](urllib3/urllib3#3122) <https://github.com/urllib3/urllib3/issues/3122></code>__)</li> <li>Fixed return types representing copying actions to use <code>typing.Self</code>. (<code>[#3363](urllib3/urllib3#3363) <https://github.com/urllib3/urllib3/issues/3363></code>__)</li> </ul> <h1>2.2.1 (2024-02-16)</h1> <ul> <li>Fixed issue where <code>InsecureRequestWarning</code> was emitted for HTTPS connections when using Emscripten. (<code>[#3331](urllib3/urllib3#3331) <https://github.com/urllib3/urllib3/issues/3331></code>__)</li> <li>Fixed <code>HTTPConnectionPool.urlopen</code> to stop automatically casting non-proxy headers to <code>HTTPHeaderDict</code>. This change was premature as it did not apply to proxy headers and <code>HTTPHeaderDict</code> does not handle byte header values correctly yet. (<code>[#3343](urllib3/urllib3#3343) <https://github.com/urllib3/urllib3/issues/3343></code>__)</li> <li>Changed <code>InvalidChunkLength</code> to <code>ProtocolError</code> when response terminates before the chunk length is sent. (<code>[#2860](urllib3/urllib3#2860) <https://github.com/urllib3/urllib3/issues/2860></code>__)</li> <li>Changed <code>ProtocolError</code> to be more verbose on incomplete reads with excess content. (<code>[#3261](urllib3/urllib3#3261) <https://github.com/urllib3/urllib3/issues/3261></code>__)</li> </ul> <h1>2.2.0 (2024-01-30)</h1> <ul> <li>Added support for <code>Emscripten and Pyodide <https://urllib3.readthedocs.io/en/latest/reference/contrib/emscripten.html></code><strong>, including streaming support in cross-origin isolated browser environments where threading is enabled. (<code>[#2951](urllib3/urllib3#2951) <https://github.com/urllib3/urllib3/issues/2951></code></strong>)</li> <li>Added support for <code>HTTPResponse.read1()</code> method. (<code>[#3186](urllib3/urllib3#3186) <https://github.com/urllib3/urllib3/issues/3186></code>__)</li> <li>Added rudimentary support for HTTP/2. (<code>[#3284](urllib3/urllib3#3284) <https://github.com/urllib3/urllib3/issues/3284></code>__)</li> <li>Fixed issue where requests against urls with trailing dots were failing due to SSL errors when using proxy. (<code>[#2244](urllib3/urllib3#2244) <https://github.com/urllib3/urllib3/issues/2244></code>__)</li> <li>Fixed <code>HTTPConnection.proxy_is_verified</code> and <code>HTTPSConnection.proxy_is_verified</code> to be always set to a boolean after connecting to a proxy. It could be <code>None</code> in some cases previously. (<code>[#3130](urllib3/urllib3#3130) <https://github.com/urllib3/urllib3/issues/3130></code>__)</li> <li>Fixed an issue where <code>headers</code> passed in a request with <code>json=</code> would be mutated (<code>[#3203](urllib3/urllib3#3203) <https://github.com/urllib3/urllib3/issues/3203></code>__)</li> <li>Fixed <code>HTTPSConnection.is_verified</code> to be set to <code>False</code> when connecting from a HTTPS proxy to an HTTP target. It was set to <code>True</code> previously. (<code>[#3267](urllib3/urllib3#3267) <https://github.com/urllib3/urllib3/issues/3267></code>__)</li> <li>Fixed handling of new error message from OpenSSL 3.2.0 when configuring an HTTP proxy as HTTPS (<code>[#3268](urllib3/urllib3#3268) <https://github.com/urllib3/urllib3/issues/3268></code>__)</li> <li>Fixed TLS 1.3 post-handshake auth when the server certificate validation is disabled (<code>[#3325](urllib3/urllib3#3325) <https://github.com/urllib3/urllib3/issues/3325></code>__)</li> <li>Note for downstream distributors: To run integration tests, you now need to run the tests a second time with the <code>--integration</code> pytest flag. (<code>[#3181](urllib3/urllib3#3181) <https://github.com/urllib3/urllib3/issues/3181></code>__)</li> </ul> </blockquote> </details> <details> <summary>Commits</summary> <ul> <li><a href="https://github.com/urllib3/urllib3/commit/27e2a5c5a7ab6a517252cc8dcef3ffa6ffb8f61a"><code>27e2a5c</code></a> Release 2.2.2 (<a href="https://redirect.github.com/urllib3/urllib3/issues/3406">#3406</a>)</li> <li><a href="https://github.com/urllib3/urllib3/commit/accff72ecc2f6cf5a76d9570198a93ac7c90270e"><code>accff72</code></a> Merge pull request from GHSA-34jh-p97f-mpxf</li> <li><a href="https://github.com/urllib3/urllib3/commit/34be4a57e59eb7365bcc37d52e9f8271b5b8d0d3"><code>34be4a5</code></a> Pin CFFI to a new release candidate instead of a Git commit (<a href="https://redirect.github.com/urllib3/urllib3/issues/3398">#3398</a>)</li> <li><a href="https://github.com/urllib3/urllib3/commit/da410581b6b3df73da976b5ce5eb20a4bd030437"><code>da41058</code></a> Bump browser-actions/setup-chrome from 1.6.0 to 1.7.1 (<a href="https://redirect.github.com/urllib3/urllib3/issues/3399">#3399</a>)</li> <li><a href="https://github.com/urllib3/urllib3/commit/b07a669bd970d69847801148286b726f0570b625"><code>b07a669</code></a> Bump github/codeql-action from 2.13.4 to 3.25.6 (<a href="https://redirect.github.com/urllib3/urllib3/issues/3396">#3396</a>)</li> <li><a href="https://github.com/urllib3/urllib3/commit/b8589ec9f8c4da91511e601b632ac06af7e7c10e"><code>b8589ec</code></a> Measure coverage with v4 of artifact actions (<a href="https://redirect.github.com/urllib3/urllib3/issues/3394">#3394</a>)</li> <li><a href="https://github.com/urllib3/urllib3/commit/f3bdc5585111429e22c81b5fb26c3ec164d98b81"><code>f3bdc55</code></a> Allow triggering CI manually (<a href="https://redirect.github.com/urllib3/urllib3/issues/3391">#3391</a>)</li> <li><a href="https://github.com/urllib3/urllib3/commit/52392654b30183129cf3ec06010306f517d9c146"><code>5239265</code></a> Fix HTTP version in debug log (<a href="https://redirect.github.com/urllib3/urllib3/issues/3316">#3316</a>)</li> <li><a href="https://github.com/urllib3/urllib3/commit/b34619f94ece0c40e691a5aaf1304953d88089de"><code>b34619f</code></a> Bump actions/checkout to 4.1.4 (<a href="https://redirect.github.com/urllib3/urllib3/issues/3387">#3387</a>)</li> <li><a href="https://github.com/urllib3/urllib3/commit/9961d14de7c920091d42d42ed76d5d479b80064d"><code>9961d14</code></a> Bump browser-actions/setup-chrome from 1.5.0 to 1.6.0 (<a href="https://redirect.github.com/urllib3/urllib3/issues/3386">#3386</a>)</li> <li>Additional commits viewable in <a href="https://github.com/urllib3/urllib3/compare/2.1.0...2.2.2">compare view</a></li> </ul> </details> <br /> [![Dependabot compatibility score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=urllib3&package-manager=pip&previous-version=2.1.0&new-version=2.2.2)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores) Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `@dependabot rebase`. [//]: # (dependabot-automerge-start) [//]: # (dependabot-automerge-end) --- <details> <summary>Dependabot commands and options</summary> <br /> You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot show <dependency name> ignore conditions` will show all of the ignore conditions of the specified dependency - `@dependabot ignore <dependency name> major version` will close this group update PR and stop Dependabot creating any more for the specific dependency's major version (unless you unignore this specific dependency's major version or upgrade to it yourself) - `@dependabot ignore <dependency name> minor version` will close this group update PR and stop Dependabot creating any more for the specific dependency's minor version (unless you unignore this specific dependency's minor version or upgrade to it yourself) - `@dependabot ignore <dependency name>` will close this group update PR and stop Dependabot creating any more for the specific dependency (unless you unignore this specific dependency or upgrade to it yourself) - `@dependabot unignore <dependency name>` will remove all of the ignore conditions of the specified dependency - `@dependabot unignore <dependency name> <ignore condition>` will remove the ignore condition of the specified dependency and ignore conditions You can disable automated security fix PRs for this repo from the [Security Alerts page](https://github.com/carbon-language/carbon-lang/network/alerts). </details> Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Currently, for long identifiers, a huge (>30%) fraction of time is spent
finding the end of the identifier. We can speed this up with a fun
application of SIMD and in-register lookup tables.
With this, the BM_ValidIdentifiers/12/64 benchmark goes from around 4 million
tokens/second to around 6 mt/s, so roughly 1.5x improvement. However,
there was a decent amount of noise in the measurement and I didn't study
it too closely as I was very happy with the overall result. The profile
shifted from >30% of the time in this loop to <10% of the time, so the
scan itself is 3x or more faster with this.
One concern with optimizing the lexer right now is that we don't have
full Unicode support from the design. This PR takes some steps to at least
try and avoid this pitfall -- the new routine works to classify UTF-8
code units, and has a fallback in that case that can grow the needed
logic.