Conversation
dbaf67d to
53f804d
Compare
There was a problem hiding this comment.
Pull request overview
This PR rewrites the URL parsing/validation stack in pure Rust, removing the C++/FFI backend and adding a fast-path can_parse validator plus new internal modules (Unicode/IDNA, helpers, serializers, scheme hashing, optional portable SIMD).
Changes:
- Replace C++ FFI URLSearchParams + IDNA bindings with pure-Rust implementations.
- Add a zero-allocation URL validator used by
Url::can_parse(plus a fast-path in the parser). - Introduce new internal modules for scheme classification, Unicode utilities, IDNA, serializers, helpers, and optional
std::simdacceleration.
Reviewed changes
Copilot reviewed 19 out of 23 changed files in this pull request and generated 8 comments.
Show a summary per file
| File | Description |
|---|---|
| src/validator.rs | New zero-allocation can_parse validator (scheme/authority/host/IDNA checks). |
| src/url_search_params.rs | Pure-Rust URLSearchParams implementation (parse/serialize/sort/iterators). |
| src/unicode.rs | Unicode helpers (percent-encode/decode, classification, domain ASCII conversion). |
| src/serializers.rs | IPv4/IPv6 serialization helpers. |
| src/scheme.rs | Perfect-hash special-scheme classification. |
| src/portable_simd_impl.rs | Nightly-only std::simd accelerated scanning paths. |
| src/parser.rs | Parser fast-paths and integration with the new validator. |
| src/idna_impl.rs | Pure-Rust IDNA (UTS #46) + punycode implementation. |
| src/idna.rs | Public IDNA wrapper updated to use pure-Rust implementation. |
| src/helpers.rs | New helper utilities used by parser. |
| src/checkers.rs | New validation/check helpers (IPv4, path signature, etc.). |
| src/character_sets.rs | Percent-encode character sets and lookup tables. |
| src/ffi.rs | Removed FFI bindings to Ada C++ backend. |
| build.rs | Removed C++ build integration. |
| Cargo.toml | Remove build-deps; add nightly-simd feature. |
| Cargo.lock | Updated dependency graph after removing build dependencies. |
| rust-toolchain.toml | Toolchain bump. |
| justfile | Exclude SIMD features from feature-powerset commands. |
| .github/workflows/ci.yml | Exclude nightly-simd from feature-powerset CI runs. |
| bench/parse.rs | Benchmark import cleanup. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
@Boshen I've added more benchmarks. We will see if this holds true! |
53f804d to
1dec7f6
Compare
1dec7f6 to
cded9e7
Compare
Merging this PR will degrade performance by 23.91%
Performance Changes
Comparing |
|
@lemire it seems we can go a lot faster in C++ :-)) |
d5fdd9a to
899c866
Compare
Not anymore @Boshen ada-url/ada#1106 |
899c866 to
d562027
Compare
very good |
Benchmark results
Measured on the same 11-URL corpus as the README (808 bytes total).
parsecan_parsenightly-simd(std::simd, nightly only)urlcrateparseis 25% faster than the C++ backend (stable), 28% faster with nightly SIMDcan_parseis 2.7× faster than the C++ backend with zero heap allocations for any inputurlcrate:parseby 2.4×,can_parseby a wide margin