Skip to content

rewrite in rust#89

Draft
anonrig wants to merge 4 commits intomainfrom
yagiz/rewrite-rust
Draft

rewrite in rust#89
anonrig wants to merge 4 commits intomainfrom
yagiz/rewrite-rust

Conversation

@anonrig
Copy link
Copy Markdown
Member

@anonrig anonrig commented Mar 28, 2026

Benchmark results

Measured on the same 11-URL corpus as the README (808 bytes total).

Implementation parse can_parse
C++ backend (README baseline) 2.016 µs — 382 MiB/s 1.212 µs — 636 MiB/s
Pure Rust (stable, no feature flags) 1.503 µs — 513 MiB/s 0.454 µs — 1 697 MiB/s
Pure Rust + nightly-simd (std::simd, nightly only) 1.457 µs — 529 MiB/s 0.511 µs — 1 508 MiB/s
servo url crate 3.562 µs — 216 MiB/s
  • parse is 25% faster than the C++ backend (stable), 28% faster with nightly SIMD
  • can_parse is 2.7× faster than the C++ backend with zero heap allocations for any input
  • Both beat the servo url crate: parse by 2.4×, can_parse by a wide margin

@anonrig anonrig force-pushed the yagiz/rewrite-rust branch 4 times, most recently from dbaf67d to 53f804d Compare March 29, 2026 12:30
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR rewrites the URL parsing/validation stack in pure Rust, removing the C++/FFI backend and adding a fast-path can_parse validator plus new internal modules (Unicode/IDNA, helpers, serializers, scheme hashing, optional portable SIMD).

Changes:

  • Replace C++ FFI URLSearchParams + IDNA bindings with pure-Rust implementations.
  • Add a zero-allocation URL validator used by Url::can_parse (plus a fast-path in the parser).
  • Introduce new internal modules for scheme classification, Unicode utilities, IDNA, serializers, helpers, and optional std::simd acceleration.

Reviewed changes

Copilot reviewed 19 out of 23 changed files in this pull request and generated 8 comments.

Show a summary per file
File Description
src/validator.rs New zero-allocation can_parse validator (scheme/authority/host/IDNA checks).
src/url_search_params.rs Pure-Rust URLSearchParams implementation (parse/serialize/sort/iterators).
src/unicode.rs Unicode helpers (percent-encode/decode, classification, domain ASCII conversion).
src/serializers.rs IPv4/IPv6 serialization helpers.
src/scheme.rs Perfect-hash special-scheme classification.
src/portable_simd_impl.rs Nightly-only std::simd accelerated scanning paths.
src/parser.rs Parser fast-paths and integration with the new validator.
src/idna_impl.rs Pure-Rust IDNA (UTS #46) + punycode implementation.
src/idna.rs Public IDNA wrapper updated to use pure-Rust implementation.
src/helpers.rs New helper utilities used by parser.
src/checkers.rs New validation/check helpers (IPv4, path signature, etc.).
src/character_sets.rs Percent-encode character sets and lookup tables.
src/ffi.rs Removed FFI bindings to Ada C++ backend.
build.rs Removed C++ build integration.
Cargo.toml Remove build-deps; add nightly-simd feature.
Cargo.lock Updated dependency graph after removing build dependencies.
rust-toolchain.toml Toolchain bump.
justfile Exclude SIMD features from feature-powerset commands.
.github/workflows/ci.yml Exclude nightly-simd from feature-powerset CI runs.
bench/parse.rs Benchmark import cleanup.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link
Copy Markdown
Collaborator

@Boshen Boshen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Faster than C++ how?

@anonrig
Copy link
Copy Markdown
Member Author

anonrig commented Mar 29, 2026

Faster than C++ how?

@Boshen I've added more benchmarks. We will see if this holds true!

@anonrig anonrig force-pushed the yagiz/rewrite-rust branch from 53f804d to 1dec7f6 Compare March 29, 2026 16:58
@anonrig anonrig force-pushed the yagiz/rewrite-rust branch from 1dec7f6 to cded9e7 Compare March 29, 2026 16:58
@codspeed-hq
Copy link
Copy Markdown

codspeed-hq bot commented Mar 29, 2026

Merging this PR will degrade performance by 23.91%

⚡ 7 improved benchmarks
❌ 1 regressed benchmark
✅ 8 untouched benchmarks

⚠️ Please fix the performance issues or acknowledge them on CodSpeed.

Performance Changes

Mode Benchmark BASE HEAD Efficiency
Simulation ada_url 5.6 ms 3.8 ms +47.82%
Simulation ada_url 16.7 µs 13.2 µs +26.64%
Simulation ada_url 1.3 ms 1.7 ms -23.91%
Simulation ada_url 11.9 µs 3.3 µs ×3.6
Simulation ada_url_parse 196 ms 165.3 ms +18.6%
Simulation ada_url 204.8 µs 142.1 µs +44.16%
Simulation ada_url 18.1 µs 6.8 µs ×2.7
Simulation ada_url 1,336.4 µs 884.2 µs +51.14%

Comparing yagiz/rewrite-rust (d562027) with main (4df14c4)

Open in CodSpeed

@anonrig
Copy link
Copy Markdown
Member Author

anonrig commented Mar 29, 2026

@lemire it seems we can go a lot faster in C++ :-))

@anonrig anonrig force-pushed the yagiz/rewrite-rust branch 6 times, most recently from d5fdd9a to 899c866 Compare March 29, 2026 18:12
@anonrig
Copy link
Copy Markdown
Member Author

anonrig commented Mar 29, 2026

Faster than C++ how?

Not anymore @Boshen ada-url/ada#1106

@anonrig anonrig force-pushed the yagiz/rewrite-rust branch from 899c866 to d562027 Compare March 29, 2026 18:54
@Boshen
Copy link
Copy Markdown
Collaborator

Boshen commented Mar 30, 2026

Faster than C++ how?

Not anymore @Boshen ada-url/ada#1106

very good

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants