Skip to content

Commit 32c3ddc

Browse files
committed
docs: design rationale page
1 parent 2c6cdc6 commit 32c3ddc

File tree

4 files changed

+81
-3
lines changed

4 files changed

+81
-3
lines changed

doc/modules/ROOT/nav.adoc

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -23,5 +23,6 @@
2323
** xref:examples/file-router.adoc[]
2424
** xref:examples/router.adoc[]
2525
** xref:examples/sanitize.adoc[]
26+
* xref:design.adoc[]
2627
* xref:reference.adoc[Reference]
2728
* xref:HelpCard.adoc[]

doc/modules/ROOT/pages/design.adoc

Lines changed: 76 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,76 @@
1+
//
2+
// Copyright (c) 2023 Alan de Freitas (alandefreitas@gmail.com)
3+
//
4+
// Distributed under the Boost Software License, Version 1.0. (See accompanying
5+
// file LICENSE_1_0.txt or copy at https://www.boost.org/LICENSE_1_0.txt)
6+
//
7+
// Official repository: https://github.com/boostorg/url
8+
//
9+
10+
= Design Rationale
11+
:navtitle: Design Rationale
12+
13+
This section documents the rationale behind design decisions in Boost.URL that are not obvious from the API alone.
14+
For a general overview of the library's goals and features, see the xref:index.adoc[introduction].
15+
16+
== Character Type
17+
18+
Boost.URL uses `char` as its character type.
19+
The library does not provide class templates parameterized on character type (e.g. `basic_url_view<CharT>`).
20+
21+
URLs are sequences of ASCII octets as defined by https://tools.ietf.org/html/rfc3986[RFC 3986,window=blank_].
22+
In practice, URLs are always handled as `char` strings: in HTTP headers, in JSON, in configuration files, and in every major programming language's URL library.
23+
Wide character types (`wchar_t`, `char16_t`, `char32_t`) are not used for URLs in any real-world context, so supporting them would add complexity with no practical benefit.
24+
25+
This also means the library does not provide a `char8_t` (C++20) instantiation.
26+
While `char8_t` is portably correct for ASCII/UTF-8 text, its adoption in the C++ ecosystem remains limited: the standard library does not fully support it for I/O or formatting, and no major framework has adopted it in public APIs.
27+
Using `char` means Boost.URL interoperates directly with `std::string`, `std::string_view`, string literals, and the rest of the ecosystem without conversion.
28+
29+
=== EBCDIC
30+
31+
The C++ standard does not require that `char` use an ASCII-compatible encoding.
32+
On EBCDIC platforms (primarily IBM z/OS), the character literal `'/'` does not have the value `0x2F`, so a URL parser that compares `char` values against ASCII constants would malfunction.
33+
34+
In practice, this is not a concern for Boost.URL:
35+
36+
* z/OS is the only remaining platform where EBCDIC is relevant for C++ compilation.
37+
* The z/OS C++ compilers support an ASCII compilation mode (`-qascii` or `-fzos-le-char-mode=ascii`) that makes `char` literals use ASCII values. This mode exists specifically for open-source software that assumes ASCII.
38+
* Real-world C++ libraries that handle URLs and HTTP on z/OS (such as cpp-httplib and DuckDB) use this ASCII mode rather than adding EBCDIC transcoding.
39+
* The z/OS REST and web services ecosystem is almost entirely Java-based. No evidence exists of C++ code parsing RFC 3986 URIs in EBCDIC `char` encoding.
40+
* WG21 is moving in this direction as well: P3688 (ASCII character utilities) proposes `char`-based functions that treat input as ASCII regardless of literal encoding.
41+
42+
On EBCDIC platforms where ASCII mode is not used, `char8_t` provides a portably correct alternative since it is guaranteed to use UTF-8 (an ASCII superset).
43+
A future extension to support `char8_t` constructor overloads on the concrete `char`-based types could address this without requiring templates, since both `char` and `char8_t` are single-byte types and the conversion between them is trivial for ASCII content.
44+
45+
== No Dynamic Allocation by Default
46+
47+
The library is designed so that most operations do not require dynamic memory allocation.
48+
49+
cpp:url_view[] does not retain ownership of the underlying string buffer and does not allocate memory.
50+
Like a cpp:string_view[], it references the original string directly.
51+
As long as the contents of the original string are unmodified, constructed URL views always contain a valid URL in its correctly serialized form.
52+
53+
Accessor functions return views referring to substrings and sub-ranges of the underlying URL.
54+
By referencing the relevant portion of the URL string internally, components can represent percent-decoded strings and be converted to other types without allocation.
55+
cpp:decode_view[] and its decoding functions perform no memory allocations unless the result needs to be stored in another container.
56+
Objects can be recycled to reuse their memory, deferring allocations until the application actually needs them.
57+
58+
This makes the library suitable for performance-sensitive network programs and embedded devices.
59+
60+
== Error Handling
61+
62+
The library uses error codes rather than exceptions as its primary error reporting mechanism.
63+
If input does not match the URL grammar, an error code is reported through cpp:result[] rather than throwing.
64+
This allows the library to be used in environments that disable exceptions (`-fno-exceptions`), which is detected automatically.
65+
66+
== URL Validity Invariant
67+
68+
All modifications to a cpp:url[] leave it in a valid state.
69+
It is not possible for a cpp:url[] to hold syntactically illegal text.
70+
All modifying functions perform validation on their input: attempting to set the scheme or port to an invalid string results in an exception, while other components are automatically percent-encoded as needed.
71+
All non-const operations offer the strong exception safety guarantee.
72+
73+
== No IRIs
74+
75+
The library does not handle https://www.rfc-editor.org/rfc/rfc3987.html[Internationalized Resource Identifiers,window=blank_] (IRIs).
76+
IRIs are different from URLs: they come from Unicode strings instead of low-ASCII strings and are covered by a separate specification.

doc/modules/ROOT/pages/index.adoc

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -28,6 +28,7 @@ While the library is general purpose, special care has been taken to ensure that
2828
Interfaces are provided for using error codes instead of exceptions as needed, and most algorithms have the means to opt out of dynamic memory allocation.
2929
Another feature of the library is that all modifications leave the URL in a valid state.
3030
Code which uses this library is easy to read, flexible, and performant.
31+
See the xref:design.adoc[design rationale] for more on these design principles.
3132

3233
Boost.URL offers these features:
3334

@@ -42,7 +43,7 @@ Boost.URL offers these features:
4243

4344
[NOTE]
4445
====
45-
Currently the library does not handle
46+
The library does not handle
4647
https://www.rfc-editor.org/rfc/rfc3987.html[Internationalized Resource Identifiers,window=blank_] (IRIs).
4748
These are different from URLs, come from Unicode strings instead of low-ASCII strings, and are covered by a separate specification.
4849
====

doc/modules/ROOT/pages/quicklook.adoc

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -234,8 +234,8 @@ id=42&name=John Doe Jingleheimer-Schmidt
234234
--
235235
====
236236

237-
cpp:decode_view[] and its decoding functions are designed to perform no memory allocations unless the algorithm where it's being used needs the result to be in another container.
238-
The design also permits recycling objects to reuse their memory, and at least minimize the number of allocations by deferring them until the result is in fact needed by the application.
237+
cpp:decode_view[] and its decoding functions perform no memory allocations unless the result needs to be stored in another container.
238+
Objects can be recycled to reuse their memory, deferring allocations until the application actually needs them.
239239

240240
In the example above, the memory owned by `str` can be reused to store other results.
241241
This is also useful when manipulating URLs:

0 commit comments

Comments
 (0)