All notable changes to this project will be documented in this file.
- Exclude more files from final package to significantly reduce package size.
- Hide usage of assert2 in doc examples to make them slightly clearer for users not familiar with it.
- Upgrade indirect dependency rustix to fix a security vulnerability in directory iterators. This does not affect htmlize, since rustix is only used by development dependencies.
- Enabled feature marks on docs.rs to make it clearer what features are required by what functions.
- Clarified ownership and licensing of entities.json data file.
- Fix building with
unescape
feature but notunescape_fast
. Added tests for a few common feature flags — in addition to--all-features
— to the CI check to avoid this sort of problem in the future.
- Fix docs.rs build to enable the
unescape
andentities
features.
- Hid
unescape()
behindunescape
feature. This allows users to avoid the dependency on phf and the build dependency on serde_json, which cuts build times on my machine by more than 90% (from 6.2 seconds to 0.5 seconds). - Hid
ENTITIES
behindentities
features for the same reason I added theunescape
feature. Note that theunescape
feature automatically enables theentities
feature, butunescape_faster
does not. - Switched both escape and unescape functions to use
Cow<'a, str>
for input and output. This allows for significant performance improvements when the input can be returned unchanged. - Updated minimum supported Rust version (MSRV) to 1.60.
- Significantly optimized both escape and unescape functions. Many of the improvements to the escape functions are similar to the ones outlined in Lise Henry’s excellent post on optimizing HTML entity escaping (see also: its Reddit discussion), though most notably I’m using memchr directly rather than regex.
- Added
unescape_faster
feature for even faster unescaping at the cost of longer build times (about 30 seconds longer on my machine). - Added
unescape_attribute()
to handle the special rules for dealing with entities in the value of an HTML attribute. Also addsunescape_in()
, which takes a context parameter that can either beContext::Attribute
orContext::General
(for everything else). - Added
unescape_bytes_in()
to work on[u8]
rather thanstr
. - Added
escape_..._bytes()
functions to work on[u8]
rather thanstr
. - Switched to the phf_codegen crate instead of using the
phf_map!
macro. On my machine, this cuts build time by about 25% (~2 seconds). - Clarified documentation of
ENTITIES
to indicate that it’s aMap
, not just a collection of tuples.
-
unescape()
incorrectly outputted the replacement character (U+FFFD “�”) for certain numeric entities:- Noncharacters
- Control characters
0x0D
(carriage return)
A close reading of the spec and some browser testing shows that behavior to be incorrect. Those characters are now outputted as themselves.
-
unescape()
incorrectly outputted long numeric entities as the literal text of the entity.A close reading of the spec and some browser testing shows that behavior to be incorrect. Those long entities are now outputted as the replacement character (U+FFFD “�”).