Skip to content

Commit

Permalink
Simplified handling of UTF-8 symbols in YAML
Browse files Browse the repository at this point in the history
We actually don't need to store the encoding, we can assume
everything is UTF-8. `Symbol#to_sym` will take care of coercing
to ASCII when applicable.

Any other encoding is not supported as the cache should be invoked
if Psych could generate any other encoding.
  • Loading branch information
byroot committed Jan 31, 2022
1 parent 487d46c commit 85f1242
Show file tree
Hide file tree
Showing 4 changed files with 6 additions and 29 deletions.
2 changes: 1 addition & 1 deletion CHANGELOG.md
@@ -1,6 +1,6 @@
# Unreleased

* Improve the YAML compile cache to support `UTF-8` symbols. (#398)
* Improve the YAML compile cache to support `UTF-8` symbols. (#398, #399)
[The default `MessagePack` symbol serializer assumes all symbols are ASCII](https://github.com/msgpack/msgpack-ruby/pull/211),
because of this, non-ASCII compatible symbol would be restored with `ASCII_8BIT` encoding (AKA `BINARY`).
Bootsnap now properly cache them in `UTF-8`.
Expand Down
4 changes: 2 additions & 2 deletions ext/bootsnap/bootsnap.c
Expand Up @@ -2,7 +2,7 @@
* Suggested reading order:
* 1. Skim Init_bootsnap
* 2. Skim bs_fetch
* 3. The rest of everything
* 3. The rest of everyrything
*
* Init_bootsnap sets up the ruby objects and binds bs_fetch to
* Bootsnap::CompileCache::Native.fetch.
Expand Down Expand Up @@ -75,7 +75,7 @@ struct bs_cache_key {
STATIC_ASSERT(sizeof(struct bs_cache_key) == KEY_SIZE);

/* Effectively a schema version. Bumping invalidates all previous caches */
static const uint32_t current_version = 5;
static const uint32_t current_version = 4;

/* hash of e.g. "x86_64-darwin17", invalidating when ruby is recompiled on a
* new OS ABI, etc. */
Expand Down
27 changes: 2 additions & 25 deletions lib/bootsnap/compile_cache/yaml.rb
Expand Up @@ -49,31 +49,8 @@ def supported_internal_encoding?
module EncodingAwareSymbols
extend self

if Symbol.method_defined?(:name)
def pack(symbol)
if symbol.encoding == Encoding::UTF_8
1.chr << symbol.name
else
0.chr << symbol.name
end
end
else
def pack(symbol)
if symbol.encoding == Encoding::UTF_8
1.chr << symbol.to_s
else
0.chr << symbol.to_s
end
end
end

def unpack(payload)
payload.freeze
string = payload.byteslice(1..-1)
if payload.ord == 1 # Encoding::UTF_8
string.force_encoding(Encoding::UTF_8)
end
string.to_sym
payload.force_encoding(Encoding::UTF_8).to_sym
end
end

Expand All @@ -94,7 +71,7 @@ def init!
factory.register_type(
0x00,
Symbol,
packer: EncodingAwareSymbols.method(:pack).to_proc,
packer: :to_msgpack_ext,
unpacker: EncodingAwareSymbols.method(:unpack).to_proc,
)

Expand Down
2 changes: 1 addition & 1 deletion test/compile_cache_key_format_test.rb
Expand Up @@ -21,7 +21,7 @@ class CompileCacheKeyFormatTest < Minitest::Test

def test_key_version
key = cache_key_for_file(FILE)
exp = [5].pack("L")
exp = [4].pack("L")
assert_equal(exp, key[R[:version]])
end

Expand Down

0 comments on commit 85f1242

Please sign in to comment.