Skip to content

[0019] 优化 string-ref/cursor 性能,消除临时 bytevector 分配#785

Merged
da-liii merged 3 commits into
mainfrom
da/0019/string-ref_perf
May 11, 2026
Merged

[0019] 优化 string-ref/cursor 性能,消除临时 bytevector 分配#785
da-liii merged 3 commits into
mainfrom
da/0019/string-ref_perf

Conversation

@da-liii
Copy link
Copy Markdown
Contributor

@da-liii da-liii commented May 11, 2026

摘要

新增 utf8->codepoint-at 函数,消除 string-ref/cursor 及基于 cursor 的字符串操作中每次字符访问产生的临时 bytevector 分配。

变更内容

  1. 新增 utf8->codepoint-at (goldfish/liii/unicode.scm)

    • 支持直接从 bytevector 的指定偏移位置解码 UTF-8 字符
    • 无需先 bytevector-copy 再解码
  2. 消除临时分配 (goldfish/liii/string-cursor.scm)

    • string-ref/cursor
    • string-prefix-length
    • string-suffix-length
    • string-prefix? 内部辅助函数
  3. 同步优化 (goldfish/scheme/char.scm)

    • utf8-string-map 的字符解码逻辑
  4. 新增测试 (tests/liii/unicode/utf8-to-codepoint-at-test.scm)

    • 覆盖 1~4 字节 UTF-8 编码、偏移解码、错误处理
  5. 性能基准测试 (bench/string-cursor.scm)

性能对比

测试场景 优化前 (秒) 优化后 (秒) 提升
ASCII 短字符串 (50 字符, 10000 次) 2.94 2.41 18%
ASCII 长字符串 (500 字符, 1000 次) 2.62 2.37 10%
UTF-8 短字符串 (20 汉字, 10000 次) 1.73 1.58 9%
UTF-8 长字符串 (200 汉字, 1000 次) 1.58 1.37 13%
Emoji 短字符串 (10 emoji, 10000 次) 0.99 0.88 11%

测试

  • bin/gf tests/liii/unicode/utf8-to-codepoint-at-test.scm — 26 项全部通过
  • bin/gf test tests/liii/string-cursor/ — 59 项全部通过

Da Shen and others added 3 commits May 11, 2026 14:57
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@da-liii da-liii merged commit c7db565 into main May 11, 2026
4 checks passed
@da-liii da-liii deleted the da/0019/string-ref_perf branch May 11, 2026 07:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant