Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
100 changes: 100 additions & 0 deletions devel/0044.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,100 @@
# [0044] 修复 gf fix 对独立 `|` 标识符的误处理

## 任务相关的代码文件
- `tools/fix/liii/goldfix-tokenize.scm` — tokenizer 实现
- `tools/fix/tests/liii/goldfix-tokenize/tokenize-lines-test.scm` — tokenize 测试
- `tools/fix/tests/liii/goldfix-repair/fix-string-test.scm` — fix-string 测试

## 如何测试
```bash
xmake b goldfish
bin/gf tests/goldfish/liii/goldfix-tokenize/tokenize-lines-test.scm
bin/gf tests/goldfish/liii/goldfix-repair/fix-string-test.scm
```

## 2026-05-17 问题分析与修复方案

### What
`gf fix` 的 tokenizer 在遇到 `|` 字符时,无条件将其当作 bar symbol(如 `|symbol with spaces|`)的开始定界符,调用 `find-bar-symbol-end` 查找下一个 `|` 作为结束。这导致当代码中使用单独的 `|` 作为普通标识符时(如 `define-regexp-grammar` 语法中的选择符),从第一个 `|` 到下一个 `|` 之间的所有内容(包括大量括号)被当作一个 token 跳过,最终造成括号修复结果完全错误。

**最小复现例子:**
```scheme
(define-regexp-grammar (:menu-item (:or ---
|
;; |
(group :%1)
(text :%1)
) ;:or
) ;:menu-item
) ;define-regexp-grammar
```

修复后变成:
```scheme
(define-regexp-grammar (:menu-item (:or ---))
|
;; |
(group :%1)
(text :%1)
;:or
;:menu-item
) ;define-regexp-grammar
```

### Why
`goldfix-tokenize.scm` 的 `tokenize` 函数中,`((char=? c #\|)` 分支没有判断 `|` 是否真的是 bar symbol 的开始。在 Scheme 中,单独的 `|` 就是一个普通标识符,只有当 `|` 后面紧跟非空白字符时才可能是 bar symbol 的开始定界符。

### How
**TDD 修复方案:**

1. **先写测试**:在 `tokenize-lines-test.scm` 中添加对独立 `|` 的 tokenize 测试,在 `fix-string-test.scm` 中添加包含独立 `|` 的修复测试
2. **修改 tokenizer**:在 `tokenize` 函数的 `((char=? c #\|)` 分支前增加条件判断,只有当 `|` 后面紧跟的字符不是空白字符时,才进入 bar symbol 处理逻辑
3. **运行测试**:确保新测试通过,且原有测试不回归

**具体修改位置**:`tools/fix/liii/goldfix-tokenize.scm` 中 `tokenize` 函数的 `cond` 分支:

```scheme
;; 原代码
((char=? c #\|)
(add-token! 'other ...))

;; 修改为
((and (char=? c #\|)
(not (whitespace-char? next-c)))
(add-token! 'other ...))
```

这样,当 `|` 后面紧跟空白(如单独一行的 `|`)时,它会被 `else` 分支当作普通 `other` token(长度为1),而不会吞掉后续内容。

## 2026-05-17 修复验证

### What
按 TDD 流程完成修复:

1. **先写测试(红色)**:
- `tokenize-lines-test.scm` 添加独立 `|` 的 tokenize 测试:验证 `(or | a)` 被正确拆分为5个 token,`|` 是独立的 `other` token
- `fix-string-test.scm` 添加包含独立 `|` 的 `define-regexp-grammar` 片段测试
- 运行测试确认失败(tokenizer 把 `|` 当作 bar symbol 吞掉后续内容)

2. **修改代码(绿色)**:
- 修改 `tools/fix/liii/goldfix-tokenize.scm` 第351行:
```scheme
((and (char=? c #\|)
(not (whitespace-char? next-c))
) ;and
(add-token! 'other ...)
) ;
```

3. **运行测试验证**:
- 新测试全部通过
- 原有6个 goldfix 测试模块全部通过,无回归:
- `goldfix-tokenize/tokenize-lines-test.scm` — 27 correct, 0 failed
- `goldfix-repair/fix-string-test.scm` — 13 correct, 0 failed
- `goldfix-repair/repair-parentheses-test.scm` — 13 correct, 0 failed
- `goldfix-repair/parentheses-balanced-p-test.scm` — 6 correct, 0 failed
- `goldfix-record/make-fix-token-test.scm` — 35 correct, 0 failed
- `goldfix-edit/apply-edits-test.scm` — 4 correct, 0 failed

4. **验证原始文件**:
- `bin/gf fix --dry-run /home/da/git/mogan2/TeXmacs/progs/kernel/gui/menu-widget.scm` 输出与原始文件一致,不再错误修改 `define-regexp-grammar` 区域
2 changes: 1 addition & 1 deletion tools/fix/liii/goldfix-tokenize.scm
Original file line number Diff line number Diff line change
Expand Up @@ -348,7 +348,7 @@
) ;add-token!
(loop)
) ;
((char=? c #\|)
((and (char=? c #\|) (not (whitespace-char? next-c)))
(add-token! 'other
start
(find-bar-symbol-end source start)
Expand Down
28 changes: 28 additions & 0 deletions tools/fix/tests/liii/goldfix-repair/fix-string-test.scm
Original file line number Diff line number Diff line change
Expand Up @@ -120,4 +120,32 @@
) ;&-
) ;check

;; 独立的 | 标识符不应被当作 bar symbol 的开始定界符。

(check (fix-string (&- #""
(define-regexp-grammar
(:menu-item (:or ---
|
(group :%1)
(text :%1)
) ;:or
) ;:menu-item
) ;define-regexp-grammar
""
) ;&-
) ;fix-string
=>
(&- #""
(define-regexp-grammar
(:menu-item (:or ---
|
(group :%1)
(text :%1)
) ;:or
) ;:menu-item
) ;define-regexp-grammar
""
) ;&-
) ;check

(check-report)
10 changes: 10 additions & 0 deletions tools/fix/tests/liii/goldfix-tokenize/tokenize-lines-test.scm
Original file line number Diff line number Diff line change
Expand Up @@ -59,6 +59,16 @@
(check (fix-token-text (list-ref tokens 2)) => "#\"\"(not-code)\"\"")
) ;let

;; 独立的 | 标识符不应被当作 bar symbol 的开始定界符。

(let ((tokens (tokenize "(or | a)")))
(check (length tokens) => 5)
(check (fix-token-type (list-ref tokens 2)) => 'other)
(check (fix-token-text (list-ref tokens 2)) => "|")
(check (fix-token-type (list-ref tokens 3)) => 'other)
(check (fix-token-text (list-ref tokens 3)) => "a")
) ;let

;; tokenize-lines
;; 按物理行组织 token,并记录第一枚 code token。

Expand Down
Loading