Skip to content

Conversation

@neoragex2002
Copy link
Contributor

@neoragex2002 neoragex2002 commented Nov 23, 2025

feat(markdown)!: strict math delimiters, robust inline code parsing; drop plain parentheses as math

  • Add MathOptions.strictDelimiters (inline: $...$, \(...\); block: $$...$$, \[...\])
  • Strict mode: disable heuristics and mid-state (unclosed) math tokens
  • Always accept $...$ as inline math (unless content contains backticks)
  • Remove plain parentheses as inline math delimiters to prevent false positives (use \(...\) or $...$ instead)
  • Fix inline code parsing:
    • Avoid emitting partial code spans when closing backtick is missing
    • Merge remaining fragments and re-parse atomically
    • Add raw-based fallback that rebuilds inline_code and strong across backticks, preserving CJK text
  • Fix nested list parsing: stop skipping *-bullet nested lists; parse uniformly
  • Tune math heuristic (non-strict path): recognise simple chemistry forms (e.g. H_2O, CH_3CH_2OH) and single-letter tokens; $...$ path no longer depends on heuristics
  • Add scripts/debug-parse.mjs for token/node inspection in local dev

BREAKING CHANGE: plain ( ... ) is no longer treated as inline math. Use \(...\) or $...$ for inline formulas.

Testcases:

3.  **多维度波动性估计 (`longTermDeviation()`)**
    *   这是计算阈值“宽度”的关键,它结合了三种不同的波动性视角:
        *   **`primaryDeviation.getDeviation()`**: 衡量 RCF 原始分数本身的整体波动(EWMA 标准差)。
        *   **`secondaryDeviation.getDeviation()`**: 衡量 RCF 原始分数的**瞬时变化率**(即 `current_score - last_score`)的波动。这对于捕捉突变非常有效。
        *   **`thresholdDeviation.getDeviation()`**: 专门捕捉当分数低于其均值时,分数与均值之间差值的波动。这进一步强化了对**分数非对称性**的理解,因为它更关注“正常”分数分布的下半部分。
    *   **融合机制**: `longTermDeviation()` 方法会根据 `shingleSize``transformMethod` 的类型,以及 `scoreDifferencing` 参数(一个 `[0, 1]` 的权重因子),**加权融合**这些不同的波动性估计。例如,`scoreDifferencing` 倾向于分数本身的波动,而 `1 - scoreDifferencing` 倾向于分数变化率的波动。
    *   **设计理念**: 分数的异常行为可能表现为多种形式:整体水平的漂移、突然的尖峰、或者持续的轻微增长。通过结合多种波动性度量,模型能够更全面、更鲁棒地估计“正常”分数的变异性,从而使阈值能够响应不同类型的异常信号。


---
#### **Q2: 模型从开始处理数据到输出可靠结果,需要经历哪些阶段?何时才算“完全正常运行”?**

**A:** 模型达到完全稳定需要经过多个阶段,其稳定时间点由 `shingle_size (S)`, `tree_size (T)`, `time_decay (D)``calibrator_window_size (W_C)` 共同决定。

1.  **阶段 1: Shingle 缓冲区填充 (`t < S - 1`)**: 模型未运行。
2.  **阶段 2: RRCF 树填充 (`S - 1 <= t < (S - 1) + T`)**: 模型开始学习,但分数不稳定。
3.  **阶段 3: 完全指数衰减运行 (`t >= (S - 1) + D`)**: 模型核心评分功能稳定。
4.  **阶段 4: 完全正常运行 (`t >= (S - 1) + max(T, D, W_C)`)**: 整个系统(模型+校准器)稳定,输出的异常概率可靠。

**结论**: 系统在处理了至少 `(S - 1) + max(T, D, W_C)` 个原始数据点后,才能被认为完全正常运行。在此之前的所有输出都应被视为模型预热和学习过程的一部分。


---
5.  **数值稳定性与溢出问题**
    *   `ExponentialWeighting.weight(t)``t` 过大(特别是 `time_decay` 设置过大时)是否会导致 `e^(alpha * t)` 溢出?


**我们在 `ED-RRCF` 算法讨论中探讨的问题归纳**

1.  **`pysad_rrcf.py``rrcf_edr.py` 实现差异与行为一致性问题**
    *   `pysad_rrcf.py` 原始实现中,评分 (`score_partial`) 和更新 (`fit_partial`) 逻辑与 `rrcf_edr.py` 的“更新并评分”模式不符。
    *   `_SingleTree.update` 方法中,`L` 变量的用法与 `rrcf_edr.py` 不一致,且存在冗余。
    *   `wi` (新点权重) 在 `i >= time_decay` 阶段的计算方式与 `rrcf_edr.py` 不一致 (`weight(time_decay)` vs `weight(time_decay - 1)`)。

2.  **`PySAD` `BaseModel` API 兼容性问题**
    *   是否能直接实现 `fit_score_partial` 而不实现 `fit_partial``score_partial`?(结论:不符合 `BaseModel` 接口规范)

3.  **模型运行阶段与稳定性问题**
    *   模型从接收第一个样本点到完全稳定输出可靠结果,分为哪些时间阶段?
    *   何时才能算作“正常运行”?(涉及到 `shingle_size``tree_size``time_decay``calibrator_window_size` 的综合考量)
    *   如何用 Mermaid 图清晰地表示这些阶段和流程?

4.  **参数的深层语义与优化问题**
    *   **`time_decay` 参数的必要性**:它是否可以被 `e``alpha` 替代?与“历史窗口”概念的对齐?
    *   **`time_decay``tree_size` 的关系**`time_decay``tree_size` 小是否无妨?对算法行为有何影响?
    *   **`e``alpha` 与“历史窗口”的对齐**:如何通过这两个参数控制模型的“记忆长度”?
    *   **`P` 值的确定**:在 `alpha = -ln(P) / H` 公式中,`P` (权重衰减比例) 如何选择?

5.  **数值稳定性与溢出问题**
    *   `ExponentialWeighting.weight(t)``t` 过大(特别是 `time_decay` 设置过大时)是否会导致 `e^(alpha * t)` 溢出?
    *   这种溢出对蓄水池采样行为有何影响?
    *   如何通过截断机制避免溢出?截断机制本身对采样的影响是什么?

6.  **传统 `RRCF` 在流式数据中的局限性**
    *   传统 `RRCF` 的无偏均匀采样机制,在概念漂移普遍存在的流式环境中,为何成为其核心缺陷?
    *   如何更深刻、直观地阐述传统 `RRCF` 理论前提与流式应用需求之间的不一致性?

---
1.  **阶段 1: Shingle 缓冲区填充 (0 <= t < S-1)**
2.  **阶段 2: 树填充 / 初始学习 (S-1 <= t < S-1 + T)**
3.  **阶段 3: 满树操作与衰减预热 (S-1 + T <= t < S-1 + D)**
4.  **阶段 4: 满树操作与完全指数衰减 (t >= S-1 + D)**
5.  **阶段 5: 阈值校准器填充 (t < S-1 + W_C)**
6.  **阶段 6: 完全运行 (t >= S-1 + W_C)**

---
- $H$, $O$, $C$
- $H_2O$, $CO_2$
- $CH_3CH_2OH$, $CH_3COOH$

…drop plain parentheses as math

- Add MathOptions.strictDelimiters (inline: $...$, \(...\); block: $$...$$, \[...\])
- Strict mode: disable heuristics and mid-state (unclosed) math tokens
- Always accept $...$ as inline math (unless content contains backticks)
- Remove plain parentheses as inline math delimiters to prevent false positives (use \(...\) or $...$ instead)
- Fix inline code parsing:
    - Avoid emitting partial code spans when closing backtick is missing
    - Merge remaining fragments and re-parse atomically
    - Add raw-based fallback that rebuilds inline_code and strong across backticks, preserving CJK text
- Fix nested list parsing: stop skipping *-bullet nested lists; parse uniformly
- Tune math heuristic (non-strict path): recognise simple chemistry forms (e.g. H_2O, CH_3CH_2OH) and single-letter tokens; $...$ path no longer depends on heuristics
- Add scripts/debug-parse.mjs for token/node inspection in local dev

BREAKING CHANGE: plain ( ... ) is no longer treated as inline math. Use \(...\) or $...$ for inline formulas.
@netlify
Copy link

netlify bot commented Nov 23, 2025

Deploy Preview for vue-markdown-renderer canceled.

Name Link
🔨 Latest commit 8890aab
🔍 Latest deploy log https://app.netlify.com/projects/vue-markdown-renderer/deploys/692363ecf3488b0008cba7e6

@netlify
Copy link

netlify bot commented Nov 23, 2025

Deploy Preview for vue-markdown-renderer-docs ready!

Name Link
🔨 Latest commit 8890aab
🔍 Latest deploy log https://app.netlify.com/projects/vue-markdown-renderer-docs/deploys/692363ec76017e0008253943
😎 Deploy Preview https://deploy-preview-148--vue-markdown-renderer-docs.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

@Simon-He95 Simon-He95 merged commit 4c883ca into Simon-He95:main Nov 24, 2025
12 checks passed
@Simon-He95
Copy link
Owner

LGTM

@Simon-He95
Copy link
Owner

Sorry, some of his tests failed; I'll revert them now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants