Skip to content

Conversation

@Hoshino-Yumetsuki
Copy link
Contributor

@Hoshino-Yumetsuki Hoshino-Yumetsuki commented Dec 28, 2024

  • 添加使用三种字符串相似度检测算法以准确的检测记忆是否重复

    • 余弦相似度:考虑词频信息
    • Levenshtein距离:考虑字符级别的编辑操作
    • Jaccard相似度:考虑词集合的重叠程度

    在实际测试中比单纯的向量相似度检测更为准确且好用

  • 修复 Milvus 缺少记忆 ID 问题

Summary by CodeRabbit

  • New Features

    • Introduced a new long-term memory plugin with configurable similarity checks.
    • Added advanced text similarity calculation functionality.
    • Enhanced memory management with configurable thresholds.
  • Configuration Changes

    • Removed legacy long-term memory configuration options.
    • Added new enableSimilarityCheck configuration option.
    • Simplified configuration schema for memory management.
  • Documentation

    • Updated README for the long-memory plugin.
    • Added badges and description for the new plugin.
  • Refactor

    • Streamlined memory handling across multiple components.
    • Removed deprecated long-memory related code.

…y calculation

- Refactor `filterSimilarMemory` function to use a cosine similarity threshold for filtering memories.
- Increase the number of similar memories retrieved from the vector store from 10 to 20.
- Introduce a new `cosineSimilarity` function to calculate the similarity score between memory strings.
- Update logging to warn when a memory is too similar based on the cosine score.
- Decrease the number of similar memories retrieved from the vector store from 20 to 10 in the `filterSimilarMemory` function.
- This change aims to refine the memory filtering process for improved performance and relevance.
…larity utility

- Updated the `filterSimilarMemory` function to utilize the new `calculateSimilarity` utility for improved memory similarity scoring.
- Enhanced logging to provide detailed similarity results, including score and additional details.
- Removed the previous `cosineSimilarity` function to streamline the codebase.
…ering logic

- Introduced `enableSimilarityCheck` option in the configuration to control memory similarity checks.
- Updated the `filterSimilarMemory` function to utilize the new configuration, allowing for more flexible memory filtering based on similarity thresholds.
- Adjusted the condition for filtering similar memories to improve performance and relevance.
…tation

- Added `enableSimilarityCheck` option to the configuration schema to control memory similarity checks.
- Updated English and Chinese locale files to include descriptions for the new `enableSimilarityCheck` and refined `longMemoryAddSimilarity` settings.
- Improved clarity in documentation regarding the behavior of memory addition based on similarity thresholds.
… logic

- Simplified the configuration schema by consolidating the `enableSimilarityCheck` option.
- Enhanced the `filterSimilarMemory` function by removing unnecessary line breaks for better readability.
- Improved logging format for similarity checks to ensure clarity in output.
…pply function

- Enhanced the document metadata structure by adding a new field `langchain_primaryid` alongside the existing `raw_id`.
- This change improves the identification and tracking of documents within the vector store service.
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Dec 28, 2024

Warning

Rate limit exceeded

@dingyi222666 has exceeded the limit for the number of commits or files that can be reviewed per hour. Please wait 11 minutes and 58 seconds before requesting another review.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

📥 Commits

Reviewing files that changed from the base of the PR and between be14cff and 3285bae.

📒 Files selected for processing (4)
  • packages/long-memory/src/index.ts (1 hunks)
  • packages/long-memory/src/locales/en-US.schema.yml (1 hunks)
  • packages/long-memory/src/locales/zh-CN.schema.yml (1 hunks)
  • packages/long-memory/src/plugins/memory.ts (6 hunks)

Walkthrough

The pull request introduces a comprehensive refactoring of the long-term memory functionality in the ChatLuna plugin. The changes involve removing existing long memory configurations from core components, creating a new dedicated long-memory package, and implementing a sophisticated similarity-based memory management system. The modifications streamline the configuration options, introduce more flexible memory handling, and provide a more modular approach to managing conversational memory across different components.

Changes

File Change Summary
packages/core/src/config.ts Removed long memory properties, added enableSimilarityCheck
packages/core/src/locales/en-US.schema.yml Updated configuration schema, added enableSimilarityCheck, removed proxy and quota settings
packages/core/src/locales/zh-CN.schema.yml Removed long memory, proxy, and quota configuration options
packages/core/src/index.ts Removed longMemory function call from initialization
packages/core/src/llm-core/chat/app.ts Removed longMemory property from ChatInterfaceInput
packages/core/src/services/chat.ts Removed longMemory parameter from ChatInterface instantiation
packages/long-memory/* New package with comprehensive long-term memory implementation, including schemas, logging, and similarity calculations

Sequence Diagram

sequenceDiagram
    participant User
    participant ChatInterface
    participant LongMemoryPlugin
    participant VectorStore
    participant SimilarityCalculator

    User->>ChatInterface: Start Conversation
    ChatInterface->>LongMemoryPlugin: Initialize Memory
    LongMemoryPlugin->>VectorStore: Create Retriever
    LongMemoryPlugin->>SimilarityCalculator: Configure Similarity Threshold
    User->>ChatInterface: Send Message
    ChatInterface->>LongMemoryPlugin: Check Memory Similarity
    LongMemoryPlugin->>SimilarityCalculator: Calculate Similarity
    SimilarityCalculator-->>LongMemoryPlugin: Similarity Score
    LongMemoryPlugin->>VectorStore: Store/Retrieve Memory
Loading

Poem

🐰 Memories dance, algorithms sing,
Long thoughts flutter on a digital wing
Similarity checks, a rabbit's delight
Filtering wisdom with algorithmic might
ChatLuna leaps, memory refined! 🧠✨


Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR. (Beta)
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (4)
packages/core/src/llm-core/memory/history/index.ts (2)

125-130: Consider clarifying the threshold naming.
Currently, longMemoryAddSimilarity and enableSimilarityCheck collaboratively control memory filtering. To improve readability, consider renaming longMemoryAddSimilarity to something like longMemoryAddSimilarityThreshold.

- if (config.longMemoryAddSimilarity < 1) {
+ if (config.longMemoryAddSimilarityThreshold < 1) {

205-205: Minor naming improvement.
Consider renaming similarityMemorys to similarMemories or searchResults for clarity.

- const similarityMemorys = await vectorStore.similaritySearchWithScore(memory, 10)
+ const similarMemories = await vectorStore.similaritySearchWithScore(memory, 10)
packages/vector-store-service/src/vectorstore/milvus.ts (1)

166-167: Additional metadata property.
Adding langchain_primaryid mirrors raw_id, which may be helpful for referencing or searching. If there’s no functional difference between raw_id and langchain_primaryid, consider unifying the fields.

document.metadata = {
  source: 'unknown',
  ...document.metadata,
  raw_id: id,
- langchain_primaryid: id
+ // Optionally unify to a single ID field if not needed separately
}
packages/core/src/locales/en-US.schema.yml (1)

39-39: Enhance the description to better reflect the implementation.

The current description could be clearer about:

  1. The three similarity algorithms being used (Cosine, Levenshtein, Jaccard)
  2. The default threshold value (0.8)
  3. The distinction between this and the longMemorySimilarity option

Consider updating the description to:

-      longMemoryAddSimilarity: Set similarity threshold for long-term memory retrieval before adding (0.0 to 1.0, where higher values require closer matches). If set to 0, the most strict check will be performed. If the retrieved memory exceeds this threshold, it will not be added to long-term memory.
+      longMemoryAddSimilarity: Set similarity threshold (0.0 to 1.0, default: 0.8) for detecting duplicate memories using Cosine, Levenshtein, and Jaccard similarity algorithms. Higher values require closer matches. New memories exceeding this threshold will be considered duplicates and won't be added. Set to 0 for strictest deduplication.
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 9e2e4a1 and 8b81f35.

📒 Files selected for processing (6)
  • packages/core/src/config.ts (2 hunks)
  • packages/core/src/llm-core/memory/history/index.ts (3 hunks)
  • packages/core/src/llm-core/memory/history/similarity.ts (1 hunks)
  • packages/core/src/locales/en-US.schema.yml (1 hunks)
  • packages/core/src/locales/zh-CN.schema.yml (1 hunks)
  • packages/vector-store-service/src/vectorstore/milvus.ts (1 hunks)
🔇 Additional comments (9)
packages/core/src/llm-core/memory/history/index.ts (4)

12-12: Import usage looks good.
The new calculateSimilarity function import is well-placed and aligns with the usage in filterSimilarMemory.


195-196: Function signature is succinct and clear.
Declaring similarityThreshold and enableCheck as distinct parameters makes the code self-explanatory.


198-200: Short-circuit condition looks correct.
Early returning if similarityThreshold >= 1 or checks are disabled ensures no unintended filtering.


215-230: Loop logic is correct.
The for-loop correctly breaks on the first detection of overly similar memory, preventing unnecessary checks. The logging is also clear and informative.

packages/core/src/llm-core/memory/history/similarity.ts (2)

1-8: Well-defined interface.
Using a dedicated interface for the similarity result helps maintain clarity and strongly typed structures.


10-108: Robust multi-metric similarity calculation.
Levenshtein, Jaccard, and Cosine cover distinct dimensions of string similarity. The weighted approach seems balanced. Watch out for performance on very large strings, but for typical cases, this should be fine.

packages/core/src/config.ts (1)

52-53: New configuration properties and defaults.
Introducing enableSimilarityCheck and adjusting the default of longMemoryAddSimilarity to 0.8 make the memory-check behavior more flexible. If 0.8 is too strict for some deployments, consider mentioning that in documentation or user guides.

Also applies to: 113-114

packages/core/src/locales/zh-CN.schema.yml (1)

38-39: Schema documentation is consistent.
The new enableSimilarityCheck description and clarified longMemoryAddSimilarity doc align well with the logic in the code.

packages/core/src/locales/en-US.schema.yml (1)

38-38: LGTM! Clear and well-documented configuration option.

The enableSimilarityCheck option is well-described and aligns perfectly with the PR's objective of implementing memory duplication detection.

@dingyi222666 dingyi222666 self-requested a review December 28, 2024 06:01
…data in apply function

- Eliminated the `langchain_primaryid` field from the document metadata structure in the `apply` function.
- This change simplifies the metadata and focuses on the essential `raw_id` for document identification.
- Renamed `longMemoryAddSimilarity` to `longMemoryDuplicateThreshold` for clarity in configuration.
- Introduced `longMemoryEnableDuplicateCheck` boolean option to control duplicate memory checks.
- Updated the `apply` function to utilize the new configuration options for filtering similar memories.
- Improved readability by renaming parameters in the `clear-chat-history` event handler.
- Added new dependencies: `@node-rs/jieba` and `tiny-segmenter` to enhance text processing capabilities.
- Refactored the import path for the `calculateSimilarity` function to improve code organization.
- Removed the obsolete `similarity.ts` file, consolidating similarity calculation logic for better maintainability.
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (8)
packages/core/src/llm-core/utils/similarity.ts (2)

22-45: Consider using standalone functions instead of a static-only class.

This class only contains static members, which may introduce unnecessary complexity. Converting these methods to standalone utility functions can improve readability and reduce boilerplate.

🧰 Tools
🪛 Biome (1.9.4)

[error] 22-45: Avoid classes that contain only static members.

Prefer using simple functions instead of classes with only static members.

(lint/complexity/noStaticOnlyClass)


[error] 33-33: Using this in a static context can be confusing.

this refers to the class.
Unsafe fix: Use the class name instead.

(lint/complexity/noThisInStatic)


[error] 36-36: Using this in a static context can be confusing.

this refers to the class.
Unsafe fix: Use the class name instead.

(lint/complexity/noThisInStatic)


77-77: Adhere to Prettier formatting guidelines.

Wrapping the arrow function parameter in parentheses improves code readability and style consistency.

- const intersection = new Set([...words1].filter(x => words2.has(x)))
+ const intersection = new Set([...words1].filter((x) => words2.has(x)))
🧰 Tools
🪛 eslint

[error] 77-77: Replace x with (x)

(prettier/prettier)

🪛 GitHub Check: CodeFactor

[warning] 77-77: packages/core/src/llm-core/utils/similarity.ts#L77
Replace x with (x) (prettier/prettier)

packages/core/src/config.ts (3)

115-116: Prettier formatting suggestion.

Deleting the extra line break here could help maintain consistent formatting.

- longMemoryEnableDuplicateCheck: Schema.boolean()
-     .default(true),
+ longMemoryEnableDuplicateCheck: Schema.boolean().default(true),
🧰 Tools
🪛 eslint

[error] 115-116: Delete ⏎············

(prettier/prettier)

🪛 GitHub Check: CodeFactor

[warning] 115-115: packages/core/src/config.ts#L115
Delete ⏎············ (prettier/prettier)


117-120: Inline the chained schema calls.

Combining .default(3) and .min(1) on the same line makes the code more concise and consistent with typical Prettier styles.

- longMemoryInterval: Schema.number()
-     .default(3)
-     .min(1)
-     .max(10),
+ longMemoryInterval: Schema.number().default(3).min(1).max(10),
🧰 Tools
🪛 eslint

[error] 117-120: Replace ⏎············.default(3)⏎············.min(1)⏎············ with .default(3).min(1)

(prettier/prettier)

🪛 GitHub Check: CodeFactor

[warning] 117-117: packages/core/src/config.ts#L117
Replace ⏎············.default(3)⏎············.min(1)⏎············ with .default(3).min(1) (prettier/prettier)


121-122: Prettier formatting suggestion.

Removing the extra line break ensures consistent code style.

- longMemoryExtractModel: Schema.dynamic('model')
-     .default('无')
+ longMemoryExtractModel: Schema.dynamic('model').default('无')
🧰 Tools
🪛 eslint

[error] 121-122: Delete ⏎············

(prettier/prettier)

🪛 GitHub Check: CodeFactor

[warning] 121-121: packages/core/src/config.ts#L121
Delete ⏎············ (prettier/prettier)

packages/core/package.json (1)

222-222: Check for potential segmentation pitfalls.

tiny-segmenter works well for Japanese text, but ensure you handle text that mixes multiple scripts (e.g., Kana with Latin letters). Consider adding tests for mixed language scenarios.

packages/core/src/llm-core/memory/history/index.ts (2)

125-130: Conditional threshold check.

You've introduced a threshold check for duplicate detection. Be cautious if you plan to handle edge cases where no threshold or a threshold of 1.0 is used; it might simply skip detection. Consider logging or clarifying the user about the fallback logic.


215-230: Consider performance constraints.

The current loop obtains up to 10 most-similar memories, then individually calculates textual similarity again. This double similarity check may be necessary, but it can also be costly if memory arrays grow large. Evaluate whether you need both vector-based search and text-based similarity or if you can unify the checks.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 8b81f35 and 347cddc.

📒 Files selected for processing (4)
  • packages/core/package.json (2 hunks)
  • packages/core/src/config.ts (2 hunks)
  • packages/core/src/llm-core/memory/history/index.ts (4 hunks)
  • packages/core/src/llm-core/utils/similarity.ts (1 hunks)
🧰 Additional context used
🪛 eslint
packages/core/src/config.ts

[error] 115-116: Delete ⏎············

(prettier/prettier)


[error] 117-120: Replace ⏎············.default(3)⏎············.min(1)⏎············ with .default(3).min(1)

(prettier/prettier)


[error] 121-122: Delete ⏎············

(prettier/prettier)

packages/core/src/llm-core/utils/similarity.ts

[error] 77-77: Replace x with (x)

(prettier/prettier)

🪛 GitHub Check: CodeFactor
packages/core/src/config.ts

[warning] 121-121: packages/core/src/config.ts#L121
Delete ⏎············ (prettier/prettier)


[warning] 115-115: packages/core/src/config.ts#L115
Delete ⏎············ (prettier/prettier)


[warning] 117-117: packages/core/src/config.ts#L117
Replace ⏎············.default(3)⏎············.min(1)⏎············ with .default(3).min(1) (prettier/prettier)

packages/core/src/llm-core/utils/similarity.ts

[warning] 77-77: packages/core/src/llm-core/utils/similarity.ts#L77
Replace x with (x) (prettier/prettier)

🪛 Biome (1.9.4)
packages/core/src/llm-core/utils/similarity.ts

[error] 22-45: Avoid classes that contain only static members.

Prefer using simple functions instead of classes with only static members.

(lint/complexity/noStaticOnlyClass)


[error] 33-33: Using this in a static context can be confusing.

this refers to the class.
Unsafe fix: Use the class name instead.

(lint/complexity/noThisInStatic)


[error] 36-36: Using this in a static context can be confusing.

this refers to the class.
Unsafe fix: Use the class name instead.

(lint/complexity/noThisInStatic)


[error] 47-134: Avoid classes that contain only static members.

Prefer using simple functions instead of classes with only static members.

(lint/complexity/noStaticOnlyClass)


[error] 120-120: Using this in a static context can be confusing.

this refers to the class.
Unsafe fix: Use the class name instead.

(lint/complexity/noThisInStatic)


[error] 121-121: Using this in a static context can be confusing.

this refers to the class.
Unsafe fix: Use the class name instead.

(lint/complexity/noThisInStatic)


[error] 122-122: Using this in a static context can be confusing.

this refers to the class.
Unsafe fix: Use the class name instead.

(lint/complexity/noThisInStatic)

🔇 Additional comments (7)
packages/core/src/config.ts (3)

49-50: Consistent naming convention.

The newly added properties longMemoryDuplicateThreshold and longMemoryEnableDuplicateCheck follow the longMemory prefix. This aligns well with past naming requests and keeps the config consistent.


54-54: Double-check default behavior.

enableSimilarityCheck is introduced without explicit usage in the snippet. Ensure that this property is properly integrated into the similarity control flow elsewhere in the codebase.


110-114: Validate range of threshold.

Ensure that the chosen default threshold of 0.8 suits your typical use case for duplicate detection. This is somewhat high; consider testing different values to find an optimal default.

packages/core/package.json (1)

210-210: Verify version range for @node-rs/jieba.

Ensure ^2.0.1 is compatible with your current Node.js environment and other existing dependencies. If any issues arise, consider pinning the exact version or verifying with additional tests.

packages/core/src/llm-core/memory/history/index.ts (3)

12-12: Ensure shared usage is known.

calculateSimilarity is imported here; confirm any upstream or downstream usage of this method to avoid unexpected behavior if the configuration toggles similarity off.


195-196: Parameter naming.

similarityThreshold and enableCheck are clear. Continue to maintain consistency across the codebase so that developers easily understand these parameters' roles.


198-201: Honor threshold=1 logic.

When similarityThreshold >= 1 or !enableCheck, the function returns immediately. This is correct for ignoring similarity checks, but be sure you truly want to skip all memory filtering in these cases.

- Introduced `stopwords-iso` dependency to improve text processing by filtering out common stopwords in multiple languages.
- Updated similarity weights in the `SimilarityCalculator` to include a new `bm25` metric, adjusting existing weights for cosine, levenshtein, and jaccard.
- Refactored `TextTokenizer` to detect languages and remove stopwords from tokenized text, enhancing the accuracy of similarity calculations.
- Added `calculateBM25Similarity` method to compute similarity scores using the BM25 algorithm, providing a more robust similarity assessment.
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (2)
packages/core/src/llm-core/utils/similarity.ts (2)

26-100: Consider converting TextTokenizer to a namespace.

Since TextTokenizer contains only static members, consider converting it to a TypeScript namespace or a module with exported functions. This would better align with functional programming practices and eliminate static context issues.

-class TextTokenizer {
+namespace TextTokenizer {
     private static stopwords = new Set([
         ...stopwords.zh,
         ...stopwords.en,
         ...stopwords.ja
     ])
     // ... rest of the code
-}
+}
🧰 Tools
🪛 Biome (1.9.4)

[error] 42-42: Using this in a static context can be confusing.

this refers to the class.
Unsafe fix: Use the class name instead.

(lint/complexity/noThisInStatic)


[error] 43-43: Using this in a static context can be confusing.

this refers to the class.
Unsafe fix: Use the class name instead.

(lint/complexity/noThisInStatic)


[error] 44-44: Using this in a static context can be confusing.

this refers to the class.
Unsafe fix: Use the class name instead.

(lint/complexity/noThisInStatic)


[error] 50-50: Using this in a static context can be confusing.

this refers to the class.
Unsafe fix: Use the class name instead.

(lint/complexity/noThisInStatic)


[error] 55-55: Using this in a static context can be confusing.

this refers to the class.
Unsafe fix: Use the class name instead.

(lint/complexity/noThisInStatic)


[error] 76-76: Using this in a static context can be confusing.

this refers to the class.
Unsafe fix: Use the class name instead.

(lint/complexity/noThisInStatic)


[error] 92-92: Using this in a static context can be confusing.

this refers to the class.
Unsafe fix: Use the class name instead.

(lint/complexity/noThisInStatic)


[error] 93-93: Using this in a static context can be confusing.

this refers to the class.
Unsafe fix: Use the class name instead.

(lint/complexity/noThisInStatic)


[error] 97-97: Using this in a static context can be confusing.

this refers to the class.
Unsafe fix: Use the class name instead.

(lint/complexity/noThisInStatic)

🪛 eslint

[error] 88-88: Replace token with (token)

(prettier/prettier)


[error] 91-91: Insert ⏎················

(prettier/prettier)


[error] 93-93: Insert ⏎············

(prettier/prettier)

🪛 GitHub Check: CodeFactor

[warning] 77-77: packages/core/src/llm-core/utils/similarity.ts#L77
Replace x with (x) (prettier/prettier)


[warning] 91-91: packages/core/src/llm-core/utils/similarity.ts#L91
Insert ⏎················ (prettier/prettier)


[warning] 88-88: packages/core/src/llm-core/utils/similarity.ts#L88
Replace token with (token) (prettier/prettier)


[warning] 93-93: packages/core/src/llm-core/utils/similarity.ts#L93
Insert ⏎············ (prettier/prettier)


196-199: Enhance error handling with specific error types.

Consider creating custom error types for better error handling and add more specific validation messages.

+class SimilarityError extends Error {
+    constructor(message: string) {
+        super(message);
+        this.name = 'SimilarityError';
+    }
+}

 public static calculate(str1: string, str2: string): SimilarityResult {
     if (!str1 || !str2) {
-        throw new Error('Input strings cannot be empty')
+        throw new SimilarityError('Input strings cannot be empty')
     }
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 347cddc and b62469c.

📒 Files selected for processing (2)
  • packages/core/package.json (2 hunks)
  • packages/core/src/llm-core/utils/similarity.ts (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • packages/core/package.json
🧰 Additional context used
🪛 Biome (1.9.4)
packages/core/src/llm-core/utils/similarity.ts

[error] 25-102: Avoid classes that contain only static members.

Prefer using simple functions instead of classes with only static members.

(lint/complexity/noStaticOnlyClass)


[error] 42-42: Using this in a static context can be confusing.

this refers to the class.
Unsafe fix: Use the class name instead.

(lint/complexity/noThisInStatic)


[error] 43-43: Using this in a static context can be confusing.

this refers to the class.
Unsafe fix: Use the class name instead.

(lint/complexity/noThisInStatic)


[error] 44-44: Using this in a static context can be confusing.

this refers to the class.
Unsafe fix: Use the class name instead.

(lint/complexity/noThisInStatic)


[error] 50-50: Using this in a static context can be confusing.

this refers to the class.
Unsafe fix: Use the class name instead.

(lint/complexity/noThisInStatic)


[error] 55-55: Using this in a static context can be confusing.

this refers to the class.
Unsafe fix: Use the class name instead.

(lint/complexity/noThisInStatic)


[error] 76-76: Using this in a static context can be confusing.

this refers to the class.
Unsafe fix: Use the class name instead.

(lint/complexity/noThisInStatic)


[error] 92-92: Using this in a static context can be confusing.

this refers to the class.
Unsafe fix: Use the class name instead.

(lint/complexity/noThisInStatic)


[error] 93-93: Using this in a static context can be confusing.

this refers to the class.
Unsafe fix: Use the class name instead.

(lint/complexity/noThisInStatic)


[error] 97-97: Using this in a static context can be confusing.

this refers to the class.
Unsafe fix: Use the class name instead.

(lint/complexity/noThisInStatic)


[error] 102-222: Avoid classes that contain only static members.

Prefer using simple functions instead of classes with only static members.

(lint/complexity/noStaticOnlyClass)


[error] 204-204: Using this in a static context can be confusing.

this refers to the class.
Unsafe fix: Use the class name instead.

(lint/complexity/noThisInStatic)


[error] 205-205: Using this in a static context can be confusing.

this refers to the class.
Unsafe fix: Use the class name instead.

(lint/complexity/noThisInStatic)


[error] 206-206: Using this in a static context can be confusing.

this refers to the class.
Unsafe fix: Use the class name instead.

(lint/complexity/noThisInStatic)


[error] 207-207: Using this in a static context can be confusing.

this refers to the class.
Unsafe fix: Use the class name instead.

(lint/complexity/noThisInStatic)

🪛 eslint
packages/core/src/llm-core/utils/similarity.ts

[error] 88-88: Replace token with (token)

(prettier/prettier)


[error] 91-91: Insert ⏎················

(prettier/prettier)


[error] 93-93: Insert ⏎············

(prettier/prettier)


[error] 132-132: Replace x with (x)

(prettier/prettier)


[error] 177-177: Replace token with (token)

(prettier/prettier)


[error] 183-183: Replace t with (t)

(prettier/prettier)


[error] 186-186: Replace 1·+·Math.abs(tokens1.length·-·termFreqInDoc2·+·0.5)·/·(termFreqInDoc2·+·0.5) with ⏎················1·+⏎····················Math.abs(tokens1.length·-·termFreqInDoc2·+·0.5)·/⏎························(termFreqInDoc2·+·0.5)⏎············

(prettier/prettier)


[error] 188-188: Insert ⏎···············

(prettier/prettier)

🪛 GitHub Check: CodeFactor
packages/core/src/llm-core/utils/similarity.ts

[warning] 77-77: packages/core/src/llm-core/utils/similarity.ts#L77
Replace x with (x) (prettier/prettier)


[warning] 91-91: packages/core/src/llm-core/utils/similarity.ts#L91
Insert ⏎················ (prettier/prettier)


[warning] 186-186: packages/core/src/llm-core/utils/similarity.ts#L186
Replace 1·+·Math.abs(tokens1.length·-·termFreqInDoc2·+·0.5)·/·(termFreqInDoc2·+·0.5) with ⏎················1·+⏎····················Math.abs(tokens1.length·-·termFreqInDoc2·+·0.5)·/⏎························(termFreqInDoc2·+·0.5)⏎············ (prettier/prettier)


[warning] 183-183: packages/core/src/llm-core/utils/similarity.ts#L183
Replace t with (t) (prettier/prettier)


[warning] 88-88: packages/core/src/llm-core/utils/similarity.ts#L88
Replace token with (token) (prettier/prettier)


[warning] 177-177: packages/core/src/llm-core/utils/similarity.ts#L177
Replace token with (token) (prettier/prettier)


[warning] 188-188: packages/core/src/llm-core/utils/similarity.ts#L188
Insert ⏎··············· (prettier/prettier)


[warning] 93-93: packages/core/src/llm-core/utils/similarity.ts#L93
Insert ⏎············ (prettier/prettier)

🔇 Additional comments (4)
packages/core/src/llm-core/utils/similarity.ts (4)

1-14: LGTM! Well-balanced similarity weights.

The weights for different similarity metrics are well-distributed, summing to 1.0, which provides a balanced approach to similarity calculation. The choice of libraries for text segmentation is appropriate for multi-language support.


16-24: LGTM! Clear and comprehensive interface definition.

The SimilarityResult interface effectively captures both the final score and individual metric details, enabling transparency in similarity calculations.


222-227: LGTM! Clean API design.

The exported function provides a simple and clean interface for calculating similarity, effectively hiding the implementation details.


167-194: Verify BM25 implementation and add documentation.

The BM25 implementation could benefit from:

  1. Documentation explaining the choice of k1 and b parameters
  2. Validation of the IDF calculation approach
  3. Handling edge cases when document lengths are very different
✅ Verification successful

BM25 implementation is correct but needs documentation

The BM25 implementation follows standard practices with appropriate parameter values (k1=1.5, b=0.75), but lacks documentation explaining:

  • The choice of these standard parameters
  • The IDF calculation approach used
  • The normalization by tokens1.length at the end

The implementation correctly handles:

  • Term frequency calculations
  • Document length normalization
  • Edge cases (zero frequencies)
  • Multi-language support through TextTokenizer

Suggested documentation to add above the method:

/**
 * Calculates BM25 similarity between two strings.
 * Uses standard parameters: k1=1.5 (term frequency saturation) and b=0.75 (length normalization).
 * The score is normalized by query length for better comparison across different inputs.
 * @param s1 Query string (treated as query terms)
 * @param s2 Document string
 * @returns Normalized BM25 similarity score
 */
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Search for other BM25 implementations in the codebase for consistency
ast-grep --pattern 'function $name($params) {
  $$$
  const k1 = $value
  const b = $value
  $$$
}'

Length of output: 101


Script:

#!/bin/bash
# Let's try a different approach to search for BM25-related code and documentation
rg -i "bm25|okapi" -B 2 -A 2

# Also search for any documentation about similarity calculations
fd "README|docs" --type f --exec cat {} \; | rg -i "similarity|bm25"

# Look for test files related to similarity calculations
fd "similarity.*test|test.*similarity" --type f --exec cat {} \;

Length of output: 2307


Script:

#!/bin/bash
# Let's look for test cases specifically for similarity calculations
rg -l "similarity" | grep -i "test" | xargs cat

# Also check for any comments or documentation in the similarity.ts file
cat packages/core/src/llm-core/utils/similarity.ts

Length of output: 7121

🧰 Tools
🪛 eslint

[error] 177-177: Replace token with (token)

(prettier/prettier)


[error] 183-183: Replace t with (t)

(prettier/prettier)


[error] 186-186: Replace 1·+·Math.abs(tokens1.length·-·termFreqInDoc2·+·0.5)·/·(termFreqInDoc2·+·0.5) with ⏎················1·+⏎····················Math.abs(tokens1.length·-·termFreqInDoc2·+·0.5)·/⏎························(termFreqInDoc2·+·0.5)⏎············

(prettier/prettier)


[error] 188-188: Insert ⏎···············

(prettier/prettier)

🪛 GitHub Check: CodeFactor

[warning] 186-186: packages/core/src/llm-core/utils/similarity.ts#L186
Replace 1·+·Math.abs(tokens1.length·-·termFreqInDoc2·+·0.5)·/·(termFreqInDoc2·+·0.5) with ⏎················1·+⏎····················Math.abs(tokens1.length·-·termFreqInDoc2·+·0.5)·/⏎························(termFreqInDoc2·+·0.5)⏎············ (prettier/prettier)


[warning] 183-183: packages/core/src/llm-core/utils/similarity.ts#L183
Replace t with (t) (prettier/prettier)


[warning] 177-177: packages/core/src/llm-core/utils/similarity.ts#L177
Replace token with (token) (prettier/prettier)


[warning] 188-188: packages/core/src/llm-core/utils/similarity.ts#L188
Insert ⏎··············· (prettier/prettier)

…ityCalculator

- Updated method calls in the TextTokenizer class to use static references for language detection and stopword filtering.
- Refactored SimilarityCalculator to call its similarity methods statically, improving code clarity and consistency.
- These changes enhance maintainability and readability of the similarity calculation logic.
- Introduced a new function `validateAndAdjustWeights` to ensure that the total similarity weights sum to 1, adjusting them if necessary.
- Updated the `SimilarityCalculator` to use the validated weights in the similarity score calculation, enhancing the reliability of similarity assessments.
- This change improves the robustness of the similarity calculation logic by ensuring consistent weight application.
@Hoshino-Yumetsuki Hoshino-Yumetsuki changed the title 完善长期记忆重复性检测,修复 Milvus 无法添加记忆 完善长期记忆重复性检测 Dec 28, 2024
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (7)
packages/long-memory/src/similarity.ts (5)

47-123: Consider replacing the static-only class with utility functions.

The TextTokenizer class contains only static methods and no instance-level state. This can be simplified by converting the methods into standalone utility functions. This aligns with the static analysis suggestion to avoid classes that contain only static members, and it may improve code clarity.

🧰 Tools
🪛 Biome (1.9.4)

[error] 76-76: Using this in a static context can be confusing.

this refers to the class.
Unsafe fix: Use the class name instead.

(lint/complexity/noThisInStatic)


[error] 97-97: Using this in a static context can be confusing.

this refers to the class.
Unsafe fix: Use the class name instead.

(lint/complexity/noThisInStatic)


74-77: Avoid using this in a static context.

The keyword this refers to the class, which can be confusing in static methods. Prefer referencing the class name or factoring out the logic into standalone functions.

- return this.removeStopwords(tokens)
+ return TextTokenizer.removeStopwords(tokens)
🧰 Tools
🪛 Biome (1.9.4)

[error] 76-76: Using this in a static context can be confusing.

this refers to the class.
Unsafe fix: Use the class name instead.

(lint/complexity/noThisInStatic)


96-97: Avoid using this in a static context.

Same issue as above. Use the class name or a standalone function call instead.

- return this.removeStopwords(tokens)
+ return TextTokenizer.removeStopwords(tokens)
🧰 Tools
🪛 Biome (1.9.4)

[error] 97-97: Using this in a static context can be confusing.

this refers to the class.
Unsafe fix: Use the class name instead.

(lint/complexity/noThisInStatic)


125-251: Consider flattening the static class into pure functions.

Like TextTokenizer, SimilarityCalculator only provides static methods and no instance-level members. Converting these methods into function exports can remove unnecessary complexity.

🧰 Tools
🪛 Biome (1.9.4)

[error] 125-253: Avoid classes that contain only static members.

Prefer using simple functions instead of classes with only static members.

(lint/complexity/noStaticOnlyClass)


253-258: Add unit tests for calculateSimilarity.

The calculateSimilarity function is crucial. Ensure it has thorough test coverage across various languages and edge cases, such as empty strings, extremely long strings, or highly similar inputs.

Would you like help creating a dedicated test suite for this function?

packages/long-memory/src/plugins/memory.ts (1)

206-231: Consider potential performance impact with repeated multistep similarity checks.

For each memory, calling similaritySearchWithScore and then iterating again with calculateSimilarity can be expensive if your dataset is large. Evaluate potential optimizations or caching if you expect high volume.

packages/long-memory/package.json (1)

66-79: Consolidate version resolution configurations

The same version resolutions are repeated in three different sections: resolutions, overrides, and pnpm.overrides. This redundancy could lead to maintenance issues.

Consider consolidating these into a single section, preferably using only pnpm.overrides since this is a pnpm workspace:

-    "resolutions": {
-        "@langchain/core": "0.3.18",
-        "js-tiktoken": "npm:@dingyi222666/js-tiktoken@^1.0.15"
-    },
-    "overrides": {
-        "@langchain/core": "0.3.18",
-        "js-tiktoken": "npm:@dingyi222666/js-tiktoken@^1.0.15"
-    },
     "pnpm": {
         "overrides": {
             "@langchain/core": "0.3.18",
             "js-tiktoken": "npm:@dingyi222666/js-tiktoken@^1.0.15"
         }
     }
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b62469c and 908f791.

📒 Files selected for processing (17)
  • packages/core/src/config.ts (1 hunks)
  • packages/core/src/index.ts (0 hunks)
  • packages/core/src/llm-core/chat/app.ts (1 hunks)
  • packages/core/src/locales/en-US.schema.yml (0 hunks)
  • packages/core/src/locales/zh-CN.schema.yml (0 hunks)
  • packages/core/src/services/chat.ts (0 hunks)
  • packages/long-memory/README.md (1 hunks)
  • packages/long-memory/package.json (1 hunks)
  • packages/long-memory/src/index.ts (1 hunks)
  • packages/long-memory/src/locales/en-US.schema.yml (1 hunks)
  • packages/long-memory/src/locales/zh-CN.schema.yml (1 hunks)
  • packages/long-memory/src/plugin.ts (1 hunks)
  • packages/long-memory/src/plugins/config.ts (1 hunks)
  • packages/long-memory/src/plugins/memory.ts (6 hunks)
  • packages/long-memory/src/similarity.ts (1 hunks)
  • packages/long-memory/tsconfig.json (1 hunks)
  • processDynamicImport.js (1 hunks)
💤 Files with no reviewable changes (4)
  • packages/core/src/index.ts
  • packages/core/src/locales/en-US.schema.yml
  • packages/core/src/locales/zh-CN.schema.yml
  • packages/core/src/services/chat.ts
✅ Files skipped from review due to trivial changes (2)
  • packages/long-memory/README.md
  • packages/long-memory/tsconfig.json
🧰 Additional context used
🪛 Biome (1.9.4)
packages/long-memory/src/plugin.ts

[error] 3-3: Expected a semicolon or an implicit semicolon after a statement, but found none

An explicit or implicit semicolon is expected here...

...Which is required to end this statement

(parse)


[error] 4-4: Expected a semicolon or an implicit semicolon after a statement, but found none

An explicit or implicit semicolon is expected here...

...Which is required to end this statement

(parse)


[error] 3-3: Shouldn't redeclare 'config'. Consider to delete it or rename it.

'config' is defined here:

(lint/suspicious/noRedeclare)


[error] 4-4: Shouldn't redeclare 'memory'. Consider to delete it or rename it.

'memory' is defined here:

(lint/suspicious/noRedeclare)

packages/long-memory/src/similarity.ts

[error] 46-125: Avoid classes that contain only static members.

Prefer using simple functions instead of classes with only static members.

(lint/complexity/noStaticOnlyClass)


[error] 76-76: Using this in a static context can be confusing.

this refers to the class.
Unsafe fix: Use the class name instead.

(lint/complexity/noThisInStatic)


[error] 97-97: Using this in a static context can be confusing.

this refers to the class.
Unsafe fix: Use the class name instead.

(lint/complexity/noThisInStatic)


[error] 125-253: Avoid classes that contain only static members.

Prefer using simple functions instead of classes with only static members.

(lint/complexity/noStaticOnlyClass)

🔇 Additional comments (16)
packages/long-memory/src/plugins/memory.ts (3)

10-10: Ensure consistent import usage.

Great to see you are utilizing the calculateSimilarity function. Confirm that this import is necessary if your business logic truly requires similarity calculation in this file.


126-131: Check threshold logic consistency.

When longMemoryDuplicateThreshold is below 1, the system filters out highly similar memories. Verify that this threshold is balanced with user expectations. If it’s too low, you might block relevant or partially relevant memories.


196-201: Early return condition might skip intended checks.

If similarityThreshold >= 1 or checks are disabled, the code immediately returns the entire memory array. Ensure this is intentional. If you wanted partial filtering, consider adjusting your threshold logic.

packages/long-memory/src/plugin.ts (1)

8-21: Plugin pattern looks good.

The approach of batching and awaiting middlewares is clear. Ensure any future middlewares are appended to this array in a well-managed order, especially if certain tasks must run before others.

packages/long-memory/src/plugins/config.ts (2)

6-16: Dynamic event-driven schema updates are well-structured.

Listening for model-added and model-removed events fosters real-time updates. This is a robust approach for adapting to model changes on the fly.


18-28: Good fallback for when no models are available.

Adding '无' ensures the app won't crash if no models exist. This is a thoughtful default.

packages/long-memory/src/index.ts (3)

12-26: Initialization flow is properly structured.

Registering the plugin and deferring additional setup until the ready event is a clean approach. It ensures that the plugin is fully initialized before calling plugins(ctx, config).


29-35: Check alignment with new core config changes.

You’ve introduced longMemory and additional properties here. Confirm these are consistent with any new or removed fields in your upstream configuration (e.g., longMemory was previously mentioned as removed in the core config).


37-57: Schema definitions are clear and localized.

Ensuring numeric bounds, default values, and i18n support is a solid practice. Verify that localized schema files match these properties for consistent user configuration experiences.

processDynamicImport.js (1)

29-32: Add path coverage for the new long-memory plugin.

The added path for packages/long-memory/src/plugin.ts and packages/long-memory/src/plugins integrates dynamic import handling for the long-memory plugin. This change ensures that the script processes additional modules for the memory feature. The adjustment looks correct and maintains consistency with similar paths entries.

packages/core/src/llm-core/chat/app.ts (1)

330-332: Verify logic for fallback to emptyEmbeddings.

The chain of conditions checks for null embeddings, empty arrays, or '无'. This effectively handles missing or invalid embedding settings by falling back to emptyEmbeddings. Ensure that other modules or user configurations don't accidentally pass alternate representations (e.g., empty strings).

packages/long-memory/src/locales/zh-CN.schema.yml (1)

1-9: Comprehensive long-term memory configuration additions.

These new schema entries elegantly capture the key settings for long-term memory, such as thresholds, intervals, and model selection. They are well-structured and clear. Verify that the described fields align with the actual usage in the code so that the UI or config loader can handle them without issues.

packages/long-memory/src/locales/en-US.schema.yml (1)

1-8: English schema for long-term memory.

The new configuration fields are consistent with the Chinese schema, providing parallel coverage in English. The thresholds and descriptive texts are clear, ensuring clarity for international users.

packages/long-memory/package.json (2)

50-55: Dependencies align well with similarity detection requirements

The chosen dependencies support multilingual text processing needed for similarity detection:

  • @node-rs/jieba for Chinese text segmentation
  • stopwords-iso for stopwords removal
  • tiny-segmenter for Japanese text segmentation

This combination will enable accurate similarity detection across different languages.


62-65: Verify peer dependency version compatibility

The package requires specific versions of its peer dependencies:

  • koishi: ^4.18.1
  • koishi-plugin-chatluna: ^1.0.0-rc.2

Let's verify these versions are compatible across the workspace.

✅ Verification successful

Let me check the chatluna plugin version requirements across the workspace.


✓ Peer dependencies are consistent across workspace

The verification shows that all packages in the workspace maintain consistent version requirements:

  • koishi: ^4.18.1
  • koishi-plugin-chatluna: ^1.0.0-rc.2

These versions align perfectly with the peer dependencies specified in the package.json file under review.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Check for version consistency across workspace
fd -e json package.json | xargs cat | jq -r 'select(.dependencies?.koishi or .peerDependencies?.koishi) | "\(.name): koishi@\(.dependencies?.koishi // .peerDependencies?.koishi)"'

Length of output: 1321


Script:

#!/bin/bash
# Check for chatluna dependency versions across workspace
fd -e json package.json | xargs cat | jq -r 'select(.dependencies?."koishi-plugin-chatluna" or .peerDependencies?."koishi-plugin-chatluna") | "\(.name): chatluna@\(.dependencies?."koishi-plugin-chatluna" // .peerDependencies?."koishi-plugin-chatluna")"'

Length of output: 1476

packages/core/src/config.ts (1)

47-47: Consider following the established naming convention

Based on previous feedback, properties related to long memory functionality should use the longMemory prefix.

-    enableSimilarityCheck: boolean
+    longMemorySimilarityCheck: boolean

Comment on lines 3 to 4
import { apply as memory } from './plugins/memory'import { apply as config } from './plugins/config'
import { apply as memory } from './plugins/memory'import { Config } from '.'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Resolve parse errors and redeclarations in import statements.

There seems to be a syntax issue and object redeclares:

import { apply as memory } from './plugins/memory'import { apply as config } from './plugins/config'

Use semicolons or new lines to separate these import statements, and avoid redeclaring config and memory.

- import { apply as memory } from './plugins/memory'import { apply as config } from './plugins/config'
+ import { apply as memory } from './plugins/memory'
+ import { apply as config } from './plugins/config'

Committable suggestion skipped: line range outside the PR's diff.

🧰 Tools
🪛 Biome (1.9.4)

[error] 3-3: Expected a semicolon or an implicit semicolon after a statement, but found none

An explicit or implicit semicolon is expected here...

...Which is required to end this statement

(parse)


[error] 4-4: Expected a semicolon or an implicit semicolon after a statement, but found none

An explicit or implicit semicolon is expected here...

...Which is required to end this statement

(parse)


[error] 3-3: Shouldn't redeclare 'config'. Consider to delete it or rename it.

'config' is defined here:

(lint/suspicious/noRedeclare)


[error] 4-4: Shouldn't redeclare 'memory'. Consider to delete it or rename it.

'memory' is defined here:

(lint/suspicious/noRedeclare)

longMemoryAddSimilarity: number
longMemoryInterval: number
longMemoryExtractModel: string
enableSimilarityCheck: boolean
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codebase verification

Breaking changes detected in long-memory configuration

The verification reveals significant usage of the removed configurations across the codebase, particularly in the long-memory plugin. Here are the affected locations that need attention:

  • packages/long-memory/src/plugins/memory.ts: Uses removed configurations for core functionality including memory retrieval and filtering
  • packages/long-memory/src/index.ts: Defines the removed configurations as part of the plugin's Config interface and Schema
  • packages/core/src/llm-core/prompt/type.ts and chain/prompt.ts: References long memory prompts in preset configurations

This change represents a breaking change as it removes configurations that are actively used in the long-memory plugin's implementation. The following actions are needed:

  1. Update the long-memory plugin to work with the new enableSimilarityCheck configuration
  2. Provide migration documentation for users to transition from the old configurations
  3. Consider maintaining backward compatibility or incrementing the major version number
🔗 Analysis chain

Verify the impact of configuration changes

The removal of several long memory configurations and addition of enableSimilarityCheck might affect existing installations. Let's verify the usage of removed configurations across the codebase.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Search for usage of removed configurations
echo "Searching for references to removed configurations..."
rg -l "longMemory|longMemorySimilarity|longMemoryAddSimilarity|longMemoryInterval|longMemoryExtractModel" --type ts

# Search for migration guides or breaking changes documentation
echo "Checking for migration documentation..."
fd -e md -e mdx | xargs rg -l "breaking.*changes|migration|upgrade"

Length of output: 555


Script:

#!/bin/bash
# Let's examine the actual usage of these configurations in the found files
echo "Checking usage in packages/long-memory/src/plugins/memory.ts:"
rg "longMemory|longMemorySimilarity|longMemoryAddSimilarity|longMemoryInterval|longMemoryExtractModel" -C 2 packages/long-memory/src/plugins/memory.ts

echo -e "\nChecking usage in packages/long-memory/src/index.ts:"
rg "longMemory|longMemorySimilarity|longMemoryAddSimilarity|longMemoryInterval|longMemoryExtractModel" -C 2 packages/long-memory/src/index.ts

echo -e "\nChecking usage in packages/core/src/llm-core/prompt/type.ts:"
rg "longMemory|longMemorySimilarity|longMemoryAddSimilarity|longMemoryInterval|longMemoryExtractModel" -C 2 packages/core/src/llm-core/prompt/type.ts

echo -e "\nChecking usage in packages/core/src/llm-core/chain/prompt.ts:"
rg "longMemory|longMemorySimilarity|longMemoryAddSimilarity|longMemoryInterval|longMemoryExtractModel" -C 2 packages/core/src/llm-core/chain/prompt.ts

# Let's also check if there are any tests that might need updating
echo -e "\nChecking for related test files:"
fd -e test.ts -e spec.ts | xargs rg -l "longMemory"

Length of output: 6224

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (2)
packages/long-memory/src/plugin.ts (2)

1-6: Consider reorganizing imports for better maintainability.

While the imports are functionally correct, consider grouping them by external packages first, followed by internal imports:

 import { Context } from 'koishi'
 import { Config } from 'koishi-plugin-chatluna-long-memory'
-// import start
+
 import { apply as config } from './plugins/config'
 import { apply as memory } from './plugins/memory'
-// import end

11-14: Consider adding middleware registration method.

Instead of hardcoding the middleware array, consider implementing a registration method for better modularity and testability:

-    const middlewares: Plugin[] =
-        // middleware start
-        [config, memory] // middleware end
+    const middlewares: Plugin[] = []
+    
+    // Register core middleware
+    middlewares.push(config)
+    middlewares.push(memory)
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 908f791 and be14cff.

📒 Files selected for processing (2)
  • packages/long-memory/src/plugin.ts (1 hunks)
  • packages/long-memory/src/plugins/memory.ts (6 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • packages/long-memory/src/plugins/memory.ts
🔇 Additional comments (2)
packages/long-memory/src/plugin.ts (2)

8-10: Verify Config interface includes similarity detection settings.

Given that this PR implements similarity detection algorithms (Cosine, Levenshtein, Jaccard), ensure that the Config interface from 'koishi-plugin-chatluna-long-memory' includes the necessary configuration options for these algorithms.

✅ Verification successful

Let me gather more information about the configuration and similarity settings.


Config interface correctly includes similarity detection settings

Based on the verification results, the Config interface in packages/core/src/config.ts already includes the enableSimilarityCheck setting (line 47). Additionally, the codebase has comprehensive similarity detection implementation:

  • The schema file (en-US.schema.yml) defines multiple similarity-related configurations:
    • longMemorySimilarity: Threshold for memory matches (0.0 to 1.0)
    • enableSimilarityCheck: Toggle for similarity checking
    • longMemoryAddSimilarity: Threshold for memory addition

The implementation includes multiple similarity algorithms:

  • Cosine similarity
  • Levenshtein distance
  • Jaccard similarity
  • BM25 scoring
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Search for similarity-related configuration in the Config interface
ast-grep --pattern 'interface Config {
  $$$
  enableSimilarityCheck?: $_
  $$$
}'

# Check for any similarity-related configuration files
fd -e yml -e yaml -e json . | xargs rg -l "similarity|cosine|levenshtein|jaccard"

Length of output: 218


Script:

#!/bin/bash
# Check the content of the schema file to understand similarity configurations
cat packages/long-memory/src/locales/en-US.schema.yml

# Search for Config interface definition
ast-grep --pattern 'interface Config {
  $$$
}'

# Look for any similarity-related settings in the codebase
rg "similarity|cosine|levenshtein|jaccard" -A 3 -B 3

Length of output: 30421


15-17: 🛠️ Refactor suggestion

Add error handling for middleware execution.

The current implementation doesn't handle potential errors from middleware execution. Consider adding try-catch blocks and proper error handling:

     for (const middleware of middlewares) {
-        await middleware(ctx, parent)
+        try {
+            await middleware(ctx, parent)
+        } catch (error) {
+            ctx.logger.error(`Middleware execution failed: ${error.message}`)
+            throw error  // or handle appropriately
+        }
     }

Additionally, verify that the memory middleware properly handles Milvus integration and memory IDs as mentioned in the PR objectives.

✅ Verification successful

Add error handling for middleware execution and verify memory implementation.

The review comment's suggestions are valid and can be improved based on the codebase analysis:

  1. The middleware error handling suggestion is appropriate as:

    • Other parts of the codebase (e.g., vectorstore.ts) implement similar error handling patterns
    • The current implementation in plugin.ts lacks error handling which could lead to silent failures
  2. Regarding the Milvus integration and memory IDs verification:

    • The memory.ts implementation properly handles memory IDs through the resolveLongMemoryId function
    • Vector store integration is handled flexibly through the createVectorStoreRetriever function
    • The implementation includes proper error handling for memory extraction and duplicate checking
    • The code has appropriate logging for debugging and error cases

Here's the suggested implementation:

    for (const middleware of middlewares) {
-        await middleware(ctx, parent)
+        try {
+            await middleware(ctx, parent)
+        } catch (error) {
+            ctx.logger?.error(`Middleware execution failed: ${error}`)
+            throw error
+        }
    }
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Search for Milvus-related code in memory middleware
rg -A 5 "Milvus|memory.*ID" ./packages/long-memory/src/plugins/memory.ts

Length of output: 74


Script:

#!/bin/bash
# Let's first check if the file exists and its content
fd "memory.ts" packages/long-memory/src/plugins/

# If the file is not found, let's check the entire codebase for memory-related files
fd "memory" packages/long-memory/src/

# Let's also search for middleware-related error handling patterns in the codebase
ast-grep --pattern 'try {
  await $$$
} catch (error) {
  $$$
}'

Length of output: 5229


Script:

#!/bin/bash
# Let's check the content of memory.ts to understand its implementation
cat packages/long-memory/src/plugins/memory.ts

# Let's also check for any error handling patterns in plugin.ts
cat packages/long-memory/src/plugin.ts

Length of output: 12114

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants