Automatic Pattern Library Generation#149
Conversation
Create PatternLibraryGenerator class that: - Extracts patterns from codebase using PatternExtractor - Categorizes patterns using AI classification - Generates Python modules matching manual library format - Supports multiple programming languages Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Add generate action to pattern_commands.py CLI that creates pattern library modules from codebase analysis. Integrates with PatternLibraryGenerator to extract and categorize patterns by language. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Implement generate-all command that generates pattern libraries for multiple languages at once. Supports filtering by languages and custom output directory. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Implemented load_language_patterns() function to dynamically load language-specific code patterns. Supports go, php, ruby, rust with case-insensitive lookup. Returns None for unsupported languages. Subtask: subtask-3-1
…very Update pattern discovery to use auto-generated libraries: - Add _load_library_patterns() helper to detect project languages and load corresponding pattern libraries - Integrate library patterns into discover_with_memory() function - Update PatternDiscoverer class to include library patterns in discover_patterns() method - Handle nested pattern structures (e.g., frameworks.gin.handler) - Add include_library_patterns parameter for optional control Subtask: subtask-3-2 Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
- Created test suite for PatternLibraryGenerator with 38 tests - Tests cover initialization, file discovery, pattern extraction, categorization, and module generation - Fixed bug in pattern_library_generator.py using categorize_pattern_sync instead of async version - All tests passing
- Created comprehensive test suite for pattern_commands.py - Added 28 tests covering generate and generate-all commands - Tests include: single language, batch generation, error handling, CLI workflow - Fixed bug in pattern_commands.py: Icons.BUILD → Icons.GEAR (BUILD doesn't exist) - All tests passing Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
|
Warning Rate limit exceeded
⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: ASSERTIVE Plan: Pro Run ID: 📒 Files selected for processing (6)
📝 WalkthroughWalkthroughThe changes introduce pattern library generation capabilities to the system, including new CLI commands to generate language-specific pattern modules, a PatternLibraryGenerator class for extracting and categorizing patterns, extension of pattern discovery to incorporate library patterns, updated pattern loaders, and comprehensive test coverage. Several standalone test modules were removed and test imports updated for clarity. Changes
Sequence DiagramsequenceDiagram
participant User as User/CLI
participant Handler as handle_patterns_command
participant Gen as PatternLibraryGenerator
participant Extractor as PatternExtractor
participant Categorizer as categorize_pattern_sync
participant FileIO as File System
User->>Handler: execute 'generate' action<br/>(language, output, options)
Handler->>Gen: initialize with project_dir
Handler->>Gen: generate_library_file(output_path,<br/>language, options)
Gen->>FileIO: scan source_dir for<br/>language extensions
FileIO-->>Gen: source files
Gen->>Extractor: extract_patterns(file)<br/>for each file
Extractor-->>Gen: extracted patterns
Gen->>Categorizer: categorize_pattern_sync(pattern)<br/>for each pattern
Categorizer-->>Gen: category + pattern_type
Gen->>Gen: group by category,<br/>apply truncation
Gen->>FileIO: write Python module<br/>(escaped snippets)
FileIO-->>Handler: output_path
Handler-->>User: success/failure status
Estimated code review effort🎯 4 (Complex) | ⏱️ ~65 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
| import sys | ||
| import tempfile | ||
| from pathlib import Path | ||
| from unittest.mock import MagicMock, patch |
Check notice
Code scanning / CodeQL
Unused import Note test
Show autofix suggestion
Hide autofix suggestion
Copilot Autofix
AI about 1 month ago
To fix the problem, remove the unused names MagicMock and patch imported from unittest.mock. This eliminates unnecessary dependencies and satisfies CodeQL’s unused import checks.
Concretely, in tests/test_pattern_cli_generation.py, delete line 18:
from unittest.mock import MagicMock, patchNo additional methods, imports, or definitions are required, and this change does not alter existing functionality because these names are not referenced anywhere in the shown code and are reported as unused.
| @@ -15,7 +15,6 @@ | ||
| import sys | ||
| import tempfile | ||
| from pathlib import Path | ||
| from unittest.mock import MagicMock, patch | ||
|
|
||
| import pytest | ||
|
|
- Remove unused imports: asyncio, PATTERN_CATEGORIES (pattern_library_generator.py) - Remove unused imports: MagicMock, patch (test_pattern_cli_generation.py) - Remove unused import: MagicMock (test_pattern_library_generator.py) - Fix unused variable: captured → _captured (test_pattern_cli_generation.py) - Replace deprecated typing.Dict with builtin dict (patterns/__init__.py) - Apply ruff formatting to all PR files (line length, trailing commas) - Merge develop to bring branch up to date - Add Automatic Pattern Library Generation docs to INTELLIGENT-PATTERN-RECOGNITION.md Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
| ) | ||
|
|
||
| # Should show warning in output | ||
| _captured = capsys.readouterr() # noqa: F841 |
Check notice
Code scanning / CodeQL
Unused local variable Note test
Show autofix suggestion
Hide autofix suggestion
Copilot Autofix
AI about 1 month ago
In general, to fix an unused local variable you either (a) remove the variable and, if safe, its assignment, or (b) rename it to a conventionally “unused” name when the right-hand side has needed side effects. Here, the only effect we need is calling capsys.readouterr() (to consume stdout/stderr), and we don’t use the returned object.
The best minimal fix without changing functionality is to drop the unused variable and call capsys.readouterr() as a standalone statement. This keeps the side effect (draining captured output) while removing the unused local. Concretely, in tests/test_pattern_cli_generation.py at the line _captured = capsys.readouterr() # noqa: F841, we should replace it with capsys.readouterr() (keeping the surrounding comments intact). No imports or other definitions are needed.
| @@ -456,7 +456,7 @@ | ||
| ) | ||
|
|
||
| # Should show warning in output | ||
| _captured = capsys.readouterr() # noqa: F841 | ||
| capsys.readouterr() | ||
| # May contain warning about unsupported language | ||
| # (implementation-dependent) | ||
|
|
|
|
||
| import tempfile | ||
| from pathlib import Path | ||
| from unittest.mock import MagicMock, patch |
Check notice
Code scanning / CodeQL
Unused import Note test
Show autofix suggestion
Hide autofix suggestion
Copilot Autofix
AI about 1 month ago
To fix the problem, remove the unused MagicMock name from the unittest.mock import while preserving any still-used imports (such as patch, if it’s used elsewhere in the file). This eliminates the unused dependency without changing runtime behavior.
Concretely, in tests/test_pattern_library_generator.py, locate the line:
from unittest.mock import MagicMock, patchand modify it to import only patch:
from unittest.mock import patchNo additional methods, imports, or definitions are needed. This keeps patch available for any tests that use it and removes the unused MagicMock symbol that CodeQL reported.
| @@ -8,7 +8,7 @@ | ||
|
|
||
| import tempfile | ||
| from pathlib import Path | ||
| from unittest.mock import MagicMock, patch | ||
| from unittest.mock import patch | ||
|
|
||
| import pytest | ||
| from integrations.graphiti.pattern_library_generator import ( |
The backward-compat shim apps/backend/test_discovery.py was removed by this PR, but tests/test_discovery.py still imported from it — causing a circular import (the test file imported itself by name). Update to import directly from analysis.test_discovery. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
PatternLibraryGenerator.project_dir uses Path.resolve(), which resolves symlinks (/var → /private/var on macOS) and short names (RUNNER~1 on Windows). Test now compares against the resolved path. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
_find_source_files needs the resolved path (which the generator stores) rather than the raw temp dir path. On macOS /var → /private/var and on Windows RUNNER~1 → runneradmin cause rglob to find no files when using the unresolved path. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Actionable comments posted: 13
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@apps/backend/cli/pattern_commands.py`:
- Around line 587-589: The code derives project_dir by climbing three parents
from spec_dir (project_dir = spec_dir.parent.parent.parent), which is brittle;
update the function that sets project_dir to accept an explicit project_dir
parameter or add a --project-dir CLI option (and fall back to the existing
heuristic), and add validation on the computed path (e.g., check for expected
marker files/directories) before using it; change references to the project_dir
variable and the function that computes it so callers can pass an explicit path,
and ensure a clear error is raised if validation fails.
- Around line 484-506: The PatternLibraryGenerator is being instantiated inside
the per-language loop even though it only depends on project_dir; move the
creation of the PatternLibraryGenerator(project_dir) out of the loop so a single
generator instance is reused for all languages, then call
generator.generate_library_file(output_path, language, options) for each
iteration; preserve per-iteration options construction
(max_patterns_per_category, include_line_numbers, optional source_dir) and only
re-instantiate the generator inside the loop if PatternLibraryGenerator has
internal per-run state that requires reset (otherwise reuse the single instance
to avoid unnecessary overhead).
- Around line 421-533: The function generate_all_patterns is over the cognitive
complexity threshold; extract the per-language processing inside the for loop
into a new helper (e.g., generate_for_language or _process_language) that
accepts the PatternLibraryGenerator inputs (project_dir, output_path, language,
options) and returns a tuple/result indicating success bool and optional error
string; move the try/except block that constructs PatternLibraryGenerator, calls
generator.generate_library_file, prints the per-language messages, and appends
to failed_languages into that helper, then simplify generate_all_patterns to
call the helper, increment success_count when it returns success, and append
failures when it returns an error to keep logic identical while reducing
complexity.
In `@apps/backend/context/pattern_discovery.py`:
- Around line 20-98: The function _load_library_patterns has high cognitive
complexity due to the nested recursive helper _extract_patterns; extract that
helper into a module-level function (e.g., rename to extract_library_patterns)
that accepts parameters (lang, category, patterns_dict, library_patterns,
prefix="") and performs the same recursive traversal and
pattern_key/pattern_text construction, then update _load_library_patterns to
call extract_library_patterns(language, category_name, category_patterns,
library_patterns) for each category; ensure you preserve the same pattern_key
naming convention and logging/capture behavior and update any type hints/imports
accordingly so library_patterns remains the consolidated dict returned by
_load_library_patterns.
- Around line 250-254: The library patterns are being loaded twice: once in
discover_with_memory (lines around discover_with_memory) and again inside
PatternDiscoverer.discover_patterns via include_library_patterns defaulting to
True; to fix, stop the duplicate load by calling
discoverer.discover_patterns(..., include_library_patterns=False) from
discover_with_memory (or alternatively remove the initial _load_library_patterns
call and let PatternDiscoverer handle it), and keep the unique symbols:
discover_with_memory, PatternDiscoverer.discover_patterns,
include_library_patterns, and _load_library_patterns when making the change.
In `@apps/backend/integrations/graphiti/pattern_library_generator.py`:
- Around line 183-219: The _categorize_patterns method is calling
categorize_pattern_sync sequentially which will be slow for many patterns;
change this to run classifications concurrently or in batches (e.g., create an
async batch helper or use a thread/process pool to call categorize_pattern_sync
in parallel or replace with an async categorize_pattern_batch) so patterns list
is processed in parallel with a configurable batch size and add an optional
progress callback; also extend the fallback type_to_category mapping in
_categorize_patterns to include missing types such as "test" -> "testing",
"config" -> "configuration", and "logging" -> "observability" (and any other
domain-specific mappings) so those patterns don’t fall back to "uncategorized".
- Around line 254-262: The current manual escaping of pattern["code_snippet"]
(variables code and code_escaped) and string assembly into module_code can fail
for edge cases like snippets ending with a backslash or containing sequences
like \"\"\"; replace the manual escaping logic by serializing the snippet with a
safe string literal generator (e.g., use json.dumps(code) or built-in
repr(code)) and insert that serialized value directly into module_code (use the
resulting quoted literal instead of assembling triple-quoted strings), updating
the code path that builds entries for key so all snippet edge cases are handled
reliably.
In `@apps/backend/patterns/__init__.py`:
- Around line 11-43: load_language_patterns currently only imports GO_PATTERNS,
PHP_PATTERNS, RUBY_PATTERNS, and RUST_PATTERNS and thus misses languages listed
in the generator; update load_language_patterns to cover the remaining entries
from the generator's LANGUAGE_EXTENSIONS (e.g., python, javascript, typescript,
java, csharp, cpp) by attempting to import corresponding modules (e.g.,
.python_patterns -> PYTHON_PATTERNS, .javascript_patterns ->
JAVASCRIPT_PATTERNS, .typescript_patterns -> TYPESCRIPT_PATTERNS, .java_patterns
-> JAVA_PATTERNS, .csharp_patterns -> CSHARP_PATTERNS, .cpp_patterns ->
CPP_PATTERNS) and returning their pattern dicts when present, and, to handle
generator-supported languages without a module, ensure the function returns None
gracefully while optionally logging or providing a clear fallback to indicate no
pre-built patterns exist.
- Around line 42-43: The except ImportError block in patterns.__init__ that
currently just returns None should log the ImportError with context instead of
silently failing; update the except handler around the dynamic import (the
ImportError catch in the module-loading function in patterns.__init__) to call
your module logger (e.g., logger.debug or processLogger.debug) and include the
exception message and the target module name so missing/misconfigured pattern
libraries are visible during debugging while preserving the existing None return
behavior.
In `@tests/test_pattern_cli_generation.py`:
- Around line 24-26: The current fragile platform-specific sys.path hack using
the any("apps/backend" in p or "apps\\backend" in p for p in sys.path) check and
sys.path.insert(0, "apps/backend") should be removed or replaced with a
cross-platform check: either drop this fallback entirely (since conftest.py
handles pytest runs) or replace the condition with a normalized Path-based check
that converts each sys.path entry to a pathlib.Path (e.g., compare
Path(p).as_posix() or Path(p).resolve() against Path("apps/backend").resolve())
before calling sys.path.insert(0, "apps/backend"), ensuring insertion happens
only when truly missing and works on all OSes.
- Around line 504-518: Replace the meaningless `assert True` in
test_generator_with_options with a real verification that generate_library_file
completed: after calling
PatternLibraryGenerator.generate_library_file(output_path, "python", options)
assert that output_path.exists() and its size/content is non-empty (e.g.,
output_path.stat().st_size > 0) or assert the file contains expected markers
like "def " or a known pattern string; use the test's generator variable and
output_path to locate the produced file and check its existence and basic
content instead of the constant boolean.
In `@tests/test_pattern_library_generator.py`:
- Around line 173-201: The test test_extract_patterns_from_files currently
patches generator.extractor.extract_patterns to always return the same
mock_patterns for every file; change the patch to use side_effect (either a list
of different pattern dicts matching each file in python_files or a callable that
returns a pattern based on the input file path) so _extract_all_patterns is
exercised across multiple files; then update assertions to verify aggregation
(e.g., total patterns equals expected sum and that each returned pattern
contains the correct "file" metadata matching entries from python_files) and
keep the use of include_line_numbers to assert "line_number" is present.
- Around line 408-411: The test's assertion is misleading because it checks for
"'''" which never occurs since generator._generate_module_code produces
double-quoted strings; change the assertion to only verify escaped triple
double-quotes are present in module_code (e.g., assert r'\"\"\"' in module_code)
or otherwise assert that any triple-quote sequences are properly escaped in the
output of _generate_module_code, removing the impossible "'''" branch.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: b12cccfc-c988-4172-a8fd-689cb4c1136f
📒 Files selected for processing (16)
apps/backend/cli/pattern_commands.pyapps/backend/context/pattern_discovery.pyapps/backend/integrations/graphiti/pattern_library_generator.pyapps/backend/patterns/__init__.pyapps/backend/test_discovery.pyapps/backend/test_env_validation.pyapps/backend/test_llm_mcp_integration.pyapps/backend/test_model_fallback_simulation.pyapps/backend/test_pattern_workflow.pyapps/backend/test_recovery_e2e.pyapps/backend/test_recovery_loop.pyapps/backend/test_sso_integration.pyguides/INTELLIGENT-PATTERN-RECOGNITION.mdtests/test_discovery.pytests/test_pattern_cli_generation.pytests/test_pattern_library_generator.py
💤 Files with no reviewable changes (8)
- apps/backend/test_discovery.py
- apps/backend/test_pattern_workflow.py
- apps/backend/test_recovery_loop.py
- apps/backend/test_env_validation.py
- apps/backend/test_model_fallback_simulation.py
- apps/backend/test_sso_integration.py
- apps/backend/test_llm_mcp_integration.py
- apps/backend/test_recovery_e2e.py
| def test_extract_patterns_from_files(self, temp_project_dir): | ||
| """Test extracting patterns from source files.""" | ||
| generator = PatternLibraryGenerator(temp_project_dir) | ||
| python_files = generator._find_source_files(generator.project_dir, "python") | ||
|
|
||
| # Mock the extractor to return sample patterns | ||
| mock_patterns = [ | ||
| { | ||
| "type": "error", | ||
| "pattern": "try-except with logging", | ||
| "code_snippet": "try:\n ...\nexcept ValueError as e:\n logger.error(f'Error: {e}')", | ||
| "line_number": 5, | ||
| } | ||
| ] | ||
|
|
||
| with patch.object( | ||
| generator.extractor, "extract_patterns", return_value=mock_patterns | ||
| ): | ||
| patterns = generator._extract_all_patterns( | ||
| python_files, pattern_types=None, include_line_numbers=True | ||
| ) | ||
|
|
||
| # Should extract patterns from both files | ||
| assert len(patterns) > 0 | ||
| # Should add file metadata | ||
| assert all("file" in p for p in patterns) | ||
| # Should include line numbers when requested | ||
| assert all("line_number" in p for p in patterns) | ||
|
|
There was a problem hiding this comment.
🧹 Nitpick | 🔵 Trivial
Test relies on mock returning same patterns for all files.
In test_extract_patterns_from_files, the mock returns the same patterns for every file. The assertion len(patterns) > 0 passes, but this doesn't verify that patterns are extracted from multiple files correctly. Consider using side_effect to return different patterns per file to better test the aggregation logic.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tests/test_pattern_library_generator.py` around lines 173 - 201, The test
test_extract_patterns_from_files currently patches
generator.extractor.extract_patterns to always return the same mock_patterns for
every file; change the patch to use side_effect (either a list of different
pattern dicts matching each file in python_files or a callable that returns a
pattern based on the input file path) so _extract_all_patterns is exercised
across multiple files; then update assertions to verify aggregation (e.g., total
patterns equals expected sum and that each returned pattern contains the correct
"file" metadata matching entries from python_files) and keep the use of
include_line_numbers to assert "line_number" is present.
pattern_commands.py: - Extract _resolve_project_dir() with marker-file validation replacing brittle spec_dir.parent.parent.parent heuristic - Add --project-dir CLI option as explicit override - Move PatternLibraryGenerator out of per-language loop (reuse single instance) - Extract _generate_for_language() helper to reduce generate_all_patterns complexity pattern_discovery.py: - Extract nested _extract_patterns() to module-level _extract_library_patterns() reducing cognitive complexity of _load_library_patterns - Fix duplicate library load: discover_with_memory now passes include_library_patterns=False to PatternDiscoverer.discover_patterns pattern_library_generator.py: - Use ThreadPoolExecutor for concurrent pattern categorization - Extend type_to_category fallback with test, config, logging, database, ui, deployment - Replace manual string escaping with json.dumps for safe code snippet serialization patterns/__init__.py: - Add all LANGUAGE_EXTENSIONS languages (python, javascript, typescript, java, csharp, cpp) - Use dynamic importlib.import_module with registry dict instead of if/elif chain - Log ImportError with module name and message instead of silently returning None tests: - Replace fragile sys.path string check with Path.resolve() comparison - Replace `assert True` with actual file existence/size verification - Use side_effect for per-file mock patterns in test_extract_patterns_from_files - Fix triple-quote assertion to match json.dumps output format Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|



Enhance feature-71 with automatic pattern extraction
Summary by CodeRabbit
Release Notes
New Features
Documentation
Tests
Chores