Skip to content

Conversation

@O1ahmad
Copy link
Owner

@O1ahmad O1ahmad commented Jan 20, 2026

Note

Introduces a robust, AI-assisted CLI parsing and binary discovery toolchain with Docker/host support, plus tests and developer tooling.

  • New modules: src/binary_finder.py (find executables in containers/host with caching and timeouts) and src/command_executor.py (help/version execution with fallbacks)
  • CLI tools: src/discover_binaries.py to auto-discover/verify binaries in images and update configs; src/process_cli_tools_v2.py to batch-parse tools (sequential/parallel) using the unified src/parser.py
  • Parser overhaul: Replaces parser_v2.py with enhanced src/parser.py (AI-driven help/version parsing, recursive subcommand extraction); updates imports to use it
  • Tests: Adds tests/test_parser.py with mocked subprocess/OpenAI to validate AWS CLI parsing
  • Tooling: Adds .pre-commit-config.yaml; Makefile clean stops deleting result.json; minor setup.py formatting tweaks

Written by Cursor Bugbot for commit 72e8228. This will update automatically on new commits. Configure here.

Summary by CodeRabbit

  • New Features

    • Binary discovery and execution tools for host/container environments, plus a CLI to discover and validate binaries inside Docker images with ranking and safe config updates.
    • AI-driven CLI help/version parsing to generate structured command documentation via a new parsing CLI.
    • Pre-commit hooks configuration added.
  • Bug Fixes

    • Safer handling when image tags are missing to prevent failures.
  • Chores

    • Simplified clean target to stop removing result.json.
  • Removed

    • Legacy parser implementation replaced/removed.

✏️ Tip: You can customize this high-level summary in your review settings.

@coderabbitai
Copy link

coderabbitai bot commented Jan 20, 2026

Note

Other AI code review bot(s) detected

CodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review.

📝 Walkthrough

Walkthrough

Adds three new discovery modules and a CLI to locate/verify executables on host and in Docker images; replaces parser_v2 with an AI-driven parser in src/parser.py; modifies processing logic, tests, pre-commit config, setup formatting, and simplifies the Makefile clean target.

Changes

Cohort / File(s) Summary
Build config
Makefile, setup.py
Simplified clean target (removed result.json). Minor formatting/quote style changes in setup.py.
Pre-commit
.pre-commit-config.yaml
New pre-commit configuration added with standard hooks (trailing-whitespace, check-yaml, debug-statements, etc.).
Binary discovery core
src/binary_finder.py, src/command_executor.py
New BinaryFinder for host/container discovery (quick lookups, full filesystem scan with caching/timeouts, candidate generation/matching/scoring, optional help verification). New CommandExecutor to run commands on host or in-container with help/version variations, timeouts, and validators.
Discovery orchestrator / CLI
src/discover_binaries.py
New BinaryDiscoveryTool CLI: loads/saves config (dry-run/backup), checks/pulls images, orchestrates discovery and validation, applies thresholds, selects primary/alternates, and updates config.
Parser migration
src/parser.py, src/parser_v2.py
src/parser_v2.py removed. src/parser.py rewritten to an AI-driven parser using BinaryFinder/CommandExecutor with analyze_with_ai, parse_command, parse_binary, and CLI entry.
Tools processing tweaks
src/process_cli_tools_v2.py
Defensive handling for empty image_tags (defaults to ["latest"]), category detection tweaks, and stylistic/quoting normalization.
Tests
tests/test_parser.py
Tests updated to use parse_binary, broadened subprocess/OpenAI mocks, and adjusted expectations for AI-driven parse outputs.

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant Discovery as BinaryDiscoveryTool
    participant Finder as BinaryFinder
    participant Executor as CommandExecutor
    participant Docker
    participant Host

    User->>Discovery: process_tools(config, filters)
    Discovery->>Docker: check_docker_available()
    Docker-->>Discovery: available

    Discovery->>Docker: pull_image(image)
    Docker-->>Discovery: image_ready

    Discovery->>Finder: discover_binaries_for_tool(image, tool_name)
    Finder->>Docker: which / command -v (quick lookup)
    Docker-->>Finder: path_or_none

    rect rgba(100, 150, 200, 0.5)
    Note over Finder,Docker: Full filesystem scan if quick lookup misses
    Finder->>Docker: docker run -- find / -executable (with timeout)
    Docker-->>Finder: executables_list
    end

    Finder->>Finder: generate_candidates(tool_name)
    Finder->>Finder: match_executables_to_candidates(...)

    rect rgba(150, 100, 200, 0.5)
    Note over Discovery,Executor: Validate top candidates
    Discovery->>Executor: test_help_variations(candidate_path, image)
    Executor->>Docker: docker run candidate --help (or Host execution)
    Docker-->>Executor: stdout/stderr/exit_code
    Executor-->>Discovery: validation_result
    end

    Discovery->>Discovery: select_primary_and_alternates()
    Discovery->>Discovery: save_config(updated_tool) (dry-run/backup handled)
    Discovery-->>User: print_summary()
Loading

Estimated Code Review Effort

🎯 5 (Critical) | ⏱️ ~120 minutes

Poem

🐰 I hop through images and folders wide,

sniffing binaries where secrets hide.
Help texts hum and tests sing true,
I choose the best paths and leave crumbs for you.
A tiny rabbit cheers: discovery, woohoo!

🚥 Pre-merge checks | ✅ 2 | ❌ 1
❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Title check ⚠️ Warning The PR title 'fix: add missing cli v2 parser src' is vague and misleading. The PR does not simply fix missing code; it comprehensively refactors the CLI parsing system by adding new modules (binary_finder, command_executor, discover_binaries, parser), removing parser_v2, and reorganizing the architecture. Revise the title to accurately reflect the scope: 'refactor: consolidate CLI parser with binary discovery and AI-assisted parsing' or 'feat: implement unified CLI parser with binary finder and command executor'.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Docstring Coverage ✅ Passed Docstring coverage is 95.35% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

cursor[bot]

This comment was marked as outdated.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🤖 Fix all issues with AI agents
In `@src/binary_finder.py`:
- Around line 188-224: The current loop in verify_executable_responds_to_help
calls `docker run` without overriding the image entrypoint, so it invokes the
image's default entrypoint instead of the target binary; change the command to
use Docker's --entrypoint to run the target executable (use the existing
binary_path or binary_name) and pass the help_arg as the container argument
(e.g. replace ['docker','run','--rm', docker_image, help_arg] with
['docker','run','--rm','--entrypoint', binary_path, docker_image, help_arg]),
keeping the timeout, capture_output and returncode checks the same.
- Around line 92-104: The current subprocess.run builds a shell command with
f'command -v {binary_name}', which allows shell injection; update the
subprocess.run invocation to pass the binary name as a positional parameter to
sh instead of interpolating it: replace the f-string with a constant command
like 'command -v "$1"' (or similar) and add the binary_name as a separate
argument after the image (e.g., using '--' then binary_name) so the container's
sh receives the binary name as $1; keep the same subprocess.run options
(capture_output, text, timeout) and preserve the existing return check
(result.returncode == 0 and result.stdout.strip()) and the returned tag
"command_v".

In `@src/command_executor.py`:
- Around line 62-73: In the Docker branch, remove the unused import os and the
unused binary_name computation, and build the docker run command to explicitly
set the container entrypoint to the target binary using --entrypoint; i.e., when
docker_image is set, construct cmd to include '--entrypoint' followed by
binary_path (then docker_image and args) so the intended binary is executed
inside the container instead of relying on the image entrypoint (update
references to docker_image, binary_path, cmd, and args accordingly).

In `@src/discover_binaries.py`:
- Around line 179-183: The code assumes image_tags is non-empty; change the
lookup to provide a default when the key is present but empty by using a
fallback (e.g., set image_tags = tool.get('image_tags') or ['latest']) so both
missing and empty lists yield ['latest']; then keep the existing selection logic
for image_tag ('latest' if present else image_tags[0]) and the construction of
docker_image; apply the same change where image_tags is read in
src/process_cli_tools_v2.py to avoid IndexError.
🧹 Nitpick comments (3)
src/binary_finder.py (2)

21-22: Annotate the class-level cache for typing clarity.

This cache is shared across instances, so marking it as ClassVar improves type checking and intent.

♻️ Proposed typing tweak
-from typing import Optional, List, Tuple, Set
+from typing import Optional, List, Tuple, Set, Dict, ClassVar
@@
-    _container_cache = {}
+    _container_cache: ClassVar[Dict[str, List[str]]] = {}

251-273: Consider case-insensitive candidate matching.

Candidate words are lowercased, but executable names aren’t, which can miss matches for mixed-case binaries.

🔧 Case-insensitive matching
-            exe_name = os.path.basename(exe_path)
+            exe_name = os.path.basename(exe_path)
+            exe_name_lower = exe_name.lower()
@@
-            for word in candidates:
-                if word in exe_name:
+            for word in candidates:
+                if word in exe_name_lower:
                     # Simple confidence based on match quality
-                    if exe_name == word:
+                    if exe_name_lower == word:
                         confidence = 1.0
                         match_type = 'exact'
-                    elif exe_name.startswith(word):
+                    elif exe_name_lower.startswith(word):
                         confidence = 0.9
                         match_type = 'starts_with'
src/command_executor.py (1)

21-38: Annotate class-level command lists with ClassVar.

These are class constants (not per-instance), so adding ClassVar improves typing clarity and intent.

♻️ Proposed typing tweaks
-from typing import Optional, Dict, List, Tuple
+from typing import Optional, Dict, List, Tuple, ClassVar
@@
-    HELP_VARIATIONS = [
+    HELP_VARIATIONS: ClassVar[List[List[str]]] = [
@@
-    VERSION_VARIATIONS = [
+    VERSION_VARIATIONS: ClassVar[List[List[str]]] = [

Comment on lines 92 to 104
# Try command -v (works in more minimal containers)
try:
result = subprocess.run(
['docker', 'run', '--rm', '--entrypoint', 'sh', docker_image,
'-c', f'command -v {binary_name}'],
capture_output=True,
text=True,
timeout=30
)
if result.returncode == 0 and result.stdout.strip():
return result.stdout.strip(), "command_v"
except subprocess.TimeoutExpired:
pass
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

# First, locate and examine the binary_finder.py file
find . -name "binary_finder.py" -type f

Repository: O1ahmad/cli-parser

Length of output: 84


🏁 Script executed:

# Check the content around lines 92-104 to see the exact code
if [ -f "src/binary_finder.py" ]; then
  cat -n src/binary_finder.py | sed -n '85,110p'
fi

Repository: O1ahmad/cli-parser

Length of output: 1180


🏁 Script executed:

# Look for where binary_name comes from and if there's any validation
if [ -f "src/binary_finder.py" ]; then
  rg -n "binary_name" src/binary_finder.py -B 2 -A 2
fi

Repository: O1ahmad/cli-parser

Length of output: 1769


🏁 Script executed:

# Check function signature and parameter origin to understand if binary_name is user-controlled
if [ -f "src/binary_finder.py" ]; then
  cat -n src/binary_finder.py | sed -n '1,95p'
fi

Repository: O1ahmad/cli-parser

Length of output: 3773


🏁 Script executed:

# Find where find_in_container is called
rg -n "find_in_container" src/binary_finder.py -B 2 -A 2

Repository: O1ahmad/cli-parser

Length of output: 604


🏁 Script executed:

# Check for other uses of sh -c in the file
rg -n "sh.*-c" src/binary_finder.py -B 2 -A 2

Repository: O1ahmad/cli-parser

Length of output: 44


🏁 Script executed:

# Look for any validation/sanitization of binary_name
rg -n "binary_name" src/binary_finder.py | head -20

Repository: O1ahmad/cli-parser

Length of output: 638


🏁 Script executed:

# Check for callers of find_in_container outside this file
rg -n "find_in_container" --type py

Repository: O1ahmad/cli-parser

Length of output: 402


🏁 Script executed:

# Check if binary_name is validated anywhere before being passed
if [ -f "src/binary_finder.py" ]; then
  cat -n src/binary_finder.py | sed -n '100,210p'
fi

Repository: O1ahmad/cli-parser

Length of output: 5025


🏁 Script executed:

# Check parser_v2.py to understand the other call site
rg -n "find_in_container" src/parser_v2.py -B 5 -A 5

Repository: O1ahmad/cli-parser

Length of output: 503


🏁 Script executed:

# Verify the syntax of the proposed fix is valid
python3 << 'EOF'
import subprocess

# Test that the proposed syntax is valid (without executing)
# When using a list with subprocess.run, shell=False (the default)
# The arguments are passed directly without shell interpretation
proposed_command = [
    'echo', 'test',  # Simulated docker run command
    '-c', 'command -v -- "$1"', 'sh', 'test-binary'
]

# This demonstrates the list structure is valid
print("Proposed command structure (as list):")
print(proposed_command)
print("\nWhen passed to subprocess.run with list (shell=False):")
print("- Arguments 0-2: echo test -c")
print("- Argument 3: 'command -v -- \"$1\"' (literal string, no shell interpretation)")
print("- Argument 4: 'sh'")
print("- Argument 5: 'test-binary' (passed as $1 to sh, no interpolation)")
print("\nConclusion: The proposed syntax is correct for safe parameter passing")
EOF

Repository: O1ahmad/cli-parser

Length of output: 148


Fix shell injection vulnerability in container command check.

The f'command -v {binary_name}' string passed to sh -c allows command injection. If binary_name contains shell metacharacters (e.g., test; rm -rf / or $(malicious)), arbitrary commands execute in the container. Pass the name as a positional parameter instead.

🔒 Safer parameter passing
-            result = subprocess.run(
-                ['docker', 'run', '--rm', '--entrypoint', 'sh', docker_image, 
-                 '-c', f'command -v {binary_name}'],
+            result = subprocess.run(
+                ['docker', 'run', '--rm', '--entrypoint', 'sh', docker_image,
+                 '-c', 'command -v -- "$1"', 'sh', binary_name],
                 capture_output=True,
                 text=True,
                 timeout=30
             )
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
# Try command -v (works in more minimal containers)
try:
result = subprocess.run(
['docker', 'run', '--rm', '--entrypoint', 'sh', docker_image,
'-c', f'command -v {binary_name}'],
capture_output=True,
text=True,
timeout=30
)
if result.returncode == 0 and result.stdout.strip():
return result.stdout.strip(), "command_v"
except subprocess.TimeoutExpired:
pass
# Try command -v (works in more minimal containers)
try:
result = subprocess.run(
['docker', 'run', '--rm', '--entrypoint', 'sh', docker_image,
'-c', 'command -v -- "$1"', 'sh', binary_name],
capture_output=True,
text=True,
timeout=30
)
if result.returncode == 0 and result.stdout.strip():
return result.stdout.strip(), "command_v"
except subprocess.TimeoutExpired:
pass
🧰 Tools
🪛 Ruff (0.14.13)

94-94: subprocess call: check for execution of untrusted input

(S603)


95-96: Starting a process with a partial executable path

(S607)

🤖 Prompt for AI Agents
In `@src/binary_finder.py` around lines 92 - 104, The current subprocess.run
builds a shell command with f'command -v {binary_name}', which allows shell
injection; update the subprocess.run invocation to pass the binary name as a
positional parameter to sh instead of interpolating it: replace the f-string
with a constant command like 'command -v "$1"' (or similar) and add the
binary_name as a separate argument after the image (e.g., using '--' then
binary_name) so the container's sh receives the binary name as $1; keep the same
subprocess.run options (capture_output, text, timeout) and preserve the existing
return check (result.returncode == 0 and result.stdout.strip()) and the returned
tag "command_v".

cursor[bot]

This comment was marked as outdated.

@O1ahmad O1ahmad force-pushed the ahmad/add_missing_src branch from 4865555 to a2811bd Compare January 20, 2026 21:35
Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Fix all issues with AI agents
In `@src/parser.py`:
- Around line 192-196: The preview printing of subcommands assumes every entry
in parsed['subcommands'] has a 'name' key and will KeyError on malformed AI
output; update the join expression in the block that prints the first
subcommands (using parsed, parsed['subcommands'], num_subcmds) to safely extract
names (e.g., use s.get('name', '<unknown>') after verifying s is a dict, coerce
to str, and optionally filter None/empty) so the print never raises and still
shows a sensible placeholder for missing names.
- Around line 293-307: When docker_image is truthy,
BinaryFinder.find_in_container(docker_image, binary_name) may return binary_path
as None; add a guard right after that call to check if binary_path is falsy and
if so log/print the same "not found" message and return None to avoid downstream
failures in parse_command (which calls os.path.basename on binary_path). Update
the block handling docker_image to mirror the host-path error path and return
early when binary_path is missing so parse_command and any subsequent use of
binary_path don't crash.

In `@tests/test_parser.py`:
- Around line 45-47: The test test_parse_aws depends on analyze_with_ai which
early-returns if OPENAI_API_KEY is missing; modify the test_parse_aws to set the
env var using the test fixture monkeypatch (e.g.,
monkeypatch.setenv("OPENAI_API_KEY", "test-key")) before calling analyze_with_ai
so the function proceeds and your mocked requests.post is exercised; ensure the
monkeypatch call happens at the start of the test to guarantee reliability
across CI environments.
♻️ Duplicate comments (1)
src/binary_finder.py (1)

92-99: Avoid shell injection in command -v invocation.

This still interpolates binary_name into sh -c, allowing injection. Use positional parameters instead.

🔒 Safer parameter passing
-            result = subprocess.run(
-                ['docker', 'run', '--rm', '--entrypoint', 'sh', docker_image,
-                 '-c', f'command -v {binary_name}'],
+            result = subprocess.run(
+                ['docker', 'run', '--rm', '--entrypoint', 'sh', docker_image,
+                 '-c', 'command -v -- "$1"', 'sh', binary_name],
                 capture_output=True,
                 text=True,
                 timeout=30
             )
🧹 Nitpick comments (3)
src/discover_binaries.py (3)

28-33: Prefer package-relative imports over sys.path mutation.
Line 28–33 modifies sys.path, which can shadow installed modules and makes packaging/testing harder. Consider switching to package-relative imports and running via python -m ... instead.

♻️ Suggested refactor
-# Add src to path
-sys.path.insert(0, os.path.dirname(__file__))
-
-from binary_finder import BinaryFinder
-from command_executor import CommandExecutor
+from .binary_finder import BinaryFinder
+from .command_executor import CommandExecutor

123-135: Surface image-inspect failures instead of silently swallowing them.
Line 124–134 uses a blanket except: pass, which hides daemon/permission issues and makes troubleshooting harder. Consider logging the error (at least WARN/DEBUG) and narrowing exception types.

🔧 Suggested change
         try:
             result = subprocess.run(
                 ['docker', 'image', 'inspect', image],
                 capture_output=True,
                 timeout=10
             )
             if result.returncode == 0:
-                self.log(f"  → Image available locally")
+                self.log("  → Image available locally")
                 return True
-        except Exception:
-            pass
+        except FileNotFoundError as e:
+            self.log(f"  → Docker not found: {e}", "WARN")
+        except subprocess.TimeoutExpired as e:
+            self.log(f"  → Timeout checking image: {e}", "WARN")
+        except Exception as e:
+            self.log(f"  → Error inspecting image: {e}", "WARN")

469-480: Make --update and --dry-run mutually exclusive.
Allowing both flags can confuse users; a mutually exclusive group avoids ambiguous intent and simplifies dry-run logic.

♻️ Suggested refactor
-    parser.add_argument('--update', action='store_true', help='Update config file with discoveries')
-    parser.add_argument('--dry-run', action='store_true', help='Dry run (no config updates)')
+    mode = parser.add_mutually_exclusive_group()
+    mode.add_argument('--update', action='store_true', help='Update config file with discoveries')
+    mode.add_argument('--dry-run', action='store_true', help='Dry run (no config updates)')
@@
-    dry_run = args.dry_run or not args.update
+    dry_run = not args.update

src/parser.py Outdated
Comment on lines 192 to 196
# Show first few subcommands and options
if num_subcmds > 0:
print(
f" → First subcommands: {', '.join([s['name'] for s in parsed['subcommands'][:5]])}{' ...' if num_subcmds > 5 else ''}"
)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Guard against malformed AI output in subcommand preview.

If the model returns any subcommand missing name, this line will raise KeyError and abort parsing. Add a safe extraction path to keep parsing resilient.

💡 Proposed fix
-    if num_subcmds > 0:
-        print(
-            f"  → First subcommands: {', '.join([s['name'] for s in parsed['subcommands'][:5]])}{' ...' if num_subcmds > 5 else ''}"
-        )
+    if num_subcmds > 0:
+        preview_names = [
+            s.get("name")
+            for s in parsed["subcommands"][:5]
+            if isinstance(s, dict) and s.get("name")
+        ]
+        if preview_names:
+            print(
+                f"  → First subcommands: {', '.join(preview_names)}{' ...' if num_subcmds > 5 else ''}"
+            )
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
# Show first few subcommands and options
if num_subcmds > 0:
print(
f" → First subcommands: {', '.join([s['name'] for s in parsed['subcommands'][:5]])}{' ...' if num_subcmds > 5 else ''}"
)
# Show first few subcommands and options
if num_subcmds > 0:
preview_names = [
s.get("name")
for s in parsed["subcommands"][:5]
if isinstance(s, dict) and s.get("name")
]
if preview_names:
print(
f" → First subcommands: {', '.join(preview_names)}{' ...' if num_subcmds > 5 else ''}"
)
🤖 Prompt for AI Agents
In `@src/parser.py` around lines 192 - 196, The preview printing of subcommands
assumes every entry in parsed['subcommands'] has a 'name' key and will KeyError
on malformed AI output; update the join expression in the block that prints the
first subcommands (using parsed, parsed['subcommands'], num_subcmds) to safely
extract names (e.g., use s.get('name', '<unknown>') after verifying s is a dict,
coerce to str, and optionally filter None/empty) so the print never raises and
still shows a sensible placeholder for missing names.

Comment on lines 45 to 47
def test_parse_aws(mock_subprocess_run):
"""Test parsing AWS CLI with mocked subprocess and OpenAI API calls"""
# Mock outputs for aws commands
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

git ls-files | head -30

Repository: O1ahmad/cli-parser

Length of output: 1067


🏁 Script executed:

# Find the main code and test files
fd -t f -e py | grep -E "(test_|analyze|parser)" | head -20

Repository: O1ahmad/cli-parser

Length of output: 96


🏁 Script executed:

# Look for analyze_with_ai function
rg "def analyze_with_ai" -A 10

Repository: O1ahmad/cli-parser

Length of output: 520


🏁 Script executed:

# Check the test file around lines 45-47
cat -n tests/test_parser.py | sed -n '30,60p'

Repository: O1ahmad/cli-parser

Length of output: 1629


🏁 Script executed:

# Search for OPENAI_API_KEY usage
rg "OPENAI_API_KEY" -B 2 -A 2

Repository: O1ahmad/cli-parser

Length of output: 470


🏁 Script executed:

# Get more of the test_parse_aws function
cat -n tests/test_parser.py | sed -n '45,150p'

Repository: O1ahmad/cli-parser

Length of output: 4679


🏁 Script executed:

# Check for conftest.py and any global fixtures
find . -name "conftest.py" -exec cat -n {} \;

Repository: O1ahmad/cli-parser

Length of output: 44


🏁 Script executed:

# Check for mock_subprocess_run fixture definition
rg "def mock_subprocess_run" -B 2 -A 10

Repository: O1ahmad/cli-parser

Length of output: 653


🏁 Script executed:

# Check if analyze_with_ai is called within the parse function
rg "analyze_with_ai" -B 3 -A 3

Repository: O1ahmad/cli-parser

Length of output: 1032


🏁 Script executed:

# Check the CI configuration for env vars
cat -n .github/workflows/CI.yml

Repository: O1ahmad/cli-parser

Length of output: 1009


🏁 Script executed:

# Get the rest of test_parse_aws to see what it actually does
cat -n tests/test_parser.py | sed -n '150,200p'

Repository: O1ahmad/cli-parser

Length of output: 2330


🏁 Script executed:

# Check if there's any mocking of OpenAI API or requests
rg "mock|patch|Mock" tests/test_parser.py -B 1 -A 1

Repository: O1ahmad/cli-parser

Length of output: 859


🏁 Script executed:

# Check the actual parse function to understand the flow
cat -n src/parser.py | sed -n '1,100p'

Repository: O1ahmad/cli-parser

Length of output: 5168


Set OPENAI_API_KEY in the test using monkeypatch to ensure reliability.

analyze_with_ai checks for the env var at the start and returns None if missing, before any API calls are made. Even though requests.post is mocked, the test will fail if OPENAI_API_KEY is not set, creating a fragile dependency on CI secrets configuration.

💡 Proposed fix
-def test_parse_aws(mock_subprocess_run):
+def test_parse_aws(mock_subprocess_run, monkeypatch):
+    monkeypatch.setenv("OPENAI_API_KEY", "test-key")
     """Test parsing AWS CLI with mocked subprocess and OpenAI API calls"""
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
def test_parse_aws(mock_subprocess_run):
"""Test parsing AWS CLI with mocked subprocess and OpenAI API calls"""
# Mock outputs for aws commands
def test_parse_aws(mock_subprocess_run, monkeypatch):
"""Test parsing AWS CLI with mocked subprocess and OpenAI API calls"""
monkeypatch.setenv("OPENAI_API_KEY", "test-key")
# Mock outputs for aws commands
🤖 Prompt for AI Agents
In `@tests/test_parser.py` around lines 45 - 47, The test test_parse_aws depends
on analyze_with_ai which early-returns if OPENAI_API_KEY is missing; modify the
test_parse_aws to set the env var using the test fixture monkeypatch (e.g.,
monkeypatch.setenv("OPENAI_API_KEY", "test-key")) before calling analyze_with_ai
so the function proceeds and your mocked requests.post is exercised; ensure the
monkeypatch call happens at the start of the test to guarantee reliability
across CI environments.

@O1ahmad O1ahmad force-pushed the ahmad/add_missing_src branch from a2811bd to 72e8228 Compare January 20, 2026 21:48
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Fix all issues with AI agents
In `@src/command_executor.py`:
- Around line 196-197: The return type annotation for test_help_variations is
incorrect: change Dict[str, any] to Dict[str, Any] and add Any to the typing
imports (alongside Dict/Optional) so the annotation uses the proper typing.Any
type; update the import statement (e.g., add "Any") and replace the annotation
on the static method test_help_variations accordingly.
- Around line 100-113: The help_indicators list in command_executor.py contains
a duplicate "usage:" entry; edit the help_indicators definition (variable
help_indicators) to remove the redundant "usage:" string (the third occurrence)
so each indicator is unique (keep "usage:" and "usage :" as you prefer),
ensuring the list no longer contains duplicate entries.
♻️ Duplicate comments (3)
tests/test_parser.py (1)

45-46: Set OPENAI_API_KEY to ensure test reliability.

The test depends on analyze_with_ai which early-returns None if OPENAI_API_KEY is not set (before any API calls are made). Even though requests.post is mocked, the test will fail in environments without this env var set, creating CI fragility.

🐛 Fix: Add monkeypatch fixture
-def test_parse_aws(mock_subprocess_run):
+def test_parse_aws(mock_subprocess_run, monkeypatch):
+    monkeypatch.setenv("OPENAI_API_KEY", "test-key")
     """Test parsing AWS CLI with mocked subprocess and OpenAI API calls"""
src/binary_finder.py (1)

92-104: Shell injection vulnerability in container command check.

The f'command -v {binary_name}' string passed to sh -c allows command injection. If binary_name contains shell metacharacters (e.g., test; rm -rf /), arbitrary commands execute in the container.

🔒 Safer parameter passing
         try:
             result = subprocess.run(
                 ['docker', 'run', '--rm', '--entrypoint', 'sh', docker_image,
-                 '-c', f'command -v {binary_name}'],
+                 '-c', 'command -v -- "$1"', 'sh', binary_name],
                 capture_output=True,
                 text=True,
                 timeout=30
             )

This passes binary_name as a positional parameter $1 to the shell, avoiding interpolation.

src/parser.py (1)

188-192: Guard against malformed AI output in subcommand/option preview.

If the AI returns a subcommand without a name key, line 189 will raise KeyError and abort parsing. Use safe access to maintain resilience.

🐛 Safer extraction
     if num_subcmds > 0:
-        print(f"  → First subcommands: {', '.join([s['name'] for s in parsed['subcommands'][:5]])}{' ...' if num_subcmds > 5 else ''}")
+        preview_names = [s.get('name', '<unnamed>') for s in parsed['subcommands'][:5] if isinstance(s, dict)]
+        if preview_names:
+            print(f"  → First subcommands: {', '.join(preview_names)}{' ...' if num_subcmds > 5 else ''}")
     if num_opts > 0:
-        print(f"  → First options: {', '.join([o.get('option', o.get('shortcut', '?')) for o in parsed['options'][:5]])}{' ...' if num_opts > 5 else ''}")
+        preview_opts = [o.get('option', o.get('shortcut', '?')) for o in parsed['options'][:5] if isinstance(o, dict)]
+        if preview_opts:
+            print(f"  → First options: {', '.join(preview_opts)}{' ...' if num_opts > 5 else ''}")
🧹 Nitpick comments (4)
src/process_cli_tools_v2.py (1)

322-335: Variable parser shadows the imported parser module.

The argparse.ArgumentParser instance at line 322 shadows the parser module imported at line 24. While this doesn't cause a runtime issue in main() since it doesn't call parser.parse_binary after this point, it's confusing and error-prone.

♻️ Suggested fix: rename the argument parser
 def main():
     """CLI entry point"""
-    parser = argparse.ArgumentParser(
+    arg_parser = argparse.ArgumentParser(
         description="Process multiple CLI tools from configuration",
         formatter_class=argparse.RawDescriptionHelpFormatter,
     )

-    parser.add_argument("config", help="Path to config JSON file")
-    parser.add_argument("--output-dir", default="data/results", help="Base output directory")
-    parser.add_argument("--max-depth", type=int, default=20, help="Maximum recursion depth (default: 20)")
-    parser.add_argument("--only", nargs="+", help="Only process these tools (by name)")
-    parser.add_argument("--parallel", action="store_true", help="Process tools in parallel")
-    parser.add_argument("--workers", type=int, default=4, help="Number of parallel workers (default: 4)")
-    parser.add_argument("--no-skip", action="store_true", help="Reprocess even if output exists")
+    arg_parser.add_argument("config", help="Path to config JSON file")
+    arg_parser.add_argument("--output-dir", default="data/results", help="Base output directory")
+    arg_parser.add_argument("--max-depth", type=int, default=20, help="Maximum recursion depth (default: 20)")
+    arg_parser.add_argument("--only", nargs="+", help="Only process these tools (by name)")
+    arg_parser.add_argument("--parallel", action="store_true", help="Process tools in parallel")
+    arg_parser.add_argument("--workers", type=int, default=4, help="Number of parallel workers (default: 4)")
+    arg_parser.add_argument("--no-skip", action="store_true", help="Reprocess even if output exists")

-    args = parser.parse_args()
+    args = arg_parser.parse_args()
src/command_executor.py (1)

186-193: Move import re to module level and simplify condition.

The import re inside the function (line 189) is executed on every call, adding unnecessary overhead. Also, len(output) > 0 is redundant when output is already truthy.

♻️ Suggested fix

Move import to top of file:

import re
import subprocess
from typing import Dict, List, Optional, Tuple

Then simplify the condition:

-            if output and (exit_code in [0, 1]) and len(output) > 0:
+            if output and exit_code in [0, 1]:
                 # Check if it looks like version output (has numbers)
-                import re
-
                 if re.search(r"\d+\.\d+", output):
                     return output
src/parser.py (1)

87-96: Consider lowering temperature for more consistent parsing.

A temperature of 0.7 can produce varied outputs. For structured JSON parsing tasks, a lower temperature (0.0–0.3) typically yields more deterministic and consistent results.

         'response_format': {'type': "json_object"},
-        'temperature': 0.7,
+        'temperature': 0.1,
     }
src/binary_finder.py (1)

204-222: LGTM on --entrypoint usage; consider logging silenced exceptions.

The Docker command now correctly uses --entrypoint to test the specific binary. However, the bare except at line 220 silently swallows all exceptions - consider logging these for debugging.

-            except (subprocess.TimeoutExpired, Exception):
+            except subprocess.TimeoutExpired:
+                continue
+            except Exception as e:
+                # Log for debugging but continue trying other variations
+                print(f"  → Help variation failed: {e}")
                 continue

Comment on lines +100 to +113
help_indicators = [
"usage:",
"usage :",
"usage:",
"options:",
"flags:",
"commands:",
"subcommands:",
"help",
"examples:",
"arguments:",
"synopsis",
"description:",
]
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Duplicate entry in help_indicators list.

Lines 101 and 103 both contain "usage:". Remove the duplicate.

🐛 Fix
         help_indicators = [
             "usage:",
-            "usage :",
             "usage:",
+            "usage :",
             "options:",

Wait, looking again - line 101 is "usage:", line 102 is "usage :" (with space), and line 103 is "usage:" again. Remove line 103:

         help_indicators = [
             "usage:",
             "usage :",
-            "usage:",
             "options:",
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
help_indicators = [
"usage:",
"usage :",
"usage:",
"options:",
"flags:",
"commands:",
"subcommands:",
"help",
"examples:",
"arguments:",
"synopsis",
"description:",
]
help_indicators = [
"usage:",
"usage :",
"options:",
"flags:",
"commands:",
"subcommands:",
"help",
"examples:",
"arguments:",
"synopsis",
"description:",
]
🤖 Prompt for AI Agents
In `@src/command_executor.py` around lines 100 - 113, The help_indicators list in
command_executor.py contains a duplicate "usage:" entry; edit the
help_indicators definition (variable help_indicators) to remove the redundant
"usage:" string (the third occurrence) so each indicator is unique (keep
"usage:" and "usage :" as you prefer), ensuring the list no longer contains
duplicate entries.

Comment on lines +196 to +197
@staticmethod
def test_help_variations(binary_path: str, docker_image: Optional[str] = None) -> Dict[str, any]:
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Fix type annotation: use Any instead of any.

any (lowercase) is a built-in function, not a type. The return type annotation should use Any from the typing module.

🐛 Fix

Update the import at line 12:

-from typing import Dict, List, Optional, Tuple
+from typing import Any, Dict, List, Optional, Tuple

Fix the annotation:

-    def test_help_variations(binary_path: str, docker_image: Optional[str] = None) -> Dict[str, any]:
+    def test_help_variations(binary_path: str, docker_image: Optional[str] = None) -> Dict[str, Any]:
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
@staticmethod
def test_help_variations(binary_path: str, docker_image: Optional[str] = None) -> Dict[str, any]:
from typing import Any, Dict, List, Optional, Tuple
Suggested change
@staticmethod
def test_help_variations(binary_path: str, docker_image: Optional[str] = None) -> Dict[str, any]:
`@staticmethod`
def test_help_variations(binary_path: str, docker_image: Optional[str] = None) -> Dict[str, Any]:
🤖 Prompt for AI Agents
In `@src/command_executor.py` around lines 196 - 197, The return type annotation
for test_help_variations is incorrect: change Dict[str, any] to Dict[str, Any]
and add Any to the typing imports (alongside Dict/Optional) so the annotation
uses the proper typing.Any type; update the import statement (e.g., add "Any")
and replace the annotation on the static method test_help_variations
accordingly.

@O1ahmad O1ahmad merged commit 206f3bb into main Jan 20, 2026
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants