Skip to content

[cli,mcp,dashboard,docs,skills] feat: add non-blocking service workflow and UI#66

Merged
Wangmerlyn merged 7 commits intomainfrom
feat/cli-service-dashboard
Mar 4, 2026
Merged

[cli,mcp,dashboard,docs,skills] feat: add non-blocking service workflow and UI#66
Wangmerlyn merged 7 commits intomainfrom
feat/cli-service-dashboard

Conversation

@Wangmerlyn
Copy link
Owner

@Wangmerlyn Wangmerlyn commented Mar 4, 2026

Summary

  • add non-blocking keep-gpu service commands (start, status, stop, list-gpus, serve) while preserving default blocking behavior for backward compatibility
  • extend the KeepGPU local server with REST endpoints and static asset hosting, then add a React/Vite "GPU Keepalive Command Deck" dashboard at /
  • update tests, docs, and skill guidance to cover service-mode workflows and dashboard usage

Validation

  • pytest tests/mcp/test_server.py tests/mcp/test_http_api.py tests/test_cli_thresholds.py tests/test_cli_service_commands.py
  • pre-commit run --all-files
  • mkdocs build

Summary by CodeRabbit

  • New Features

    • Non-blocking service workflow with CLI subcommands (serve, start, status, stop, service-stop, list-gpus), auto-start and dashboard URL hints
    • Interactive web dashboard for session control and telemetry
    • Dual-mode service API exposing JSON-RPC and REST endpoints
  • Bug Fixes

    • More reliable shutdown/release behavior, timeout-aware stop handling, and clearer user-facing errors
  • Documentation

    • Expanded getting-started, CLI reference, MCP/API guides, quick-start and examples
  • Tests

    • New HTTP API and CLI service command test coverage

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Mar 4, 2026

Warning

Rate limit exceeded

@Wangmerlyn has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 14 minutes and 35 seconds before requesting another review.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 7e2bdb43-1820-423d-a72d-866d182eccfa

📥 Commits

Reviewing files that changed from the base of the PR and between b8d7f42 and 641a56b.

📒 Files selected for processing (12)
  • docs/guides/cli.md
  • docs/plans/cli-service-dashboard/findings.md
  • docs/plans/cli-service-dashboard/progress.md
  • docs/reference/cli.md
  • skills/gpu-keepalive-with-keepgpu/SKILL.md
  • src/keep_gpu/cli.py
  • src/keep_gpu/mcp/server.py
  • src/keep_gpu/single_gpu_controller/cuda_gpu_controller.py
  • src/keep_gpu/single_gpu_controller/rocm_gpu_controller.py
  • tests/mcp/test_http_api.py
  • tests/mcp/test_server.py
  • tests/test_cli_service_commands.py
📝 Walkthrough

Walkthrough

Introduces a comprehensive service-oriented architecture for KeepGPU, replacing the blocking-only CLI with a multi-command interface (serve, start, status, stop, list-gpus, service-stop) supporting non-blocking sessions. Expands the MCP server to expose HTTP/REST endpoints alongside JSON-RPC and adds a React-based web dashboard UI.

Changes

Cohort / File(s) Summary
Documentation – User guides and index
README.md, docs/getting-started.md, docs/guides/cli.md, docs/guides/mcp.md, docs/index.md, docs/reference/cli.md, mkdocs.yml
Updated documentation to reflect the new service architecture, non-blocking CLI subcommands (start, status, stop, serve, list-gpus, service-stop), REST/JSON-RPC API endpoints, dashboard URL, and blocking vs. non-blocking workflow guidance.
Planning documents
docs/plans/cli-service-dashboard/task_plan.md, docs/plans/cli-service-dashboard/findings.md, docs/plans/cli-service-dashboard/progress.md
Introduces comprehensive planning and progress tracking for the CLI service dashboard initiative, capturing design direction, implementation milestones, validation results, and known findings.
Skills documentation
skills/gpu-keepalive-with-keepgpu/SKILL.md, skills/gpu-keepalive-with-keepgpu/agents/openai.yaml, docs/skills/gpu-keepalive-with-keepgpu/skillcheck-free-report.md
Updates skill documentation to cover dual execution modes (blocking and non-blocking), CLI workflow, dashboard access, and agent-friendly command sequences.
CLI service layer
src/keep_gpu/cli.py
Complete CLI overhaul adding multi-command subcommands (serve, start, status, stop, list-gpus, service-stop), service lifecycle management (auto-start, graceful shutdown), HTTP/RPC client primitives, persistent runtime tracking (PID/log files), input validation, and backward-compatibility for legacy flags.
MCP server HTTP expansion
src/keep_gpu/mcp/server.py
Extends MCP server from JSON-RPC stdio-only to dual-mode HTTP service with REST endpoints (/api/gpus, /api/sessions, /api/sessions/{job_id}, /health), static asset serving with SPA fallback, JSON-RPC HTTP compatibility, and graceful shutdown integration.
Static dashboard assets
src/keep_gpu/mcp/static/index.html, src/keep_gpu/mcp/static/assets/index.css
Adds HTML entry point and CSS styling for the KeepGPU Control Deck dashboard UI with theming, typography, responsive layout, and component styling.
Frontend application
web/dashboard/src/App.jsx, web/dashboard/src/main.jsx, web/dashboard/src/styles.css, web/dashboard/vite.config.js, web/dashboard/package.json, web/dashboard/index.html, web/dashboard/.gitignore
Complete React/Vite dashboard frontend with API client, state management, GPU/session telemetry display, session lifecycle controls (start/stop), real-time refresh, error handling, and comprehensive UI component styling.
HTTP API tests
tests/mcp/test_http_api.py
Validates KeepGPUServer HTTP API endpoints, health checks, static index serving, and session lifecycle operations (POST start, GET sessions, DELETE stop).
CLI service command tests
tests/test_cli_service_commands.py
Comprehensive test coverage for new CLI subcommands (start, stop, service-stop, list-gpus, status), service auto-start behavior, RPC interaction, error handling, and lazy-import verification.

Sequence Diagram(s)

sequenceDiagram
    participant User as User/Agent
    participant CLI as CLI Interface
    participant Service as Local Service
    participant Controller as GPU Controller
    
    User->>CLI: keep-gpu start --gpu-ids 0
    CLI->>CLI: Check if service running
    CLI->>Service: Auto-start service (spawn process)
    Service->>Service: Initialize HTTP/RPC server
    Service-->>CLI: Service ready
    CLI->>Service: RPC call: start_keep(gpu_ids, vram, ...)
    Service->>Controller: Acquire GPU resource
    Controller-->>Service: Session created (job_id)
    Service-->>CLI: job_id + metadata
    CLI-->>User: Print job_id & dashboard URL
    
    Note over User,CLI: Non-blocking: User continues
    
    User->>CLI: keep-gpu status
    CLI->>Service: RPC/HTTP: status(job_id?)
    Service->>Controller: Query session state
    Controller-->>Service: Session metrics
    Service-->>CLI: Status JSON
    CLI-->>User: Display status + utilization
    
    User->>CLI: keep-gpu stop --job-id <id>
    CLI->>Service: RPC/HTTP: stop_keep(job_id)
    Service->>Controller: Release GPU session
    Controller-->>Service: Released
    Service-->>CLI: Stopped confirmation
    CLI-->>User: Session stopped
Loading
sequenceDiagram
    participant Browser as Browser/Dashboard UI
    participant Dashboard as React App
    participant API as HTTP REST Server
    participant Service as Keep Service
    
    Browser->>Dashboard: Load /index.html
    Dashboard->>API: GET /api/gpus
    API->>Service: Query GPU telemetry
    Service-->>API: GPU list + utilization
    API-->>Dashboard: JSON response
    
    Dashboard->>API: GET /api/sessions
    API->>Service: Query active sessions
    Service-->>API: Sessions list
    API-->>Dashboard: Sessions JSON
    
    Dashboard-->>Browser: Render GPU grid + session list
    
    Browser->>Dashboard: Click "Start Session" form
    Dashboard->>API: POST /api/sessions (gpu_ids, vram, ...)
    API->>Service: start_keep RPC
    Service-->>API: job_id + session metadata
    API-->>Dashboard: Session created
    Dashboard-->>Browser: Update sessions list, show status
    
    Note over Browser,Service: Auto-refresh every 3s
    Dashboard->>API: GET /api/gpus
    API-->>Dashboard: Updated metrics
    Dashboard-->>Browser: Refresh telemetry display
Loading

Estimated code review effort

🎯 5 (Critical) | ⏱️ ~120 minutes

Possibly related PRs

  • PR #59 – Introduced the foundational MCP JSON-RPC server that this PR significantly extends with HTTP/REST endpoints, static dashboard asset serving, and richer session encapsulation.
  • PR #60 – Added HTTP/JSON-RPC transport and expanded MCP server endpoints; this PR generalizes and completes that work with REST-style operations and a full dashboard.
  • PR #63 – Involves CUDA controller workload changes (_run_relu_batch) referenced in documentation and implementation that overlap with this feature's GPU management scope.

Poem

🐰 A rabbit hops through the dashboard dreams,
Non-blocking flows and API streams,
With service mode and REST so clean,
The finest keepalive ever seen!
Sessions bloom, GPUs sing, keep-alive takes wing! 🚀✨

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 14.71% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and concisely summarizes the main changes: adding a non-blocking service workflow and UI across multiple components (CLI, MCP, dashboard, docs, skills).

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch feat/cli-service-dashboard

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the keep-gpu utility by introducing a non-blocking service mode and a user-friendly web dashboard. These additions aim to improve the flexibility and usability of GPU management, particularly for automated agent workflows, by allowing users to initiate and monitor GPU keep-alive sessions without tying up their terminal, and providing a visual overview of GPU status and active sessions.

Highlights

  • Non-blocking Service Workflow: Introduced new keep-gpu service commands (start, status, stop, list-gpus, serve) to enable non-blocking GPU keep-alive sessions, suitable for agent-based workflows, while maintaining backward compatibility for the default blocking behavior.
  • GPU Keepalive Command Deck Dashboard: Developed and integrated a new React/Vite-based web dashboard, accessible via http://127.0.0.1:8765/ when the service is running, providing a visual interface for live GPU telemetry, session management (start, stop, status), and configuration.
  • Extended Local Server with REST API: The KeepGPU local server (keep-gpu-mcp-server) was enhanced to include REST endpoints (/api/gpus, /api/sessions, /health) alongside existing JSON-RPC capabilities, and now serves the static assets for the new dashboard.
  • Comprehensive Documentation Updates: All relevant documentation, including the README.md, getting-started.md, cli.md, mcp.md, and SKILL.md files, has been updated to reflect the new service mode, dashboard usage, and CLI commands.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog
  • README.md
    • Updated CudaGPUController reference from _run_mat_batch to _run_relu_batch.
    • Added quick start examples for non-blocking service mode and dashboard access.
    • Expanded the 'Flags that matter' section to differentiate between blocking mode knobs and new service mode commands.
    • Updated the 'MCP endpoint' section to 'MCP and service API', detailing REST examples and dashboard access.
  • docs/getting-started.md
    • Added a new section 'Non-blocking workflow for agents' with examples for keep-gpu start, status, and serve commands, including dashboard access.
  • docs/guides/cli.md
    • Restructured the document to clearly separate 'Blocking mode' and 'Non-blocking service mode' workflows.
    • Added detailed descriptions and examples for new service commands (start, status, stop, list-gpus, serve).
    • Introduced a new 'Dashboard UI' section with instructions on accessing the web interface.
    • Updated 'Command knobs' and 'Troubleshooting' sections to include service mode specifics.
  • docs/guides/mcp.md
    • Renamed the document from 'MCP Server' to 'MCP and Service API'.
    • Updated the introduction to reflect the new JSON-RPC, REST API, and Dashboard interfaces.
    • Provided new examples for REST API calls (GET, POST, DELETE) for sessions and GPU info.
    • Added a dedicated 'Dashboard' section with access instructions and 'Remote and security notes'.
  • docs/index.md
    • Updated the description of the CLI to include 'blocking and non-blocking session workflows'.
    • Added new entries for 'Dashboard + API' and 'MCP and Service API' in the 'What’s inside' section.
  • docs/plans/cli-service-dashboard/findings.md
    • Added a new document outlining repository observations, design direction (retro-futuristic control room aesthetic), implemented architecture details, and validation findings for the non-blocking CLI and dashboard features.
  • docs/plans/cli-service-dashboard/progress.md
    • Added a new document logging the progress of the non-blocking CLI and dashboard feature implementation, detailing completed tasks, in-progress items, next steps, and validation results.
  • docs/plans/cli-service-dashboard/task_plan.md
    • Added a new document outlining the task plan for implementing the non-blocking CLI, local service, and dashboard, including background, goals, proposed solution, phases, and risks.
  • docs/reference/cli.md
    • Restructured the document to provide a 'Command summary' and separate sections for 'Blocking mode options' and 'Service mode' commands.
    • Detailed options for serve, start, status, stop, and list-gpus subcommands.
    • Added a new 'Service HTTP endpoints' table listing all available REST and JSON-RPC endpoints.
    • Updated environment variables section to clarify their application to blocking mode.
  • docs/skills/gpu-keepalive-with-keepgpu/skillcheck-free-report.md
    • Updated the validation method description to include coverage of blocking and non-blocking KeepGPU workflows, including dashboard usage.
  • mkdocs.yml
    • Updated the navigation entry for 'MCP Server' to 'MCP and Service API'.
  • skills/gpu-keepalive-with-keepgpu/SKILL.md
    • Updated the skill description to include non-blocking service workflows and dashboard usage.
    • Added new 'Command model' sections for blocking and non-blocking modes with examples.
    • Revised 'Agent workflow' steps to incorporate mode selection and service commands.
    • Updated 'Command templates' to include agent-friendly non-blocking sequences and dashboard access.
    • Modified suggested response shape for user requests to use keep-gpu start and dashboard verification.
  • skills/gpu-keepalive-with-keepgpu/agents/openai.yaml
    • Updated short_description and default_prompt to reflect support for blocking and non-blocking workflows with dashboard integration.
  • src/keep_gpu/cli.py
    • Refactored the main CLI entry point to use app.callback for the default blocking behavior.
    • Added new subcommands: serve (to run the local service), start (to initiate non-blocking sessions), status (to check session status), stop (to terminate sessions), and list-gpus (to fetch GPU telemetry).
    • Implemented logic for auto-starting the local service when start is invoked and the service is not running.
    • Added utility functions for HTTP JSON requests and RPC calls to interact with the local service.
  • src/keep_gpu/mcp/server.py
    • Extended the KeepGPUServer to handle HTTP GET and DELETE requests for REST endpoints (/health, /api/gpus, /api/sessions, /api/sessions/{job_id}).
    • Implemented static file serving for the new dashboard UI from the src/keep_gpu/mcp/static/ directory, including SPA fallback.
    • Modified _JSONRPCHandler to process POST requests for both JSON-RPC and new REST session creation.
    • Added STATIC_DIR constant to point to the dashboard's static assets.
  • src/keep_gpu/mcp/static/assets/index.css
    • Added a new CSS file containing styling for the GPU Keepalive Command Deck dashboard, featuring a retro-futuristic aesthetic.
  • src/keep_gpu/mcp/static/index.html
    • Added a new HTML entry point for the compiled React dashboard, linking to its JavaScript and CSS assets.
  • tests/mcp/test_http_api.py
    • Added a new test file to verify the functionality of the HTTP API endpoints, including /health, static file serving, and the session lifecycle (start, status, stop) via REST calls.
  • tests/test_cli_service_commands.py
    • Added a new test file to validate the behavior of the new CLI service commands, ensuring start correctly uses RPC calls and stop enforces required arguments, and confirming blocking mode remains the default.
  • web/dashboard/.gitignore
    • Added a new .gitignore file for the dashboard's node_modules/ directory.
  • web/dashboard/index.html
    • Added the development HTML entry point for the React dashboard.
  • web/dashboard/package-lock.json
    • Added the package-lock.json file, locking dependencies for the React/Vite dashboard.
  • web/dashboard/package.json
    • Added the package.json file, defining dependencies (React, React-DOM) and dev dependencies (Vite, @vitejs/plugin-react) for the dashboard.
  • web/dashboard/src/App.jsx
    • Added the main React component for the dashboard, implementing state management for GPU telemetry, active sessions, and form inputs.
    • Includes functions for API interaction (start, stop, refresh sessions), data formatting, and UI rendering for the control deck layout.
  • web/dashboard/src/main.jsx
    • Added the main JavaScript entry point for the React dashboard application, rendering the App component into the DOM.
  • web/dashboard/src/styles.css
    • Added the primary CSS file for the React dashboard, defining global styles, layout, and component-specific aesthetics with a retro-futuristic theme.
  • web/dashboard/vite.config.js
    • Added the Vite configuration file for building the React dashboard, specifying output directory, asset naming conventions, and React plugin usage.
Activity
  • The author created an implementation plan in docs/plans/cli-service-dashboard/task_plan.md.
  • Initial repository findings were captured in docs/plans/cli-service-dashboard/findings.md.
  • CLI subcommands and service auto-start were implemented in src/keep_gpu/cli.py.
  • Server HTTP endpoints and static dashboard routing were extended in src/keep_gpu/mcp/server.py.
  • A React/Vite dashboard source was added under web/dashboard/ and static assets were built.
  • New tests were added in tests/mcp/test_http_api.py and tests/test_cli_service_commands.py.
  • README and documentation were updated to reflect the new service and dashboard workflow.
  • Validation was performed, with pytest passing 13 tests, pre-commit hooks passing, and mkdocs build succeeding.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@Wangmerlyn
Copy link
Owner Author

/gemini review\n@coderabbitai review

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Mar 4, 2026

@Wangmerlyn Sure! I'll start a review of the PR now.


🧠 Learnings used
Learnt from: CR
Repo: Wangmerlyn/KeepGPU PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-02-15T07:10:27.370Z
Learning: Applies to src/keep_gpu/cli.py : CLI interface at `keep-gpu` is a first-class interface that must be kept in sync with related documentation and tests when behavior changes

Learnt from: CR
Repo: Wangmerlyn/KeepGPU PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-02-15T07:10:27.370Z
Learning: Applies to src/keep_gpu/mcp/server.py : MCP server interface `keep-gpu-mcp-server` is a first-class interface that must be kept in sync with related documentation and tests when behavior changes
✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 135a4423e5

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +31 to +32
.map((part) => Number(part.trim()))
.filter((value) => Number.isFinite(value))

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Reject empty GPU ID tokens in dashboard parsing

The dashboard parser coerces each comma-separated token with Number(...), and Number("") becomes 0, so inputs with a trailing/extra comma (for example "1," or "0,,2") silently add GPU 0 to the request. This can arm sessions on the wrong device and unexpectedly occupy GPU 0, which is especially risky on shared machines; the parser should treat empty/invalid tokens as an input error instead of coercing/filtering them.

Useful? React with 👍 / 👎.

Comment on lines +303 to +305
if path.startswith("/api/sessions/"):
job_id = unquote(path.rsplit("/", 1)[-1])
self._json_response(200, server_ref.stop_keep(job_id=job_id))

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Prevent trailing-slash deletes from stopping all sessions

The DELETE route extracts the job id with path.rsplit("/", 1)[-1], which yields an empty string for /api/sessions/<job_id>/; passing that empty value into stop_keep triggers the "stop all" branch because falsy job_id means no specific target. A single trailing slash can therefore terminate every active session instead of one session, so this path should reject empty IDs (or normalize path handling before dispatch).

Useful? React with 👍 / 👎.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a new non-blocking service-based workflow with a web dashboard for KeepGPU, maintaining backward compatibility with the existing CLI. A critical security concern is the lack of authentication in the new local service API, making it vulnerable if exposed to the network, and its susceptibility to memory exhaustion and crashes from malformed JSON payloads. Furthermore, the implementation has a potential crash on Windows when auto-starting the service and an unhandled error case in the REST API, suggesting opportunities for more robust error handling and a more user-friendly UI.

Comment on lines +100 to +116
subprocess.Popen(
[
sys.executable,
"-m",
"keep_gpu.mcp.server",
"--mode",
"http",
"--host",
host,
"--port",
str(port),
],
stdin=subprocess.DEVNULL,
stdout=log_file,
stderr=log_file,
start_new_session=True,
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The use of start_new_session=True is specific to POSIX-like systems and will cause an error on Windows. To ensure cross-platform compatibility for detaching the service process, you should use creationflags on Windows. Here is a suggested implementation that handles both cases.

        popen_kwargs = {
            "stdin": subprocess.DEVNULL,
            "stdout": log_file,
            "stderr": log_file,
        }
        if sys.platform == "win32":
            # DETACHED_PROCESS makes the new process not inherit the parent's console.
            popen_kwargs["creationflags"] = subprocess.DETACHED_PROCESS
        else:
            # start_new_session creates a new session, detaching it from the
            # parent's terminal, which is what we want for a background service.
            popen_kwargs["start_new_session"] = True

        subprocess.Popen(
            [
                sys.executable,
                "-m",
                "keep_gpu.mcp.server",
                "--mode",
                "http",
                "--host",
                host,
                "--port",
                str(port),
            ],
            **popen_kwargs,
        )

Comment on lines +247 to 308
def do_GET(self): # noqa: N802
parsed = urlparse(self.path)
path = parsed.path
server_ref = self.server.keepgpu_server # type: ignore[attr-defined]

if path == "/health":
self._json_response(200, {"ok": True})
return
if path == "/api/gpus":
self._json_response(200, server_ref.list_gpus())
return
if path == "/api/sessions":
self._json_response(200, server_ref.status())
return
if path.startswith("/api/sessions/"):
job_id = unquote(path.rsplit("/", 1)[-1])
self._json_response(200, server_ref.status(job_id=job_id))
return

self._serve_static(path)

def do_POST(self): # noqa: N802
"""
Handle an HTTP JSON-RPC request and write a JSON response.

Expects application/json bodies containing {"method", "params", "id"}.
Returns 400 with an error object if parsing fails.
"""
parsed = urlparse(self.path)
path = parsed.path
server_ref = self.server.keepgpu_server # type: ignore[attr-defined]
try:
length = int(self.headers.get("content-length", "0"))
body = self.rfile.read(length).decode("utf-8")
payload = json.loads(body)
server_ref = self.server.keepgpu_server # type: ignore[attr-defined]
response = _handle_request(server_ref, payload)
status = 200
payload = self._read_json_body()
if path == "/api/sessions":
result = server_ref.start_keep(**payload)
self._json_response(200, result)
return

# JSON-RPC compatibility endpoint.
if path in ("/", "/rpc"):
response = _handle_request(server_ref, payload)
self._json_response(200, response)
return

self._json_response(404, {"error": {"message": "Unknown endpoint"}})
except (json.JSONDecodeError, ValueError, UnicodeDecodeError) as exc:
response = {"error": {"message": f"Bad request: {exc}"}}
status = 400
data = json.dumps(response).encode()
self.send_response(status)
self.send_header("content-type", "application/json")
self.send_header("content-length", str(len(data)))
self.end_headers()
self.wfile.write(data)
self._json_response(400, {"error": {"message": f"Bad request: {exc}"}})

def do_DELETE(self): # noqa: N802
parsed = urlparse(self.path)
path = parsed.path
server_ref = self.server.keepgpu_server # type: ignore[attr-defined]

if path == "/api/sessions":
self._json_response(200, server_ref.stop_keep(job_id=None))
return
if path.startswith("/api/sessions/"):
job_id = unquote(path.rsplit("/", 1)[-1])
self._json_response(200, server_ref.stop_keep(job_id=job_id))
return
self._json_response(404, {"error": {"message": "Unknown endpoint"}})

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

security-medium medium

The KeepGPUServer exposes sensitive GPU control endpoints (e.g., starting/stopping sessions) without authentication or authorization, making it vulnerable if exposed to the network. An attacker could manipulate GPU resources. Additionally, the /api/sessions POST endpoint's try...except block is incomplete, failing to handle TypeError for unexpected payload keys, which can lead to unhandled exceptions and 500 errors, further degrading API robustness.

Comment on lines +215 to +218
def _read_json_body(self) -> Dict[str, Any]:
length = int(self.headers.get("content-length", "0"))
body = self.rfile.read(length).decode("utf-8")
return json.loads(body)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

security-medium medium

The _read_json_body method reads the entire request body into memory based on the Content-Length header without enforcing any maximum size limit. An attacker could send a request with a very large Content-Length to exhaust the server's memory, leading to a Denial of Service.

Comment on lines +279 to +281
payload = self._read_json_body()
if path == "/api/sessions":
result = server_ref.start_keep(**payload)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

security-medium medium

In the do_POST method, user-supplied JSON payloads are unpacked as keyword arguments to the start_keep method using the **payload syntax. If the payload contains keys that do not match the method signature, a TypeError is raised. This exception is not caught by the specific try...except block in the handler, which only catches ValueError and json.JSONDecodeError. This could lead to failed requests and server instability when processing malformed input.

try:
payload = _http_json_request("GET", f"{_service_base_url(host, port)}/health")
return bool(payload.get("ok"))
except Exception:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Catching a broad Exception can mask bugs and make debugging harder. It's better to catch more specific exceptions that are expected during a network request, such as URLError and json.JSONDecodeError.

Suggested change
except Exception:
except (URLError, json.JSONDecodeError):

Comment on lines +251 to +253
GPUs {JSON.stringify(session.params.gpu_ids)} / {session.params.vram} /
{" "}
{session.params.interval}s / threshold {session.params.busy_threshold}%
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The UI currently displays gpu_ids as a raw JSON string (e.g., null or [0,1]). This could be made more user-friendly. For example, you could display "all" when gpu_ids is null or empty, and a simple comma-separated string like "0, 1" otherwise.

Suggested change
GPUs {JSON.stringify(session.params.gpu_ids)} / {session.params.vram} /
{" "}
{session.params.interval}s / threshold {session.params.busy_threshold}%
GPUs {session.params.gpu_ids ? session.params.gpu_ids.join(", ") : "all"} / {session.params.vram} /
{" "}
{session.params.interval}s / threshold {session.params.busy_threshold}%

@gemini-code-assist
Copy link
Contributor

@Wangmerlyn Thanks for the explicit invocation! I'll proceed with a full code review of the pull request now.

@Wangmerlyn
Copy link
Owner Author

/gemini review\n@coderabbitai review

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Mar 4, 2026

@Wangmerlyn Sure! I'll kick off a review of the PR now.


🧠 Learnings used
Learnt from: CR
Repo: Wangmerlyn/KeepGPU PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-02-15T07:10:27.370Z
Learning: If a review-related subagent exists, call it when the code is ready for review

Learnt from: CR
Repo: Wangmerlyn/KeepGPU PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-02-15T07:10:27.370Z
Learning: Applies to src/keep_gpu/cli.py : CLI interface at `keep-gpu` is a first-class interface that must be kept in sync with related documentation and tests when behavior changes

Learnt from: CR
Repo: Wangmerlyn/KeepGPU PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-02-15T07:10:27.370Z
Learning: Applies to src/keep_gpu/mcp/server.py : MCP server interface `keep-gpu-mcp-server` is a first-class interface that must be kept in sync with related documentation and tests when behavior changes
✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a significant new feature: a non-blocking service-based workflow for "keep-gpu", complete with a REST API and a React-based web dashboard. The implementation preserves backward compatibility by keeping the original blocking CLI behavior as the default. However, it introduces several security concerns, primarily the lack of authentication on the local HTTP service, which allows any user on a multi-user system to control GPU reservations. The service is also vulnerable to memory exhaustion via large request bodies and potential thread exhaustion through unvalidated GPU ID lists. Additionally, a potential robustness issue in the new API server and a minor inconsistency in the documentation were identified. Implementing basic authentication and input validation is recommended to ensure the service is safe for use in shared environments.

Comment on lines +247 to 308
def do_GET(self): # noqa: N802
parsed = urlparse(self.path)
path = parsed.path
server_ref = self.server.keepgpu_server # type: ignore[attr-defined]

if path == "/health":
self._json_response(200, {"ok": True})
return
if path == "/api/gpus":
self._json_response(200, server_ref.list_gpus())
return
if path == "/api/sessions":
self._json_response(200, server_ref.status())
return
if path.startswith("/api/sessions/"):
job_id = unquote(path.rsplit("/", 1)[-1])
self._json_response(200, server_ref.status(job_id=job_id))
return

self._serve_static(path)

def do_POST(self): # noqa: N802
"""
Handle an HTTP JSON-RPC request and write a JSON response.

Expects application/json bodies containing {"method", "params", "id"}.
Returns 400 with an error object if parsing fails.
"""
parsed = urlparse(self.path)
path = parsed.path
server_ref = self.server.keepgpu_server # type: ignore[attr-defined]
try:
length = int(self.headers.get("content-length", "0"))
body = self.rfile.read(length).decode("utf-8")
payload = json.loads(body)
server_ref = self.server.keepgpu_server # type: ignore[attr-defined]
response = _handle_request(server_ref, payload)
status = 200
payload = self._read_json_body()
if path == "/api/sessions":
result = server_ref.start_keep(**payload)
self._json_response(200, result)
return

# JSON-RPC compatibility endpoint.
if path in ("/", "/rpc"):
response = _handle_request(server_ref, payload)
self._json_response(200, response)
return

self._json_response(404, {"error": {"message": "Unknown endpoint"}})
except (json.JSONDecodeError, ValueError, UnicodeDecodeError) as exc:
response = {"error": {"message": f"Bad request: {exc}"}}
status = 400
data = json.dumps(response).encode()
self.send_response(status)
self.send_header("content-type", "application/json")
self.send_header("content-length", str(len(data)))
self.end_headers()
self.wfile.write(data)
self._json_response(400, {"error": {"message": f"Bad request: {exc}"}})

def do_DELETE(self): # noqa: N802
parsed = urlparse(self.path)
path = parsed.path
server_ref = self.server.keepgpu_server # type: ignore[attr-defined]

if path == "/api/sessions":
self._json_response(200, server_ref.stop_keep(job_id=None))
return
if path.startswith("/api/sessions/"):
job_id = unquote(path.rsplit("/", 1)[-1])
self._json_response(200, server_ref.stop_keep(job_id=job_id))
return
self._json_response(404, {"error": {"message": "Unknown endpoint"}})

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

security-medium medium

The KeepGPU local service lacks authentication, allowing any user on a multi-user system to control GPU reservations and potentially cause resource exhaustion. This is a critical security vulnerability. Additionally, the current handling of the payload dictionary by passing it directly with ** can lead to TypeError if unexpected keys are sent, resulting in unhandled exceptions and 500 errors, further contributing to robustness issues and potential resource exhaustion. It is recommended to implement a token-based authentication mechanism for the service and to filter the payload to only include expected keys for methods like start_keep to improve both security and robustness.

Comment on lines +215 to +218
def _read_json_body(self) -> Dict[str, Any]:
length = int(self.headers.get("content-length", "0"))
body = self.rfile.read(length).decode("utf-8")
return json.loads(body)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

security-medium medium

The _read_json_body method reads the entire request body into memory based on the Content-Length header without any size limits. An attacker could send a request with a very large Content-Length to cause memory exhaustion on the host machine. It is recommended to implement a maximum allowed size for the request body and verify the Content-Length header against this limit before reading the data.

Suggested change
def _read_json_body(self) -> Dict[str, Any]:
length = int(self.headers.get("content-length", "0"))
body = self.rfile.read(length).decode("utf-8")
return json.loads(body)
def _read_json_body(self) -> Dict[str, Any]:
length = int(self.headers.get("content-length", "0"))
if length > 1_000_000: # Example 1MB limit
raise ValueError("Request body too large")
body = self.rfile.read(length).decode("utf-8")
return json.loads(body)

status = 200
payload = self._read_json_body()
if path == "/api/sessions":
result = server_ref.start_keep(**payload)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

security-medium medium

The /api/sessions endpoint passes the user-supplied payload directly to server_ref.start_keep(**payload). The start_keep method (and the underlying GlobalGPUController) creates a new thread for each GPU ID provided in the gpu_ids parameter. An attacker could send a request with an excessively large list of GPU IDs, causing the service to attempt to spawn a massive number of threads, potentially leading to process or system instability. The gpu_ids parameter should be validated against the actual number of available GPUs on the system before proceeding.

keep-gpu serve --host 127.0.0.1 --port 8765
```

`http://127.0.0.1:8765/`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

For consistency with other documentation files (README.md, guides/cli.md, etc.), it would be better to wrap this URL in a text code block. This will ensure uniform rendering of URLs across the documentation.

Suggested change
`http://127.0.0.1:8765/`
http://127.0.0.1:8765/

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 13

🧹 Nitpick comments (5)
docs/getting-started.md (1)

93-106: Add the service teardown command to complete the workflow.

This quickstart shows how to start/serve, but not how to stop the daemon. Adding keep-gpu service-stop here would prevent orphaned local services.

📝 Proposed doc tweak
 keep-gpu serve --host 127.0.0.1 --port 8765

http://127.0.0.1:8765/
+
+When finished:
+
+bash +keep-gpu service-stop +

</details>
Based on learnings: CLI interface at `keep-gpu` is a first-class interface that must be kept in sync with related documentation and tests when behavior changes.

<details>
<summary>🤖 Prompt for AI Agents</summary>

Verify each finding against the current code and only fix it if needed.

In @docs/getting-started.md around lines 93 - 106, The docs' quickstart shows
starting and serving the daemon but omits how to stop it; update the
getting-started.md section that lists keep-gpu start and keep-gpu serve to
include the teardown command keep-gpu service-stop with a short "When
finished:" lead-in so users can cleanly stop the daemon; locate the block
containing keep-gpu start --gpu-ids and the keep-gpu serve --host example
and append the service-stop instruction and example invocation.


</details>

</blockquote></details>
<details>
<summary>docs/plans/cli-service-dashboard/findings.md (1)</summary><blockquote>

`1-47`: **Add an explicit TODO checklist section to match plan-doc conventions.**

Given this file lives under `docs/plans/`, include a clear todo/status block (done/in-progress/pending) so execution tracking remains explicit.


As per coding guidelines: `docs/plans/**/*.md`: Before starting complex work, capture background, goal, solution, and todo items in a Markdown plan under `docs/plans/`.

<details>
<summary>🤖 Prompt for AI Agents</summary>

Verify each finding against the current code and only fix it if needed.

In @docs/plans/cli-service-dashboard/findings.md around lines 1 - 47, Add a
"TODO / Status" checklist section to the existing plan document "Findings:
Non-blocking CLI + Dashboard" that follows docs/plans conventions: under the
top-level findings heading add a "## TODO / Status" (or "## Roadmap / TODO")
subsection that explicitly lists Background, Goal, Solution, and a checklist of
actionable items with statuses (done / in-progress / pending) and owner/ETA
where useful; ensure it references implemented artifacts like
src/keep_gpu/cli.py, src/keep_gpu/mcp/server.py, and web/dashboard/ so readers
can map tasks to code (e.g., entries for "serve subcommand implemented (done)",
"service-stop command tests (done/in-progress)", "dashboard build pipeline
(pending)"); keep the checklist concise and use Markdown task list syntax so
it's easy to track progress.


</details>

</blockquote></details>
<details>
<summary>docs/index.md (1)</summary><blockquote>

`35-36`: **Make the “Dashboard + API” entry clickable for consistency.**

This is the only item in the list without a link, which makes navigation slightly inconsistent.


<details>
<summary>🔗 Proposed tweak</summary>

```diff
-- :material-monitor-dashboard: **Dashboard + API** – Run `keep-gpu serve` and open
+- :material-monitor-dashboard: **[Dashboard + API](guides/mcp.md)** – Run `keep-gpu serve` and open
   `http://127.0.0.1:8765/` for session controls and telemetry.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/index.md` around lines 35 - 36, The list entry for
":material-monitor-dashboard: **Dashboard + API** – Run `keep-gpu serve` and
open `http://127.0.0.1:8765/` is the only item missing a link; update that entry
to make the "Dashboard + API" text (or the whole entry) a clickable Markdown
link to the dashboard URL (e.g., `[Dashboard + API](http://127.0.0.1:8765/)`) so
it matches the other linked items and improves navigation.
tests/mcp/test_http_api.py (1)

40-68: Optional: extract HTTP server spin-up/cleanup into a shared fixture/helper.

Both tests repeat identical server/thread lifecycle code. A small fixture would reduce maintenance and make future endpoint tests easier to add.

Also applies to: 70-109

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/mcp/test_http_api.py` around lines 40 - 68, Refactor the duplicated
HTTP server spin-up/cleanup in test_http_health_and_static_index (and the block
at lines 70-109) into a reusable pytest fixture or helper that constructs
KeepGPUServer(controller_factory=dummy_factory), creates the _Server subclassing
TCPServer with allow_reuse_address=True, instantiates it with _JSONRPCHandler,
sets httpd.keepgpu_server, starts a daemon thread running httpd.serve_forever,
yields the base URL (or httpd and thread) to tests, and ensures proper teardown
by calling httpd.shutdown(), httpd.server_close(), server.shutdown(), and
joining the thread; update test_http_health_and_static_index to use that fixture
and keep calls to _request_json, Request, and urlopen unchanged.
tests/test_cli_service_commands.py (1)

76-80: Optional lint-hygiene cleanup for test doubles.

To avoid ARG001/ARG005 noise, consider prefixing intentionally unused mock parameters with _ (for example _host, _port, _params).

Also applies to: 94-95, 104-114

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/test_cli_service_commands.py` around lines 76 - 80, The test doubles
fake_ensure and fake_rpc are triggering linter ARG001/ARG005 because unused
parameters are named without a leading underscore; update their signatures to
prefix unused parameters (e.g., change host, port, auto_start to _host, _port,
_auto_start in fake_ensure and method, params, host, port to _method, _params,
_host, _port in fake_rpc) and apply the same renaming pattern to the other test
doubles referenced around lines 94-95 and 104-114 to silence the lint warnings
while keeping behavior identical.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@docs/guides/cli.md`:
- Around line 101-106: Update the Troubleshooting entry for daemon shutdown to
mention active sessions: add guidance that if sessions are active you must run
the session shutdown command (keep-gpu stop --all) first or use the forced
service stop (keep-gpu service-stop --force) before trying keep-gpu
service-stop, so users aren't blocked by active sessions; update the line under
"Need to close background service" to include these two commands and a brief
note about active sessions.

In `@docs/plans/cli-service-dashboard/progress.md`:
- Around line 23-27: Update the progress section that currently reads "Preparing
final summary and PR after successful validation." and the "Next" list item
"Create commit and open PR for review." to reflect that the PR is already open:
replace those lines with a current-status statement (e.g., "PR open: awaiting
review/feedback") and update the "Next" items to actions for an open PR such as
"Address review feedback," "Respond to reviewer comments," and "Prepare
merge-ready changes"; ensure the heading "Next" and the list content are
consistent with the PR-open lifecycle.

In `@docs/reference/cli.md`:
- Around line 7-14: Update the command summary lines to include the host/port
flags for the status and service-stop commands: modify the entries for "keep-gpu
status" and "keep-gpu service-stop" so they show "[--host 127.0.0.1] [--port
8765]" (matching the detailed sections) to ensure the summary reflects the
documented flags for keep-gpu status and keep-gpu service-stop.

In `@skills/gpu-keepalive-with-keepgpu/SKILL.md`:
- Around line 94-105: Add one explicit sentence clarifying that Ctrl+C only
stops foreground runs and does not affect background service sessions; for
service mode users must run keep-gpu stop (to stop sessions) and optionally
keep-gpu service-stop (to stop the service). Mention the specific commands
keep-gpu start, keep-gpu status, keep-gpu stop, and keep-gpu service-stop so
readers can locate relevant behavior and that this applies after the
non-blocking vs service mode split.

In `@src/keep_gpu/cli.py`:
- Around line 186-206: The stop routine (_stop_service_process) blindly signals
whatever PID is read from disk; add an ownership check before sending signals:
implement a helper _is_managed_keepgpu_pid(pid: int) that verifies the PID
corresponds to our managed keep_gpu process (e.g., inspect process command/args
for "keep_gpu.mcp.server" and "--mode http" or equivalent), call this helper
after _read_service_pid and before any _pid_alive/os.kill actions, and if the
check fails treat the PID as invalid (clear the PID with _clear_service_pid and
return False) to avoid killing unrelated processes.
- Around line 125-127: The code currently assumes the HTTP response is JSON and
calls json.loads(body) directly after urlopen(request...), which raises on
non-JSON replies and bypasses command error handling; wrap the json.loads(body)
call in a try/except that catches json.JSONDecodeError (and ValueError as a
fallback), and in the except path convert the raw response into a controlled CLI
error (or raise the same command-level exception type used elsewhere) including
the response body and HTTP status/context from response, so non-KeepGPU
endpoints produce a handled error instead of an unhandled exception; keep the
happy path returning the parsed JSON or {} when body is empty.

In `@src/keep_gpu/mcp/server.py`:
- Around line 261-264: The handler that extracts job_id with
unquote(path.rsplit("/", 1)[-1]) (in the GET branch that calls server_ref.status
and the DELETE branch that calls server_ref.stop_keep) can produce an empty
string and thus trigger a global "stop all" behavior; add a guard after
extracting job_id to validate it is non-empty (e.g., if not job_id: return a
400/404 via self._json_response) and only call server_ref.status(job_id=job_id)
or server_ref.stop_keep(job_id=job_id) when job_id is present; update both
branches that use unquote(path.rsplit("/", 1)[-1]) to perform this check before
invoking server_ref methods.
- Around line 279-293: The handler only converts JSON parse errors into
structured JSON responses, but exceptions thrown by
server_ref.start_keep(**payload) or during JSON-RPC handling (e.g., TypeError,
KeyError, RuntimeError) bypass that and return non-JSON responses; wrap the
calls to server_ref.start_keep(...) and _handle_request(...) (and the
surrounding request handling block that reads payload via _read_json_body()) so
that any non-parse exception is caught and sent through _json_response with a
500 status and a structured body like {"error": {"message": str(exc), "type":
exc.__class__.__name__}}; keep the existing
JSONDecodeError/ValueError/UnicodeDecodeError handling but add a broader except
Exception as exc branch that uses _json_response(500, ...) to ensure consistent
JSON error formatting for start_keep, _handle_request, and other runtime
failures.
- Line 266: The GET handler currently funnels all unmatched paths to
self._serve_static(path), causing /api/* unknown endpoints to return the SPA
HTML; update the GET handling (e.g., in do_GET or the method that calls
self._serve_static) to intercept paths that start with "/api/" and return a
proper JSON 404 instead of falling through: send a 404 status, set
"Content-Type: application/json", write a small JSON body like {"error":"Not
Found"} (or similar) and return early; leave non-/api/ paths to continue to
self._serve_static(path).

In `@src/keep_gpu/mcp/static/assets/index.css`:
- Line 1: The bundled CSS (index.css) is triggering stylelint errors (e.g.,
font-family-name-quotes, keyframe-selector-notation); update the stylelint
configuration to exclude generated assets and point linting at the source
stylesheet instead: add an ignoreFiles/ignorePatterns entry for the MCP static
assets bundle (the generated index.css) in your .stylelintrc / stylelint config
and ensure the canonical source (the dashboard styles.css) remains in the lint
target list so only the authoring file is linted.

In `@web/dashboard/src/App.jsx`:
- Around line 25-33: The parseGpuIds function currently coerces inputs with
Number(...) causing empty strings to become 0 and allowing negatives/decimals;
update parseGpuIds to reject non-integer or negative tokens by trimming,
splitting on "," and validating each token with a strict integer regex (e.g.
/^\d+$/) or parseInt combined with Number.isInteger and >= 0, then return an
array of integers or null for empty input; also apply the same strict validation
where the session payload is built before calling /api/sessions (the code that
assembles gpu_ids for the POST) so you never send 0 for empty fields or
fractional/negative IDs.

In `@web/dashboard/src/styles.css`:
- Line 75: Replace the quoted font-family declaration "font-family: "Fraunces",
serif;" with an unquoted font family name so it reads font-family: Fraunces,
serif; — update both occurrences of the exact declaration string ("font-family:
"Fraunces", serif;") so they conform to stylelint's font-family-name-quotes
rule.
- Line 1: The `@import` uses url(...) which violates the Stylelint rule expecting
string notation; update the `@import` statement (the line that currently imports
fonts with url(...)) to use a quoted string instead (e.g., `@import`
"https://fonts.googleapis.com/..." ) so the import uses string notation while
preserving the same URL and parameters.

---

Nitpick comments:
In `@docs/getting-started.md`:
- Around line 93-106: The docs' quickstart shows starting and serving the daemon
but omits how to stop it; update the getting-started.md section that lists
`keep-gpu start` and `keep-gpu serve` to include the teardown command `keep-gpu
service-stop` with a short "When finished:" lead-in so users can cleanly stop
the daemon; locate the block containing `keep-gpu start --gpu-ids` and the
`keep-gpu serve --host` example and append the service-stop instruction and
example invocation.

In `@docs/index.md`:
- Around line 35-36: The list entry for ":material-monitor-dashboard:
**Dashboard + API** – Run `keep-gpu serve` and open `http://127.0.0.1:8765/` is
the only item missing a link; update that entry to make the "Dashboard + API"
text (or the whole entry) a clickable Markdown link to the dashboard URL (e.g.,
`[Dashboard + API](http://127.0.0.1:8765/)`) so it matches the other linked
items and improves navigation.

In `@docs/plans/cli-service-dashboard/findings.md`:
- Around line 1-47: Add a "TODO / Status" checklist section to the existing plan
document "Findings: Non-blocking CLI + Dashboard" that follows docs/plans
conventions: under the top-level findings heading add a "## TODO / Status" (or
"## Roadmap / TODO") subsection that explicitly lists Background, Goal,
Solution, and a checklist of actionable items with statuses (done / in-progress
/ pending) and owner/ETA where useful; ensure it references implemented
artifacts like src/keep_gpu/cli.py, src/keep_gpu/mcp/server.py, and
web/dashboard/ so readers can map tasks to code (e.g., entries for "serve
subcommand implemented (done)", "service-stop command tests (done/in-progress)",
"dashboard build pipeline (pending)"); keep the checklist concise and use
Markdown task list syntax so it's easy to track progress.

In `@tests/mcp/test_http_api.py`:
- Around line 40-68: Refactor the duplicated HTTP server spin-up/cleanup in
test_http_health_and_static_index (and the block at lines 70-109) into a
reusable pytest fixture or helper that constructs
KeepGPUServer(controller_factory=dummy_factory), creates the _Server subclassing
TCPServer with allow_reuse_address=True, instantiates it with _JSONRPCHandler,
sets httpd.keepgpu_server, starts a daemon thread running httpd.serve_forever,
yields the base URL (or httpd and thread) to tests, and ensures proper teardown
by calling httpd.shutdown(), httpd.server_close(), server.shutdown(), and
joining the thread; update test_http_health_and_static_index to use that fixture
and keep calls to _request_json, Request, and urlopen unchanged.

In `@tests/test_cli_service_commands.py`:
- Around line 76-80: The test doubles fake_ensure and fake_rpc are triggering
linter ARG001/ARG005 because unused parameters are named without a leading
underscore; update their signatures to prefix unused parameters (e.g., change
host, port, auto_start to _host, _port, _auto_start in fake_ensure and method,
params, host, port to _method, _params, _host, _port in fake_rpc) and apply the
same renaming pattern to the other test doubles referenced around lines 94-95
and 104-114 to silence the lint warnings while keeping behavior identical.

ℹ️ Review info
Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 38c4186d-9560-405e-be2d-cc385aa938c3

📥 Commits

Reviewing files that changed from the base of the PR and between f52bcc0 and 6e57ef5.

⛔ Files ignored due to path filters (1)
  • web/dashboard/package-lock.json is excluded by !**/package-lock.json
📒 Files selected for processing (27)
  • README.md
  • docs/getting-started.md
  • docs/guides/cli.md
  • docs/guides/mcp.md
  • docs/index.md
  • docs/plans/cli-service-dashboard/findings.md
  • docs/plans/cli-service-dashboard/progress.md
  • docs/plans/cli-service-dashboard/task_plan.md
  • docs/reference/cli.md
  • docs/skills/gpu-keepalive-with-keepgpu/skillcheck-free-report.md
  • mkdocs.yml
  • skills/gpu-keepalive-with-keepgpu/SKILL.md
  • skills/gpu-keepalive-with-keepgpu/agents/openai.yaml
  • src/keep_gpu/cli.py
  • src/keep_gpu/mcp/server.py
  • src/keep_gpu/mcp/static/assets/dashboard.js
  • src/keep_gpu/mcp/static/assets/index.css
  • src/keep_gpu/mcp/static/index.html
  • tests/mcp/test_http_api.py
  • tests/test_cli_service_commands.py
  • web/dashboard/.gitignore
  • web/dashboard/index.html
  • web/dashboard/package.json
  • web/dashboard/src/App.jsx
  • web/dashboard/src/main.jsx
  • web/dashboard/src/styles.css
  • web/dashboard/vite.config.js

@@ -0,0 +1 @@
@import"https://fonts.googleapis.com/css2?family=IBM+Plex+Sans:wght@400;500;600;700&family=IBM+Plex+Mono:wght@400;500&family=Fraunces:opsz,wght@9..144,500;9..144,700&display=swap";:root{--bg-0: #0e1116;--bg-1: #171c24;--bg-2: #1f2630;--panel: rgba(25, 31, 40, .82);--border: rgba(196, 171, 127, .24);--text-main: #ece5d9;--text-muted: #b2a996;--accent: #c4ab7f;--accent-soft: #e6d3ac;--cool: #6c8ca9;--alert: #c0695d;--warm: #bf8f52;--shadow: 0 18px 34px rgba(0, 0, 0, .38)}*{box-sizing:border-box}body{margin:0;min-height:100vh;color:var(--text-main);font-family:IBM Plex Sans,Segoe UI,sans-serif;background:radial-gradient(circle at 80% 0%,rgba(196,171,127,.08) 0%,transparent 35%),linear-gradient(180deg,#12161d,#0d1015)}.deck{position:relative;min-height:100vh;padding:2rem clamp(1rem,2vw,2rem) 1.2rem;display:flex;flex-direction:column;gap:1rem}.grid-noise{position:fixed;top:0;right:0;bottom:0;left:0;pointer-events:none;opacity:.2;background-image:linear-gradient(rgba(255,255,255,.02) 1px,transparent 1px),linear-gradient(90deg,rgba(255,255,255,.02) 1px,transparent 1px);background-size:36px 36px}.glass{background:linear-gradient(175deg,rgba(34,41,50,.8),var(--panel));border:1px solid var(--border);box-shadow:inset 0 1px #ffffff08,var(--shadow);border-radius:.7rem}.masthead{padding:1.2rem 1.25rem}.eyebrow{margin:0;color:var(--accent-soft);font-family:IBM Plex Mono,monospace;letter-spacing:.14em;text-transform:uppercase;font-size:.72rem}.masthead h1{margin:.55rem 0 .35rem;font-family:Fraunces,serif;font-weight:700;font-size:clamp(1.4rem,3vw,2.2rem);letter-spacing:.01em}.masthead p{margin:0;color:var(--text-muted);max-width:72ch;line-height:1.45}.service-hint{margin-top:.6rem;font-family:IBM Plex Mono,monospace;font-size:.74rem;color:#bdb29e}.service-hint code{margin:0 .25rem;color:var(--accent-soft)}.stats-row{display:grid;grid-template-columns:repeat(3,minmax(0,1fr));gap:.8rem}.stat-card{padding:.85rem .95rem}.stat-card h2{margin:0;color:var(--text-muted);text-transform:uppercase;letter-spacing:.11em;font-size:.72rem;font-family:IBM Plex Mono,monospace}.stat-card p{margin:.45rem 0 0;color:var(--accent-soft);font-size:clamp(1.2rem,2vw,1.75rem);font-weight:600}.panel-grid{display:grid;gap:.85rem;grid-template-columns:minmax(300px,1fr) minmax(300px,1fr)}.panel{padding:.95rem}.span-all{grid-column:1 / -1}.panel-heading{display:flex;justify-content:space-between;align-items:center;margin-bottom:.75rem;gap:.6rem}.panel h2{margin:0;font-family:Fraunces,serif;font-weight:500;font-size:1.05rem}.chip{border:1px solid rgba(196,171,127,.4);color:var(--accent-soft);border-radius:999px;padding:.2rem .55rem;font-size:.68rem;font-family:IBM Plex Mono,monospace;letter-spacing:.08em;text-transform:uppercase}.form-grid{display:grid;grid-template-columns:repeat(2,minmax(0,1fr));gap:.68rem}label{display:flex;flex-direction:column;gap:.34rem}label span{font-family:IBM Plex Mono,monospace;text-transform:uppercase;letter-spacing:.08em;color:var(--text-muted);font-size:.68rem}input{border:1px solid rgba(196,171,127,.28);background:#0e1218e6;border-radius:.5rem;color:var(--text-main);padding:.6rem .66rem;font:inherit}input:focus{outline:none;border-color:var(--accent)}button{border:none;border-radius:.55rem;padding:.64rem .8rem;font:inherit;font-family:IBM Plex Mono,monospace;text-transform:uppercase;letter-spacing:.08em;cursor:pointer}button:disabled{cursor:default;opacity:.5}.primary{background:linear-gradient(180deg,#d6c09a,#b89a67);color:#1a1307}.ghost{color:var(--accent-soft);background:#c4ab7f17;border:1px solid rgba(196,171,127,.35)}.danger{color:#f1cec8;background:#c0695d29;border:1px solid rgba(192,105,93,.5)}.session-list{display:flex;flex-direction:column;gap:.6rem}.session-row{border:1px solid rgba(196,171,127,.2);border-radius:.55rem;padding:.65rem;display:flex;justify-content:space-between;align-items:center;gap:.7rem;background:#0e1218bd}.session-row h3{margin:0;font-size:.88rem;font-family:IBM Plex Mono,monospace;color:var(--accent-soft)}.session-row p{margin:.3rem 0 0;color:var(--text-muted);font-size:.78rem}.telemetry-grid{display:grid;grid-template-columns:repeat(auto-fill,minmax(220px,1fr));gap:.68rem}.telemetry-card{border:1px solid rgba(196,171,127,.22);background:#0e1218c2;border-radius:.55rem;padding:.68rem}.telemetry-card header{display:flex;justify-content:space-between;align-items:flex-start;gap:.5rem}.telemetry-card h3{margin:0;font-size:.86rem}.telemetry-card h3 small{display:block;margin-top:.24rem;color:var(--text-muted);font-size:.67rem;font-family:IBM Plex Mono,monospace}.meter{margin-top:.58rem;width:100%;height:.44rem;border-radius:999px;overflow:hidden;background:#ffffff12}.meter-fill{height:100%;background:linear-gradient(90deg,#7a96ad,#be9d67,#bf6d61)}.util-pill{font-size:.67rem;border-radius:999px;padding:.2rem .46rem;border:1px solid;font-family:IBM Plex Mono,monospace}.util-pill.cool{color:#c0d4e4;border-color:#6c8ca973}.util-pill.warm{color:#efd4a9;border-color:#bf8f5280}.util-pill.alert{color:#f2cac4;border-color:#c0695d80}.util-pill.muted{color:var(--text-muted);border-color:#b2a99666}.telemetry-card p,.empty{margin:.5rem 0 0;color:var(--text-muted);font-size:.78rem}.status-line{margin-top:auto;border:1px solid rgba(196,171,127,.28);border-radius:.52rem;background:#0f1318d9;padding:.56rem .72rem;display:flex;gap:.42rem;align-items:center;color:var(--text-muted);font-family:IBM Plex Mono,monospace;font-size:.72rem}.blink{width:.42rem;height:.42rem;border-radius:50%;background:var(--accent);opacity:.8;animation:pulse 1.4s ease-in-out infinite}@keyframes pulse{0%,to{transform:scale(.85)}50%{transform:scale(1.1)}}@media (max-width: 980px){.stats-row,.panel-grid,.form-grid{grid-template-columns:1fr}}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Generated CSS currently violates stylelint rules.

Line 1 triggers multiple stylelint errors (font-family-name-quotes, keyframe-selector-notation). Since this file is bundled output, consider linting web/dashboard/src/styles.css as the source of truth and excluding generated artifacts under src/keep_gpu/mcp/static/assets/ from stylelint checks.

🧰 Tools
🪛 Stylelint (17.3.0)

[error] 1-1: Expected quotes around "IBM Plex Sans" (font-family-name-quotes)

(font-family-name-quotes)


[error] 1-1: Expected quotes around "Segoe UI" (font-family-name-quotes)

(font-family-name-quotes)


[error] 1-1: Expected quotes around "IBM Plex Mono" (font-family-name-quotes)

(font-family-name-quotes)


[error] 1-1: Expected quotes around "IBM Plex Mono" (font-family-name-quotes)

(font-family-name-quotes)


[error] 1-1: Expected quotes around "IBM Plex Mono" (font-family-name-quotes)

(font-family-name-quotes)


[error] 1-1: Expected quotes around "IBM Plex Mono" (font-family-name-quotes)

(font-family-name-quotes)


[error] 1-1: Expected quotes around "IBM Plex Mono" (font-family-name-quotes)

(font-family-name-quotes)


[error] 1-1: Expected quotes around "IBM Plex Mono" (font-family-name-quotes)

(font-family-name-quotes)


[error] 1-1: Expected quotes around "IBM Plex Mono" (font-family-name-quotes)

(font-family-name-quotes)


[error] 1-1: Expected quotes around "IBM Plex Mono" (font-family-name-quotes)

(font-family-name-quotes)


[error] 1-1: Expected quotes around "IBM Plex Mono" (font-family-name-quotes)

(font-family-name-quotes)


[error] 1-1: Expected quotes around "IBM Plex Mono" (font-family-name-quotes)

(font-family-name-quotes)


[error] 1-1: Expected "to" to be "100%" (keyframe-selector-notation)

(keyframe-selector-notation)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/keep_gpu/mcp/static/assets/index.css` at line 1, The bundled CSS
(index.css) is triggering stylelint errors (e.g., font-family-name-quotes,
keyframe-selector-notation); update the stylelint configuration to exclude
generated assets and point linting at the source stylesheet instead: add an
ignoreFiles/ignorePatterns entry for the MCP static assets bundle (the generated
index.css) in your .stylelintrc / stylelint config and ensure the canonical
source (the dashboard styles.css) remains in the lint target list so only the
authoring file is linted.

Comment on lines +25 to +33
function parseGpuIds(raw) {
if (!raw.trim()) {
return null
}
return raw
.split(",")
.map((part) => Number(part.trim()))
.filter((value) => Number.isFinite(value))
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Harden payload validation before calling /api/sessions.

Lines 25-33 and 112-117 currently allow invalid values (e.g., gpu_ids with negative/decimal IDs, or empty numeric inputs coercing to 0). This can send malformed session parameters to the server.

🛡️ Proposed fix
 function parseGpuIds(raw) {
-  if (!raw.trim()) {
+  const text = raw.trim()
+  if (!text) {
     return null
   }
-  return raw
-    .split(",")
-    .map((part) => Number(part.trim()))
-    .filter((value) => Number.isFinite(value))
+  const ids = text
+    .split(",")
+    .map((part) => part.trim())
+    .filter(Boolean)
+    .map((part) => Number(part))
+
+  if (ids.some((value) => !Number.isInteger(value) || value < 0)) {
+    throw new Error("GPU IDs must be non-negative integers separated by commas.")
+  }
+  return ids
 }
@@
   async function onStartSession(event) {
     event.preventDefault()
     setBusy(true)
     try {
+      const interval = Number(form.interval)
+      const busyThreshold = Number(form.busyThreshold)
+      if (!Number.isInteger(interval) || interval < 1) {
+        throw new Error("Interval must be an integer >= 1.")
+      }
+      if (!Number.isInteger(busyThreshold) || busyThreshold < -1) {
+        throw new Error("Busy threshold must be an integer >= -1.")
+      }
+
       const payload = {
         gpu_ids: parseGpuIds(form.gpuIds),
         vram: form.vram,
-        interval: Number(form.interval),
-        busy_threshold: Number(form.busyThreshold)
+        interval,
+        busy_threshold: busyThreshold
       }

Also applies to: 112-117

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@web/dashboard/src/App.jsx` around lines 25 - 33, The parseGpuIds function
currently coerces inputs with Number(...) causing empty strings to become 0 and
allowing negatives/decimals; update parseGpuIds to reject non-integer or
negative tokens by trimming, splitting on "," and validating each token with a
strict integer regex (e.g. /^\d+$/) or parseInt combined with Number.isInteger
and >= 0, then return an array of integers or null for empty input; also apply
the same strict validation where the session payload is built before calling
/api/sessions (the code that assembles gpu_ids for the POST) so you never send 0
for empty fields or fractional/negative IDs.

@@ -0,0 +1,387 @@
@import url("https://fonts.googleapis.com/css2?family=IBM+Plex+Sans:wght@400;500;600;700&family=IBM+Plex+Mono:wght@400;500&family=Fraunces:opsz,wght@9..144,500;9..144,700&display=swap");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Fix @import notation to satisfy configured Stylelint rule.

Line 1 uses url(...), but your lint config expects string import notation.

🔧 Proposed fix
-@import url("https://fonts.googleapis.com/css2?family=IBM+Plex+Sans:wght@400;500;600;700&family=IBM+Plex+Mono:wght@400;500&family=Fraunces:opsz,wght@9..144,500;9..144,700&display=swap");
+@import "https://fonts.googleapis.com/css2?family=IBM+Plex+Sans:wght@400;500;600;700&family=IBM+Plex+Mono:wght@400;500&family=Fraunces:opsz,wght@9..144,500;9..144,700&display=swap";
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
@import url("https://fonts.googleapis.com/css2?family=IBM+Plex+Sans:wght@400;500;600;700&family=IBM+Plex+Mono:wght@400;500&family=Fraunces:opsz,wght@9..144,500;9..144,700&display=swap");
`@import` "https://fonts.googleapis.com/css2?family=IBM+Plex+Sans:wght@400;500;600;700&family=IBM+Plex+Mono:wght@400;500&family=Fraunces:opsz,wght@9..144,500;9..144,700&display=swap";
🧰 Tools
🪛 Stylelint (17.3.0)

[error] 1-1: Expected "url("https://fonts.googleapis.com/css2?family=IBM+Plex+Sans:wght@400;500;600;700&family=IBM+Plex+Mono:wght@400;500&family=Fraunces:opsz,wght@9..144,500;9..144,700&display=swap")" to be ""https://fonts.googleapis.com/css2?family=IBM+Plex+Sans:wght@400;500;600;700&family=IBM+Plex+Mono:wght@400;500&family=Fraunces:opsz,wght@9..144,500;9..144,700&display=swap"" (import-notation)

(import-notation)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@web/dashboard/src/styles.css` at line 1, The `@import` uses url(...) which
violates the Stylelint rule expecting string notation; update the `@import`
statement (the line that currently imports fonts with url(...)) to use a quoted
string instead (e.g., `@import` "https://fonts.googleapis.com/..." ) so the import
uses string notation while preserving the same URL and parameters.


.masthead h1 {
margin: 0.55rem 0 0.35rem;
font-family: "Fraunces", serif;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Remove quotes around Fraunces in font-family.

Lines 75 and 150 violate font-family-name-quotes and will fail Stylelint.

🔧 Proposed fix
-  font-family: "Fraunces", serif;
+  font-family: Fraunces, serif;

Also applies to: 150-150

🧰 Tools
🪛 Stylelint (17.3.0)

[error] 75-75: Unexpected quotes around "Fraunces" (font-family-name-quotes)

(font-family-name-quotes)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@web/dashboard/src/styles.css` at line 75, Replace the quoted font-family
declaration "font-family: "Fraunces", serif;" with an unquoted font family name
so it reads font-family: Fraunces, serif; — update both occurrences of the exact
declaration string ("font-family: "Fraunces", serif;") so they conform to
stylelint's font-family-name-quotes rule.

@Wangmerlyn
Copy link
Owner Author

/gemini review\n@coderabbitai review

@gemini-code-assist
Copy link
Contributor

@Wangmerlyn I'm already performing a full code review of this pull request based on your previous invocation. There's no need to re-issue the /gemini review command. I will post my detailed feedback once the review is complete.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Mar 4, 2026

@Wangmerlyn Sure! I'll kick off a full review of the PR now.


🧠 Learnings used
Learnt from: CR
Repo: Wangmerlyn/KeepGPU PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-02-15T07:10:27.370Z
Learning: If a review-related subagent exists, call it when the code is ready for review

Learnt from: CR
Repo: Wangmerlyn/KeepGPU PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-02-15T07:10:27.370Z
Learning: Applies to src/keep_gpu/cli.py : CLI interface at `keep-gpu` is a first-class interface that must be kept in sync with related documentation and tests when behavior changes

Learnt from: CR
Repo: Wangmerlyn/KeepGPU PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-02-15T07:10:27.370Z
Learning: Applies to src/keep_gpu/mcp/server.py : MCP server interface `keep-gpu-mcp-server` is a first-class interface that must be kept in sync with related documentation and tests when behavior changes
✅ Actions performed

Full review triggered.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (3)
src/keep_gpu/cli.py (2)

123-131: ⚠️ Potential issue | 🟠 Major

Handle non-JSON success responses as controlled CLI errors.

Line [125] assumes every 2xx body is JSON. A non-JSON body raises and escapes current command-level error handling.

🛠️ Suggested fix
     try:
         with urlopen(request, timeout=timeout) as response:  # nosec B310
             body = response.read().decode("utf-8")
-            return json.loads(body) if body else {}
+            if not body:
+                return {}
+            try:
+                return json.loads(body)
+            except json.JSONDecodeError as exc:
+                raise RuntimeError(
+                    f"Non-JSON response from service endpoint: {url}"
+                ) from exc
#!/bin/bash
# Verify that successful-response JSON parsing is guarded.
rg -n -C3 'return json\.loads\(body\) if body else \{\}' src/keep_gpu/cli.py
rg -n -C3 'JSONDecodeError' src/keep_gpu/cli.py

Expected result: first command currently matches; second should show a handler added in _http_json_request after the fix.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/keep_gpu/cli.py` around lines 123 - 131, The code currently assumes every
successful HTTP body is JSON (the urlopen(request, timeout=timeout) block that
does "return json.loads(body) if body else {}"), which lets a JSONDecodeError
escape; update the _http_json_request implementation to catch
json.JSONDecodeError around json.loads(body) and raise a controlled RuntimeError
(e.g., "Invalid JSON response from KeepGPU service" including the url and a
snippet of body) so callers see a clear CLI-level error; keep the existing
HTTPError and URLError handlers intact and reference the urlopen(request,
timeout=timeout) call and the json.loads(body) invocation when making the
change.

190-210: ⚠️ Potential issue | 🔴 Critical

Do not signal persisted PID without ownership check.

Line [191] trusts a PID file blindly. A reused/stale PID can target an unrelated process. Also guard os.kill for process-exit races.

🛠️ Suggested fix
 def _stop_service_process(host: str, port: int, timeout: float = 3.0) -> bool:
     pid = _read_service_pid(host, port)
     if pid is None:
         return False
+    if not _is_managed_keepgpu_pid(pid):
+        _clear_service_pid(host, port)
+        return False

     if not _pid_alive(pid):
         _clear_service_pid(host, port)
         return True

-    os.kill(pid, signal.SIGTERM)
+    try:
+        os.kill(pid, signal.SIGTERM)
+    except OSError:
+        _clear_service_pid(host, port)
+        return False
@@
-    os.kill(pid, signal.SIGKILL)
+    try:
+        os.kill(pid, signal.SIGKILL)
+    except OSError:
+        _clear_service_pid(host, port)
+        return False
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/keep_gpu/cli.py` around lines 190 - 210, The _stop_service_process
function currently trusts a PID from _read_service_pid and sends signals
directly; change it to first verify ownership before signaling by adding a check
(e.g. new helper like _process_matches_service or reuse _pid_alive plus an
ownership check using process cmdline/argv or UID via /proc or psutil) to ensure
the PID belongs to our service; only proceed to send SIGTERM/SIGKILL if that
check passes, and wrap os.kill calls in try/except to handle ProcessLookupError
and PermissionError (treat ProcessLookupError as already-exited and avoid
clearing the PID unless ownership was confirmed); keep using _clear_service_pid
and _read_service_pid, but do not clear the file if ownership verification
fails.
docs/reference/cli.md (1)

11-13: ⚠️ Potential issue | 🟡 Minor

Keep command summary flags aligned with detailed sections.

Line [11]-[13] omit --host/--port in the quick summary while the detailed command sections include them.

Suggested doc patch
-keep-gpu status [--job-id ID]
-keep-gpu stop (--job-id ID | --all)
-keep-gpu service-stop [--force]
+keep-gpu status [--job-id ID] [--host 127.0.0.1] [--port 8765]
+keep-gpu stop (--job-id ID | --all) [--host 127.0.0.1] [--port 8765]
+keep-gpu service-stop [--force] [--host 127.0.0.1] [--port 8765]

As per coding guidelines: "Update documentation when changing CLI flags, controller behavior, platform support, or MCP methods".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/reference/cli.md` around lines 11 - 13, The quick summary lines for the
CLI commands (the entries for "keep-gpu status", "keep-gpu stop", and "keep-gpu
service-stop") are missing the global flags shown in the detailed sections;
update the summary to include the --host and --port flags (e.g., "keep-gpu
status [--job-id ID] [--host HOST] [--port PORT]", likewise for "keep-gpu stop"
and "keep-gpu service-stop") so the summary aligns with the detailed command
docs and coding guidelines.
🧹 Nitpick comments (3)
docs/getting-started.md (1)

89-106: Add teardown commands in the non-blocking quickstart.

Lines [94]-[106] explain how to start and inspect service mode, but they should also show how to stop sessions and optionally stop the daemon in the same flow.

Suggested doc patch
 keep-gpu start --gpu-ids 0 --interval 120 --vram 1GiB --busy-threshold 25
 keep-gpu status
+keep-gpu stop --all
+keep-gpu service-stop

Based on learnings: CLI interface at keep-gpu is a first-class interface that must be kept in sync with related documentation and tests when behavior changes.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/getting-started.md` around lines 89 - 106, Add explicit teardown
commands to the "Non-blocking workflow for agents" quickstart: after the
serve/status examples (keep-gpu start, keep-gpu status, keep-gpu serve), show
how to stop individual sessions and the service/daemon (e.g., keep-gpu stop
[--session-id | --all] and keep-gpu daemon stop) so users can clean up in the
same flow; update examples to include the exact CLI flags and an optional note
to verify session IDs via keep-gpu status and to ensure docs/tests stay in sync
with the keep-gpu CLI behavior.
tests/mcp/test_http_api.py (1)

40-68: Extract shared HTTP server harness into a fixture/context helper.

Both tests duplicate nearly identical server/thread startup and shutdown logic; consolidating it will reduce drift and cleanup mistakes over time.

Also applies to: 70-109

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/mcp/test_http_api.py` around lines 40 - 68, Extract the duplicated HTTP
server startup/shutdown into a reusable pytest fixture or context manager (e.g.,
http_server) to be used by test_http_health_and_static_index and the other
test(s) around lines 70-109: encapsulate creation of
KeepGPUServer(controller_factory=dummy_factory), the _Server subclass with
allow_reuse_address, binding to ("127.0.0.1", 0) with _JSONRPCHandler, assigning
httpd.keepgpu_server, starting the serve_forever thread, and returning the base
URL plus references to httpd/server/thread; ensure teardown logic
(httpd.shutdown(), httpd.server_close(), server.shutdown(),
thread.join(timeout=2)) runs reliably in the fixture finalizer or context
manager so tests simply call the fixture/context to get the base URL and avoid
duplicating setup/cleanup code.
tests/test_cli_service_commands.py (1)

183-187: Consider pytest.fail instead of assert False.

Using pytest.fail() is more explicit for indicating an expected exception path. The current pattern works but python -O would strip the assertion.

💡 Suggested improvement
+    import pytest
+
     try:
         cli._http_json_request("GET", "http://127.0.0.1:8765/health")
-        assert False, "Expected RuntimeError"
+        pytest.fail("Expected RuntimeError")
     except RuntimeError as exc:
         assert "Cannot reach KeepGPU service" in str(exc)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/test_cli_service_commands.py` around lines 183 - 187, Replace the use
of a bare assertion to indicate test failure in the exception path with
pytest.fail to avoid issues when assertions are stripped; specifically, in the
test around cli._http_json_request("GET", "http://127.0.0.1:8765/health") catch
block, call pytest.fail("Expected RuntimeError") (and add an import for pytest
at the top of the test module if missing) instead of using assert False so the
test failure is explicit and not affected by optimization flags.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@web/dashboard/src/App.jsx`:
- Line 313: The span in App.jsx uses class "refresh-tag" but that selector is
missing from styles.css; add a .refresh-tag rule to styles.css (targeting the
same class used in App.jsx) that sets font-size (~0.68rem), color to
var(--text-muted), and a monospace font-family fallback (e.g., "IBM Plex Mono",
monospace) so the element is styled as intended.

---

Duplicate comments:
In `@docs/reference/cli.md`:
- Around line 11-13: The quick summary lines for the CLI commands (the entries
for "keep-gpu status", "keep-gpu stop", and "keep-gpu service-stop") are missing
the global flags shown in the detailed sections; update the summary to include
the --host and --port flags (e.g., "keep-gpu status [--job-id ID] [--host HOST]
[--port PORT]", likewise for "keep-gpu stop" and "keep-gpu service-stop") so the
summary aligns with the detailed command docs and coding guidelines.

In `@src/keep_gpu/cli.py`:
- Around line 123-131: The code currently assumes every successful HTTP body is
JSON (the urlopen(request, timeout=timeout) block that does "return
json.loads(body) if body else {}"), which lets a JSONDecodeError escape; update
the _http_json_request implementation to catch json.JSONDecodeError around
json.loads(body) and raise a controlled RuntimeError (e.g., "Invalid JSON
response from KeepGPU service" including the url and a snippet of body) so
callers see a clear CLI-level error; keep the existing HTTPError and URLError
handlers intact and reference the urlopen(request, timeout=timeout) call and the
json.loads(body) invocation when making the change.
- Around line 190-210: The _stop_service_process function currently trusts a PID
from _read_service_pid and sends signals directly; change it to first verify
ownership before signaling by adding a check (e.g. new helper like
_process_matches_service or reuse _pid_alive plus an ownership check using
process cmdline/argv or UID via /proc or psutil) to ensure the PID belongs to
our service; only proceed to send SIGTERM/SIGKILL if that check passes, and wrap
os.kill calls in try/except to handle ProcessLookupError and PermissionError
(treat ProcessLookupError as already-exited and avoid clearing the PID unless
ownership was confirmed); keep using _clear_service_pid and _read_service_pid,
but do not clear the file if ownership verification fails.

---

Nitpick comments:
In `@docs/getting-started.md`:
- Around line 89-106: Add explicit teardown commands to the "Non-blocking
workflow for agents" quickstart: after the serve/status examples (keep-gpu
start, keep-gpu status, keep-gpu serve), show how to stop individual sessions
and the service/daemon (e.g., keep-gpu stop [--session-id | --all] and keep-gpu
daemon stop) so users can clean up in the same flow; update examples to include
the exact CLI flags and an optional note to verify session IDs via keep-gpu
status and to ensure docs/tests stay in sync with the keep-gpu CLI behavior.

In `@tests/mcp/test_http_api.py`:
- Around line 40-68: Extract the duplicated HTTP server startup/shutdown into a
reusable pytest fixture or context manager (e.g., http_server) to be used by
test_http_health_and_static_index and the other test(s) around lines 70-109:
encapsulate creation of KeepGPUServer(controller_factory=dummy_factory), the
_Server subclass with allow_reuse_address, binding to ("127.0.0.1", 0) with
_JSONRPCHandler, assigning httpd.keepgpu_server, starting the serve_forever
thread, and returning the base URL plus references to httpd/server/thread;
ensure teardown logic (httpd.shutdown(), httpd.server_close(),
server.shutdown(), thread.join(timeout=2)) runs reliably in the fixture
finalizer or context manager so tests simply call the fixture/context to get the
base URL and avoid duplicating setup/cleanup code.

In `@tests/test_cli_service_commands.py`:
- Around line 183-187: Replace the use of a bare assertion to indicate test
failure in the exception path with pytest.fail to avoid issues when assertions
are stripped; specifically, in the test around cli._http_json_request("GET",
"http://127.0.0.1:8765/health") catch block, call pytest.fail("Expected
RuntimeError") (and add an import for pytest at the top of the test module if
missing) instead of using assert False so the test failure is explicit and not
affected by optimization flags.

ℹ️ Review info
Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: c43b2fac-7fe7-4825-91b3-9823c691d674

📥 Commits

Reviewing files that changed from the base of the PR and between f52bcc0 and b8d7f42.

⛔ Files ignored due to path filters (1)
  • web/dashboard/package-lock.json is excluded by !**/package-lock.json
📒 Files selected for processing (27)
  • README.md
  • docs/getting-started.md
  • docs/guides/cli.md
  • docs/guides/mcp.md
  • docs/index.md
  • docs/plans/cli-service-dashboard/findings.md
  • docs/plans/cli-service-dashboard/progress.md
  • docs/plans/cli-service-dashboard/task_plan.md
  • docs/reference/cli.md
  • docs/skills/gpu-keepalive-with-keepgpu/skillcheck-free-report.md
  • mkdocs.yml
  • skills/gpu-keepalive-with-keepgpu/SKILL.md
  • skills/gpu-keepalive-with-keepgpu/agents/openai.yaml
  • src/keep_gpu/cli.py
  • src/keep_gpu/mcp/server.py
  • src/keep_gpu/mcp/static/assets/dashboard.js
  • src/keep_gpu/mcp/static/assets/index.css
  • src/keep_gpu/mcp/static/index.html
  • tests/mcp/test_http_api.py
  • tests/test_cli_service_commands.py
  • web/dashboard/.gitignore
  • web/dashboard/index.html
  • web/dashboard/package.json
  • web/dashboard/src/App.jsx
  • web/dashboard/src/main.jsx
  • web/dashboard/src/styles.css
  • web/dashboard/vite.config.js

<section className="panel span-all">
<div className="panel-heading">
<h2>GPU Telemetry</h2>
<span className="refresh-tag">refresh 3s</span>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Missing CSS class refresh-tag.

The refresh-tag class referenced here is not defined in styles.css. The element will render without styling.

💄 Proposed fix - add to styles.css
.refresh-tag {
  font-size: 0.68rem;
  color: var(--text-muted);
  font-family: "IBM Plex Mono", monospace;
}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@web/dashboard/src/App.jsx` at line 313, The span in App.jsx uses class
"refresh-tag" but that selector is missing from styles.css; add a .refresh-tag
rule to styles.css (targeting the same class used in App.jsx) that sets
font-size (~0.68rem), color to var(--text-muted), and a monospace font-family
fallback (e.g., "IBM Plex Mono", monospace) so the element is styled as
intended.

@Wangmerlyn
Copy link
Owner Author

/gemini review\n@coderabbitai review

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Mar 4, 2026

@Wangmerlyn Sure! I'll kick off a review of the PR now.


🧠 Learnings used
Learnt from: CR
Repo: Wangmerlyn/KeepGPU PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-02-15T07:10:27.370Z
Learning: If a review-related subagent exists, call it when the code is ready for review
✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

@gemini-code-assist
Copy link
Contributor

@Wangmerlyn I'm still actively performing a full code review of this pull request. Repeated invocations of /gemini review do not accelerate the process. I will post my detailed feedback as soon as the review is complete. Thank you for your patience!

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (4)
src/keep_gpu/mcp/server.py (3)

308-310: ⚠️ Potential issue | 🔴 Critical

Guard empty job_id to prevent accidental global stop.

A trailing slash (/api/sessions/) yields an empty job_id; in DELETE this flows into stop_keep(job_id=""), which is treated as falsy and becomes “stop all”.

🐛 Proposed fix
         if path.startswith("/api/sessions/"):
-            job_id = unquote(path.rsplit("/", 1)[-1])
+            job_id = unquote(path.rsplit("/", 1)[-1]).strip()
+            if not job_id:
+                self._json_response(400, {"error": {"message": "Missing job_id"}})
+                return
             self._json_response(200, server_ref.status(job_id=job_id))
             return
@@
         if path.startswith("/api/sessions/"):
-            job_id = unquote(path.rsplit("/", 1)[-1])
+            job_id = unquote(path.rsplit("/", 1)[-1]).strip()
+            if not job_id:
+                self._json_response(400, {"error": {"message": "Missing job_id"}})
+                return
             self._json_response(200, server_ref.stop_keep(job_id=job_id))
             return

Also applies to: 350-353

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/keep_gpu/mcp/server.py` around lines 308 - 310, The route handling for
"/api/sessions/" extracts job_id with job_id = unquote(path.rsplit("/", 1)[-1])
but doesn't guard for an empty string, allowing a trailing slash to produce
job_id == "" which leads DELETE to call stop_keep(job_id="") (treated as falsy
and stopping all). Update both the status branch (where
server_ref.status(job_id=job_id) is called) and the delete branch (where
stop_keep is invoked) to check for an empty job_id (e.g., if not job_id: return
a 400/404 JSON response via self._json_response and do not call
server_ref.status or stop_keep) so empty job IDs are rejected rather than
interpreted as “stop all.”

325-341: ⚠️ Potential issue | 🟠 Major

Normalize non-parse POST failures into structured JSON errors.

Only parse failures are handled; runtime errors from start_keep(**payload) can bypass consistent JSON error responses.

🛠️ Proposed fix
-        except (json.JSONDecodeError, ValueError, UnicodeDecodeError) as exc:
+        except (json.JSONDecodeError, ValueError, UnicodeDecodeError) as exc:
             self._json_response(400, {"error": {"message": f"Bad request: {exc}"}})
+        except TypeError as exc:
+            self._json_response(400, {"error": {"message": f"Bad request: {exc}"}})
+        except Exception as exc:
+            logger.exception("POST request failed for path %s", path)
+            self._json_response(
+                500,
+                {
+                    "error": {
+                        "message": str(exc),
+                        "type": exc.__class__.__name__,
+                    }
+                },
+            )
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/keep_gpu/mcp/server.py` around lines 325 - 341, The POST handling
currently only catches JSON parse errors so runtime exceptions from
server_ref.start_keep or _handle_request escape the JSON error API; wrap the
calls to server_ref.start_keep(**payload) and the JSON-RPC
_handle_request(server_ref, payload) in a try/except Exception as exc and use
self._json_response(500, {"error": {"message": str(exc)}}) to return a
consistent structured JSON error; keep the existing except
(json.JSONDecodeError, ValueError, UnicodeDecodeError) for 400s and avoid
re-raising so all failures produce JSON responses.

313-314: ⚠️ Potential issue | 🟠 Major

Return JSON 404 for unknown /api/* GET routes.

Unmatched API GET paths currently fall through to SPA static serving and can return HTML 200 instead of API 404.

🔧 Proposed fix
-        self._serve_static(path)
+        if path.startswith("/api/"):
+            self._json_response(404, {"error": {"message": "Unknown endpoint"}})
+            return
+        self._serve_static(path)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/keep_gpu/mcp/server.py` around lines 313 - 314, When falling back to
self._serve_static(path) for unmatched GET requests, detect API paths
(path.startswith("/api/")) and return a JSON 404 instead of serving the SPA
HTML. Modify the request handler that currently calls self._serve_static(path)
so that for GET and path.startswith("/api/") it responds with a 404 status,
Content-Type: application/json and a small body like {"error":"Not Found"} (use
the existing response helper on the class or add a simple JSON response path),
otherwise continue to call self._serve_static(path).
docs/plans/cli-service-dashboard/progress.md (1)

30-37: ⚠️ Potential issue | 🟡 Minor

Refresh progress state to match the current PR lifecycle.

As of March 4, 2026, this PR is already open, so “Preparing final summary and PR” and “Create commit and open PR” are stale.

📝 Suggested update
 ### In Progress

-- Preparing final summary and PR after successful validation.
+- PR is open; addressing review feedback and merge-readiness checks.

 ### Next

-1. Create commit and open PR for review.
+1. Address reviewer feedback.
+2. Land follow-up fixes and re-run targeted validation.
+3. Prepare merge once approvals are complete.

As per coding guidelines, "Implementations must follow the plan document and its todo items; update the plan document when tasks or scope change".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/plans/cli-service-dashboard/progress.md` around lines 30 - 37, The
plan's "In Progress" and "Next" sections are stale because the PR is already
open; update the "In Progress" bullets (replace "Preparing final summary and PR
after successful validation" with "Final summary prepared; PR opened on
2026-03-04" or similar) and replace the "Next" list (remove "Create commit and
open PR for review") with current actionable items like "Address review
feedback", "Run final validation", and "Merge after approval"; ensure headings
"In Progress" and "Next" remain and include a timestamp or note about the PR
being open to keep the plan in sync with the PR lifecycle.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/keep_gpu/mcp/server.py`:
- Around line 379-384: The threaded HTTP server created by _Server
(ThreadingMixIn, TCPServer) exposes httpd.keepgpu_server which accesses and
mutates self._sessions from handlers (methods start_keep, stop_keep, status)
without synchronization; add a threading.Lock (e.g., self._sessions_lock) to the
keepgpu server instance during initialization and wrap every read/write of
self._sessions in start_keep, stop_keep and status with the lock (use
acquire/release or a context manager) to prevent race conditions. Ensure the
lock is referenced by the handlers via httpd.keepgpu_server._sessions_lock (or
an accessor) so all concurrent handler threads synchronize on the same lock.

---

Duplicate comments:
In `@docs/plans/cli-service-dashboard/progress.md`:
- Around line 30-37: The plan's "In Progress" and "Next" sections are stale
because the PR is already open; update the "In Progress" bullets (replace
"Preparing final summary and PR after successful validation" with "Final summary
prepared; PR opened on 2026-03-04" or similar) and replace the "Next" list
(remove "Create commit and open PR for review") with current actionable items
like "Address review feedback", "Run final validation", and "Merge after
approval"; ensure headings "In Progress" and "Next" remain and include a
timestamp or note about the PR being open to keep the plan in sync with the PR
lifecycle.

In `@src/keep_gpu/mcp/server.py`:
- Around line 308-310: The route handling for "/api/sessions/" extracts job_id
with job_id = unquote(path.rsplit("/", 1)[-1]) but doesn't guard for an empty
string, allowing a trailing slash to produce job_id == "" which leads DELETE to
call stop_keep(job_id="") (treated as falsy and stopping all). Update both the
status branch (where server_ref.status(job_id=job_id) is called) and the delete
branch (where stop_keep is invoked) to check for an empty job_id (e.g., if not
job_id: return a 400/404 JSON response via self._json_response and do not call
server_ref.status or stop_keep) so empty job IDs are rejected rather than
interpreted as “stop all.”
- Around line 325-341: The POST handling currently only catches JSON parse
errors so runtime exceptions from server_ref.start_keep or _handle_request
escape the JSON error API; wrap the calls to server_ref.start_keep(**payload)
and the JSON-RPC _handle_request(server_ref, payload) in a try/except Exception
as exc and use self._json_response(500, {"error": {"message": str(exc)}}) to
return a consistent structured JSON error; keep the existing except
(json.JSONDecodeError, ValueError, UnicodeDecodeError) for 400s and avoid
re-raising so all failures produce JSON responses.
- Around line 313-314: When falling back to self._serve_static(path) for
unmatched GET requests, detect API paths (path.startswith("/api/")) and return a
JSON 404 instead of serving the SPA HTML. Modify the request handler that
currently calls self._serve_static(path) so that for GET and
path.startswith("/api/") it responds with a 404 status, Content-Type:
application/json and a small body like {"error":"Not Found"} (use the existing
response helper on the class or add a simple JSON response path), otherwise
continue to call self._serve_static(path).

ℹ️ Review info
Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: f5d3baf9-fd7d-4b6a-b0ca-f6985f5fbacd

📥 Commits

Reviewing files that changed from the base of the PR and between b8d7f42 and 34bceb4.

📒 Files selected for processing (6)
  • docs/plans/cli-service-dashboard/findings.md
  • docs/plans/cli-service-dashboard/progress.md
  • src/keep_gpu/mcp/server.py
  • src/keep_gpu/single_gpu_controller/cuda_gpu_controller.py
  • src/keep_gpu/single_gpu_controller/rocm_gpu_controller.py
  • tests/mcp/test_server.py

@Wangmerlyn
Copy link
Owner Author

/gemini review\n@coderabbitai review

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Mar 4, 2026

@Wangmerlyn Sure! I'll kick off a review of the PR now.


🧠 Learnings used
Learnt from: CR
Repo: Wangmerlyn/KeepGPU PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-02-15T07:10:27.370Z
Learning: If a review-related subagent exists, call it when the code is ready for review
✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

@gemini-code-assist
Copy link
Contributor

@Wangmerlyn I'm still actively performing a full code review of this pull request. Repeated invocations of /gemini review do not accelerate the process. I will post my detailed feedback as soon as the review is complete. Thank you for your patience!

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/keep_gpu/single_gpu_controller/rocm_gpu_controller.py (1)

84-99: ⚠️ Potential issue | 🟡 Minor

Log message emitted even when cache is not cleared.

When the thread doesn't stop within the timeout (line 84-90), the method returns early without clearing the cache. However, line 99 unconditionally logs "keep thread stopped & cache cleared", which is misleading for the early-return path where torch.cuda.empty_cache() was never called.

🐛 Proposed fix
             if self._thread.is_alive():
                 logger.warning(
                     "rank %s: ROCm keep thread did not stop within %.1fs",
                     self.rank,
                     join_timeout,
                 )
                 return
             torch.cuda.empty_cache()
+            logger.info("rank %s: keep thread stopped & cache cleared", self.rank)
         else:
             logger.warning("rank %s: keep thread not running", self.rank)
         if self._rocm_smi:
             try:
                 self._rocm_smi.rsmi_shut_down()
             except Exception as exc:  # pragma: no cover - best effort
                 logger.debug("rsmi_shut_down failed: %s", exc)
-        logger.info("rank %s: keep thread stopped & cache cleared", self.rank)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/keep_gpu/single_gpu_controller/rocm_gpu_controller.py` around lines 84 -
99, The final info log incorrectly states "keep thread stopped & cache cleared"
even when the method returns early because self._thread.is_alive() timed out and
torch.cuda.empty_cache() was not called; update the control flow so the
logger.info("rank %s: keep thread stopped & cache cleared", self.rank) is only
emitted after the cache clear and rsmi shutdown succeed (i.e., move or guard
that log behind the branch where torch.cuda.empty_cache() is executed and any
rsmi_shut_down handling), and ensure the early-return path logs a distinct
warning (using logger.warning with join_timeout) without claiming the cache was
cleared.
🧹 Nitpick comments (3)
tests/mcp/test_http_api.py (1)

50-77: Consider extracting server setup/teardown into a fixture.

The server setup pattern (TCPServer creation, thread start, base URL computation) and teardown (shutdown, server_close, thread.join) is duplicated across all test functions. A pytest fixture would reduce boilerplate and ensure consistent cleanup.

♻️ Suggested fixture pattern
import pytest

`@pytest.fixture`
def http_server():
    server = make_server()
    
    class _Server(TCPServer):
        allow_reuse_address = True

    httpd = _Server(("127.0.0.1", 0), _JSONRPCHandler)
    httpd.keepgpu_server = server
    thread = threading.Thread(target=httpd.serve_forever, daemon=True)
    thread.start()
    
    base = f"http://127.0.0.1:{httpd.server_address[1]}"
    
    yield base, server
    
    httpd.shutdown()
    httpd.server_close()
    server.shutdown()
    thread.join(timeout=2)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/mcp/test_http_api.py` around lines 50 - 77, The test duplicates HTTP
server setup/teardown (see test_http_health_and_static_index using make_server,
_Server subclass, _JSONRPCHandler, httpd and thread) so extract that logic into
a pytest fixture (e.g., http_server) that creates make_server(), instantiates
the _Server(("127.0.0.1", 0), _JSONRPCHandler), sets httpd.keepgpu_server,
starts the thread running httpd.serve_forever, yields the base URL and server
for tests, and in teardown calls httpd.shutdown(), httpd.server_close(),
server.shutdown(), and thread.join(timeout=2); update
test_http_health_and_static_index and other tests to accept the fixture and use
the yielded base and server instead of in-test setup/teardown.
tests/test_cli_service_commands.py (1)

183-187: Use pytest.fail() instead of assert False.

Lines 185 and 205 use assert False which can be optimized away with python -O. Consider using pytest.fail() for explicit test failures.

♻️ Proposed fix
+    import pytest
+
     try:
         cli._http_json_request("GET", "http://127.0.0.1:8765/health")
-        assert False, "Expected RuntimeError"
+        pytest.fail("Expected RuntimeError")
     except RuntimeError as exc:
         assert "Cannot reach KeepGPU service" in str(exc)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/test_cli_service_commands.py` around lines 183 - 187, Replace the
fragile "assert False, 'Expected RuntimeError'" usages with explicit pytest.fail
calls: import pytest in tests/test_cli_service_commands.py if it's not already
imported, and change the failing assertions around the
cli._http_json_request("GET", "http://127.0.0.1:8765/health") call (and the
other occurrence later in the file) to pytest.fail("Expected RuntimeError") so
the test failure cannot be optimized away by python -O and clearly signals the
intended failure.
tests/mcp/test_server.py (1)

116-130: Test relies on job iteration order matching creation order.

The assertion result["stopped"] == [job_a] assumes the server iterates jobs in insertion order. If the underlying data structure (e.g., a dict prior to Python 3.7 or any reordering logic) changes, this test becomes flaky. Consider using set comparisons instead:

assert set(result["stopped"]) == {job_a}
assert set(result["timed_out"]) == {job_b}

Alternatively, verify the server implementation guarantees insertion-order iteration.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/mcp/test_server.py` around lines 116 - 130, The test
test_stop_all_tracks_timeouts assumes insertion-order iteration of jobs which
makes it flaky; update the assertions in test_stop_all_tracks_timeouts to
compare as unordered collections (e.g., use set(result["stopped"]) == {job_a}
and set(result["timed_out"]) == {job_b}) or otherwise assert membership (job_a
in result["stopped"], job_b in result["timed_out"]) so the test no longer
depends on server.start_keep/stop_keep iteration order when monkeypatching
_release_with_timeout.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@docs/reference/cli.md`:
- Around line 70-72: The `keep-gpu list-gpus` command documentation is missing
an options table; add a consistent options section showing the service
connection flags (e.g., --host and --port) for the `keep-gpu list-gpus` heading
so readers know how to point the command at the service. Under the `###
`keep-gpu list-gpus`` block insert an options table listing `--host <host>` and
`--port <port>` with brief descriptions and defaults consistent with other
service commands in the docs.
- Around line 12-14: Update the command summaries for the CLI doc so the "stop"
and "list-gpus" entries include the --host and --port flags like the "status"
and "service-stop" lines do; specifically, modify the summary lines that
currently read "keep-gpu stop (--job-id ID | --all)" and "keep-gpu list-gpus" to
include "[--host 127.0.0.1] [--port 8765]" so they match the detailed sections
and the implemented CLI behavior (commands: stop, list-gpus, and reference
service-stop/status for flag format).

---

Outside diff comments:
In `@src/keep_gpu/single_gpu_controller/rocm_gpu_controller.py`:
- Around line 84-99: The final info log incorrectly states "keep thread stopped
& cache cleared" even when the method returns early because
self._thread.is_alive() timed out and torch.cuda.empty_cache() was not called;
update the control flow so the logger.info("rank %s: keep thread stopped & cache
cleared", self.rank) is only emitted after the cache clear and rsmi shutdown
succeed (i.e., move or guard that log behind the branch where
torch.cuda.empty_cache() is executed and any rsmi_shut_down handling), and
ensure the early-return path logs a distinct warning (using logger.warning with
join_timeout) without claiming the cache was cleared.

---

Nitpick comments:
In `@tests/mcp/test_http_api.py`:
- Around line 50-77: The test duplicates HTTP server setup/teardown (see
test_http_health_and_static_index using make_server, _Server subclass,
_JSONRPCHandler, httpd and thread) so extract that logic into a pytest fixture
(e.g., http_server) that creates make_server(), instantiates the
_Server(("127.0.0.1", 0), _JSONRPCHandler), sets httpd.keepgpu_server, starts
the thread running httpd.serve_forever, yields the base URL and server for
tests, and in teardown calls httpd.shutdown(), httpd.server_close(),
server.shutdown(), and thread.join(timeout=2); update
test_http_health_and_static_index and other tests to accept the fixture and use
the yielded base and server instead of in-test setup/teardown.

In `@tests/mcp/test_server.py`:
- Around line 116-130: The test test_stop_all_tracks_timeouts assumes
insertion-order iteration of jobs which makes it flaky; update the assertions in
test_stop_all_tracks_timeouts to compare as unordered collections (e.g., use
set(result["stopped"]) == {job_a} and set(result["timed_out"]) == {job_b}) or
otherwise assert membership (job_a in result["stopped"], job_b in
result["timed_out"]) so the test no longer depends on
server.start_keep/stop_keep iteration order when monkeypatching
_release_with_timeout.

In `@tests/test_cli_service_commands.py`:
- Around line 183-187: Replace the fragile "assert False, 'Expected
RuntimeError'" usages with explicit pytest.fail calls: import pytest in
tests/test_cli_service_commands.py if it's not already imported, and change the
failing assertions around the cli._http_json_request("GET",
"http://127.0.0.1:8765/health") call (and the other occurrence later in the
file) to pytest.fail("Expected RuntimeError") so the test failure cannot be
optimized away by python -O and clearly signals the intended failure.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: a39269fc-37ba-450f-8102-eacdac6699ee

📥 Commits

Reviewing files that changed from the base of the PR and between b8d7f42 and a15315b.

📒 Files selected for processing (12)
  • docs/guides/cli.md
  • docs/plans/cli-service-dashboard/findings.md
  • docs/plans/cli-service-dashboard/progress.md
  • docs/reference/cli.md
  • skills/gpu-keepalive-with-keepgpu/SKILL.md
  • src/keep_gpu/cli.py
  • src/keep_gpu/mcp/server.py
  • src/keep_gpu/single_gpu_controller/cuda_gpu_controller.py
  • src/keep_gpu/single_gpu_controller/rocm_gpu_controller.py
  • tests/mcp/test_http_api.py
  • tests/mcp/test_server.py
  • tests/test_cli_service_commands.py

Comment on lines +12 to +14
keep-gpu stop (--job-id ID | --all)
keep-gpu service-stop [--host 127.0.0.1] [--port 8765] [--force]
keep-gpu list-gpus
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

# First, find and read the CLI documentation file
find . -name "cli.md" -type f | head -5

Repository: Wangmerlyn/KeepGPU

Length of output: 106


🏁 Script executed:

# Also search for the CLI implementation to verify flag support
find . -name "cli.py" -type f | head -5

Repository: Wangmerlyn/KeepGPU

Length of output: 83


🏁 Script executed:

# Search for list-gpus and stop command implementations
rg -n "def.*list.*gpu|def.*stop" --type=py -B 2 -A 10 | head -100

Repository: Wangmerlyn/KeepGPU

Length of output: 6858


🏁 Script executed:

# Read the CLI documentation file to see the summary and detailed sections
cat -n ./docs/reference/cli.md | head -80

Repository: Wangmerlyn/KeepGPU

Length of output: 3198


🏁 Script executed:

# Read the CLI implementation to check the actual command definitions
cat -n ./src/keep_gpu/cli.py | head -150

Repository: Wangmerlyn/KeepGPU

Length of output: 5521


🏁 Script executed:

# Search for the stop and list-gpus command definitions in cli.py
rg -n "def stop|def list.?gpus|@app.command" ./src/keep_gpu/cli.py -A 15 | head -150

Repository: Wangmerlyn/KeepGPU

Length of output: 3790


stop and list-gpus command summaries are missing --host/--port flags.

The detailed sections for both commands (lines 64-68 and 70-72) document --host, --port options, but the summary omits them. The implementations confirm both commands support these flags. Update the summaries to be consistent with status and service-stop.

📝 Suggested fix
-keep-gpu stop (--job-id ID | --all)
-keep-gpu list-gpus
+keep-gpu stop (--job-id ID | --all) [--host 127.0.0.1] [--port 8765]
+keep-gpu list-gpus [--host 127.0.0.1] [--port 8765]
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
keep-gpu stop (--job-id ID | --all)
keep-gpu service-stop [--host 127.0.0.1] [--port 8765] [--force]
keep-gpu list-gpus
keep-gpu stop (--job-id ID | --all) [--host 127.0.0.1] [--port 8765]
keep-gpu service-stop [--host 127.0.0.1] [--port 8765] [--force]
keep-gpu list-gpus [--host 127.0.0.1] [--port 8765]
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/reference/cli.md` around lines 12 - 14, Update the command summaries for
the CLI doc so the "stop" and "list-gpus" entries include the --host and --port
flags like the "status" and "service-stop" lines do; specifically, modify the
summary lines that currently read "keep-gpu stop (--job-id ID | --all)" and
"keep-gpu list-gpus" to include "[--host 127.0.0.1] [--port 8765]" so they match
the detailed sections and the implemented CLI behavior (commands: stop,
list-gpus, and reference service-stop/status for flag format).

Comment on lines +70 to +72
### `keep-gpu list-gpus`

## Exit codes
Returns GPU telemetry from service.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

list-gpus section lacks an options table.

Other service commands document their --host/--port options. If list-gpus contacts the service (which it must, to return telemetry), add the options table for consistency.

📝 Suggested addition after line 72
 Returns GPU telemetry from service.
+
+| Option | Description |
+| --- | --- |
+| `--host`, `--port` | Service host/port. |
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
### `keep-gpu list-gpus`
## Exit codes
Returns GPU telemetry from service.
### `keep-gpu list-gpus`
Returns GPU telemetry from service.
| Option | Description |
| --- | --- |
| `--host`, `--port` | Service host/port. |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/reference/cli.md` around lines 70 - 72, The `keep-gpu list-gpus` command
documentation is missing an options table; add a consistent options section
showing the service connection flags (e.g., --host and --port) for the `keep-gpu
list-gpus` heading so readers know how to point the command at the service.
Under the `### `keep-gpu list-gpus`` block insert an options table listing
`--host <host>` and `--port <port>` with brief descriptions and defaults
consistent with other service commands in the docs.

@Wangmerlyn Wangmerlyn merged commit e296c88 into main Mar 4, 2026
5 checks passed
@Wangmerlyn Wangmerlyn deleted the feat/cli-service-dashboard branch March 4, 2026 07:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant