Skip to content

Wittey-coder/codex-mcp

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Codex MCP Server

An MCP (Model Context Protocol) server that uses OpenAI-compatible LLMs to evaluate outputs from other LLMs. It provides specialized evaluation tools for code review, architecture design, UI/UX design, test cases, and custom scenarios.

Features

  • 6 Evaluation Tools:

    • evaluate_code_review - Evaluate code for quality, bugs, security, and performance
    • evaluate_architecture - Evaluate system/software architecture designs
    • evaluate_uiux - Evaluate UI/UX designs for usability and accessibility
    • evaluate_test_cases - Evaluate test case coverage and quality
    • evaluate_custom - Evaluate with custom user-defined criteria
    • list_evaluation_criteria - List predefined evaluation criteria
  • Flexible Input: Accept content as direct text or file paths

  • Structured Output: JSON results with scores, issues, strengths, and improvements

  • OpenAI Compatible: Works with any OpenAI-compatible API (OpenAI, Azure, local LLMs)

Installation

  1. Clone or download this repository
  2. Install dependencies:
npm install
  1. Build the TypeScript code:
npm run build

Configuration

The server requires the following environment variables:

Variable Description Required Default
LLM_API_KEY API key for the LLM service Yes -
LLM_API_BASE_URL OpenAI-compatible API base URL No https://api.openai.com/v1
LLM_MODEL Model name to use No gpt-5.2
LLM_MAX_TOKENS Maximum tokens for response No 4096
LLM_TEMPERATURE Temperature for generation No 0.3

MCP Configuration

Add this server to your MCP settings file:

For Kilo Code (VS Code)

Edit ~/Library/Application Support/Code/User/globalStorage/kilocode.kilo-code/settings/mcp_settings.json:

{
  "mcpServers": {
    "codex-evaluator": {
      "command": "node",
      "args": ["/path/to/codex-mcp/build/index.js"],
      "env": {
        "LLM_API_KEY": "your-openai-api-key",
        "LLM_API_BASE_URL": "https://api.openai.com/v1",
        "LLM_MODEL": "gpt-5.2",
        "LLM_MAX_TOKENS": "4096",
        "LLM_TEMPERATURE": "0.3"
      },
      "disabled": false,
      "alwaysAllow": [],
      "disabledTools": []
    }
  }
}

For Claude Desktop

Edit ~/Library/Application Support/Claude/claude_desktop_config.json:

{
  "mcpServers": {
    "codex-evaluator": {
      "command": "node",
      "args": ["/path/to/codex-mcp/build/index.js"],
      "env": {
        "LLM_API_KEY": "your-openai-api-key",
        "LLM_MODEL": "gpt-5.2"
      }
    }
  }
}

Usage Examples

Code Review

Use evaluate_code_review to review this Python function:

def calculate_discount(price, discount):
    return price - (price * discount / 100)

Architecture Evaluation

Use evaluate_architecture to evaluate this microservices design:

Service A -> Message Queue -> Service B -> Database
Service A -> Cache -> Service C -> External API

UI/UX Evaluation

Use evaluate_uiux to evaluate this login page design:

- Header with logo
- Email input field
- Password input field  
- "Remember me" checkbox
- "Forgot password" link
- Login button
- "Sign up" link at bottom

Test Case Evaluation

Use evaluate_test_cases to evaluate these unit tests:

describe('Calculator', () => {
  it('should add two numbers', () => {
    expect(add(2, 3)).toBe(5);
  });
});

Custom Evaluation

Use evaluate_custom with criteria "Check for SQL injection vulnerabilities and XSS attacks" on this code:

const query = "SELECT * FROM users WHERE name = '" + userName + "'";

List Criteria

Use list_evaluation_criteria with scenario "code_review" to see what criteria are used.

Output Format

All evaluation tools return a structured JSON response:

{
  "summary": "Brief overall assessment",
  "score": {
    "overall": 7.5,
    "categories": {
      "correctness": 8,
      "security": 7,
      "performance": 8,
      "maintainability": 7,
      "best_practices": 7
    }
  },
  "issues": [
    {
      "severity": "high",
      "category": "security",
      "location": "line 5",
      "description": "SQL injection vulnerability",
      "suggestion": "Use parameterized queries"
    }
  ],
  "strengths": [
    "Clear function naming",
    "Good code organization"
  ],
  "improvements": [
    {
      "priority": "high",
      "description": "Add input validation",
      "rationale": "Prevents invalid data from causing errors"
    }
  ],
  "metadata": {
    "evaluator_model": "gpt-5.2",
    "scenario": "code_review",
    "timestamp": "2024-01-15T10:30:00Z",
    "input_type": "text",
    "processing_time_ms": 2500
  }
}

Development

Build

npm run build

Watch Mode

npm run dev

Run Directly

npm start

Evaluation Criteria

Code Review

  • Correctness (25%): Logic errors, edge cases, type safety
  • Security (20%): Vulnerabilities, input validation, data exposure
  • Performance (15%): Efficiency, memory usage, async patterns
  • Maintainability (20%): Organization, naming, complexity
  • Best Practices (20%): Design patterns, SOLID, DRY/KISS

Architecture

  • Scalability (20%): Scaling capabilities, bottlenecks
  • Reliability (20%): Fault tolerance, redundancy
  • Maintainability (20%): Modularity, coupling, cohesion
  • Security (15%): Authentication, data protection
  • Cost Efficiency (10%): Resource utilization
  • Performance (15%): Latency, throughput

UI/UX

  • Usability (25%): Navigation, efficiency, error prevention
  • Accessibility (20%): WCAG, screen readers, contrast
  • Visual Design (15%): Consistency, hierarchy, typography
  • User Flow (20%): Task completion, feedback
  • Responsiveness (10%): Device adaptation
  • Content (10%): Messaging clarity, help text

Test Cases

  • Coverage (25%): Code/branch/path coverage
  • Edge Cases (20%): Boundaries, errors, null inputs
  • Assertions (20%): Meaningful assertions, clarity
  • Structure (20%): Organization, naming, isolation
  • Maintainability (15%): Independence, data management

License

MIT

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published