Codex MCP Server

An MCP (Model Context Protocol) server that uses OpenAI-compatible LLMs to evaluate outputs from other LLMs. It provides specialized evaluation tools for code review, architecture design, UI/UX design, test cases, and custom scenarios.

Features

6 Evaluation Tools:
- evaluate_code_review - Evaluate code for quality, bugs, security, and performance
- evaluate_architecture - Evaluate system/software architecture designs
- evaluate_uiux - Evaluate UI/UX designs for usability and accessibility
- evaluate_test_cases - Evaluate test case coverage and quality
- evaluate_custom - Evaluate with custom user-defined criteria
- list_evaluation_criteria - List predefined evaluation criteria
Flexible Input: Accept content as direct text or file paths
Structured Output: JSON results with scores, issues, strengths, and improvements
OpenAI Compatible: Works with any OpenAI-compatible API (OpenAI, Azure, local LLMs)

Installation

Clone or download this repository
Install dependencies:

npm install

Build the TypeScript code:

npm run build

Configuration

The server requires the following environment variables:

Variable	Description	Required	Default
`LLM_API_KEY`	API key for the LLM service	Yes	-
`LLM_API_BASE_URL`	OpenAI-compatible API base URL	No	`https://api.openai.com/v1`
`LLM_MODEL`	Model name to use	No	`gpt-5.2`
`LLM_MAX_TOKENS`	Maximum tokens for response	No	`4096`
`LLM_TEMPERATURE`	Temperature for generation	No	`0.3`

MCP Configuration

Add this server to your MCP settings file:

For Kilo Code (VS Code)

Edit ~/Library/Application Support/Code/User/globalStorage/kilocode.kilo-code/settings/mcp_settings.json:

{
  "mcpServers": {
    "codex-evaluator": {
      "command": "node",
      "args": ["/path/to/codex-mcp/build/index.js"],
      "env": {
        "LLM_API_KEY": "your-openai-api-key",
        "LLM_API_BASE_URL": "https://api.openai.com/v1",
        "LLM_MODEL": "gpt-5.2",
        "LLM_MAX_TOKENS": "4096",
        "LLM_TEMPERATURE": "0.3"
      },
      "disabled": false,
      "alwaysAllow": [],
      "disabledTools": []
    }
  }
}

For Claude Desktop

Edit ~/Library/Application Support/Claude/claude_desktop_config.json:

{
  "mcpServers": {
    "codex-evaluator": {
      "command": "node",
      "args": ["/path/to/codex-mcp/build/index.js"],
      "env": {
        "LLM_API_KEY": "your-openai-api-key",
        "LLM_MODEL": "gpt-5.2"
      }
    }
  }
}

Usage Examples

Code Review

Use evaluate_code_review to review this Python function:

def calculate_discount(price, discount):
    return price - (price * discount / 100)

Architecture Evaluation

Use evaluate_architecture to evaluate this microservices design:

Service A -> Message Queue -> Service B -> Database
Service A -> Cache -> Service C -> External API

UI/UX Evaluation

Use evaluate_uiux to evaluate this login page design:

- Header with logo
- Email input field
- Password input field  
- "Remember me" checkbox
- "Forgot password" link
- Login button
- "Sign up" link at bottom

Test Case Evaluation

Use evaluate_test_cases to evaluate these unit tests:

describe('Calculator', () => {
  it('should add two numbers', () => {
    expect(add(2, 3)).toBe(5);
  });
});

Custom Evaluation

Use evaluate_custom with criteria "Check for SQL injection vulnerabilities and XSS attacks" on this code:

const query = "SELECT * FROM users WHERE name = '" + userName + "'";

List Criteria

Use list_evaluation_criteria with scenario "code_review" to see what criteria are used.

Output Format

All evaluation tools return a structured JSON response:

{
  "summary": "Brief overall assessment",
  "score": {
    "overall": 7.5,
    "categories": {
      "correctness": 8,
      "security": 7,
      "performance": 8,
      "maintainability": 7,
      "best_practices": 7
    }
  },
  "issues": [
    {
      "severity": "high",
      "category": "security",
      "location": "line 5",
      "description": "SQL injection vulnerability",
      "suggestion": "Use parameterized queries"
    }
  ],
  "strengths": [
    "Clear function naming",
    "Good code organization"
  ],
  "improvements": [
    {
      "priority": "high",
      "description": "Add input validation",
      "rationale": "Prevents invalid data from causing errors"
    }
  ],
  "metadata": {
    "evaluator_model": "gpt-5.2",
    "scenario": "code_review",
    "timestamp": "2024-01-15T10:30:00Z",
    "input_type": "text",
    "processing_time_ms": 2500
  }
}

Development

Build

npm run build

Watch Mode

npm run dev

Run Directly

npm start

Evaluation Criteria

Code Review

Correctness (25%): Logic errors, edge cases, type safety
Security (20%): Vulnerabilities, input validation, data exposure
Performance (15%): Efficiency, memory usage, async patterns
Maintainability (20%): Organization, naming, complexity
Best Practices (20%): Design patterns, SOLID, DRY/KISS

Architecture

Scalability (20%): Scaling capabilities, bottlenecks
Reliability (20%): Fault tolerance, redundancy
Maintainability (20%): Modularity, coupling, cohesion
Security (15%): Authentication, data protection
Cost Efficiency (10%): Resource utilization
Performance (15%): Latency, throughput

UI/UX

Usability (25%): Navigation, efficiency, error prevention
Accessibility (20%): WCAG, screen readers, contrast
Visual Design (15%): Consistency, hierarchy, typography
User Flow (20%): Task completion, feedback
Responsiveness (10%): Device adaptation
Content (10%): Messaging clarity, help text

Test Cases

Coverage (25%): Code/branch/path coverage
Edge Cases (20%): Boundaries, errors, null inputs
Assertions (20%): Meaningful assertions, clarity
Structure (20%): Organization, naming, isolation
Maintainability (15%): Independence, data management

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
src		src
.gitignore		.gitignore
README.md		README.md
package.json		package.json
tsconfig.json		tsconfig.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Codex MCP Server

Features

Installation

Configuration

MCP Configuration

For Kilo Code (VS Code)

For Claude Desktop

Usage Examples

Code Review

Architecture Evaluation

UI/UX Evaluation

Test Case Evaluation

Custom Evaluation

List Criteria

Output Format

Development

Build

Watch Mode

Run Directly

Evaluation Criteria

Code Review

Architecture

UI/UX

Test Cases

License

About

Uh oh!

Releases

Packages

Languages

Wittey-coder/codex-mcp

Folders and files

Latest commit

History

Repository files navigation

Codex MCP Server

Features

Installation

Configuration

MCP Configuration

For Kilo Code (VS Code)

For Claude Desktop

Usage Examples

Code Review

Architecture Evaluation

UI/UX Evaluation

Test Case Evaluation

Custom Evaluation

List Criteria

Output Format

Development

Build

Watch Mode

Run Directly

Evaluation Criteria

Code Review

Architecture

UI/UX

Test Cases

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages