Skip to content

Conversation

@therealnb
Copy link
Collaborator

Some changes to align the docs and demo config.

  • Fixed proxyMode in mcpserver_github.yaml - Changed from sse to streamable-http
  • Fixed Helm chart default image repository - Changed from ghcr.io/stacklok/mcp-optimizer to ghcr.io/stackloklabs/mcp-optimizer in values.yaml
  • Fixed Helm chart appVersion - Changed from 0.1.0 to 0.1.1 in Chart.yaml
  • Updated documentation - Fixed all references to use the correct image repository path

Signed-off-by: nigel brown <nigel@stacklok.com>
@claude
Copy link

claude bot commented Nov 17, 2025

PR Review

Overall Assessment: Approved

This PR correctly aligns documentation and configuration with the actual repository organization (stackloklabs vs stacklok).

Positive Points

  • Consistency: All image references now correctly point to ghcr.io/stackloklabs/mcp-optimizer
  • Version Bump: appVersion updated to 0.1.1 appropriately
  • Config Fix: proxyMode changed from sse to streamable-http for GitHub MCP server example

Minor Observations

  1. Clarity: The proxyMode change (sse → streamable-http) in mcpserver_github.yaml is mentioned in the PR body but without rationale. Consider if this deserves a brief inline comment explaining why streamable-http is preferred over sse.

  2. Consistency Check: Verify that 0.1.1 image actually exists at ghcr.io/stackloklabs/mcp-optimizer:0.1.1 before merging.

Code Quality

  • ✅ No code logic changes
  • ✅ No security concerns
  • ✅ No performance impact
  • ✅ No breaking changes
  • ✅ Clear and minimal changes

Recommendation: Merge after confirming 0.1.1 image availability.

@therealnb therealnb requested a review from ptelang November 17, 2025 11:10
@therealnb
Copy link
Collaborator Author

With these changes it is still not working. There was a change in how toolhive-operator handles env.
Let me check the ToolHive git history for recent changes to environment variable handling. There was a significant architectural change on November 10, 2025:

The Change

Commit 2485c15e - "removes unused cli flags from proxyrunner" fundamentally changed how configuration is passed:

Before (CLI flags):

  • The operator would pass configuration via command-line flags like --env, --transport, etc.
  • Environment variables from podTemplateSpec would be converted to CLI arguments

After (File-based config):

  • The proxyrunner now looks for /etc/runconfig/runconfig.json
  • If found, it uses runWithFileBasedConfig() which bypasses all CLI flags
  • The file is supposed to contain the complete RunConfig

The Bug

Looking at the logs we saw:

{"level":"info","ts":1763374999.877413,"caller":"logger/logger.go:39","msg":"Successfully loaded configuration from /etc/runconfig/runconfig.json"}

The operator IS creating the runconfig.json file, BUT it's not properly including the environment variables from the podTemplateSpec.spec.containers[0].env in that file.

The operator needs to be updated to ensure that when it creates the /etc/runconfig/runconfig.json, it includes all the environment variables from the MCPServer spec.

@therealnb
Copy link
Collaborator Author

Reported stacklok/toolhive#2606 and stacklok/toolhive#2605.

I don't think we can make this work for now.

@therealnb
Copy link
Collaborator Author

Some background:

Kubernetes Deployment Issues - Investigation Summary

Date: November 17, 2025
Status: ⚠️ Partially Working - MCP Optimizer deployment blocked by operator bugs

Overview

This document summarizes the investigation into kubernetes-integration.md documentation accuracy and deployment issues discovered during testing in Rancher Desktop.

Documentation Issues Found & Fixed

1. ✅ FIXED: Wrong Container Registry Path

  • Issue: Documentation referenced ghcr.io/stacklok/mcp-optimizer
  • Correct: ghcr.io/stackloklabs/mcp-optimizer (note: "stackloklabs")
  • Impact: Image pull failed with "denied" error
  • Files Updated:
    • docs/kubernetes-integration.md - All registry references
    • helm/mcp-optimizer/values.yaml - Default image repository
    • helm/mcp-optimizer/Chart.yaml - appVersion updated to 0.1.1

2. ✅ FIXED: GitHub Server ProxyMode Mismatch

  • Issue: Example showed proxyMode: sse but docs recommended streamable-http
  • Fixed: examples/mcp-servers/mcpserver_github.yaml updated to use streamable-http
  • Note: SSE is deprecated, streamable-http is recommended

3. ✅ FIXED: Helm Chart Using PodTemplateSpec (Incompatible with Current Operator)

  • Issue: Default values.yaml used podTemplateSpec for env vars and volumes
  • Root Cause: ToolHive operator refactoring on Nov 10, 2025 (commit 2485c15e)
  • Fixed: Updated Helm chart to use top-level spec.env and spec.volumes

ToolHive Operator Bugs Discovered

Bug #1: PodTemplateSpec Environment Variables Not Transferred ⚠️

Introduced: November 10, 2025 (commit 2485c15e)
Severity: HIGH - Breaks existing deployments using podTemplateSpec

Description:

  • Operator switched from CLI flags to file-based config (runconfig.json)
  • Function createRunConfigFromMCPServer() only reads m.Spec.Env
  • Environment variables in m.Spec.PodTemplateSpec.spec.containers[0].env are IGNORED

Location: cmd/thv-operator/controllers/mcpserver_runconfig.go:86

envVars := convertEnvVarsFromMCPServer(m.Spec.Env)  // Only reads top-level Env!

Workaround: Use top-level spec.env instead of spec.podTemplateSpec

Timeline:

  • Before Nov 10, 2025: Operator passed config via CLI flags ✅ WORKED
  • After Nov 10, 2025: Operator creates runconfig.json from top-level fields only ❌ BROKEN

Bug #2: Volumes Not Applied to Pods 🛑

Severity: CRITICAL - Blocks MCP Optimizer deployment

Description:

  • spec.volumes field is accepted in MCPServer CRD
  • Operator reads volumes but does NOT:
    • Add them to runconfig.json
    • Mount them in the actual pod spec
    • Create volume definitions in StatefulSet/Deployment

Impact:

  • MCP Optimizer requires writable /tmp for SQLite database
  • Without volume mounts, container crashes: "unable to open database file"
  • NO WORKAROUND AVAILABLE

Evidence:

# Volumes in MCPServer spec:
$ kubectl get mcpserver mcp-optimizer -o jsonpath='{.spec.volumes}'
[{"hostPath":"","mountPath":"/tmp","name":"tmp","readOnly":false}]

# But NOT in runconfig.json:
$ kubectl get configmap mcp-optimizer-runconfig -o jsonpath='{.data.runconfig\.json}' | jq '.volumes'
null

# And NOT in pod spec:
$ kubectl get pod mcp-optimizer-0 -o jsonpath='{.spec.volumes}'
[only serviceaccount volumes, no user-defined volumes]

What Used to Work (Before Nov 10, 2025)

The original configuration in helm/mcp-optimizer/values.yaml (commit 48921bc):

spec:
  podTemplateSpec:
    spec:
      volumes:
        - name: data
          emptyDir: {}
        - name: tmp
          emptyDir: {}
      containers:
        - name: mcp
          volumeMounts:
            - name: data
              mountPath: /data
            - name: tmp
              mountPath: /tmp
          env:
            - name: DB_URL
              value: "sqlite:///data/mcp_optimizer.db"

This worked because the operator:

  1. Read podTemplateSpec
  2. Converted it to CLI flags for thv-proxyrunner
  3. Proxyrunner created pods with proper volumes and env vars

Current Workarounds Applied

For Environment Variables: ✅ Working

Use top-level spec.env instead of spec.podTemplateSpec:

apiVersion: toolhive.stacklok.dev/v1alpha1
kind: MCPServer
spec:
  env:
    - name: RUNTIME_MODE
      value: "k8s"
    - name: DB_URL
      value: "sqlite:///tmp/mcp_optimizer.db"

For Volumes: ❌ Not Working

No workaround available. Volumes defined in spec.volumes are not applied to pods.

Files Updated

  1. Documentation:

    • docs/kubernetes-integration.md - Fixed all registry paths
  2. Helm Chart:

    • helm/mcp-optimizer/values.yaml - Changed to use top-level env, removed podTemplateSpec, added volumes
    • helm/mcp-optimizer/Chart.yaml - Updated appVersion to 0.1.1
    • helm/mcp-optimizer/templates/mcpserver.yaml - Added volume template support
  3. Examples:

    • examples/mcp-servers/mcpserver_github.yaml - Fixed proxyMode, added comments
    • examples/mcp-servers/mcpserver_fetch.yaml - Added best practice comments

Current Deployment Status

Component Status Notes
ToolHive Operator ✅ Running Version 0.2.22
Fetch MCP Server ✅ Running Working correctly
GitHub MCP Server ✅ Running Working correctly (after proxyMode fix)
MCP Optimizer ❌ CrashLoopBackOff Blocked by volume bug

Recommendations

For ToolHive Team

  1. Fix Volume Handling (Critical):

    • Update createRunConfigFromMCPServer() to include volumes in runconfig.json
    • OR apply volumes directly to pod spec (bypassing runconfig.json)
    • Test with volume configurations
  2. Consider PodTemplateSpec Support:

    • Either: Support podTemplateSpec in runconfig.json creation
    • Or: Document that podTemplateSpec is no longer supported
    • Or: Apply podTemplateSpec directly to pods (bypass runconfig.json for these fields)
  3. Add Migration Guide:

    • Document the Nov 10, 2025 breaking changes
    • Provide migration path from podTemplateSpec to top-level fields
    • Update all examples and documentation

For MCP Optimizer Users

Current Status: Cannot deploy MCP Optimizer successfully in Kubernetes due to volume mounting bug.

Temporary Options:

  1. Wait for ToolHive operator fix
  2. Use local/Docker deployment instead of Kubernetes
  3. Build custom operator version with volume fix

Test Environment

  • Kubernetes: Rancher Desktop
  • ToolHive Operator Version: 0.2.22
  • ToolHive Operator CRDs: 0.0.34
  • MCP Optimizer Image: ghcr.io/stackloklabs/mcp-optimizer:0.1.1
  • Test Date: November 17, 2025

References

  • ToolHive Operator Commit: 2485c15e - "removes unused cli flags from proxyrunner"
  • MCP Optimizer Initial Commit: 48921bc - Original working configuration
  • Operator Source: /Users/nigel/code/toolhive/cmd/thv-operator/controllers/mcpserver_runconfig.go

@ptelang ptelang merged commit 9f61c4a into main Nov 17, 2025
6 checks passed
@ptelang ptelang deleted the k8s-config-doc-tweaks branch November 17, 2025 14:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants