Skip to content

Add Moonshot AI Integration #2891

@jelveh

Description

@jelveh

Add Moonshot AI (Kimi) as a new AI provider integration to Puter. Moonshot AI offers the Kimi model family, featuring long context windows (up to 262K tokens), multimodal input (text, image, video), built-in reasoning/thinking mode, and tool calling with up to 128 functions. Their flagship model kimi-k2.6 is a trillion-parameter model competitive with frontier models.

Moonshot AI provides an OpenAI-compatible API (https://api.moonshot.ai/v1), which simplifies integration significantly.

Scope of Work

1. Backend - Chat Completion Provider

Directory: src/backend/drivers/ai-chat/providers/moonshot/

Files to create:

  • Moonshot.ts - Main provider class extending the base chat provider
  • models.ts - Model definitions with costs, context windows, and capabilities

Provider class must implement:

interface IChatProvider {
    models(extra_params?: unknown): IChatModel[] | Promise<IChatModel[]>;
    list(): string[] | Promise<string[]>;
    getDefaultModel(): string;
    complete(arg: ICompleteArguments): Promise<IChatCompleteResult>;
}

Models to include:

Flagship

Model Context Input (cache miss) Input (cache hit) Output Capabilities
kimi-k2.6 262,144 $0.95/1M $0.16/1M $4.00/1M Chat, Tool calling, JSON mode, Thinking mode, Partial mode

Kimi K2.5

Model Context Input (cache miss) Input (cache hit) Output Capabilities
kimi-k2.5 262,144 $0.60/1M $0.10/1M $3.00/1M Chat, Vision (image + video), Tool calling, JSON mode, Thinking mode

Kimi K2 Series (being discontinued May 25, 2026 - migrate to kimi-k2.6)

Model Context Input (cache miss) Input (cache hit) Output Capabilities
kimi-k2-0905-preview 262,144 $0.60/1M $0.15/1M $2.50/1M Chat, Tool calling, JSON mode
kimi-k2-0711-preview 131,072 $0.60/1M $0.15/1M $2.50/1M Chat, Tool calling, JSON mode
kimi-k2-turbo-preview 262,144 $1.15/1M $0.15/1M $8.00/1M Chat, Tool calling, JSON mode, Fast (60-100 tok/s)
kimi-k2-thinking 262,144 $0.60/1M $0.15/1M $2.50/1M Chat, Tool calling, JSON mode, Reasoning
kimi-k2-thinking-turbo 262,144 $1.15/1M $0.15/1M $8.00/1M Chat, Tool calling, JSON mode, Reasoning, Fast

Moonshot V1 (Legacy)

Model Context Input Output Capabilities
moonshot-v1-8k 8,192 $0.20/1M $2.00/1M Chat, Tool calling
moonshot-v1-32k 32,768 $1.00/1M $3.00/1M Chat, Tool calling
moonshot-v1-128k 131,072 $2.00/1M $5.00/1M Chat, Tool calling
moonshot-v1-auto Auto Auto Auto Chat, Tool calling (auto-selects context)
moonshot-v1-8k-vision-preview 8,192 $0.20/1M $2.00/1M Chat, Vision, Tool calling
moonshot-v1-32k-vision-preview 32,768 $1.00/1M $3.00/1M Chat, Vision, Tool calling
moonshot-v1-128k-vision-preview 131,072 $2.00/1M $5.00/1M Chat, Vision, Tool calling

Note: The default model should be kimi-k2.6 as it is the current flagship. The kimi-k2 series is scheduled for discontinuation on May 25, 2026. Consider whether to include K2 models at all given the timeline.

Implementation notes:

  • Use the OpenAI SDK with custom baseURL: https://api.moonshot.ai/v1 (similar to xAI and DeepSeek providers)
  • Authentication: Bearer token via Authorization header
  • Support streaming (stream: true) and non-streaming responses
  • Tool/function calling is supported on all models (up to 128 tools, with optional strict parameter)
  • response_format supports text, json_object, and json_schema
  • kimi-k2.6 supports a thinking parameter: { type: "enabled" | "disabled" } - consider exposing this
  • kimi-k2.5 supports multimodal input: text, image_url, and video_url content types
  • Image/video can be passed as base64 data URIs or Moonshot file references (ms://<file_id>)
  • Prompt caching is automatic; usage response includes cached_tokens field
  • finish_reason values: stop, length, tool_calls
  • The n parameter supports up to 5 completions (but only 1 when temperature is near 0)

Model definition example (kimi-k2.6):

{
    puterId: 'moonshot:moonshot/kimi-k2.6',
    id: 'kimi-k2.6',
    name: 'Kimi K2.6',
    aliases: ['kimi-k26', 'kimi'],
    modalities: { input: ['text'], output: ['text'] },
    costs_currency: 'usd-cents',
    input_cost_key: 'prompt_tokens',
    output_cost_key: 'completion_tokens',
    costs: {
        tokens: 1_000_000,
        prompt_tokens: 95,       // $0.95 per 1M = 95 cents per 1M
        completion_tokens: 400,  // $4.00 per 1M = 400 cents per 1M
        cached_tokens: 16,       // $0.16 per 1M = 16 cents per 1M
    },
    context: 262144,
    max_tokens: 262144,
    tool_call: true,
    knowledge: '2025-01',
}

Model definition example (kimi-k2.5 with vision):

{
    puterId: 'moonshot:moonshot/kimi-k2.5',
    id: 'kimi-k2.5',
    name: 'Kimi K2.5',
    aliases: ['kimi-k25'],
    modalities: { input: ['text', 'image', 'video'], output: ['text'] },
    costs_currency: 'usd-cents',
    input_cost_key: 'prompt_tokens',
    output_cost_key: 'completion_tokens',
    costs: {
        tokens: 1_000_000,
        prompt_tokens: 60,       // $0.60 per 1M
        completion_tokens: 300,  // $3.00 per 1M
        cached_tokens: 10,       // $0.10 per 1M
    },
    context: 262144,
    max_tokens: 262144,
    tool_call: true,
    knowledge: '2025-01',
}

2. Backend - Provider Registration

File to modify: src/backend/drivers/ai-chat/ChatCompletionDriver.ts

In #registerProviders(), add:

const moonshotKey = providers['moonshot']?.apiKey
    ?? providers['moonshot']?.secret_key
    ?? providers['moonshot']?.key;

if (moonshotKey) {
    this.#providers['moonshot'] = new MoonshotProvider(
        { apiKey: moonshotKey },
        m,
    );
}

3. puter.js Client Integration

File to modify: src/puter-js/src/modules/AI.js

Add Moonshot to the provider/driver alias normalization:

// In the chat provider normalization
if (['moonshot', 'kimi', 'moonshot-ai'].includes(lower)) return 'moonshot';

File to modify: src/puter-js/types/modules/ai.d.ts

Add Moonshot types to the TypeScript definitions for chat completion options.

4. Configuration

Add configuration support in the config system:

{
    "providers": {
        "moonshot": {
            "apiKey": "sk-..."
        }
    }
}

5. Cost Metering

File to create: src/backend/drivers/ai-chat/providers/moonshot/costs.ts

Implement getReportedCosts() following the standard pattern. Note that Moonshot has three-tier input pricing (cache hit vs cache miss), so metering should account for cached_tokens from the usage response:

// Example for kimi-k2.6
{
    usageType: 'moonshot:kimi-k2.6:input-tokens',
    ucentsPerUnit: 9500,        // $0.95/1M = 9500 microcents per 1M tokens
    unit: 'token',
    source: 'driver:aiChat/moonshot',
},
{
    usageType: 'moonshot:kimi-k2.6:cached-tokens',
    ucentsPerUnit: 1600,        // $0.16/1M = 1600 microcents per 1M tokens
    unit: 'token',
    source: 'driver:aiChat/moonshot',
},
{
    usageType: 'moonshot:kimi-k2.6:output-tokens',
    ucentsPerUnit: 40000,       // $4.00/1M = 40000 microcents per 1M tokens
    unit: 'token',
    source: 'driver:aiChat/moonshot',
}

6. Moonshot-Specific Features

These features are part of the Moonshot API and should be supported in the integration:

  • Thinking mode (thinking parameter on kimi-k2.6/k2.5): Enables chain-of-thought reasoning. Expose via an extra parameter in the chat options (e.g. thinking: true), similar to how other providers handle reasoning modes. The thinking object accepts { type: "enabled" | "disabled" } and on kimi-k2.6 also supports { keep: "all" } for preserved thinking output.
  • Prompt caching (prompt_cache_key): Moonshot supports explicit cache keys for similar requests. The response usage object includes cached_tokens. Metering must correctly attribute cached vs uncached input tokens at their different rates.
  • Partial mode (partial on assistant messages): Allows prefilling assistant responses. Expose this to support use cases like constrained generation and guided completions.
  • Web search: Moonshot supports built-in internet search. Expose as an option (e.g. web_search: true) so users can ground responses in live data.
  • moonshot-v1-auto: Auto-selects context window size based on input length. Include as a supported model.

7. Documentation & Examples

  • Add a usage example following existing patterns
  • Add Moonshot to any provider listing documentation
  • Example usage:
// Basic chat completion with flagship model
const response = await puter.ai.chat('Hello from Kimi!', {
    provider: 'moonshot',
    model: 'kimi-k2.6'
});

// With streaming
const stream = await puter.ai.chat('Tell me a story', {
    provider: 'moonshot',
    model: 'kimi-k2.6',
    stream: true
});

// With tool calling (up to 128 tools supported)
const response = await puter.ai.chat('What is the weather?', {
    provider: 'moonshot',
    model: 'kimi-k2.6',
    tools: [{ type: 'function', function: { name: 'get_weather', description: '...', parameters: { ... } } }]
});

// Vision with kimi-k2.5 (supports image and video)
const response = await puter.ai.chat([
    { text: 'What do you see in this image?' },
    { image_url: 'data:image/png;base64,...' }
], {
    provider: 'moonshot',
    model: 'kimi-k2.5'
});

// JSON mode
const response = await puter.ai.chat('List 3 colors as JSON', {
    provider: 'moonshot',
    model: 'kimi-k2.6',
    response_format: { type: 'json_object' }
});

Implementation Checklist

  • Create src/backend/drivers/ai-chat/providers/moonshot/Moonshot.ts
  • Create src/backend/drivers/ai-chat/providers/moonshot/models.ts
  • Create src/backend/drivers/ai-chat/providers/moonshot/costs.ts
  • Register provider in ChatCompletionDriver.ts (#registerProviders())
  • Add provider to model map building in ChatCompletionDriver.ts (#buildModelMap())
  • Add Moonshot aliases in src/puter-js/src/modules/AI.js
  • Update TypeScript types in src/puter-js/types/modules/ai.d.ts
  • Add configuration documentation
  • Add usage examples
  • Test chat completion (streaming and non-streaming)
  • Test tool/function calling
  • Test vision input with kimi-k2.5 and moonshot-v1-*-vision-preview models
  • Test JSON mode (response_format)
  • Test thinking/reasoning mode on kimi-k2.6
  • Verify cost metering (including cached token tracking)
  • Test thinking mode with thinking parameter on kimi-k2.6
  • Test web search feature
  • Test partial mode (assistant message prefilling)
  • Regression: verify existing providers (OpenAI, Anthropic, etc.) still work after adding Moonshot to the driver registry and model map

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions