Add Moonshot AI Integration

Add [Moonshot AI](https://platform.kimi.ai/) (Kimi) as a new AI provider integration to Puter. Moonshot AI offers the Kimi model family, featuring long context windows (up to 262K tokens), multimodal input (text, image, video), built-in reasoning/thinking mode, and tool calling with up to 128 functions. Their flagship model `kimi-k2.6` is a trillion-parameter model competitive with frontier models.

Moonshot AI provides an OpenAI-compatible API (`https://api.moonshot.ai/v1`), which simplifies integration significantly.


## Scope of Work

### 1. Backend - Chat Completion Provider

**Directory:** `src/backend/drivers/ai-chat/providers/moonshot/`

**Files to create:**

- `Moonshot.ts` - Main provider class extending the base chat provider
- `models.ts` - Model definitions with costs, context windows, and capabilities

**Provider class must implement:**

```typescript
interface IChatProvider {
    models(extra_params?: unknown): IChatModel[] | Promise<IChatModel[]>;
    list(): string[] | Promise<string[]>;
    getDefaultModel(): string;
    complete(arg: ICompleteArguments): Promise<IChatCompleteResult>;
}
```

**Models to include:**

#### Flagship

| Model | Context | Input (cache miss) | Input (cache hit) | Output | Capabilities |
|-------|---------|--------------------|--------------------|--------|-------------|
| `kimi-k2.6` | 262,144 | $0.95/1M | $0.16/1M | $4.00/1M | Chat, Tool calling, JSON mode, Thinking mode, Partial mode |

#### Kimi K2.5

| Model | Context | Input (cache miss) | Input (cache hit) | Output | Capabilities |
|-------|---------|--------------------|--------------------|--------|-------------|
| `kimi-k2.5` | 262,144 | $0.60/1M | $0.10/1M | $3.00/1M | Chat, Vision (image + video), Tool calling, JSON mode, Thinking mode |

#### Kimi K2 Series (being discontinued May 25, 2026 - migrate to kimi-k2.6)

| Model | Context | Input (cache miss) | Input (cache hit) | Output | Capabilities |
|-------|---------|--------------------|--------------------|--------|-------------|
| `kimi-k2-0905-preview` | 262,144 | $0.60/1M | $0.15/1M | $2.50/1M | Chat, Tool calling, JSON mode |
| `kimi-k2-0711-preview` | 131,072 | $0.60/1M | $0.15/1M | $2.50/1M | Chat, Tool calling, JSON mode |
| `kimi-k2-turbo-preview` | 262,144 | $1.15/1M | $0.15/1M | $8.00/1M | Chat, Tool calling, JSON mode, Fast (60-100 tok/s) |
| `kimi-k2-thinking` | 262,144 | $0.60/1M | $0.15/1M | $2.50/1M | Chat, Tool calling, JSON mode, Reasoning |
| `kimi-k2-thinking-turbo` | 262,144 | $1.15/1M | $0.15/1M | $8.00/1M | Chat, Tool calling, JSON mode, Reasoning, Fast |

#### Moonshot V1 (Legacy)

| Model | Context | Input | Output | Capabilities |
|-------|---------|-------|--------|-------------|
| `moonshot-v1-8k` | 8,192 | $0.20/1M | $2.00/1M | Chat, Tool calling |
| `moonshot-v1-32k` | 32,768 | $1.00/1M | $3.00/1M | Chat, Tool calling |
| `moonshot-v1-128k` | 131,072 | $2.00/1M | $5.00/1M | Chat, Tool calling |
| `moonshot-v1-auto` | Auto | Auto | Auto | Chat, Tool calling (auto-selects context) |
| `moonshot-v1-8k-vision-preview` | 8,192 | $0.20/1M | $2.00/1M | Chat, Vision, Tool calling |
| `moonshot-v1-32k-vision-preview` | 32,768 | $1.00/1M | $3.00/1M | Chat, Vision, Tool calling |
| `moonshot-v1-128k-vision-preview` | 131,072 | $2.00/1M | $5.00/1M | Chat, Vision, Tool calling |

> **Note:** The default model should be `kimi-k2.6` as it is the current flagship. The kimi-k2 series is scheduled for discontinuation on May 25, 2026. Consider whether to include K2 models at all given the timeline.

**Implementation notes:**
- Use the OpenAI SDK with custom `baseURL`: `https://api.moonshot.ai/v1` (similar to xAI and DeepSeek providers)
- Authentication: Bearer token via `Authorization` header
- Support streaming (`stream: true`) and non-streaming responses
- Tool/function calling is supported on all models (up to 128 tools, with optional `strict` parameter)
- `response_format` supports `text`, `json_object`, and `json_schema`
- kimi-k2.6 supports a `thinking` parameter: `{ type: "enabled" | "disabled" }` - consider exposing this
- kimi-k2.5 supports multimodal input: text, `image_url`, and `video_url` content types
- Image/video can be passed as base64 data URIs or Moonshot file references (`ms://<file_id>`)
- Prompt caching is automatic; usage response includes `cached_tokens` field
- `finish_reason` values: `stop`, `length`, `tool_calls`
- The `n` parameter supports up to 5 completions (but only 1 when temperature is near 0)

**Model definition example (kimi-k2.6):**
```typescript
{
    puterId: 'moonshot:moonshot/kimi-k2.6',
    id: 'kimi-k2.6',
    name: 'Kimi K2.6',
    aliases: ['kimi-k26', 'kimi'],
    modalities: { input: ['text'], output: ['text'] },
    costs_currency: 'usd-cents',
    input_cost_key: 'prompt_tokens',
    output_cost_key: 'completion_tokens',
    costs: {
        tokens: 1_000_000,
        prompt_tokens: 95,       // $0.95 per 1M = 95 cents per 1M
        completion_tokens: 400,  // $4.00 per 1M = 400 cents per 1M
        cached_tokens: 16,       // $0.16 per 1M = 16 cents per 1M
    },
    context: 262144,
    max_tokens: 262144,
    tool_call: true,
    knowledge: '2025-01',
}
```

**Model definition example (kimi-k2.5 with vision):**
```typescript
{
    puterId: 'moonshot:moonshot/kimi-k2.5',
    id: 'kimi-k2.5',
    name: 'Kimi K2.5',
    aliases: ['kimi-k25'],
    modalities: { input: ['text', 'image', 'video'], output: ['text'] },
    costs_currency: 'usd-cents',
    input_cost_key: 'prompt_tokens',
    output_cost_key: 'completion_tokens',
    costs: {
        tokens: 1_000_000,
        prompt_tokens: 60,       // $0.60 per 1M
        completion_tokens: 300,  // $3.00 per 1M
        cached_tokens: 10,       // $0.10 per 1M
    },
    context: 262144,
    max_tokens: 262144,
    tool_call: true,
    knowledge: '2025-01',
}
```

### 2. Backend - Provider Registration

**File to modify:** `src/backend/drivers/ai-chat/ChatCompletionDriver.ts`

In `#registerProviders()`, add:
```typescript
const moonshotKey = providers['moonshot']?.apiKey
    ?? providers['moonshot']?.secret_key
    ?? providers['moonshot']?.key;

if (moonshotKey) {
    this.#providers['moonshot'] = new MoonshotProvider(
        { apiKey: moonshotKey },
        m,
    );
}
```

### 3. puter.js Client Integration

**File to modify:** `src/puter-js/src/modules/AI.js`

Add Moonshot to the provider/driver alias normalization:
```javascript
// In the chat provider normalization
if (['moonshot', 'kimi', 'moonshot-ai'].includes(lower)) return 'moonshot';
```

**File to modify:** `src/puter-js/types/modules/ai.d.ts`

Add Moonshot types to the TypeScript definitions for chat completion options.

### 4. Configuration

Add configuration support in the config system:

```json
{
    "providers": {
        "moonshot": {
            "apiKey": "sk-..."
        }
    }
}
```

### 5. Cost Metering

**File to create:** `src/backend/drivers/ai-chat/providers/moonshot/costs.ts`

Implement `getReportedCosts()` following the standard pattern. Note that Moonshot has three-tier input pricing (cache hit vs cache miss), so metering should account for `cached_tokens` from the usage response:

```typescript
// Example for kimi-k2.6
{
    usageType: 'moonshot:kimi-k2.6:input-tokens',
    ucentsPerUnit: 9500,        // $0.95/1M = 9500 microcents per 1M tokens
    unit: 'token',
    source: 'driver:aiChat/moonshot',
},
{
    usageType: 'moonshot:kimi-k2.6:cached-tokens',
    ucentsPerUnit: 1600,        // $0.16/1M = 1600 microcents per 1M tokens
    unit: 'token',
    source: 'driver:aiChat/moonshot',
},
{
    usageType: 'moonshot:kimi-k2.6:output-tokens',
    ucentsPerUnit: 40000,       // $4.00/1M = 40000 microcents per 1M tokens
    unit: 'token',
    source: 'driver:aiChat/moonshot',
}
```

### 6. Moonshot-Specific Features

These features are part of the Moonshot API and should be supported in the integration:

- **Thinking mode** (`thinking` parameter on kimi-k2.6/k2.5): Enables chain-of-thought reasoning. Expose via an extra parameter in the chat options (e.g. `thinking: true`), similar to how other providers handle reasoning modes. The `thinking` object accepts `{ type: "enabled" | "disabled" }` and on kimi-k2.6 also supports `{ keep: "all" }` for preserved thinking output.
- **Prompt caching** (`prompt_cache_key`): Moonshot supports explicit cache keys for similar requests. The response `usage` object includes `cached_tokens`. Metering must correctly attribute cached vs uncached input tokens at their different rates.
- **Partial mode** (`partial` on assistant messages): Allows prefilling assistant responses. Expose this to support use cases like constrained generation and guided completions.
- **Web search**: Moonshot supports built-in internet search. Expose as an option (e.g. `web_search: true`) so users can ground responses in live data.
- **`moonshot-v1-auto`**: Auto-selects context window size based on input length. Include as a supported model.

### 7. Documentation & Examples

- Add a usage example following existing patterns
- Add Moonshot to any provider listing documentation
- Example usage:

```javascript
// Basic chat completion with flagship model
const response = await puter.ai.chat('Hello from Kimi!', {
    provider: 'moonshot',
    model: 'kimi-k2.6'
});

// With streaming
const stream = await puter.ai.chat('Tell me a story', {
    provider: 'moonshot',
    model: 'kimi-k2.6',
    stream: true
});

// With tool calling (up to 128 tools supported)
const response = await puter.ai.chat('What is the weather?', {
    provider: 'moonshot',
    model: 'kimi-k2.6',
    tools: [{ type: 'function', function: { name: 'get_weather', description: '...', parameters: { ... } } }]
});

// Vision with kimi-k2.5 (supports image and video)
const response = await puter.ai.chat([
    { text: 'What do you see in this image?' },
    { image_url: 'data:image/png;base64,...' }
], {
    provider: 'moonshot',
    model: 'kimi-k2.5'
});

// JSON mode
const response = await puter.ai.chat('List 3 colors as JSON', {
    provider: 'moonshot',
    model: 'kimi-k2.6',
    response_format: { type: 'json_object' }
});
```

---

## Implementation Checklist

- [ ] Create `src/backend/drivers/ai-chat/providers/moonshot/Moonshot.ts`
- [ ] Create `src/backend/drivers/ai-chat/providers/moonshot/models.ts`
- [ ] Create `src/backend/drivers/ai-chat/providers/moonshot/costs.ts`
- [ ] Register provider in `ChatCompletionDriver.ts` (`#registerProviders()`)
- [ ] Add provider to model map building in `ChatCompletionDriver.ts` (`#buildModelMap()`)
- [ ] Add Moonshot aliases in `src/puter-js/src/modules/AI.js`
- [ ] Update TypeScript types in `src/puter-js/types/modules/ai.d.ts`
- [ ] Add configuration documentation
- [ ] Add usage examples
- [ ] Test chat completion (streaming and non-streaming)
- [ ] Test tool/function calling
- [ ] Test vision input with kimi-k2.5 and moonshot-v1-*-vision-preview models
- [ ] Test JSON mode (`response_format`)
- [ ] Test thinking/reasoning mode on kimi-k2.6
- [ ] Verify cost metering (including cached token tracking)
- [ ] Test thinking mode with `thinking` parameter on kimi-k2.6
- [ ] Test web search feature
- [ ] Test partial mode (assistant message prefilling)
- [ ] Regression: verify existing providers (OpenAI, Anthropic, etc.) still work after adding Moonshot to the driver registry and model map

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Moonshot AI Integration #2891

Scope of Work

1. Backend - Chat Completion Provider

Flagship

Kimi K2.5

Kimi K2 Series (being discontinued May 25, 2026 - migrate to kimi-k2.6)

Moonshot V1 (Legacy)

2. Backend - Provider Registration

3. puter.js Client Integration

4. Configuration

5. Cost Metering

6. Moonshot-Specific Features

7. Documentation & Examples

Implementation Checklist

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Model	Context	Input (cache miss)	Input (cache hit)	Output	Capabilities
`kimi-k2-0905-preview`	262,144	$0.60/1M	$0.15/1M	$2.50/1M	Chat, Tool calling, JSON mode
`kimi-k2-0711-preview`	131,072	$0.60/1M	$0.15/1M	$2.50/1M	Chat, Tool calling, JSON mode
`kimi-k2-turbo-preview`	262,144	$1.15/1M	$0.15/1M	$8.00/1M	Chat, Tool calling, JSON mode, Fast (60-100 tok/s)
`kimi-k2-thinking`	262,144	$0.60/1M	$0.15/1M	$2.50/1M	Chat, Tool calling, JSON mode, Reasoning
`kimi-k2-thinking-turbo`	262,144	$1.15/1M	$0.15/1M	$8.00/1M	Chat, Tool calling, JSON mode, Reasoning, Fast

Model	Context	Input	Output	Capabilities
`moonshot-v1-8k`	8,192	$0.20/1M	$2.00/1M	Chat, Tool calling
`moonshot-v1-32k`	32,768	$1.00/1M	$3.00/1M	Chat, Tool calling
`moonshot-v1-128k`	131,072	$2.00/1M	$5.00/1M	Chat, Tool calling
`moonshot-v1-auto`	Auto	Auto	Auto	Chat, Tool calling (auto-selects context)
`moonshot-v1-8k-vision-preview`	8,192	$0.20/1M	$2.00/1M	Chat, Vision, Tool calling
`moonshot-v1-32k-vision-preview`	32,768	$1.00/1M	$3.00/1M	Chat, Vision, Tool calling
`moonshot-v1-128k-vision-preview`	131,072	$2.00/1M	$5.00/1M	Chat, Vision, Tool calling

Add Moonshot AI Integration #2891

Description

Scope of Work

1. Backend - Chat Completion Provider

Flagship

Kimi K2.5

Kimi K2 Series (being discontinued May 25, 2026 - migrate to kimi-k2.6)

Moonshot V1 (Legacy)

2. Backend - Provider Registration

3. puter.js Client Integration

4. Configuration

5. Cost Metering

6. Moonshot-Specific Features

7. Documentation & Examples

Implementation Checklist

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions