Add Moonshot AI (Kimi) as a new AI provider integration to Puter. Moonshot AI offers the Kimi model family, featuring long context windows (up to 262K tokens), multimodal input (text, image, video), built-in reasoning/thinking mode, and tool calling with up to 128 functions. Their flagship model kimi-k2.6 is a trillion-parameter model competitive with frontier models.
Moonshot AI provides an OpenAI-compatible API (https://api.moonshot.ai/v1), which simplifies integration significantly.
Scope of Work
1. Backend - Chat Completion Provider
Directory: src/backend/drivers/ai-chat/providers/moonshot/
Files to create:
Moonshot.ts - Main provider class extending the base chat provider
models.ts - Model definitions with costs, context windows, and capabilities
Provider class must implement:
interface IChatProvider {
models(extra_params?: unknown): IChatModel[] | Promise<IChatModel[]>;
list(): string[] | Promise<string[]>;
getDefaultModel(): string;
complete(arg: ICompleteArguments): Promise<IChatCompleteResult>;
}
Models to include:
Flagship
| Model |
Context |
Input (cache miss) |
Input (cache hit) |
Output |
Capabilities |
kimi-k2.6 |
262,144 |
$0.95/1M |
$0.16/1M |
$4.00/1M |
Chat, Tool calling, JSON mode, Thinking mode, Partial mode |
Kimi K2.5
| Model |
Context |
Input (cache miss) |
Input (cache hit) |
Output |
Capabilities |
kimi-k2.5 |
262,144 |
$0.60/1M |
$0.10/1M |
$3.00/1M |
Chat, Vision (image + video), Tool calling, JSON mode, Thinking mode |
Kimi K2 Series (being discontinued May 25, 2026 - migrate to kimi-k2.6)
| Model |
Context |
Input (cache miss) |
Input (cache hit) |
Output |
Capabilities |
kimi-k2-0905-preview |
262,144 |
$0.60/1M |
$0.15/1M |
$2.50/1M |
Chat, Tool calling, JSON mode |
kimi-k2-0711-preview |
131,072 |
$0.60/1M |
$0.15/1M |
$2.50/1M |
Chat, Tool calling, JSON mode |
kimi-k2-turbo-preview |
262,144 |
$1.15/1M |
$0.15/1M |
$8.00/1M |
Chat, Tool calling, JSON mode, Fast (60-100 tok/s) |
kimi-k2-thinking |
262,144 |
$0.60/1M |
$0.15/1M |
$2.50/1M |
Chat, Tool calling, JSON mode, Reasoning |
kimi-k2-thinking-turbo |
262,144 |
$1.15/1M |
$0.15/1M |
$8.00/1M |
Chat, Tool calling, JSON mode, Reasoning, Fast |
Moonshot V1 (Legacy)
| Model |
Context |
Input |
Output |
Capabilities |
moonshot-v1-8k |
8,192 |
$0.20/1M |
$2.00/1M |
Chat, Tool calling |
moonshot-v1-32k |
32,768 |
$1.00/1M |
$3.00/1M |
Chat, Tool calling |
moonshot-v1-128k |
131,072 |
$2.00/1M |
$5.00/1M |
Chat, Tool calling |
moonshot-v1-auto |
Auto |
Auto |
Auto |
Chat, Tool calling (auto-selects context) |
moonshot-v1-8k-vision-preview |
8,192 |
$0.20/1M |
$2.00/1M |
Chat, Vision, Tool calling |
moonshot-v1-32k-vision-preview |
32,768 |
$1.00/1M |
$3.00/1M |
Chat, Vision, Tool calling |
moonshot-v1-128k-vision-preview |
131,072 |
$2.00/1M |
$5.00/1M |
Chat, Vision, Tool calling |
Note: The default model should be kimi-k2.6 as it is the current flagship. The kimi-k2 series is scheduled for discontinuation on May 25, 2026. Consider whether to include K2 models at all given the timeline.
Implementation notes:
- Use the OpenAI SDK with custom
baseURL: https://api.moonshot.ai/v1 (similar to xAI and DeepSeek providers)
- Authentication: Bearer token via
Authorization header
- Support streaming (
stream: true) and non-streaming responses
- Tool/function calling is supported on all models (up to 128 tools, with optional
strict parameter)
response_format supports text, json_object, and json_schema
- kimi-k2.6 supports a
thinking parameter: { type: "enabled" | "disabled" } - consider exposing this
- kimi-k2.5 supports multimodal input: text,
image_url, and video_url content types
- Image/video can be passed as base64 data URIs or Moonshot file references (
ms://<file_id>)
- Prompt caching is automatic; usage response includes
cached_tokens field
finish_reason values: stop, length, tool_calls
- The
n parameter supports up to 5 completions (but only 1 when temperature is near 0)
Model definition example (kimi-k2.6):
{
puterId: 'moonshot:moonshot/kimi-k2.6',
id: 'kimi-k2.6',
name: 'Kimi K2.6',
aliases: ['kimi-k26', 'kimi'],
modalities: { input: ['text'], output: ['text'] },
costs_currency: 'usd-cents',
input_cost_key: 'prompt_tokens',
output_cost_key: 'completion_tokens',
costs: {
tokens: 1_000_000,
prompt_tokens: 95, // $0.95 per 1M = 95 cents per 1M
completion_tokens: 400, // $4.00 per 1M = 400 cents per 1M
cached_tokens: 16, // $0.16 per 1M = 16 cents per 1M
},
context: 262144,
max_tokens: 262144,
tool_call: true,
knowledge: '2025-01',
}
Model definition example (kimi-k2.5 with vision):
{
puterId: 'moonshot:moonshot/kimi-k2.5',
id: 'kimi-k2.5',
name: 'Kimi K2.5',
aliases: ['kimi-k25'],
modalities: { input: ['text', 'image', 'video'], output: ['text'] },
costs_currency: 'usd-cents',
input_cost_key: 'prompt_tokens',
output_cost_key: 'completion_tokens',
costs: {
tokens: 1_000_000,
prompt_tokens: 60, // $0.60 per 1M
completion_tokens: 300, // $3.00 per 1M
cached_tokens: 10, // $0.10 per 1M
},
context: 262144,
max_tokens: 262144,
tool_call: true,
knowledge: '2025-01',
}
2. Backend - Provider Registration
File to modify: src/backend/drivers/ai-chat/ChatCompletionDriver.ts
In #registerProviders(), add:
const moonshotKey = providers['moonshot']?.apiKey
?? providers['moonshot']?.secret_key
?? providers['moonshot']?.key;
if (moonshotKey) {
this.#providers['moonshot'] = new MoonshotProvider(
{ apiKey: moonshotKey },
m,
);
}
3. puter.js Client Integration
File to modify: src/puter-js/src/modules/AI.js
Add Moonshot to the provider/driver alias normalization:
// In the chat provider normalization
if (['moonshot', 'kimi', 'moonshot-ai'].includes(lower)) return 'moonshot';
File to modify: src/puter-js/types/modules/ai.d.ts
Add Moonshot types to the TypeScript definitions for chat completion options.
4. Configuration
Add configuration support in the config system:
{
"providers": {
"moonshot": {
"apiKey": "sk-..."
}
}
}
5. Cost Metering
File to create: src/backend/drivers/ai-chat/providers/moonshot/costs.ts
Implement getReportedCosts() following the standard pattern. Note that Moonshot has three-tier input pricing (cache hit vs cache miss), so metering should account for cached_tokens from the usage response:
// Example for kimi-k2.6
{
usageType: 'moonshot:kimi-k2.6:input-tokens',
ucentsPerUnit: 9500, // $0.95/1M = 9500 microcents per 1M tokens
unit: 'token',
source: 'driver:aiChat/moonshot',
},
{
usageType: 'moonshot:kimi-k2.6:cached-tokens',
ucentsPerUnit: 1600, // $0.16/1M = 1600 microcents per 1M tokens
unit: 'token',
source: 'driver:aiChat/moonshot',
},
{
usageType: 'moonshot:kimi-k2.6:output-tokens',
ucentsPerUnit: 40000, // $4.00/1M = 40000 microcents per 1M tokens
unit: 'token',
source: 'driver:aiChat/moonshot',
}
6. Moonshot-Specific Features
These features are part of the Moonshot API and should be supported in the integration:
- Thinking mode (
thinking parameter on kimi-k2.6/k2.5): Enables chain-of-thought reasoning. Expose via an extra parameter in the chat options (e.g. thinking: true), similar to how other providers handle reasoning modes. The thinking object accepts { type: "enabled" | "disabled" } and on kimi-k2.6 also supports { keep: "all" } for preserved thinking output.
- Prompt caching (
prompt_cache_key): Moonshot supports explicit cache keys for similar requests. The response usage object includes cached_tokens. Metering must correctly attribute cached vs uncached input tokens at their different rates.
- Partial mode (
partial on assistant messages): Allows prefilling assistant responses. Expose this to support use cases like constrained generation and guided completions.
- Web search: Moonshot supports built-in internet search. Expose as an option (e.g.
web_search: true) so users can ground responses in live data.
moonshot-v1-auto: Auto-selects context window size based on input length. Include as a supported model.
7. Documentation & Examples
- Add a usage example following existing patterns
- Add Moonshot to any provider listing documentation
- Example usage:
// Basic chat completion with flagship model
const response = await puter.ai.chat('Hello from Kimi!', {
provider: 'moonshot',
model: 'kimi-k2.6'
});
// With streaming
const stream = await puter.ai.chat('Tell me a story', {
provider: 'moonshot',
model: 'kimi-k2.6',
stream: true
});
// With tool calling (up to 128 tools supported)
const response = await puter.ai.chat('What is the weather?', {
provider: 'moonshot',
model: 'kimi-k2.6',
tools: [{ type: 'function', function: { name: 'get_weather', description: '...', parameters: { ... } } }]
});
// Vision with kimi-k2.5 (supports image and video)
const response = await puter.ai.chat([
{ text: 'What do you see in this image?' },
{ image_url: 'data:image/png;base64,...' }
], {
provider: 'moonshot',
model: 'kimi-k2.5'
});
// JSON mode
const response = await puter.ai.chat('List 3 colors as JSON', {
provider: 'moonshot',
model: 'kimi-k2.6',
response_format: { type: 'json_object' }
});
Implementation Checklist
Add Moonshot AI (Kimi) as a new AI provider integration to Puter. Moonshot AI offers the Kimi model family, featuring long context windows (up to 262K tokens), multimodal input (text, image, video), built-in reasoning/thinking mode, and tool calling with up to 128 functions. Their flagship model
kimi-k2.6is a trillion-parameter model competitive with frontier models.Moonshot AI provides an OpenAI-compatible API (
https://api.moonshot.ai/v1), which simplifies integration significantly.Scope of Work
1. Backend - Chat Completion Provider
Directory:
src/backend/drivers/ai-chat/providers/moonshot/Files to create:
Moonshot.ts- Main provider class extending the base chat providermodels.ts- Model definitions with costs, context windows, and capabilitiesProvider class must implement:
Models to include:
Flagship
kimi-k2.6Kimi K2.5
kimi-k2.5Kimi K2 Series (being discontinued May 25, 2026 - migrate to kimi-k2.6)
kimi-k2-0905-previewkimi-k2-0711-previewkimi-k2-turbo-previewkimi-k2-thinkingkimi-k2-thinking-turboMoonshot V1 (Legacy)
moonshot-v1-8kmoonshot-v1-32kmoonshot-v1-128kmoonshot-v1-automoonshot-v1-8k-vision-previewmoonshot-v1-32k-vision-previewmoonshot-v1-128k-vision-previewImplementation notes:
baseURL:https://api.moonshot.ai/v1(similar to xAI and DeepSeek providers)Authorizationheaderstream: true) and non-streaming responsesstrictparameter)response_formatsupportstext,json_object, andjson_schemathinkingparameter:{ type: "enabled" | "disabled" }- consider exposing thisimage_url, andvideo_urlcontent typesms://<file_id>)cached_tokensfieldfinish_reasonvalues:stop,length,tool_callsnparameter supports up to 5 completions (but only 1 when temperature is near 0)Model definition example (kimi-k2.6):
Model definition example (kimi-k2.5 with vision):
2. Backend - Provider Registration
File to modify:
src/backend/drivers/ai-chat/ChatCompletionDriver.tsIn
#registerProviders(), add:3. puter.js Client Integration
File to modify:
src/puter-js/src/modules/AI.jsAdd Moonshot to the provider/driver alias normalization:
File to modify:
src/puter-js/types/modules/ai.d.tsAdd Moonshot types to the TypeScript definitions for chat completion options.
4. Configuration
Add configuration support in the config system:
{ "providers": { "moonshot": { "apiKey": "sk-..." } } }5. Cost Metering
File to create:
src/backend/drivers/ai-chat/providers/moonshot/costs.tsImplement
getReportedCosts()following the standard pattern. Note that Moonshot has three-tier input pricing (cache hit vs cache miss), so metering should account forcached_tokensfrom the usage response:6. Moonshot-Specific Features
These features are part of the Moonshot API and should be supported in the integration:
thinkingparameter on kimi-k2.6/k2.5): Enables chain-of-thought reasoning. Expose via an extra parameter in the chat options (e.g.thinking: true), similar to how other providers handle reasoning modes. Thethinkingobject accepts{ type: "enabled" | "disabled" }and on kimi-k2.6 also supports{ keep: "all" }for preserved thinking output.prompt_cache_key): Moonshot supports explicit cache keys for similar requests. The responseusageobject includescached_tokens. Metering must correctly attribute cached vs uncached input tokens at their different rates.partialon assistant messages): Allows prefilling assistant responses. Expose this to support use cases like constrained generation and guided completions.web_search: true) so users can ground responses in live data.moonshot-v1-auto: Auto-selects context window size based on input length. Include as a supported model.7. Documentation & Examples
Implementation Checklist
src/backend/drivers/ai-chat/providers/moonshot/Moonshot.tssrc/backend/drivers/ai-chat/providers/moonshot/models.tssrc/backend/drivers/ai-chat/providers/moonshot/costs.tsChatCompletionDriver.ts(#registerProviders())ChatCompletionDriver.ts(#buildModelMap())src/puter-js/src/modules/AI.jssrc/puter-js/types/modules/ai.d.tsresponse_format)thinkingparameter on kimi-k2.6