A provider-agnostic Swift API for chat, embeddings, image generation, agents, and retrieval. One protocol surface — on-device Apple Intelligence, OpenAI, Anthropic, Gemini, Mistral, local Ollama, and Cohere behind it.
It's built for client- and server-side Swift alike: drop it into an iOS/macOS app — including fully on-device inference with Apple Intelligence, no API key required — or run it inside a server (Vapor, Hummingbird) or CLI on Linux. The same protocols, agents, and retrieval stack work in both, so a model you prototype in an app moves to a backend unchanged.
- Platforms: macOS 15+, iOS 18+, tvOS 18+, watchOS 11+, Linux, Android, Wasm
- Swift: 6.1+ toolchain, language mode 6
- License: MIT
Add the package to your Package.swift:
.package(url: "https://github.com/Dean151/swift-ai.git", from: "0.1.0"),Then depend on whichever provider (or higher-level library) you need:
.target(
name: "MyApp",
dependencies: [
.product(name: "OpenAI", package: "swift-ai"),
]
)The Retrieval package trait unlocks retrieval helpers in Agent or Conversation:
.product(name: "Agent", package: "swift-ai", traits: ["Retrieval"]),| Module | Use it for |
|---|---|
AppleIntelligence |
On-device Foundation Models (iOS/macOS/visionOS 26+). No API key. Structured output, tool calling, streaming; image input, reasoning levels, and Private Cloud Compute on 27+. |
OpenAI |
GPT family, DALL·E / gpt-image-*, native structured outputs. |
Anthropic |
Claude family, with prompt-caching helpers. |
Gemini |
Gemini chat, task-typed embeddings, Imagen. |
Mistral |
Mistral chat (incl. Pixtral vision) and embeddings. |
Ollama |
Local / self-hosted open models via a running Ollama server. Keyless by default. |
Cohere |
Command chat, task-typed embeddings, and document reranking. |
Voyage |
Embeddings and reranking specialist. |
Jina |
Embeddings and reranking specialist (Matryoshka dimensions, late chunking). |
Every provider conforms to ModelProvider, so swapping backends is a one-line change. Use provider.capabilities(for: id) to check what a given model supports before requesting features like tool calling or structured outputs.
import OpenAI
let provider = OpenAI(
apiKey: ProcessInfo.processInfo.environment["OPENAI_API_KEY"] ?? "",
transport: URLSessionTransport()
)
let model = provider.languageModel(.gpt4oMini) // or "gpt-4o-mini", or any model id
let output = try await model.generate("Say hi in one word.")
print(output.message)Each provider ships a strongly typed model identifier with discoverable constants — OpenAIModel.gpt4oMini, AnthropicModel.sonnet, GeminiModel.gemini25Flash, MistralModel.large, OllamaModel.llama33, CohereModel.commandA, plus VoyageEmbedding/JinaEmbedding (and rerank) for the embedding specialists. They're ExpressibleByStringLiteral, so any new or fine-tuned model name still works as a plain string. Apple Intelligence, whose model set is closed, uses an exhaustive enum (.onDevice / .privateCloudCompute) instead.
ChatRequest accepts a string literal for the common single-user-message case, and an array literal of messages when you need system prompts or multi-turn history:
let output = try await model.generate(
[
.system("You answer in haiku."),
.user("Why does the wind blow?")
],
options: GenerationOptions(maxTokens: 200)
)for try await event in model.stream(request) {
switch event {
case .messageDelta(let chunk): print(chunk, terminator: "")
case .toolInvocation(let call): print("\ntool:", call.name)
case .completed(let final): print("\ndone:", final.usage ?? "")
}
}AgentRunner drives a model through a tool-calling loop with retries, observers, and parallel tool execution. Give it a ToolBox, then run for a final result or stream for live events.
import Agent
import OpenAI
let model = OpenAI(apiKey: apiKey, transport: URLSessionTransport())
.languageModel(.gpt4oMini)
let tools = try ToolBox([
AnyTool.callback(
name: "current_time",
description: "Returns the current ISO-8601 timestamp.",
handler: { _, _ in .string(ISO8601DateFormatter().string(from: Date())) }
)
])
let result = try await AgentRunner(model: model, tools: tools)
.run(messages: [.user("What time is it?")])
print(result.finalOutput.message)Conversation layers a MessageStore and a chain of MemoryPolicy values on top of AgentRunner. Every turn persists user, assistant, and tool messages; policies shape what the model actually sees — system prompt, recent window, rolling summary, retrieval — without losing anything from the durable transcript.
import Conversation
import OpenAI
let conversation = Conversation(
model: OpenAI(apiKey: apiKey, transport: URLSessionTransport())
.languageModel(.gpt4oMini),
store: FileMessageStore(directory: URL.documentsDirectory.appending(path: "chats")),
policies: [
StaticSystemMemoryPolicy("You are a friendly Swift tutor."),
RecentWindowPolicy(maxMessages: 30, maxTokens: 8_000),
],
tokenBudget: 12_000
)
let session = try await conversation.newSession()
let turn = try await conversation.send(.user("Explain async/await in one sentence."), to: session.id)
print(turn.appended.last?.message ?? "")Each higher-level module re-exports the layers underneath it (Conversation brings in Agent and AI), so a single import Conversation plus your provider import is enough.
The AppleIntelligence provider runs Foundation Models entirely on-device — no API key, no network. It speaks the same ModelProvider surface as every other backend, so structured output, tool calling, and streaming all work through the shared API:
import AppleIntelligence
let provider = AppleIntelligence()
// Structured output — bridged to Foundation Models guided generation.
// Apple Intelligence has a closed model set, so you pick it with a typed
// enum (.onDevice / .privateCloudCompute), not a string.
let output = try await provider.languageModel(.onDevice).generate(
"Extract the city and temperature from: It's 22°C in Paris.",
options: GenerationOptions(responseFormat: .jsonSchema(Weather.outputSchema))
)
let weather = try output.decodeJSON(Weather.self)Features that need the iOS 27 / macOS 27 SDK degrade gracefully — the package still builds and runs on 26, those options simply do nothing there:
// Reasoning effort and tool-calling mode (no-op before 27).
let options = GenerationOptions(appleIntelligence: .init(reasoningLevel: .deep, toolCallingMode: .required))
// Route through Private Cloud Compute (falls back to on-device before 27).
let cloud = provider.languageModel(.privateCloudCompute)
// Image input — attach an image to a multimodal prompt (27+).
let described = try await provider.languageModel(.onDevice).generate(
[.user(content: ["Write alt text for this:", .image(.url(screenshotURL))])]
)provider.capabilities(for:) reflects what the running OS actually supports, so you can gate features at runtime. The provider also exposes on-device introspection — contextSize, supportedLanguages, tokenCount(for:), prewarm(), a useCase/guardrails initializer, and (on 27+) privateCloudComputeStatus() for quota and availability.
The rendered DocC catalog (target AI) has guides for everything beyond hello-world:
- Tool calling — define
Tools, run them throughAgent.AgentRunner. - Structured outputs — typed, schema-backed responses.
- Conversation — persistent multi-turn sessions with pluggable memory policies.
- Retrieval —
VectorStore+ retrieval policies for RAG, with optionalRerankModelreranking. - Testing —
AITestingships mocks, recorders, and replayers. - Prompt caching, availability & fallbacks, embeddings & images.
Browse the catalog on the Swift Package Index or via swift package generate-documentation.
MIT — see LICENSE.