On-device reinforcement learning for Swift. Built on SwiftGrad's autograd engine.
SwiftRL brings reinforcement learning to iOS, macOS, and visionOS - no Python, no server, no cloud. Train RL agents directly on Apple devices with real-time gradient computation through SwiftGrad's backward().
SwiftRL is in active development. The autograd foundation (SwiftGrad) is complete and tested.
There is no reinforcement learning library for Swift. The only attempt (swift-rl) died in 2021 when Swift for TensorFlow was archived. Every RL tool today - Stable-Baselines3, CleanRL, RLlib, Unity ML-Agents - requires Python and cannot run on iOS.
Meanwhile:
- AI in gaming is a $5.85B market growing to $38B by 2034
- Mobile holds 52% of the AI gaming market
- Games using adaptive AI see ~30% higher engagement
- Apple has 28 million registered developers with zero RL tools
| Advantage | Why It Matters for RL |
|---|---|
| Real-time performance | Policy updates within 16ms frame budgets. No GIL, no GC pauses. |
| Privacy by default | RL agents learn from user behavior that never leaves the device. |
| Native game integration | Direct access to SpriteKit, RealityKit, GameplayKit game loops. |
| Unified memory | Apple Silicon shares CPU/GPU memory - no data copies for training. |
| visionOS exclusive | Spatial computing is Swift-only. Adaptive spatial agents require Swift. |
SwiftRL
├── Core
│ ├── Environment - Protocol: step(action) → (state, reward, done)
│ ├── ReplayBuffer - Uniform and prioritized experience replay
│ ├── Policy - Protocol for policy networks
│ └── Trainer - Training loop orchestration
├── Algorithms
│ ├── REINFORCE - Simplest policy gradient
│ ├── DQN - Deep Q-Network with target network
│ ├── A2C - Advantage Actor-Critic
│ └── PPO - Proximal Policy Optimization
├── Environments
│ ├── GridWorld - Navigation with obstacles
│ ├── CartPole - Classic control benchmark
│ └── Bandit - Multi-armed bandit
└── Optimizers
├── SGD - (from SwiftGrad)
└── Adam - Adaptive moment estimation
import SwiftRL
// Define an environment
let env = GridWorld(size: 8)
// Create a policy network (powered by SwiftGrad)
let policy = MLP(inputSize: env.observationSize, layerSizes: [64, 32, env.actionCount])
// Train with DQN
let agent = DQN(
policy: policy,
learningRate: 0.001,
gamma: 0.99,
epsilon: DecayingEpsilon(start: 1.0, end: 0.01, decay: 0.995)
)
// Training loop
for episode in 0..<1000 {
let reward = agent.train(environment: env)
if episode % 100 == 0 {
print("Episode \(episode): reward = \(reward)")
}
}
// Use the trained agent
let action = agent.act(observation: env.reset())| Use Case | RL Algorithm | Platform |
|---|---|---|
| Adaptive game NPCs | PPO / DQN | iOS, visionOS |
| Dynamic difficulty | Contextual bandits → PPO | iOS |
| Smart notifications | Multi-armed bandit | iOS, watchOS |
| Spatial agents | PPO with continuous actions | visionOS |
| Automated playtesting | DQN / A2C | macOS |
| Personalized fitness | Contextual bandits | watchOS, iOS |
See SwiftRLDemos for playable iOS apps showcasing SwiftRL:
- Snake - DQN learns to hunt food in real-time
- 2048 - Policy gradient discovers tile-merging strategies
- Connect Four - Self-play TD learning
- Blackjack - Monte Carlo policy evaluation
| Repository | Description | Status |
|---|---|---|
| SwiftGrad | Autograd engine | Working |
| SwiftRL | Reinforcement learning (you are here) | In development |
| SwiftRLDemos | Demo apps | Planned |
- micrograd by Andrej Karpathy - the autograd engine SwiftGrad is built on
- CleanRL - single-file RL implementations we aim to match in clarity
- Unity ML-Agents - the closest analog (but Python-dependent, desktop-only training)
- Stable-Baselines3 - the API design standard for RL libraries
- Apple ML Research - 8+ RL papers published 2023-2025
SwiftRL is in early development. If you're interested in contributing, open an issue to discuss before submitting a PR.
MIT - see LICENSE.