Skip to content

Feature/swiftlmchat ios runtime#7

Merged
solderzzc merged 4 commits intomainfrom
feature/swiftlmchat-ios-runtime
Mar 31, 2026
Merged

Feature/swiftlmchat ios runtime#7
solderzzc merged 4 commits intomainfrom
feature/swiftlmchat-ios-runtime

Conversation

@solderzzc
Copy link
Copy Markdown
Member

No description provided.

…code project

- Regenerate project.pbxproj with:
  - MLXInferenceCore .swift files as direct compile sources
  - XCLocalSwiftPackageReference for mlx-swift and mlx-swift-lm
  - XCSwiftPackageProductDependency for MLX, MLXLLM, MLXLMCommon
- Add ModelManagementView (download manager, disk usage, delete)
- Add ModelDownloadManager to MLXInferenceCore
- Wire progress callbacks in InferenceEngine.load()
ModelStorage (new):
- macOS: Library/Caches/huggingface/hub/ (matches defaultHubApi)
- iOS: Library/Application Support/SwiftLMChat/Models/ + isExcludedFromBackup
- Platform-agnostic scan, sizeOnDisk, delete primitives

ModelDownloader (new, iOS only):
- URLSession background session (survives app suspension)
- HuggingFace API file enumeration (GET /api/models/{id})
- Per-file download with progress streaming
- macOS: LLMModelFactory handles download directly (no change)

ModelDownloadManager refactor:
- Built on ModelStorage + ModelDownloader layers
- NWPathMonitor for WiFi/cellular/offline detection
- iOS RAM budget: 40% (vs 75% macOS) via modelsForDevice()
- Cellular threshold: warn before >200MB downloads on cellular
- updateProgress() / clearProgress() for InferenceEngine bridge

InferenceEngine:
- UIApplication.didReceiveMemoryWarningNotification → auto-unload (iOS)
- ProcessInfo.thermalStateDidChangeNotification → ThermalLevel @published
- Critical thermal → stop generation immediately
- HubApi.downloadBase redirected to ModelStorage.cacheRoot

ModelPickerView:
- Network status banner (offline / cellular warning)
- Thermal warning banner
- Cellular confirmation dialog before large downloads
- handleModelTap() blocks download when offline

SwiftLMChat.entitlements (new):
- com.apple.developer.kernel.increased-memory-limit
- UIBackgroundModes: fetch, processing

Package.swift: add Hub product to MLXInferenceCore dependencies
…load on foreground

InferenceEngine:
- willResignActiveNotification → stopGeneration() + unload() + save backgroundedModelId
- didBecomeActiveNotification → reload backgroundedModelId (or lastLoadedModelId)
- autoOffloadOnBackground: Bool (default true on iOS, false on macOS)
- Observers consolidated into [NSObjectProtocol] for clean deinit
- Reactive memory warning still kept as safety fallback
- Thermal observer migrated to same consolidated array
- Background unload sets .idle (not .error) — clean UX on return
ExpertStreamingConfig (new, MLXLMCommon):
- Replaces EXPERIMENTAL_SSD_STREAM env var with a proper Swift API
- .mmapPageCache mode: APFS page-cache (iOS + macOS without directIO)
- .directNVMe mode: pread() at 5GB/s NVMe (macOS default for MoE)
- activate(modelDirectory:useDirectIO:) + deactivate()
- legacyEnvPath shim for any remaining C-level consumers

SwitchLayers.swift:
- ExpertStreamingConfig.shared.isEnabled replaces env var gate
- #if os(macOS) / #else: directNVMe path locked to macOS only
- iOS always routes to mmap prefault fallback (was dead code before)

Load.swift / LayerPartitioning.swift:
- Both env var gates replaced with ExpertStreamingConfig.shared.isEnabled

InferenceEngine.load():
- MoE models get config.lazyLoad = true + ExpertStreamingConfig.activate()
- macOS: useDirectIO=true (5GB/s NVMe pread)
- iOS: useDirectIO=false (APFS mmap, ~2-3GB/s, fits in sandbox)
- Deactivated on error or unload()

ModelCatalog:
- ramRequiredGB for MoE = peak-resident (active experts only)
- Qwen3 30B MoE: ramRequired=4.5GB (targets iPad Pro M4 8GB+)
- DeepSeek R1 0528: ramRequired=8GB (targets iPad Pro M4 16GB+)
- Qwen3.5 122B: ramRequired=12GB (macOS / iPad Pro M4 Max 32GB)

This enables 30B-class MoE reasoning models on iPad Pro M4
without any system swap — purely via OS page-cache eviction.
@solderzzc solderzzc merged commit f2065f3 into main Mar 31, 2026
0 of 4 checks passed
@solderzzc solderzzc deleted the feature/swiftlmchat-ios-runtime branch March 31, 2026 20:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant