Feature/swiftlmchat ios runtime by solderzzc · Pull Request #7 · SharpAI/SwiftLM

solderzzc · 2026-03-31T20:23:20Z

No description provided.

…code project - Regenerate project.pbxproj with: - MLXInferenceCore .swift files as direct compile sources - XCLocalSwiftPackageReference for mlx-swift and mlx-swift-lm - XCSwiftPackageProductDependency for MLX, MLXLLM, MLXLMCommon - Add ModelManagementView (download manager, disk usage, delete) - Add ModelDownloadManager to MLXInferenceCore - Wire progress callbacks in InferenceEngine.load()

@published

ModelStorage (new): - macOS: Library/Caches/huggingface/hub/ (matches defaultHubApi) - iOS: Library/Application Support/SwiftLMChat/Models/ + isExcludedFromBackup - Platform-agnostic scan, sizeOnDisk, delete primitives ModelDownloader (new, iOS only): - URLSession background session (survives app suspension) - HuggingFace API file enumeration (GET /api/models/{id}) - Per-file download with progress streaming - macOS: LLMModelFactory handles download directly (no change) ModelDownloadManager refactor: - Built on ModelStorage + ModelDownloader layers - NWPathMonitor for WiFi/cellular/offline detection - iOS RAM budget: 40% (vs 75% macOS) via modelsForDevice() - Cellular threshold: warn before >200MB downloads on cellular - updateProgress() / clearProgress() for InferenceEngine bridge InferenceEngine: - UIApplication.didReceiveMemoryWarningNotification → auto-unload (iOS) - ProcessInfo.thermalStateDidChangeNotification → ThermalLevel @published - Critical thermal → stop generation immediately - HubApi.downloadBase redirected to ModelStorage.cacheRoot ModelPickerView: - Network status banner (offline / cellular warning) - Thermal warning banner - Cellular confirmation dialog before large downloads - handleModelTap() blocks download when offline SwiftLMChat.entitlements (new): - com.apple.developer.kernel.increased-memory-limit - UIBackgroundModes: fetch, processing Package.swift: add Hub product to MLXInferenceCore dependencies

…load on foreground InferenceEngine: - willResignActiveNotification → stopGeneration() + unload() + save backgroundedModelId - didBecomeActiveNotification → reload backgroundedModelId (or lastLoadedModelId) - autoOffloadOnBackground: Bool (default true on iOS, false on macOS) - Observers consolidated into [NSObjectProtocol] for clean deinit - Reactive memory warning still kept as safety fallback - Thermal observer migrated to same consolidated array - Background unload sets .idle (not .error) — clean UX on return

ExpertStreamingConfig (new, MLXLMCommon): - Replaces EXPERIMENTAL_SSD_STREAM env var with a proper Swift API - .mmapPageCache mode: APFS page-cache (iOS + macOS without directIO) - .directNVMe mode: pread() at 5GB/s NVMe (macOS default for MoE) - activate(modelDirectory:useDirectIO:) + deactivate() - legacyEnvPath shim for any remaining C-level consumers SwitchLayers.swift: - ExpertStreamingConfig.shared.isEnabled replaces env var gate - #if os(macOS) / #else: directNVMe path locked to macOS only - iOS always routes to mmap prefault fallback (was dead code before) Load.swift / LayerPartitioning.swift: - Both env var gates replaced with ExpertStreamingConfig.shared.isEnabled InferenceEngine.load(): - MoE models get config.lazyLoad = true + ExpertStreamingConfig.activate() - macOS: useDirectIO=true (5GB/s NVMe pread) - iOS: useDirectIO=false (APFS mmap, ~2-3GB/s, fits in sandbox) - Deactivated on error or unload() ModelCatalog: - ramRequiredGB for MoE = peak-resident (active experts only) - Qwen3 30B MoE: ramRequired=4.5GB (targets iPad Pro M4 8GB+) - DeepSeek R1 0528: ramRequired=8GB (targets iPad Pro M4 16GB+) - Qwen3.5 122B: ramRequired=12GB (macOS / iPad Pro M4 Max 32GB) This enables 30B-class MoE reasoning models on iPad Pro M4 without any system swap — purely via OS page-cache eviction.

solderzzc added 4 commits March 31, 2026 13:19

solderzzc merged commit f2065f3 into main Mar 31, 2026
0 of 4 checks passed

solderzzc deleted the feature/swiftlmchat-ios-runtime branch March 31, 2026 20:24

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature/swiftlmchat ios runtime#7

Feature/swiftlmchat ios runtime#7
solderzzc merged 4 commits intomainfrom
feature/swiftlmchat-ios-runtime

solderzzc commented Mar 31, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

solderzzc commented Mar 31, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant