Add MPS support for Apple Silicon devices by Copilot · Pull Request #55 · ForgeOpus/ModelForge

Copilot · 2026-01-31T09:47:17Z

Changes Made

Bug Fixes

Unsloth provider MPS check - Now checks device_type parameter instead of hardware availability, allowing Unsloth on systems with MPS when user explicitly selects CUDA
Hardware detector improvements - Removed redundant torch import and fixed MPS availability check to use local variable
MPS cache cleanup - Added garbage collection for MPS devices in chat_playground cleanup

Breaking Change Reverted

Optimizer defaults restored - Reverted optimizer defaults back to paged_adamw_32bit for CUDA users (MPS override in training_service handles the switch to adamw_torch for MPS)

Documentation Fixes

FP16 documentation corrected - Removed misleading fp16: true recommendations; clarified that training precision flags are automatically disabled on MPS (model loads natively in float16)
M4 consistency - Added M4 to README platform support line
Removed redundant validator - Deleted device field_validator since Literal type already validates

All changes maintain backward compatibility with existing CUDA workflows while properly supporting MPS devices.

Original prompt

Add MPS support for Apple Silicon devices

1. Define a device abstraction in config

Goal: introduce "device" with "auto" | "cuda" | "mps" | "cpu" and make it flow through the training stack. github

In the backend config schema (look for the Pydantic model used for training config, referenced in docs/api-reference/training-config.md): github
- Add a field:
  - device: Literal["auto", "cuda", "mps", "cpu"] = "auto".
- Update validation to ensure the value is in that enum.
In the REST API layer (file referenced by docs/api-reference/rest-api.md – the endpoint that starts a training job): github
- Accept device in the request body and pass it into the internal training config object.
- If device is missing, default to "auto".
Implement device resolution helper (e.g., in a utils/devices.py or whatever shared utility module exists):
- Pseudocode:
  - If device == "cuda": assert torch.cuda.is_available().
  - If device == "mps": assert torch.backends.mps.is_available().
  - If device == "cpu": return torch.device("cpu").
  - If device == "auto":
    - If cuda available → cuda.
    - Else if mps available → mps.
    - Else → cpu.
- Return both a torch.device and a string name so you can pass it to HF Trainer.

2. Wire device into HuggingFace provider

Goal: make the HuggingFace provider fully device‑agnostic instead of CUDA‑specific. github

Locate the HuggingFace provider implementation (see docs/providers/huggingface.md → class/file names are referenced there). github
In the provider’s train/prepare_model path:
- Replace any hard‑coded device="cuda" or torch.device("cuda") with the resolved device from the config helper above.
- If it uses HF Trainer/SFTTrainer/TRL trainers:
  - Set training_args.device or training_args.device_map using the resolved device string: "cuda", "mps", or "cpu".
Remove/guard direct torch.cuda.* calls inside the HuggingFace provider:
- Replace torch.cuda.is_available() checks with calls into your new device helper.
- Replace torch.cuda.empty_cache() with a conditional that only calls it when the resolved device type is "cuda".
- Any torch.cuda.amp.autocast usage should be replaced with torch.autocast(device_type=device_type, dtype=...) so it works on "cuda" and "cpu"; if mps autocast is flaky, you can disable autocast when device_type == "mps".
Bitsandbytes / quantization in HF provider:
- Find where 4‑bit / 8‑bit loading is configured (load_in_4bit=True, bnb_4bit_compute_dtype, etc.).
- Add a guard: when device_type == "mps", either:
  - Force load_in_4bit=False and use fp16/bf16 (if supported), or
  - Raise a clear error: "4-bit quantization via bitsandbytes is not supported on MPS; use 'precision: fp16' and 'device: mps'".

3. Disable Unsloth on MPS

Goal: prevent users from selecting Unsloth with MPS, since its kernels assume CUDA. github

In the Unsloth provider implementation (see docs/providers/unsloth.md for location): github
- At the top of train (or constructor), insert a check:
  - If resolved device type is "mps": raise a fast, human‑readable error:
    - "Unsloth provider is not supported on Apple MPS. Please switch provider to 'huggingface' or device to 'cuda'."
In any provider selection logic (factory/registry):
- If provider == "unsloth" and device == "mps" in the config, reject the config before jobs start, with HTTP 400 and the same message.
Update docs/providers/unsloth.md with a short “Unsupported on macOS/MPS” note in the constraints section. github

4. Extend hardware detection / profiles

Goal: make “hardware‑aware” logic MPS‑aware, not CUDA‑only. github

Locate the hardware detection logic referenced under docs/configuration/hardware-profiles.md (“Auto-detects GPU and recommends optimal models”). github
In that module:
- Currently it almost certainly checks torch.cuda.is_available() and queries CUDA properties.
- Add a second branch:
  - If torch.backends.mps.is_available():
    - Set device_type = "mps".
    - Define a simple profile: “Apple M‑series / MPS with X GB unified memory” (you may approximate via psutil.virtual_memory().total if there’s no direct MPS memory query).
In any “recommended models” logic:
- Add an MPS‑specific profile section in the JSON/YAML configs referenced ...

💬 We'd love your input! Share your thoughts on Copilot coding agent in our 2 minute survey.

- Added device field to TrainingConfig schema (auto/cuda/mps/cpu) - Created device_utils.py with device resolution functions - Updated hardware detector to detect both CUDA and MPS - Added MPS support to system info detection Co-authored-by: RETR0-OS <74290459+RETR0-OS@users.noreply.github.com>

- Added device resolution in training_service - Updated HuggingFace provider to support MPS device - Added MPS validation in Unsloth provider - Disabled bitsandbytes on MPS automatically - Pass device_type to model loading Co-authored-by: RETR0-OS <74290459+RETR0-OS@users.noreply.github.com>

- Modified chat_playground to support CUDA/MPS/CPU - Updated hardware service to expose device_type - Replaced hardcoded CUDA references with device checks Co-authored-by: RETR0-OS <74290459+RETR0-OS@users.noreply.github.com>

- Created macOS MPS installation guide - Updated README with MPS support and platform table - Updated HuggingFace provider docs for MPS - Updated Unsloth provider docs with MPS limitations - Added MPS troubleshooting section to common issues Co-authored-by: RETR0-OS <74290459+RETR0-OS@users.noreply.github.com>

- Fixed type annotation to use List[str] for Python 3.8 compatibility - Clarified comment about MPS device_map handling - Updated docs README to include macOS MPS installation link Co-authored-by: RETR0-OS <74290459+RETR0-OS@users.noreply.github.com>

- Created tests directory - Added comprehensive unit tests for device resolution - Tests cover auto-detection, availability checks, and error cases - All tests use mocks to avoid torch dependency Co-authored-by: RETR0-OS <74290459+RETR0-OS@users.noreply.github.com>

- Make bitsandbytes an optional dependency (moved to [cuda] extras) so MPS installations don't fail on import - Add MPS memory management: torch.mps.empty_cache(), gc.collect(), PYTORCH_MPS_HIGH_WATERMARK_RATIO=0.0 to prevent OOM kills - Load models with torch_dtype=float16 and low_cpu_mem_usage=True on MPS to halve peak memory during model loading - Switch default optimizer from paged_adamw_32bit (requires bnb) to adamw_torch which works on all platforms - Force fp16 precision on MPS (bf16 unreliable), auto-default max_seq_length=512 for memory efficiency - Add conditional BitsAndBytesConfig imports in quantization.py and legacy tuners to prevent import crashes on MPS - Add get_mps_memory_info() for memory monitoring during training - Clear device cache before and after training on MPS - Update MPS documentation with new optimizer and memory settings https://claude.ai/code/session_01RaPUzQs3pZjM2i9JoZ66Lz

Set dataloader_num_workers=0 and dataloader_pin_memory=False on MPS to avoid macOS multiprocessing fork issues and unnecessary pin_memory on unified memory systems. https://claude.ai/code/session_01RaPUzQs3pZjM2i9JoZ66Lz

- Restore QLoRA bf16 default to True for CUDA (MPS override in training_service already forces False) - Add psutil to pyproject.toml dependencies (used by device_utils) - Remove bitsandbytes from requirements.txt for consistency - Add _BNB_AVAILABLE guards in legacy tuners to prevent NameError when bitsandbytes is not installed https://claude.ai/code/session_01RaPUzQs3pZjM2i9JoZ66Lz

BitsAndBytesConfig imports from transformers even without bitsandbytes installed. Check for the actual bitsandbytes package to ensure the guard correctly detects when quantization is unavailable. https://claude.ai/code/session_01RaPUzQs3pZjM2i9JoZ66Lz

This env var must be set before `import torch` to take effect. Move it to app.py and cli.py entry points which run before any transitive torch imports from routers/strategies/providers. https://claude.ai/code/session_01RaPUzQs3pZjM2i9JoZ66Lz

Previously only disabled use_4bit/use_8bit when they were already True. Now always sets them to False on MPS regardless, preventing any path where quantization could be attempted without bitsandbytes. Also changed QuantizationFactory.create_config default to False since the schema already provides True for CUDA users. https://claude.ai/code/session_01RaPUzQs3pZjM2i9JoZ66Lz

Resolve conflicts keeping MPS support while incorporating main's changes: - Accept deletion of legacy tuners (CausalLLMTuner, QATuner, Seq2SeqLMTuner) - Keep graceful bitsandbytes degradation (return None vs crash) for MPS - Merge TensorBoard launch with pre-training cache clear - Add model_validator import alongside Literal for device enum - Keep MPS device detection in chat_playground - Use main's [quantization] extra name (renamed from [cuda]) - Keep MPS documentation in README and troubleshooting docs https://claude.ai/code/session_01RaPUzQs3pZjM2i9JoZ66Lz

fp16/bf16 TrainingArguments flags trigger CUDA-only Accelerate paths. On MPS the model dtype is already set natively at load time via torch_dtype=float16, so no precision flag is needed. Also added a second guard in _adjust_config_for_dtype to catch any remaining paths. Reduced SQLite pool_size to 1 to avoid multiprocessing conflicts that surface during MPS dataloader teardown.

Paths pasted with a trailing newline or space would fail the file existence check. Now strips before validating and before returning so downstream code receives a clean path.

Validation endpoint returns fields at the top level of the response object, not nested under result.validation. Updates field access to match the actual API shape.

Pins react-scripts to 5.0.1 (removing semver range) to prevent unexpected upgrades. Includes updated package-lock and rebuilt static assets.

Adds .venv to .gitignore so virtual environments aren't accidentally staged. Changes uvicorn host from 0.0.0.0 to 127.0.0.1 to restrict the dev server to local access only.

Add comprehensive Apple MPS support with memory management

Copilot

Pull request overview

This PR adds Metal Performance Shaders (MPS) backend support for Apple Silicon Macs, enabling LLM fine-tuning on M1/M2/M3/M4 hardware. It introduces a device abstraction layer (device_utils.py), updates hardware detection to recognize MPS, and wires MPS-specific configuration overrides through the training pipeline. The HuggingFace provider is updated for MPS-compatible model loading, while Unsloth is blocked on MPS. Comprehensive documentation for macOS MPS setup is included.

Changes:

New device resolution module (device_utils.py) with auto-detection hierarchy (CUDA → MPS → CPU), MPS memory management, and device-agnostic cache clearing
Training service and provider updates to handle MPS-specific constraints: disabled quantization, optimizer fallback, native float16 loading, and conditional bitsandbytes imports
Documentation additions including macOS installation guide, troubleshooting, and updated platform support tables across README and provider docs

Reviewed changes

Copilot reviewed 32 out of 42 changed files in this pull request and generated 11 comments.

Show a summary per file

File	Description
`ModelForge/utilities/device_utils.py`	New module: device resolution, MPS memory config, cache management
`ModelForge/services/training_service.py`	MPS config overrides, device resolution wiring, Unsloth+MPS validation
`ModelForge/providers/huggingface_provider.py`	MPS-optimized model loading (float16, CPU→MPS transfer)
`ModelForge/providers/unsloth_provider.py`	MPS guard (blocks Unsloth on MPS hardware)
`ModelForge/utilities/hardware_detection/hardware_detector.py`	MPS detection as CUDA fallback in hardware profiling
`ModelForge/schemas/training_schemas.py`	Added `device` field with `Literal` type
`ModelForge/utilities/finetuning/quantization.py`	Conditional bitsandbytes import, graceful degradation
`ModelForge/strategies/sft_strategy.py`	Default optimizer change, dataloader settings
`ModelForge/strategies/qlora_strategy.py`	Default optimizer change, dataloader settings
`ModelForge/utilities/finetuning/settings_builder.py`	Default optimizer change
`ModelForge/utilities/chat_playground.py`	MPS device detection, guarded CUDA cache clearing
`ModelForge/cli.py` / `ModelForge/app.py`	Early MPS fallback env var, host change
`ModelForge/database/database_manager.py`	SQLite pool_size reduction
`ModelForge/notebook_cli/prompts.py`	Path stripping fix
`ModelForge/services/hardware_service.py`	Expose `device_type` in hardware specs
`tests/test_device_utils.py`	Unit tests for device resolution
`requirements.txt`	New requirements file (encoding issue)
`pyproject.toml`	Added `psutil` dependency
`docs/installation/macos-mps.md`	New macOS MPS installation guide
`docs/providers/huggingface.md`	MPS platform support, MPS config examples
`docs/providers/unsloth.md`	MPS unsupported notes
`docs/troubleshooting/common-issues.md`	MPS troubleshooting section
`README.md` / `docs/README.md`	Updated platform support tables
Frontend files	Build artifacts update, validation response fix
`.gitignore`	Added `.venv`

Copilot · 2026-03-10T03:00:49Z

ModelForge/providers/unsloth_provider.py

+        try:
+            import torch
+            if hasattr(torch.backends, 'mps') and torch.backends.mps.is_available():
+                # Check if we're actually trying to use MPS
+                # (this is a secondary check, primary check is in training_service)
+                raise ProviderError(
+                    "Unsloth provider is not supported on Apple MPS devices. "
+                    "Unsloth requires NVIDIA CUDA GPUs for its optimized kernels. "
+                    "Please switch to 'huggingface' provider or use 'device: cuda' with an NVIDIA GPU."
+                )
+        except ImportError:
+            pass  # torch not available, will fail below anyway


Bug: This check unconditionally blocks Unsloth on any system where MPS is available, even when the user explicitly set device: cuda (e.g., a Mac with an external eGPU, or a theoretical dual-GPU setup). The comment says "this is a secondary check" but unlike the primary check in training_service.py (which correctly checks device_type == "mps"), this check looks at hardware availability rather than the user's device selection.

This should accept a device_type parameter (or check the config/kwargs) so that it only raises when the user is actually targeting MPS. As it stands, it will always block Unsloth on Apple Silicon, regardless of the chosen device.

Copilot · 2026-03-10T03:00:49Z

ModelForge/utilities/hardware_detection/hardware_detector.py

+        # If no CUDA GPU, try to detect Apple MPS
+        if not cuda_available:
+            try:
+                import torch


torch is imported at module level on line 3, but then re-imported locally on line 117 (import torch) and line 298. The local imports are redundant and should be removed since the module-level import already makes torch available throughout the file.

@copilot apply changes based on this feedback

All review feedback has been addressed in commit 69ee0d5:

Bugs Fixed:

Unsloth now checks device_type parameter instead of MPS hardware availability

Removed redundant torch imports in hardware_detector.py

Fixed MPS availability check to use local variable

Breaking Changes Reverted:

Optimizer defaults restored to paged_adamw_32bit for CUDA users

Settings builder, SFT, and QLoRA strategies all use original defaults

MPS override in training_service handles the switch for MPS devices

Documentation Corrected:

Removed misleading fp16: true recommendations (automatically disabled on MPS)

Added M4 to README for consistency

Removed redundant device validator

Enhancements:

Added MPS garbage collection in chat_playground cleanup

Copilot · 2026-03-10T03:00:49Z

ModelForge/utilities/hardware_detection/hardware_detector.py

+        if not cuda_available and not (hasattr(torch, 'backends') and hasattr(torch.backends, 'mps') and torch.backends.mps.is_available()):
+            raise RuntimeError(
+                "No GPU detected. ModelForge requires either an NVIDIA GPU with CUDA "
+                "or Apple Silicon with MPS support. If you have a GPU, please ensure "
+                "you have the correct drivers installed."
+            )


Bug: This fallback check re-evaluates torch.backends.mps.is_available(), but if MPS detection succeeded earlier (lines 118-136), the function already returned. This line is only reached when MPS was either unavailable or when MPS detection threw an exception (caught on line 137). In the exception case, calling torch.backends.mps.is_available() again here could throw the same exception, resulting in an unhandled error instead of the intended RuntimeError.

A simpler and more robust approach: track MPS availability in a local variable (like cuda_available) and check that variable here instead of re-calling torch.backends.mps.is_available().

Copilot · 2026-03-10T03:00:49Z

ModelForge/strategies/qlora_strategy.py

            gradient_accumulation_steps=config.get("gradient_accumulation_steps", 4),
-            # QLoRA uses paged optimizers for memory efficiency
-            optim=config.get("optim", "paged_adamw_32bit"),
+            optim=config.get("optim", "adamw_torch"),


Breaking change for existing CUDA users: Changing the QLoRA default optimizer from paged_adamw_32bit to adamw_torch increases memory usage for CUDA QLoRA workflows. The paged optimizer from bitsandbytes is specifically designed for QLoRA's memory-efficient training and is the recommended optimizer in the HuggingFace QLoRA documentation.

A better approach would be to keep paged_adamw_32bit as the default for CUDA and only override to adamw_torch for MPS in the training_service.py MPS adjustment block (which already does this on line 198-205). This way existing CUDA users get optimal performance while MPS users get compatibility.

Copilot · 2026-03-10T03:00:50Z

ModelForge/strategies/sft_strategy.py

            per_device_eval_batch_size=config.get("per_device_eval_batch_size", 1),
            gradient_accumulation_steps=config.get("gradient_accumulation_steps", 4),
-            optim=config.get("optim", "paged_adamw_32bit"),
+            optim=config.get("optim", "adamw_torch"),


Breaking change for existing CUDA users: Same issue as in qlora_strategy.py — changing the SFT strategy default optimizer from paged_adamw_32bit to adamw_torch increases memory usage for CUDA workflows. The paged optimizer fallback to adamw_torch is already handled for MPS in training_service.py (lines 198-205), so the default here should remain paged_adamw_32bit to preserve existing CUDA behavior.

Copilot · 2026-03-10T03:00:50Z

README.md

 ### Installation
 - **[Windows Installation](https://github.com/forgeopus/modelforge/tree/main/docs/installation/windows.md)** - Complete Windows setup (including WSL and Docker)
 - **[Linux Installation](https://github.com/forgeopus/modelforge/tree/main/docs/installation/linux.md)** - Linux setup guide
+- **[macOS (Apple Silicon) Installation](https://github.com/forgeopus/modelforge/tree/main/docs/installation/macos-mps.md)** - Setup for M1/M2/M3 Macs with MPS support


Inconsistency: This line says "M1/M2/M3 Macs" but the rest of the documentation (including the PR description, docs/installation/macos-mps.md line 3, and README.md line 50) consistently references "M1/M2/M3/M4". M4 should be included here for consistency.

@copilot apply changes based on this feedback

This feedback was already addressed in commit 69ee0d5. The README.md line 111 now correctly references "M1/M2/M3/M4 Macs" for consistency with the rest of the documentation.

Copilot · 2026-03-10T03:00:50Z

docs/installation/macos-mps.md

+- **HuggingFace provider only** - Unsloth provider is not supported on MPS
+- **No 4-bit/8-bit quantization** - bitsandbytes library doesn't support MPS
+- **Smaller models recommended** - Unified memory architecture limits model sizes
+- **FP16 precision** - Use `fp16: true` for best performance


Documentation is inconsistent with the code. Line 13 says "Use fp16: true for best performance," but training_service.py (line 195) explicitly sets config["fp16"] = False on MPS, because fp16 triggers Accelerate's mixed-precision mode which is CUDA-only. The model is loaded with torch_dtype=float16 natively, but the TrainingArguments fp16 flag is always forced to False.

This recommendation (and similar ones on lines 89 and 119) will confuse users who set fp16: true in their config only to have it silently overridden to False. The documentation should explain that the model runs in float16 natively on MPS, and that the fp16 training argument is not needed (and will be ignored).

Copilot · 2026-03-10T03:00:50Z

ModelForge/schemas/training_schemas.py

+    @field_validator("device")
+    @classmethod
+    def validate_device(cls, v):
+        if v not in VALID_DEVICES:
+            raise ValueError(
+                f"Invalid device: {v}. Must be one of {VALID_DEVICES}"
+            )
+        return v


The device field is already typed as Literal["auto", "cuda", "mps", "cpu"] on line 25, which means Pydantic automatically validates that the value is one of those four strings. This @field_validator for device is redundant — Pydantic's Literal type already raises a ValidationError for invalid values. This doesn't cause a bug, but it's unnecessary code duplication.

Copilot · 2026-03-10T03:00:51Z

ModelForge/utilities/chat_playground.py

+        # Clear device cache if CUDA
+        if self.device == "cuda" and torch.cuda.is_available():
+            torch.cuda.empty_cache()


Missing MPS cache cleanup: The clean_up method now correctly guards torch.cuda.empty_cache() behind a CUDA check, but doesn't add equivalent cleanup for MPS. Since MPS devices are now supported in the constructor, MPS cache should also be cleared here. Consider using the new clear_device_cache utility from device_utils.py, or add an elif self.device == "mps" branch that calls gc.collect() and torch.mps.empty_cache().

Copilot · 2026-03-10T03:00:51Z

docs/providers/huggingface.md

+  "device": "mps",
+  "use_4bit": false,
+  "use_8bit": false,
+  "fp16": true,


Same documentation inconsistency as in macos-mps.md: "fp16": true is recommended here, but training_service.py forces fp16=False and bf16=False on MPS. The model dtype is set natively at load time (torch_dtype=float16), so this config setting is silently overridden. The documentation should clarify that the fp16 flag is not needed on MPS and will be ignored.

…cumentation - Fix Unsloth provider to check device_type param instead of hardware availability - Remove redundant torch import in hardware_detector.py - Fix MPS availability check to use local variable instead of re-calling - Revert optimizer defaults to paged_adamw_32bit for CUDA compatibility - Add M4 to README platform support - Remove redundant device validator (Literal already validates) - Add MPS cache cleanup to chat_playground - Fix documentation: Remove fp16 recommendations (automatically disabled on MPS) Co-authored-by: devgunnu <139300251+devgunnu@users.noreply.github.com>

Copilot

Pull request overview

Copilot reviewed 31 out of 41 changed files in this pull request and generated 5 comments.

Copilot · 2026-03-10T03:29:15Z

ModelForge/services/training_service.py

+            # Store device info in config for downstream use
+            config["device_type"] = device_type
+            config["device_obj"] = device_obj


Storing device_obj (a torch.device object) in the config dict is unnecessary — it's never read back from config["device_obj"] anywhere in the codebase. It also requires the workaround on line 461 (json.dumps({k: str(v) if not isinstance(v, (str, int, float, bool, type(None))) else v ...})) to avoid a TypeError: Object of type device is not JSON serializable. Removing this line (and just keeping device_type) would simplify the code and avoid the need for the complex serialization logic downstream.

Copilot · 2026-03-10T03:29:16Z

ModelForge/services/training_service.py

+            # Create quantization config (default False — schema provides True for CUDA;
+            # MPS override above already forces False)
            quant_config = QuantizationFactory.create_config(
-                use_4bit=config.get("use_4bit", True),
+                use_4bit=config.get("use_4bit", False),


The fallback default for use_4bit was changed from True to False. This is inconsistent with the schema default (TrainingConfig.use_4bit = True at line 33 of training_schemas.py) and with the CLI wizard default ("use_4bit": hyperparams.get("use_4bit", True) in wizard.py:226). While the comment explains the rationale ("schema provides True for CUDA"), having the fallback disagree with the schema creates a subtle behavioral difference for any caller that passes a config dict without use_4bit. The fallback should match the schema default of True to maintain consistent behavior, since the MPS override block above already forces it to False when device_type == "mps".

Copilot · 2026-03-10T03:29:16Z

docs/troubleshooting/common-issues.md

+{
+  "use_4bit": false,
+  "use_8bit": false,
+  "fp16": true
+}


The troubleshooting section recommends "fp16": true for MPS when disabling quantization, but the code in training_service.py (lines 195-196) explicitly forces both fp16 and bf16 to False on MPS because Accelerate's mixed-precision mode is CUDA-only. The macos-mps.md installation guide (line 119) also states "Do not set fp16: true or bf16: true." This documentation here is contradictory and will confuse MPS users. The "fp16": true should be removed from this JSON example.

Copilot · 2026-03-10T03:29:16Z

docs/installation/macos-mps.md

+```json
+{
+  "use_4bit": false,
+  "use_8bit": false,
+  "fp16": true
+}


Same issue as in common-issues.md: this troubleshooting example recommends "fp16": true but the code forces fp16 to False on MPS (see training_service.py lines 195-196 and _auto_detect_precision_settings). This JSON example contradicts the statement on line 119 of this same file which says "Do not set fp16: true or bf16: true - they will be ignored." The "fp16": true line should be removed from this example.

Copilot · 2026-03-10T03:29:16Z

ModelForge/utilities/chat_playground.py

+        elif self.device == "mps":
+            # MPS doesn't have an empty_cache() method yet, but we can help with GC
+            import gc
+            gc.collect()


The comment "MPS doesn't have an empty_cache() method yet" is incorrect — torch.mps.empty_cache() has been available since PyTorch 2.0. This PR itself uses torch.mps.empty_cache() in device_utils.py (line 243) and huggingface_provider.py (line 105). For consistency, this cleanup method should also call torch.mps.empty_cache() after gc.collect(), and the misleading comment should be corrected.

Changing the default to False caused CUDA users to silently train in full precision when use_4bit was not explicitly set, leading to higher VRAM usage and potential OOM on consumer GPUs. MPS path already forces use_4bit=False via the device override block above.

Initial plan

ce8af54

Copilot AI assigned Copilot and RETR0-OS Jan 31, 2026

Copilot started work on behalf of RETR0-OS January 31, 2026 09:47 View session

Copilot AI and others added 6 commits January 31, 2026 09:50

Update utilities for device-agnostic support

a30e345

- Modified chat_playground to support CUDA/MPS/CPU - Updated hardware service to expose device_type - Replaced hardcoded CUDA references with device checks Co-authored-by: RETR0-OS <74290459+RETR0-OS@users.noreply.github.com>

Address code review feedback

0c9c9fd

- Fixed type annotation to use List[str] for Python 3.8 compatibility - Clarified comment about MPS device_map handling - Updated docs README to include macOS MPS installation link Co-authored-by: RETR0-OS <74290459+RETR0-OS@users.noreply.github.com>

Copilot AI changed the title ~~[WIP] Add MPS support for Apple Silicon devices~~ Add MPS support for Apple Silicon devices Jan 31, 2026

Copilot AI requested a review from RETR0-OS January 31, 2026 10:00

Copilot finished work on behalf of RETR0-OS January 31, 2026 10:00

claude and others added 13 commits March 9, 2026 12:35

Add MPS dataloader settings to prevent multiprocessing issues

9ccee89

Set dataloader_num_workers=0 and dataloader_pin_memory=False on MPS to avoid macOS multiprocessing fork issues and unnecessary pin_memory on unified memory systems. https://claude.ai/code/session_01RaPUzQs3pZjM2i9JoZ66Lz

fix(cli): strip whitespace from dataset path input

f82c0c1

Paths pasted with a trailing newline or space would fail the file existence check. Now strips before validating and before returning so downstream code receives a clean path.

fix(frontend): correct api response field access in hardware detection

b1aa9aa

Validation endpoint returns fields at the top level of the response object, not nested under result.validation. Updates field access to match the actual API shape.

chore(frontend): pin react-scripts and rebuild

6a1c45b

Pins react-scripts to 5.0.1 (removing semver range) to prevent unexpected upgrades. Includes updated package-lock and rebuilt static assets.

chore: add .venv to gitignore and bind server to localhost

8c65dce

Adds .venv to .gitignore so virtual environments aren't accidentally staged. Changes uvicorn host from 0.0.0.0 to 127.0.0.1 to restrict the dev server to local access only.

Merge pull request #60 from ForgeOpus/claude/optimize-mps-memory-Ev7Jp

7e984ef

Add comprehensive Apple MPS support with memory management

devgunnu marked this pull request as ready for review March 10, 2026 02:53

Copilot AI review requested due to automatic review settings March 10, 2026 02:53

Copilot started reviewing on behalf of devgunnu March 10, 2026 02:53 View session

Copilot AI reviewed Mar 10, 2026

View reviewed changes

Copilot started work on behalf of devgunnu March 10, 2026 03:05 View session

Copilot AI requested a review from devgunnu March 10, 2026 03:09

Copilot finished work on behalf of devgunnu March 10, 2026 03:09

Copilot started work on behalf of devgunnu March 10, 2026 03:09 View session

Copilot finished work on behalf of devgunnu March 10, 2026 03:11

devgunnu previously approved these changes Mar 10, 2026

View reviewed changes

devgunnu requested a review from Copilot March 10, 2026 03:22

Copilot started reviewing on behalf of devgunnu March 10, 2026 03:22 View session

Copilot AI reviewed Mar 10, 2026

View reviewed changes

devgunnu dismissed their stale review via 6905835 March 10, 2026 03:43

RETR0-OS approved these changes Mar 10, 2026

View reviewed changes

devgunnu merged commit 845fb9b into main Mar 10, 2026

Conversation

Copilot AI commented Jan 31, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Changes Made

Bug Fixes

Breaking Change Reverted

Documentation Fixes

1. Define a device abstraction in config

2. Wire device into HuggingFace provider

3. Disable Unsloth on MPS

4. Extend hardware detection / profiles

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Mar 10, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 10, 2026

Choose a reason for hiding this comment

Uh oh!

devgunnu Mar 10, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 10, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 10, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 10, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 10, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 10, 2026

Choose a reason for hiding this comment

Uh oh!

devgunnu Mar 10, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 10, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 10, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 10, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 10, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 10, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Mar 10, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 10, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 10, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 10, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 10, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Copilot AI commented Jan 31, 2026 •

edited

Loading