Paper: Qwen3.5-27B results by joelteply · Pull Request #574 · CambrianTech/continuum

joelteply · 2026-03-28T03:05:33Z

27B forged: 3.07→2.96 ppl (+3.5%), 17GB 4-bit, targets MacBook Pro M1/M2/M3 32GB.

Copilot

Pull request overview

Updates the Experiential Plasticity paper to include Qwen3.5-27B forging results alongside the existing Qwen3.5-4B entry, with added narrative about hardware targets and published model links.

Changes:

Adds Qwen3.5-27B baseline/final PPL and device/quantization details to the Qwen3.5 family results table.
Replaces the prior “queued” narrative with summarized findings, target hardware guidance, and Hugging Face links for both models.
Updates device annotations for the existing Qwen3.5-4B entry.

Comments suppressed due to low confidence (1)

docs/papers/EXPERIENTIAL-PLASTICITY.md:104

This section now lists two published models, but there is still a separate "Published model" entry immediately below that only links the 4B. Please remove the duplicate line or update it to match the new plural list so readers don’t miss the 27B link.

**Published models**: [continuum-ai/qwen3.5-4b-code-forged](https://huggingface.co/continuum-ai/qwen3.5-4b-code-forged) | [continuum-ai/qwen3.5-27b-code-forged](https://huggingface.co/continuum-ai/qwen3.5-27b-code-forged)

**Training configuration**: LoRA (r=16, α=32) with AMP GradScaler for fp16 stability, gradient checkpointing, 3 cycles × 1000 steps, train-then-prune ordering.

**Published model**: [continuum-ai/qwen3.5-4b-code-forged](https://huggingface.co/continuum-ai/qwen3.5-4b-code-forged)

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-03-28T03:07:37Z

-| Qwen3.5-9B | 8.2B | Code | — | — | — | — | queued |
-| Qwen3.5-27B | 23.6B | Code | — | — | — | — | queued |
+| **Qwen3.5-4B** | 3.4B | Code | CodeFeedback (156K) | 3.04 | **2.31** | **+24.0%** | RTX 5090 (fp16) |
+| **Qwen3.5-27B** | 23.6B | Code | CodeFeedback (156K) | 3.07 | **2.96** | **+3.5%** | RTX 5090 (4-bit) |


The improvement percentage for Qwen3.5-27B looks inconsistent with the PPL values shown. Going from 3.07 → 2.96 is ~3.6% (0.11/3.07), not 3.5%; please align the percentage or adjust the underlying numbers/rounding rule so the table is internally consistent.

Copilot · 2026-03-28T03:07:37Z

+| **Qwen3.5-27B** | 23.6B | Code | CodeFeedback (156K) | 3.07 | **2.96** | **+3.5%** | RTX 5090 (4-bit) |

-**Key advance over Qwen2.5 results**: the Qwen3.5-4B improvement (+24%) exceeds the Qwen2.5-7B improvement (+14.6%) despite being a smaller model. This is because domain-specific training data (CodeFeedback: real coding Q&A) is dramatically more effective than generic text (wikitext) at driving head specialization. The heads that survive pruning are the ones that matter for **code**, not for Wikipedia — producing a model that is architecturally optimized for its target domain.
+**Key findings**: Both models improve over baseline. The 4B shows dramatic +24% improvement — domain-specific data (CodeFeedback: real coding Q&A) drives far more head specialization than generic text. The 27B improves +3.5% while running in 17GB (4-bit NF4) instead of 28GB (fp16) — better quality at 36% less VRAM. The 27B was forged with only 2 cycles before early-stopping; more cycles and continuous defrag (§8) should improve further.


The VRAM reduction percentage appears miscomputed: 17GB vs 28GB is ~39% less VRAM (11/28), not 36%. Please recalc or clarify what baseline (e.g., measured allocator peak vs model weights only) you’re comparing against.

Suggested change

**Key findings**: Both models improve over baseline. The 4B shows dramatic +24% improvement — domain-specific data (CodeFeedback: real coding Q&A) drives far more head specialization than generic text. The 27B improves +3.5% while running in 17GB (4-bit NF4) instead of 28GB (fp16) — better quality at 36% less VRAM. The 27B was forged with only 2 cycles before early-stopping; more cycles and continuous defrag (§8) should improve further.

**Key findings**: Both models improve over baseline. The 4B shows dramatic +24% improvement — domain-specific data (CodeFeedback: real coding Q&A) drives far more head specialization than generic text. The 27B improves +3.5% while running in 17GB (4-bit NF4) instead of 28GB (fp16) — better quality at ~39% less VRAM. The 27B was forged with only 2 cycles before early-stopping; more cycles and continuous defrag (§8) should improve further.

Copilot · 2026-03-28T03:07:38Z

+
+**Published models**: [continuum-ai/qwen3.5-4b-code-forged](https://huggingface.co/continuum-ai/qwen3.5-4b-code-forged) | [continuum-ai/qwen3.5-27b-code-forged](https://huggingface.co/continuum-ai/qwen3.5-27b-code-forged)

 **Training configuration**: LoRA (r=16, α=32) with AMP GradScaler for fp16 stability, gradient checkpointing, 3 cycles × 1000 steps, train-then-prune ordering.


The text says the 27B run used "only 2 cycles" before early-stopping, but the "Training configuration" summary below states "3 cycles × 1000 steps" without qualification. Please clarify whether the config differs per model (4B vs 27B) or update the cycle count so the narrative and config are consistent.

Suggested change

**Training configuration**: LoRA (r=16, α=32) with AMP GradScaler for fp16 stability, gradient checkpointing, 3 cycles × 1000 steps, train-then-prune ordering.

**Training configuration**: LoRA (r=16, α=32) with AMP GradScaler for fp16 stability, gradient checkpointing; Qwen3.5-4B: 3 cycles × 1000 steps; Qwen3.5-27B: early-stopped after 2 of 3 planned cycles × 1000 steps; train-then-prune ordering.

Paper: Qwen3.5-27B forged +3.5% at 17GB — targets MacBook Pro 32GB

dbfc94e

Copilot AI review requested due to automatic review settings March 28, 2026 03:05

joelteply merged commit 7960e53 into main Mar 28, 2026

joelteply deleted the docs/27b-results branch March 28, 2026 03:05

Copilot started reviewing on behalf of joelteply March 28, 2026 03:06 View session

github-actions Bot added the size: S label Mar 28, 2026

Copilot AI reviewed Mar 28, 2026

View reviewed changes

joelteply mentioned this pull request May 13, 2026

test(generated): ts-rs export sync ratchet (#1132 PR-2) #1137

Merged

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Paper: Qwen3.5-27B results#574

Paper: Qwen3.5-27B results#574
joelteply merged 1 commit into
mainfrom
docs/27b-results

joelteply commented Mar 28, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Mar 28, 2026

Uh oh!

Copilot AI Mar 28, 2026

Uh oh!

Copilot AI Mar 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants


		Published models: [continuum-ai/qwen3.5-4b-code-forged](https://huggingface.co/continuum-ai/qwen3.5-4b-code-forged) \| [continuum-ai/qwen3.5-27b-code-forged](https://huggingface.co/continuum-ai/qwen3.5-27b-code-forged)

		Training configuration: LoRA (r=16, α=32) with AMP GradScaler for fp16 stability, gradient checkpointing, 3 cycles × 1000 steps, train-then-prune ordering.

Conversation

joelteply commented Mar 28, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Mar 28, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 28, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 28, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants