# Notebook Course: Training a Custom Tiny Qwen3 MoE LLM from Scratch



I welcome you to this hands-on course where we build and train a Large Language Model (LLM) entirely from scratch only on Apple Silicon using MLX-LM and my package MLX-LM-LoRA.

This is not another “fine-tune GPT” course — we’re creating a custom MoE model architecture, pretraining it on web-scale data, finetuning it using supervised data, and optimizing it with preference learning. All of it runs locally on your Mac.

---

📦 Minimum Requirements

To run this notebook locally, you’ll need:
- An Apple Silicon Mac (M1 Pro / M2 Max / M3, etc.)
- Minimum 32 GB of unified memory (lower is also possible, but you'd have to change parameters to fit, Higher is better)
- Python ≥ 3.12

---

📚 Course Overview

This course covers the entire LLM training stack:
1.	Model Design: Define a Tiny Qwen3 MoE-style model
2.	Pretraining: Unsupervised training on `mlx-community/fineweb-200k`
3.	Supervised Finetuning (SFT): Using a subset of `digitalpipelines/wizard_vicuna_70k_uncensored`
4.	Preference Optimization: With `mlx-community/Human-Like-DPO`
5.	Evaluation + Inference: Test generations and evaluation

---

🤖 Model Architecture: Tiny Qwen3 MoE

We’ll build a custom minimal MoE version of the new Qwen3. The model uses:
- Transformer decoder-only blocks
- Sparse Mixture-of-Experts layers
- Rotary embeddings (RoPE)

We aim to keep it light enough to fully train on-device.

---

🧪 Pretraining Dataset: mlx-community/fineweb-200k

We’ll use a subset of the FineWeb dataset hosted on Hugging Face:
mlx-community/fineweb-200k

It is:
- Cleaned, deduplicated web content
- Ideal for unsupervised autoregressive training
- Efficiently streamed on Apple Silicon

---

🧾 Prompt Templates

We’ll use a custom prompt template to go with Qwen3.
For finetuning, we adopt the following ChatLM-style template:

```text
<|im_start|>system description
This is a conversation between Josie, a helpfull AI assistant and a human user.<|im_end|>
<|im_start|>user turn
{prompt}<|im_end|>
<|im_start|>assistant turn, name = 'Josie'
{answer}<|im_end|>
```

This helps align with conversational-tuned models and enables easy preference optimization later.

---

🧠 SFT Dataset: digitalpipelines/wizard_vicuna_70k_uncensored

We’ll use the Wizard Vicuna dataset, known for:
- Rich instruction-following data
- Clean conversational structure
- Multi turn conversations
- Aligned with assistant-style prompting

This stage teaches the model how a conversation looks like.

---

❤️ Preference Optimization: mlx-community/Human-Like-DPO

To refine the model’s behavior further, We’ll apply Monolithic Preference Optimization (ORPO) using:

mlx-community/Human-Like-DPO

This dataset includes:
- Ranked preference pairs
- Human-like instructions and completions
- Fine-grained reward signal for better alignment

---

🧰 Tools Used
- MLX-LM: Apple-native LLM framework leveraging MLX runtime
- MLX-LM-LoRA: My package for full-precision and LoRA training, supporting DPO, ORPO, GRPO and more
- Hugging Face Datasets + Transformers for data loading and formatting

---

🏁 By The End Of This Course…

You’ll have:
- Designed and built a custom Qwen3 MoE model
- Pretrained it on web-scale text
- Supervised-finetuned it on instruction data
- Aligned it using preference optimization
- Deployed and evaluated it entirely on your Mac

Let’s get started! 🚀

# ⚙️ Installation: Set Up the Environment

Before we begin training, let’s install all necessary dependencies.

We’ll use:
- `mlx-lm-lora` – includes mlx-lm, datasets
- `huggingface_hub` – for uploading our models

In [None]:
!pip install mlx-lm-lora huggingface_hub wandb hf_xet

# 📦 Imports and Setup

Before building and training our Qwen3-style MoE model, we import everything needed:
- Core Python utilities (os, Path, dataclass)
- MLX-LM-LORA tools for defining models and saving configs
- HF token/tokenizer setup
- Model class and utility functions

In [1]:
from mlx_lm_lora.trainer.datasets import CacheDataset, ORPODataset, TextDataset
from mlx_lm_lora.trainer.orpo_trainer import ORPOTrainingArgs, train_orpo
from mlx_lm_lora.trainer.sft_trainer import SFTTrainingArgs, train_sft

from datasets import load_dataset
from huggingface_hub import create_repo, HfApi

from mlx_lm.utils import load_model, load_tokenizer, save_config, save_model
from mlx_lm.models.qwen3_moe import Model, BaseModelArgs
from mlx_lm.tuner.utils import get_total_parameters
from mlx_lm.tuner.callbacks import TrainingCallback
from mlx_lm import generate

import mlx.optimizers as optim
from mlx_optimizers import Muon

from dataclasses import dataclass, field
from pathlib import Path
import math
import os

def calculate_iters(train_set, batch_size, epochs) -> int:
    num_samples = len(train_set)
    batches_per_epoch = math.ceil(num_samples / batch_size)
    iters = epochs * batches_per_epoch
    print(f"[INFO] Calculated {iters} iterations from {epochs} epochs (dataset size: {num_samples}, batch size: {batch_size})")
    return iters

  from .autonotebook import tqdm as notebook_tqdm


# 📌 Setup: Define HF Token and Model Metadata

Before training or uploading, we set up:
- Your Hugging Face token for authentication
- The model name (used for saving)
- The user or organization name
- The local path where everything will be stored

In [2]:
hf_token = "" # <-- Add you HF Token here

new_model_name = "qwen3_tiny_moe"
user_name = "mlx-community"
author = "Gökdeniz Gülmez"

folder = "/Users/gokdenizgulmez/Desktop/mlx-lm-lora/examples/"

pretraining_dataset_name = "mlx-community/fineweb-200k"
pretraining_dataset_samples = 2000

finetuning_dataset_name = "digitalpipelines/wizard_vicuna_70k_uncensored"
finetuning_dataset_samples = 1000

preference_dataset_name = "mlx-community/Human-Like-DPO"
preference_dataset_samples = 100

# 📁 Prepare Target Directory for Model & Tokenizer

We define the full target path for saving the model and tokenizer.

If it doesn’t already exist, we:
- Clone the custom tokenizer repo from Hugging Face
- Place it inside the model folder

Otherwise, we skip cloning.

In [3]:
target_dir = os.path.join(folder, new_model_name)
if not os.path.exists(target_dir):
    !git clone https://huggingface.co/Goekdeniz-Guelmez/qwen3_tokenizer "{target_dir}"
else:
    print(f"Tokenizer already exists at: {target_dir}")

Tokenizer already exists at: /Users/gokdenizgulmez/Desktop/mlx-lm-lora/examples/qwen3_tiny_moe


# 🧱 Define Model Configuration

We subclass the BaseModelArgs to create a tiny Qwen3 MoE architecture:
- Shallow network with only 2 layers
- Small hidden size (128) for fast local training
- 4 experts, 1 active per token (simple MoE routing)
- Tokenizer and embedding sizes that match Qwen3

This keeps the model small enough to:
- Fit on local Apple Silicon
- Still support pretraining and ORPO finetuning

In [4]:
@dataclass
class NewModelArgs(BaseModelArgs):
    mlp_only_layers: list[int] = field(default_factory=list)  # Default: no MLP-only layers

    model_type: str = "qwen3_moe"
    
    hidden_size: int = 128                   # Tiny, but enough for basic functionality
    num_hidden_layers: int = 2               # Very shallow
    intermediate_size: int = 256             # Typically 2x hidden size
    moe_intermediate_size: int = 256         # Same as above, or slightly larger
    num_attention_heads: int = 2             # Must divide hidden_size
    num_key_value_heads: int = 1             # Often set to 1 for tiny models
    head_dim: int = field(init=False)        # Will be computed post-init
    num_experts: int = 4                     # Small MoE with minimal experts
    num_experts_per_tok: int = 1             # One expert per token
    decoder_sparse_step: int = 1             # Typical setting

    rms_norm_eps: float = 1e-6               # Standard value
    vocab_size: int = 151936                 # Matches Qwen3 tokenizer size
    rope_theta: float = 1000.0               # Qwen3 uses 1e3

    tie_word_embeddings: bool = True         # Save params, good default
    max_position_embeddings: int = 1028      # Common default
    norm_topk_prob: bool = True              # MoE-specific regularization trick

    def __post_init__(self):
        self.head_dim = self.hidden_size // self.num_attention_heads  # Auto-calculated

args = NewModelArgs()

# 🧪 Initialize the Model

We instantiate the model using the configuration from the previous cell. This builds the full architecture in memory, ready for pretraining.

In [5]:
model = Model(args)

# 🔍 Inspect the Model

We print:
- The full architecture and parameter count

In [6]:
print(model)
print(f"{int(get_total_parameters(model) / 1e6):.3f}M total parameters.")

Model(
  (model): Qwen3MoeModel(
    (embed_tokens): Embedding(151936, 128)
    (layers.0): Qwen3MoeDecoderLayer(
      (self_attn): Attention(
        (q_proj): Linear(input_dims=128, output_dims=128, bias=False)
        (k_proj): Linear(input_dims=128, output_dims=64, bias=False)
        (v_proj): Linear(input_dims=128, output_dims=64, bias=False)
        (o_proj): Linear(input_dims=128, output_dims=128, bias=False)
        (q_norm): RMSNorm(64, eps=1e-06)
        (k_norm): RMSNorm(64, eps=1e-06)
        (rope): RoPE(64, traditional=False)
      )
      (input_layernorm): RMSNorm(128, eps=1e-06)
      (post_attention_layernorm): RMSNorm(128, eps=1e-06)
      (mlp): Qwen3MoeSparseMoeBlock(
        (gate): Linear(input_dims=128, output_dims=4, bias=False)
        (switch_mlp): SwitchGLU(
          (gate_proj): SwitchLinear()
          (up_proj): SwitchLinear()
          (down_proj): SwitchLinear()
          (activation): SiLU()
        )
      )
    )
    (layers.1): Qwen3MoeDecoderLay

# 💾 Save Model & Config to Disk

We store:
- The full args configuration as config.json
- The initial (untrained) model weights using save_model(...)

This gives us a checkpointable starting point for training.

In [7]:
save_config(vars(args), f"{target_dir}/config.json")

In [8]:
tokenizer = load_tokenizer(Path(target_dir))

# ------------------------------------------- Start Pretraning -------------------------------------------

# ⚙️ Initialize Optimizer (Muon)

We initialize our optimizer — in this case, Muon, a performant optimizer for LLM pretraining. We use a small learning rate to ensure stable convergence during early training.

In [9]:
pretraining_opt = Muon(learning_rate=1e-5)

# 📄 Load Pretraining Dataset

Now we load the unsupervised pretraining dataset. We’re using a cleaned, deduplicated web corpus called FineWeb, hosted on Hugging Face.

We also support subsampling the dataset (e.g., for testing), and we split it into training and validation sets for tracking generalization performance.

In [10]:
pretraining_dataset = load_dataset(pretraining_dataset_name)["train"]

if pretraining_dataset_samples is not None:
    pretraining_dataset = pretraining_dataset.select(range(pretraining_dataset_samples))

pretraining_train_dataset, pretraining_valid_dataset = pretraining_dataset.train_test_split(test_size=0.01, seed=42).values()

# 📦 Format Dataset for Tokenized Training

We wrap the raw dataset into TextDataset objects, which tokenize the samples and prepare them for training. Each sample is a single long-form text entry, streamed and tokenized efficiently on-device.

In [11]:
pretraining_train_set = TextDataset(pretraining_train_dataset, tokenizer, text_key='text')
pretraining_valid_set = TextDataset(pretraining_valid_dataset, tokenizer, text_key='text')

# 🧪 Test Model Before Training

Let’s try generating a basic output with our untrained model. This helps verify that the architecture is wired up correctly before starting pretraining.

Of course, the output will be gibberish at this point — the model hasn’t learned anything yet!

In [12]:
generate(
    model=model,
    tokenizer=tokenizer,
    prompt="Hello"
)



# 🚀 Begin Pretraining

Now we start pretraining the model using the unsupervised dataset.

Key hyperparameters:
- Batch size and number of epochs
- Number of training iterations
- Evaluation and checkpoint saving intervals
- Gradient checkpointing (to reduce memory usage)

In [15]:
batch_size = 8
epochs = 16

train_sft(
    model=model,
    args=SFTTrainingArgs(
        batch_size=batch_size,
        iters=calculate_iters(train_set=pretraining_train_set, batch_size=batch_size, epochs=epochs),
        val_batches=1,
        steps_per_report=100,
        steps_per_eval=1000,
        steps_per_save=1000,
        max_seq_length=model.args.max_position_embeddings,
        grad_checkpoint=True,
        adapter_file=Path(target_dir) / "pretrain.safetensors",
    ),
    optimizer=pretraining_opt,
    train_dataset=CacheDataset(pretraining_train_set),
    val_dataset=CacheDataset(pretraining_valid_set),
    training_callback=TrainingCallback()
)

Starting training..., iters: 1


Training:   0%|          | 0/1 [00:00<?, ?it/s]

Iter 1: Val loss 12.094, Val took 0.330s


Training: 100%|██████████| 1/1 [00:02<00:00,  2.03s/it, loss=12.109, it/s=61.965]


Iter 1: loss 12.109, lr 1.000e-05, it/s 61.965, tok/s 2221.448, trained_tok 3585, peak_mem 16.992GB
Saved final weights to /Users/gokdenizgulmez/Desktop/mlx-lm-lora/examples/qwen3_tiny_moe/pretrain.safetensors.





# ✅ Generate After Pretraining

Once training is done, we generate another output using the now pretrained model. This should show noticeable improvements over the gibberish from earlier.

From here, we’ll move on to supervised finetuning using instruction-following data.

In [None]:
generate(
    model=model,
    tokenizer=tokenizer,
    prompt="Hello, "
)

# ------------------------------------------- Start Fine-Tuning -------------------------------------------

# 📥 Load Pretrained Model & Define Sequence Length

We begin by loading the pretrained model weights from the pretraining stage. This gives us a language model with a solid foundation in general web-text patterns.

# 💬 Define Prompt Format

To make our model follow instructions in a conversational way, we define a structured prompt format based on ChatML-style tokens used by Qwen3:
- Each turn in the conversation is clearly labeled (system, user, assistant)
- We also include special formatting tokens and the assistant name “Josie”
- The full conversation is terminated with an <eos> token after each turn

This format is essential for making the model align well with multi-turn dialogue patterns.

# 📚 Load and Format Finetuning Dataset

We load the supervised finetuning dataset — Wizard Vicuna — which contains rich, multi-turn conversations.

We then apply a formatting function that:
- Converts each conversation into a full prompt using the template
- Adds EOS tokens after every message
- Builds up a structured context that can be fed directly into the model

Finally, we split the dataset into training and validation sets for model evaluation.

In [None]:
system_prompt = """This is a conversation between Josie, a helpfull AI assistant and a human user."""

EOS_TOKEN = tokenizer.eos_token

full_prompt_format = """<|im_start|>system description
{}<|im_end|>
<|im_start|>user turn
{}<|im_end|>
<|im_start|>assistant turn, name = 'Josie'
"""

system_turn = """<|im_start|>system description
{}"""

user_turn = """
<|im_start|>user turn
{}"""

assistant_turn = """
<|im_start|>assistant turn, name = 'Josie'
{}"""

def format_prompts_func(sample):
    this_conversation = sample["conversations"]

    if isinstance(this_conversation, list):
        conversation = system_turn.format(system_prompt + EOS_TOKEN)
        for turn in this_conversation:
            if turn["from"] == "human":
                conversation += user_turn.format(turn['value'] + EOS_TOKEN)
            elif turn["from"] == "gpt":
                conversation += assistant_turn.format(turn['value'] + EOS_TOKEN)

    sample["text"] = conversation
    return sample

finetuning_dataset = load_dataset(finetuning_dataset_name)["train"]

if finetuning_dataset_samples is not None:
    finetuning_dataset = finetuning_dataset.select(range(finetuning_dataset_samples))

finetuning_dataset = finetuning_dataset.map(format_prompts_func,)
finetuning_train_dataset, finetuning_valid_dataset = finetuning_dataset.train_test_split(test_size=0.01, seed=42).values()

In [17]:
finetuning_train_set = TextDataset(finetuning_train_dataset, tokenizer, text_key='text')
finetuning_valid_set = TextDataset(finetuning_valid_dataset, tokenizer, text_key='text')

# ⚙️ Initialize Finetuning Optimizer

We use the standard AdamW optimizer for supervised finetuning, with a learning rate tuned for stability and generalization.

In [18]:
finetuning_opt = optim.AdamW(learning_rate=1e-5)

# 🤖 Generate Before Finetuning

Just like with pretraining, we generate an output from the model before starting finetuning. This gives us a baseline for comparison and ensures that formatting is working correctly.

At this stage, the model understands language but isn’t yet great at instruction-following.

In [None]:
generate(
    model=model,
    tokenizer=tokenizer,
    prompt=full_prompt_format.format(system_prompt + EOS_TOKEN, "Hello, how are you?" + EOS_TOKEN)
)

# 🧠 Train with Supervised Instruction Data

Now we begin the finetuning process!

Using train_sft, we fine-tune the model on instructional conversations using:
- A smaller batch size
- Shorter training schedule (6 epochs)
- Frequent evaluation and saving

This helps the model learn how to answer direct questions, follow instructions, and respond conversationally.

In [None]:
batch_size = 4
epochs = 6

train_sft(
    model=model,
    args=SFTTrainingArgs(
        batch_size=batch_size,
        iters=calculate_iters(train_set=finetuning_train_set, batch_size=batch_size, epochs=epochs),
        val_batches=1,
        steps_per_report=50,
        steps_per_eval=100,
        steps_per_save=100,
        max_seq_length=512,
        grad_checkpoint=True,
        adapter_file=Path(target_dir) / "finetune.safetensors",
    ),
    optimizer=finetuning_opt,
    train_dataset=CacheDataset(finetuning_train_set),
    val_dataset=CacheDataset(finetuning_valid_set),
    training_callback=TrainingCallback()
)

# ✅ Generate After Finetuning

Finally, we generate again after training to see how the model improved. You should now notice:
- More human and aligned responses
- Better task-following ability
- Less randomness compared to pretraining-only output

From here, we’re ready to further refine the model with preference optimization, teaching it not just what to say — but how to say it better.

In [None]:
generate(
    model=model,
    tokenizer=tokenizer,
    prompt=full_prompt_format.format(system_prompt + EOS_TOKEN, "Hello, how are you?" + EOS_TOKEN)
)

# ------------------------------------------- Optimize for Preference -------------------------------------------

# 🗂️ Load and Format Preference Dataset

We load the preference dataset — such as one based on human or synthetic rankings (e.g. OpenAssistant, Anthropic HH-RLHF, etc.).

Each entry is transformed into:
- A chosen version: the preferred assistant response
- A rejected version: the less helpful or lower-quality completion

This dual-format is essential for contrastive training using ORPO.

We split the dataset into training and validation subsets to monitor generalization.

In [None]:
system_prompt = """This is a conversation between Josie, a helpfull AI assistant and a human user."""

def format_prompts_func(sample):
    prompt = sample["prompt"]
    chosen = sample["chosen"]
    rejected = sample["rejected"]

    chosen_conversation = full_prompt_format.format(system_prompt, prompt, chosen)
    rejected_conversation = full_prompt_format.format(system_prompt, prompt, rejected)

    sample["rejected"] = rejected_conversation
    sample["chosen"] = chosen_conversation
    return sample

preference_dataset = load_dataset(preference_dataset_name)["train"]

if preference_dataset_samples is not None:
    preference_dataset = preference_dataset.select(range(preference_dataset_samples))

preference_dataset = preference_dataset.map(format_prompts_func,)
preference_train_dataset, preference_valid_dataset = preference_dataset.train_test_split(test_size=0.01, seed=42).values()

In [24]:
preference_train_set = ORPODataset(preference_train_dataset, tokenizer)
preference_valid_set = ORPODataset(preference_valid_dataset, tokenizer)

# ⚙️ Set Up Preference Optimizer

We initialize the optimizer — again using AdamW with a modest learning rate, suitable for low-noise fine-tuning.

In [25]:
preference_opt = optim.AdamW(learning_rate=1e-5)

# 🤖 Generate Before Preference Optimization

Before we start optimizing with ORPO, we generate an example using the current finetuned model.

This helps us evaluate how the model behaves before it has been trained to prefer better completions.

In [None]:
generate(
    model=model,
    tokenizer=tokenizer,
    prompt=full_prompt_format.format(system_prompt + EOS_TOKEN, "Hello, how are you?" + EOS_TOKEN)
)

# 🧠 Train with ORPO: Reward-Based Fine-Tuning

Now we train the model using train_orpo, a contrastive objective where the model is encouraged to:
- Increase the likelihood of the chosen response
- Decrease the likelihood of the rejected one

We use:
- Small batch size and modest epoch count
- Frequent evaluation and saving
- Hyperparameters like beta (regularization strength) and reward_scaling to balance reward magnitudes

Importantly, we save only the adapter weights in this step, making the fine-tuning modular and efficient.

In [None]:
batch_size = 4
epochs = 4

train_orpo(
    model=model,
    args=ORPOTrainingArgs(
        batch_size=batch_size,
        iters=calculate_iters(train_set=preference_train_set, batch_size=batch_size, epochs=epochs),
        val_batches=1,
        steps_per_report=20,
        steps_per_eval=50,
        steps_per_save=50,
        max_seq_length=512,
        grad_checkpoint=True,
        beta=0.1,
        reward_scaling=0.6,
        adapter_file=Path(target_dir) / "preference.safetensors",
    ),
    optimizer=preference_opt,
    train_dataset=CacheDataset(preference_train_set),
    val_dataset=CacheDataset(preference_valid_set),
    training_callback=TrainingCallback()
)

# ✅ Generate After Preference Optimization

After training, we generate again to compare results. The model should now:
- Prefer clearer, more helpful responses
- Avoid overly long or repetitive completions
- Show improved alignment with user intent

This final stage wraps up the instruction-tuning pipeline, producing a model that’s not only capable of following instructions but also knows how to respond helpfully.

In [None]:
generate(
    model=model,
    tokenizer=tokenizer,
    prompt=full_prompt_format.format(system_prompt + EOS_TOKEN, "Hello, how are you?" + EOS_TOKEN)
)

# 💾 Save Final Model Artifacts

Now that our model has gone through pretraining → SFT → ORPO, we save all important artifacts:
- config.json: The final ModelArgs, useful for reloading the model later.
- tokenizer.model / tokenizer.json: Your tokenizer for inference or future fine-tuning.
- Adapter weights from SFT and ORPO (if applicable).
- A final README.md: Describes what the model is, how it was trained, and how to use it.

This makes it ready for:
- Upload to 🤗 Hugging Face Hub
- Local inference and evaluation
- Further fine-tuning or preference optimization

In [30]:
readme_file = f"""---
tags:
- mlx
- text-generation
pipeline_tag: text-generation
---

# Large Language Model `{user_name}/{new_model_name}`

This model was developed end-to-end on Apple Silicon using the [`mlx-lm-lora`](https://github.com/Goekdeniz-Guelmez/mlx-lm-lora) package, part of the **Creating LLMs from Scratch** course by Gökdeniz Gülmez.

---

## 📖 About This Course

This model is the result of a hands-on, full-stack training series on Apple Silicon covering:

- Custom Qwen3 MoE model creation
- Large-scale pretraining with web data
- Supervised fine-tuning on rich conversational data
- Advanced preference alignment via ORPO
- Efficient local training and inference on Mac

For full details, check out the official notebook:
[Creating LLMs from Scratch — Qwen3 MoE from Scratch](https://github.com/Goekdeniz-Guelmez/mlx-lm-lora/blob/main/examples/qwen3_moe_from_scratch.ipynb)

Also visit the [`mlx-lm-lora` GitHub repository](https://github.com/Goekdeniz-Guelmez/mlx-lm-lora) for more tools and resources.

---

## 🛠 Training Pipeline Overview

1. **Pretraining**
   Dataset: `{pretraining_dataset_name}` ({pretraining_dataset_samples or 'full dataset'})  
   Description: Unsupervised training on cleaned, deduplicated web-scale data from FineWeb-200k.

2. **Supervised Fine-Tuning (SFT)**
   Dataset: `{finetuning_dataset_name}` ({finetuning_dataset_samples or 'full dataset'})  
   Description: Instruction-following, conversational data from Wizard Vicuna 70k uncensored subset.

3. **Preference Optimization**
   Dataset: `{preference_dataset_name}` ({preference_dataset_samples or 'full dataset'})  
   Description: Ranked prompt-completion pairs from Human-Like DPO dataset to align with human preferences.

---

## 📚 About the Model

| Field                 | Value                                                                                 |
|-----------------------|---------------------------------------------------------------------------------------|
| **Model Name**        | `{new_model_name}`                                                        |
| **Architecture**      | {model.args.model_type}                                                    |
| **Alignment Method**  | Monolithic Preference Optimization (ORPO)                                            |
| **Training Framework**| [`mlx-lm-lora`](https://github.com/Goekdeniz-Guelmez/mlx-lm-lora) on Apple Silicon   |
| **Author**            | Gökdeniz Gülmez                                                                      |

---

## 📦 Usage Example (Python)

```python
from mlx_lm.utils import load
from mlx_lm import generate

model, tokenizer = load("{user_name}/{new_model_name}")

generate(model, tokenizer, "What is the meaning of life?")
"""

new_readme_path = f"{target_dir}/README.md"
with open(new_readme_path, "w") as new_readme_file:
    new_readme_file.write(readme_file)

save_model(target_dir, model)

In [None]:
api = HfApi(token=hf_token)
create_repo(
  repo_id=f"{user_name}/{new_model_name}",
  repo_type="model",
  exist_ok=True,
  token=hf_token,
  private=True
)
api.upload_folder(
  folder_path=target_dir,
  repo_id=f"{user_name}/{new_model_name}",
  token=hf_token,
  commit_message="Initial Commit"
)

# 🫶 Thanks

You’ve made it! 🎉

By the end of this notebook, you’ve:
- Pretrained a foundation model from scratch 💪
- Supervised it on assistant-style instructions ✍️
- Aligned it using ranked preferences and ORPO ❤️‍🔥
- Saved, organized, and prepped your model for deployment or sharing 🗃️

Thank you for walking through this course — and most importantly, for building helpful AI responsibly.
Josie (and I!) are proud of you. Until next time!

Gökdeniz Gülmez 👋