<h1 align='center'>Synthetic Data Generation and Unsloth Tutorial</h1>

## 📚 Table of Contents:

- [Synthetic Data Kit: Data Generation](#synthetic-data-generation)
- [Unsloth: Fine-Tuning and saving the model](#fine-tuning)

## Synthetic Data Generation

In this section, we use the CLI from synthetic-data-kit to generate datasets

### Testing Synthetic Data Kit Command

Please make sure you are running vllm by opening a terminal and typing `vllm serve Unsloth/Llama-3.3-70B-Instruct   --port 8001   --max-model-len 48000   --gpu-memory-utilization 0.85`

In [1]:
import os

In [2]:
os.chdir("/app/projects/Unsloth-AMD-Fine-Tuning-Synthetic-Data")
!pwd

/app/projects/Unsloth-AMD-Fine-Tuning-Synthetic-Data


In [15]:
!synthetic-data-kit --help

Loading config from: /opt/venv/lib/python3.10/site-packages/synthetic_data_kit/config.yaml
Config has LLM provider set to: api-endpoint
Loading config from: /opt/venv/lib/python3.10/site-packages/synthetic_data_kit/config.yaml
Config has LLM provider set to: api-endpoint
[1m                                                                                [0m
[1m [0m[1;33mUsage: [0m[1msynthetic-data-kit [OPTIONS] COMMAND [ARGS]...[0m[1m                         [0m[1m [0m
[1m                                                                                [0m
 A toolkit for preparing synthetic datasets for fine-tuning LLMs                
                                                                                
[2m╭─[0m[2m Options [0m[2m───────────────────────────────────────────────────────────────────[0m[2m─╮[0m
[2m│[0m [1;36m-[0m[1;36m-config[0m              [1;32m-c[0m      [1;33mPATH[0m  Path to configuration file               [2m│[0m
[2m│[0m 

### Exploring Synthetic Data Kit CLI

This command displays the help menu for the `synthetic-data-kit` CLI tool, showing available commands:
- **system-check**: Verify LLM provider server is running
- **ingest**: Parse documents (PDF, HTML, YouTube, etc.) into clean text
- **create**: Generate synthetic content (Q&A pairs, instructions, etc.) using LLM
- **curate**: Filter and clean generated content based on quality scores
- **save-as**: Convert data to different formats (fine-tuning format, JSON, etc.)
- **server**: Launch web interface for the toolkit

In [16]:
!synthetic-data-kit -c config.yaml system-check

Loading config from: /opt/venv/lib/python3.10/site-packages/synthetic_data_kit/config.yaml
Config has LLM provider set to: api-endpoint
Loading config from: /opt/venv/lib/python3.10/site-packages/synthetic_data_kit/config.yaml
Config has LLM provider set to: api-endpoint
Loading config from: config.yaml
Config has LLM provider set to: vllm
[1;34mEnvironment variable check:[0m
API_ENDPOINT_KEY: Not found
get_llm_provider returning: vllm
[?25l[32m vLLM server is running at [0m[4;94mhttp://localhost:8001/v1[0m
[2KAvailable models: [1m{[0m[32m'object'[0m: [32m'list'[0m, [32m'data'[0m: [1m[[0m[1m{[0m[32m'id'[0m: 
[32m'Unsloth/Llama-3.3-70B-Instruct'[0m, [32m'object'[0m: [32m'model'[0m, [32m'created'[0m: [1;36m1768338574[0m, 
[32m'owned_by'[0m: [32m'vllm'[0m, [32m'root'[0m: [32m'Unsloth/Llama-3.3-70B-Instruct'[0m, [32m'parent'[0m: [3;35mNone[0m, 
[32m'max_model_len'[0m: [1;36m48000[0m, [32m'permission'[0m: [1m[[0m[1m{[0m[32m'id'[0m: 
[

### Verifying LLM Server Status

This command checks if the vLLM server is running and accessible at `http://localhost:8001/v1`. It displays:
- Server status and endpoint
- Available models (here: Unsloth/Llama-3.3-70B-Instruct)
- Model configuration (max context length: 48000 tokens)

The system is configured to use the vLLM provider as specified in `config.yaml`.

In [17]:
mkdir -p logical_reasoning/{sources,data/{input,parsed,generated,curated,final}}

### Creating Project Directory Structure

This command creates a well-organized directory structure for the logical reasoning project:
- `sources/`: Store original source documents (PDFs, etc.)
- `data/input/`: Input files for processing
- `data/parsed/`: Parsed text files after document ingestion
- `data/generated/`: Generated synthetic Q&A pairs
- `data/curated/`: Quality-filtered data after curation
- `data/final/`: Final formatted data ready for fine-tuning

In [30]:
cd logical_reasoning

/app/projects/Unsloth-AMD-Fine-Tuning-Synthetic-Data/logical_reasoning


### Navigating to Project Directory

Changes the current working directory to `logical_reasoning/` where all subsequent operations will take place.

In [31]:
!wget -P sources/ -q --show-progress   "https://www.csus.edu/indiv/d/dowdenb/4/logical-reasoning-archives/logical-reasoning-2017-12-02.pdf"   "https://people.cs.umass.edu/~pthomas/solutions/Liar_Truth.pdf"



In [34]:
# UPDATED
!wget -P sources/ -q --show-progress   "https://www.csus.edu/faculty/d/dowden/_internal/_documents/logical-reasoning-12.pdf"   "https://people.cs.umass.edu/~pthomas/solutions/Liar_Truth.pdf"



In [35]:
!ls -lha sources/

total 5.9M
drwxr-xr-x 2 root root 4.0K Jan 13 21:20 .
drwxr-xr-x 4 root root 4.0K Jan 13 21:10 ..
-rw-r--r-- 1 root root 328K May 31  2017 Liar_Truth.pdf
-rw-r--r-- 1 root root 5.6M Jul 31 02:19 logical-reasoning-12.pdf


### Downloading Source Documents

Downloads two PDF documents related to logical reasoning and liar/truth puzzles:
1. "Logical Reasoning" textbook from CSU Sacramento
2. "Liar and Truth Teller Puzzles" from UMass

These documents will serve as the knowledge base for generating synthetic training data. The `-q` flag runs wget in quiet mode, and `--show-progress` displays a progress bar.

In [37]:
cp sources/* data/input/

### Copying Source Files to Input Directory

Copies all downloaded source documents from `sources/` to `data/input/` to prepare them for the ingestion pipeline.

In [42]:
!synthetic-data-kit ingest ./data/input/

Loading config from: /opt/venv/lib/python3.10/site-packages/synthetic_data_kit/config.yaml
Config has LLM provider set to: api-endpoint
Loading config from: /opt/venv/lib/python3.10/site-packages/synthetic_data_kit/config.yaml
Config has LLM provider set to: api-endpoint
Loading config from: /opt/venv/lib/python3.10/site-packages/synthetic_data_kit/config.yaml
Config has LLM provider set to: api-endpoint
[34mProcessing directory: [0m[1;34m.[0m[1;35m/data/input/[0m
[34mFound [0m[1;36m2[0m[34m supported files to process[0m
[32m✓ Liar_Truth.pdf[0m
[32m✓ logical-reasoning-[0m[1;36m12.[0m[32mpdf[0m

[1;34mProcessing Summary:[0m
Total files: [1;36m2[0m
[32mSuccessful: [0m[1;36m2[0m
[32mFailed: [0m[1;36m0[0m
[32m✅ All files processed successfully![0m


### Ingesting and Parsing Documents

This command processes the PDF files in `data/input/` using the synthetic-data-kit's **ingest** command:
- Extracts text content from PDFs
- Cleans and normalizes the text
- Saves parsed text files to `data/parsed/`

The output shows successful processing of 2 PDF files (Liar_Truth.pdf and logical-reasoning-2017-12-02.pdf).

Note: This will take about 10 minutes, set `--verbose` flag to see progress or reduce the `num-pairs` for a faster test

In [43]:
!synthetic-data-kit -c ../config.yaml create ./data/parsed/ --type qa --num-pairs 50

Loading config from: /opt/venv/lib/python3.10/site-packages/synthetic_data_kit/config.yaml
Config has LLM provider set to: api-endpoint
Loading config from: /opt/venv/lib/python3.10/site-packages/synthetic_data_kit/config.yaml
Config has LLM provider set to: api-endpoint
Loading config from: ../config.yaml
Config has LLM provider set to: vllm
get_llm_provider returning: vllm
[32m🔗 Using vllm provider[0m
[34mProcessing directory: [0m[1;34m.[0m[1;35m/data/parsed/[0m[34m for qa generation[0m
[34mFound [0m[1;36m2[0m[34m qa files to process[0m
Loading config from: ../config.yaml
Config has LLM provider set to: vllm
L Using vllm provider
Loading config from: ../config.yaml
Config has LLM provider set to: vllm
Processing 1 chunks to generate QA pairs...
Batch processing complete.                                                      
Generated 25 QA pairs total (requested: 50)
Saving result to data/generated/Liar_Truth_qa_pairs.json
Successfully wrote test file to data/generate

### Generating Synthetic Q&A Pairs

This command uses the synthetic-data-kit's **create** command to generate Q&A pairs from the parsed text:
- Reads parsed text files from `data/parsed/`
- Uses the vLLM provider with Llama-3.3-70B-Instruct model
- Generates 50 Q&A pairs per file (`--num-pairs 50`)
- Type is set to `qa` for question-answer pair generation
- Outputs are saved to `data/generated/`

The process chunks the text and generates questions with corresponding answers. This took about 10 minutes for the full run. Use `--verbose` flag to see detailed progress or reduce `--num-pairs` for faster testing.

In [44]:
!synthetic-data-kit -c ../config.yaml curate ./data/generated/ --threshold 7.0

Loading config from: /opt/venv/lib/python3.10/site-packages/synthetic_data_kit/config.yaml
Config has LLM provider set to: api-endpoint
Loading config from: /opt/venv/lib/python3.10/site-packages/synthetic_data_kit/config.yaml
Config has LLM provider set to: api-endpoint
Loading config from: ../config.yaml
Config has LLM provider set to: vllm
get_llm_provider returning: vllm
[32m🔗 Using vllm provider[0m
[34mProcessing directory: [0m[1;34m.[0m[1;35m/data/generated/[0m[34m for curation[0m
[34mFound [0m[1;36m3[0m[34m JSON files to curate[0m
Loading config from: ../config.yaml
Config has LLM provider set to: vllm
Loading config from: ../config.yaml
Config has LLM provider set to: vllm
Processing 5 batches of QA pairs...
Batch processing complete.                                                      
Rated 25 QA pairs
Retained 22 pairs (threshold: 7.0)
Average score: 8.2
[32m✓ Liar_Truth_qa_pairs.json[0m
Loading config from: ../config.yaml
Config has LLM provider set to: v

### Curating and Quality Filtering

This command uses the **curate** function to filter generated Q&A pairs based on quality:
- Evaluates each Q&A pair using quality metrics
- Filters pairs with quality score above threshold (7.0/10)
- Removes low-quality, inconsistent, or malformed pairs
- Saves curated data to `data/curated/`

This ensures only high-quality synthetic data is used for fine-tuning.

In [45]:
!synthetic-data-kit save-as ./data/curated/ --format ft

Loading config from: /opt/venv/lib/python3.10/site-packages/synthetic_data_kit/config.yaml
Config has LLM provider set to: api-endpoint
Loading config from: /opt/venv/lib/python3.10/site-packages/synthetic_data_kit/config.yaml
Config has LLM provider set to: api-endpoint
Loading config from: /opt/venv/lib/python3.10/site-packages/synthetic_data_kit/config.yaml
Config has LLM provider set to: api-endpoint
[34mProcessing directory: [0m[1;34m.[0m[1;35m/data/curated/[0m[34m for format conversion to ft[0m
[34mFound [0m[1;36m2[0m[34m JSON files to convert to ft format[0m
[32m✓ Liar_Truth_qa_pairs_cleaned.json[0m
[32m✓ logical-reasoning-12_qa_pairs_cleaned.json[0m

[1;34mFormat Conversion Summary [0m[1;34m([0m[1;34mft, json[0m[1;34m)[0m[1;34m:[0m
Total files: [1;36m2[0m
[32mSuccessful: [0m[1;36m2[0m
[32mFailed: [0m[1;36m0[0m
[32m✅ All files converted successfully![0m


### Converting to Fine-Tuning Format

This command uses the **save-as** function to convert curated Q&A pairs to fine-tuning format:
- Reads curated JSON files from `data/curated/`
- Converts to format `ft` (fine-tuning format with messages structure)
- Outputs are saved to `data/final/` with proper conversation format
- The resulting format is compatible with standard fine-tuning pipelines

Successfully converted 2 files to fine-tuning format.

In [6]:
os.chdir("/app/projects/Unsloth-AMD-Fine-Tuning-Synthetic-Data/logical_reasoning")
!pwd

/app/projects/Unsloth-AMD-Fine-Tuning-Synthetic-Data/logical_reasoning


In [7]:
import json
import glob
from pathlib import Path
from datasets import Dataset

# ===== CONFIGURATION =====
data_dir = "./data/final"  # Change this to your data directory

# ===== STEP 1: Find all FT files =====
data_path = Path(data_dir)
ft_files = glob.glob(str(data_path / "*.json"))

# ===== STEP 2: Load and convert all files =====
all_data = []

for file_path in ft_files:
    # Load the JSON file
    with open(file_path, 'r') as f:
        ft_data = json.load(f)
    
    # Convert each item
    for item in ft_data:
        if 'messages' not in item:
            continue
        
        # Extract only user and assistant messages
        conversation = []
        for msg in item['messages']:
            if msg['role'] == 'user' or msg['role'] == 'assistant':
                conversation.append({
                    "role": msg['role'],
                    "content": msg['content']
                })
        
        # Add to our data if we have at least one exchange
        if len(conversation) > 0:
            all_data.append({
                "conversations": conversation
            })

print(f"\n🎯 Total conversations: {len(all_data)}")

# ===== STEP 3: Create HuggingFace Dataset =====
dataset = Dataset.from_list(all_data)

# ===== STEP 4: Preview the data =====
print(json.dumps(dataset[0], indent=2))


🎯 Total conversations: 72
{
  "conversations": [
    {
      "content": "If Bradley H. Dowden is the author of the book 'Logical Reasoning' and he dedicated the 2012 edition to his wife Hellan, can we conclude that Hellan is his wife in the 1993 edition as well?",
      "role": "user"
    },
    {
      "content": "To solve this, let's break down the information given. The 1993 edition's acknowledgments mention Hellan Roth Dowden as a friend and colleague who helped with the project, but it does not explicitly state her relationship to Bradley H. Dowden at that time. However, the 2012 edition is dedicated to Hellan, his wife. Since the question asks if we can conclude Hellan is his wife in the 1993 edition, we must consider if the information provided allows us to make that assumption. Given that the 1993 edition does not specify Hellan's relationship to Bradley as his wife, and assuming that marital status can change over time, we cannot conclusively determine from the given informat

### Loading and Converting Data to HuggingFace Dataset

This cell performs comprehensive data processing:

1. **Finding Files**: Locates all JSON files in `data/final/` directory
2. **Loading Data**: Reads each JSON file containing fine-tuning formatted data
3. **Format Conversion**: Extracts user and assistant messages from the fine-tuning format
4. **Structuring Conversations**: Creates a standardized conversation format with role-content pairs
5. **Creating Dataset**: Converts the processed data into a HuggingFace Dataset object

The output shows 74 total conversations were successfully loaded and formatted. The preview displays a sample conversation showing a knight-and-knave logic puzzle with its solution.

## Fine-Tuning

### Note: Please remember to shutdown the vLLM instance!

In [6]:
!pip install --upgrade torch==2.8.0 pytorch-triton-rocm torchvision torchaudio torchao==0.13.0 xformers --index-url https://download.pytorch.org/whl/rocm6.4

Looking in indexes: https://download.pytorch.org/whl/rocm6.4
Collecting pytorch-triton-rocm
  Using cached https://download.pytorch.org/whl/pytorch_triton_rocm-3.5.1-cp310-cp310-linux_x86_64.whl.metadata (1.7 kB)
Collecting torchvision
  Using cached https://download.pytorch.org/whl/rocm6.4/torchvision-0.24.1%2Brocm6.4-cp310-cp310-manylinux_2_28_x86_64.whl.metadata (5.9 kB)
Collecting torchaudio
  Using cached https://download.pytorch.org/whl/rocm6.4/torchaudio-2.9.1%2Brocm6.4-cp310-cp310-manylinux_2_28_x86_64.whl.metadata (6.9 kB)
Collecting xformers
  Using cached https://download.pytorch.org/whl/rocm6.4/xformers-0.0.33.post2-cp39-abi3-manylinux_2_28_x86_64.whl.metadata (1.2 kB)
INFO: pip is looking at multiple versions of torchvision to determine which version is compatible with other requirements. This could take a while.
Collecting torchvision
  Downloading https://download.pytorch.org/whl/rocm6.4/torchvision-0.24.0%2Brocm6.4-cp310-cp310-manylinux_2_28_x86_64.whl.metadata (5.9 kB)

In [6]:
!pip install --no-deps unsloth unsloth-zoo
!pip install --no-deps git+https://github.com/unslothai/unsloth-zoo.git
!pip install "unsloth[amd] @ git+https://github.com/unslothai/unsloth"


[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m25.2[0m[39;49m -> [0m[32;49m25.3[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m
Collecting git+https://github.com/unslothai/unsloth-zoo.git
  Cloning https://github.com/unslothai/unsloth-zoo.git to /tmp/pip-req-build-tfe0p8xk
  Running command git clone --filter=blob:none --quiet https://github.com/unslothai/unsloth-zoo.git /tmp/pip-req-build-tfe0p8xk
  Resolved https://github.com/unslothai/unsloth-zoo.git to commit c315ec1b0782a43893f34ed1dc264de9f2600236
  Installing build dependencies ... [?25ldone
[?25h  Getting requirements to build wheel ... [?25ldone
[?25h  Preparing metadata (pyproject.toml) ... [?25ldone

[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m25.2[0m[39;49m -> [0m[32;49m25.3[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run

In [25]:
!pip install trl


[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m25.2[0m[39;49m -> [0m[32;49m25.3[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m


In [None]:
!pip install ipywidgets widgetsnbextension


[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m25.2[0m[39;49m -> [0m[32;49m25.3[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m


In [8]:
import os
import json
import glob
import torch
import shutil
from pathlib import Path
from datasets import Dataset

In [9]:
os.environ["TORCH_COMPILE_DISABLE"] = "1"                                                                                                                                                     
os.environ["TORCHINDUCTOR_DISABLE"] = "1" 

### Importing Standard Libraries

Imports essential Python libraries for fine-tuning:
- `os`, `json`, `glob`: File system operations and JSON handling
- `torch`: PyTorch deep learning framework
- `shutil`: File operations
- `Path`: Path manipulation
- `Dataset`: HuggingFace datasets library for data handling

In [10]:
print("CUDA available:", torch.cuda.is_available())                                                                                                          
print("Device count:", torch.cuda.device_count())                                                                                                            
                                                                                                                                                            
if hasattr(torch, 'accelerator'):                                                                                                                            
    print("Accelerator available:", torch.accelerator.is_available())                                                                                        
    if torch.accelerator.is_available():                                                                                                                     
        print("Current accelerator:", torch.accelerator.current_accelerator())                                                                               
                                                                                                                                                            
# Check ROCm specifically                                                                                                                                    
print("HIP available:", hasattr(torch.version, 'hip') and torch.version.hip is not None)                                                                     
if hasattr(torch.version, 'hip'):                                                                                                                            
    print("HIP version:", torch.version.hip)    

CUDA available: True
Device count: 1
Accelerator available: True
Current accelerator: cuda
HIP available: True
HIP version: 6.4.43482-0f2d60242


In [11]:
from unsloth import FastLanguageModel
from unsloth.chat_templates import get_chat_template, standardize_sharegpt, train_on_responses_only
from trl import SFTConfig, SFTTrainer
from transformers import DataCollatorForSeq2Seq

🦥 Unsloth: Will patch your computer to enable 2x faster free finetuning.
INFO 01-14 19:08:29 [__init__.py:241] Automatically detected platform rocm.
INFO 01-14 19:08:29 [layer.py:37] [Aiter] VLLM_ROCM_USE_AITER_TRITON_FUSED_ROPE_ZEROS_KV_CACHE=False
INFO 01-14 19:08:29 [activation.py:67] [Aiter] VLLM_ROCM_USE_AITER_TRITON_SILU_MUL_FP4_QUANT=False
INFO 01-14 19:08:29 [activation.py:68] [Aiter] VLLM_ROCM_USE_AITER_TRITON_SILU_MUL_FP8_QUANT=False
INFO 01-14 19:08:29 [activation.py:69] [Aiter] VLLM_TRITON_FP4_GEMM_USE_ASM=False
INFO 01-14 19:08:30 [llama.py:72] [Aiter] VLLM_ROCM_USE_AITER_TRITON_FUSED_ROPE_ZEROS_KV_CACHE=False VLLM_ROCM_USE_AITER_MHA=True
Unsloth: Your Flash Attention 2 installation seems to be broken?
A possible explanation is you have a new CUDA version which isn't
yet compatible with FA2? Please file a ticket to Unsloth or FA2.
We shall now use Xformers instead, which does not have any performance hits!
We found this negligible impact by benchmarking on 1x A100.
🦥 Unslo

### Importing Unsloth and Training Libraries

Imports specialized libraries for efficient fine-tuning:
- `FastLanguageModel` from Unsloth: Optimized model loading and training
- `get_chat_template`, `standardize_sharegpt`, `train_on_responses_only`: Chat formatting utilities
- `SFTConfig`, `SFTTrainer`: Supervised fine-tuning configuration and trainer from TRL
- `DataCollatorForSeq2Seq`: Handles batching and padding for sequence-to-sequence training

### Setup Unsloth model and tokenizer for ROCm without bitsandbytes

In [12]:
max_seq_length = 1024
dtype = torch.bfloat16  # Explicit bfloat16 for ROCm
load_in_4bit = False  

model, tokenizer = FastLanguageModel.from_pretrained(
    model_name="unsloth/Llama-3.3-70B-Instruct",
    max_seq_length=max_seq_length,
    dtype=dtype,
    load_in_4bit=load_in_4bit,
    device_map="auto",
    torch_dtype=torch.bfloat16,  # Explicit for ROCm
    trust_remote_code=True,
)

print(f"✅ Loaded: Llama-3.3-70B-Instruct (bfloat16, ROCm compatible)")

# Add LoRA adapters
model = FastLanguageModel.get_peft_model(
    model,
    r=64,  # Higher rank for 70B model
    target_modules=["q_proj", "k_proj", "v_proj", "o_proj",
                   "gate_proj", "up_proj", "down_proj"],
    lora_alpha=64,
    lora_dropout=0,
    bias="none",
    use_gradient_checkpointing="unsloth",
    random_state=3407,
    use_rslora=False,
    loftq_config=None,
)

Are you certain you want to do remote code execution?
==((====))==  Unsloth 2026.1.2: Fast Llama patching. Transformers: 4.57.3. vLLM: 0.9.2rc2.dev2602+g03b8f9b84.rocm702.
   \\   /|    AMD Radeon Graphics. Num GPUs = 1. Max memory: 191.688 GB. Platform: Linux.
O^O/ \_/ \    Torch: 2.8.0+rocm6.4. ROCm Toolkit: 6.4.43482-0f2d60242. Triton: 3.4.0
\        /    Bfloat16 = TRUE. FA [Xformers = None. FA2 = False]
 "-____-"     Free license: http://github.com/unslothai/unsloth
Unsloth: Fast downloading is enabled - ignore downloading bars which are red colored!


  GPU_BUFFERS = tuple([torch.empty(2*256*2048, dtype = dtype, device = f"{DEVICE_TYPE_TORCH}:{i}") for i in range(n_gpus)])
`torch_dtype` is deprecated! Use `dtype` instead!


Loading checkpoint shards:   0%|          | 0/30 [00:00<?, ?it/s]

✅ Loaded: Llama-3.3-70B-Instruct (bfloat16, ROCm compatible)


Unsloth 2026.1.2 patched 80 layers with 80 QKV layers, 80 O layers and 80 MLP layers.


In [13]:
import torch                                                                                                                                                                                  
                                                                                                                                                                                            
# Basic GPU check                                                                                                                                                                             
print("CUDA available:", torch.cuda.is_available())                                                                                                                                           
print("Device name:", torch.cuda.get_device_name(0))                                                                                                                                          
print("Device count:", torch.cuda.device_count())                                                                                                                                             
                                                                                                                                                                                            
# Confirm it's ROCm (not NVIDIA CUDA)                                                                                                                                                         
print("HIP version:", torch.version.hip)                                                                                                                                                      
                                                                                                                                                                                            
# Check where your model is                                                                                                                                                                   
print("\nModel device:", next(model.parameters()).device) 

CUDA available: True
Device name: AMD Radeon Graphics
Device count: 1
HIP version: 6.4.43482-0f2d60242

Model device: cuda:0


### Loading Llama-3.3-70B Model with LoRA

This cell sets up the model for efficient fine-tuning on AMD ROCm hardware:

**Model Configuration:**
- Model: Llama-3.3-70B-Instruct (70 billion parameters)
- Data type: bfloat16 for ROCm compatibility
- No quantization (load_in_4bit=False) to avoid bitsandbytes dependency
- Max sequence length: 1024 tokens

**LoRA (Low-Rank Adaptation) Configuration:**
- Rank (r): 64 - Higher rank for the large 70B model
- Target modules: All attention and MLP layers (q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj)
- LoRA alpha: 64
- Dropout: 0 (no dropout)
- Gradient checkpointing: "unsloth" for memory efficiency

LoRA enables efficient fine-tuning by only training small adapter layers instead of the entire 70B model, making it feasible to train on a single AMD MI300X GPU with 192GB HBM3 memory.

In [14]:
"""Prepare dataset with proper chat template and tensor compatibility"""
print("🔧 Preparing dataset for training...")

# Set chat template
tokenizer = get_chat_template(tokenizer, chat_template="llama-3.1")

# Ensure pad token is set
if tokenizer.pad_token is None:
    tokenizer.pad_token = tokenizer.eos_token

# Formatting function that ensures proper tensor conversion
def formatting_prompts_func(examples):
    convos = examples["conversations"]
    texts = []
    
    for convo in convos:
        # Ensure conversation is in correct format
        if isinstance(convo, list) and all(isinstance(msg, dict) for msg in convo):
            text = tokenizer.apply_chat_template(convo, tokenize=False, add_generation_prompt=False)
            texts.append(text)
        else:
            print(f"⚠️  Skipping malformed conversation: {type(convo)}")
            continue
    
    return {"text": texts}

dataset = standardize_sharegpt(dataset)

dataset = dataset.map(formatting_prompts_func, batched=True, remove_columns=dataset.column_names)

dataset = dataset.filter(lambda x: len(x["text"].strip()) > 0)

print(f"✅ Prepared {len(dataset)} valid examples for training")

# Show sample
if len(dataset) > 0:
    print(f"📝 Sample formatted text:")
    print(dataset["text"][0][:200] + "...")

🔧 Preparing dataset for training...


Unsloth: Standardizing formats (num_proc=20):   0%|          | 0/72 [00:00<?, ? examples/s]

Map:   0%|          | 0/72 [00:00<?, ? examples/s]

Filter:   0%|          | 0/72 [00:00<?, ? examples/s]

✅ Prepared 72 valid examples for training
📝 Sample formatted text:
<|begin_of_text|><|start_header_id|>system<|end_header_id|>

Cutting Knowledge Date: December 2023
Today Date: 26 July 2024

<|eot_id|><|start_header_id|>user<|end_header_id|>

If Bradley H. Dowden is...


### Preparing Dataset with Chat Template

This cell formats the dataset for fine-tuning:

**Steps:**
1. **Set Chat Template**: Applies Llama-3.1 chat template formatting
2. **Configure Padding**: Sets pad token to eos token if not already set
3. **Format Conversations**: The `formatting_prompts_func` function:
   - Takes raw conversations from the dataset
   - Applies the chat template to format them properly
   - Validates conversation structure (list of dicts with role/content)
   - Filters out malformed conversations
4. **Standardize Format**: Uses `standardize_sharegpt` to normalize the data structure
5. **Apply Formatting**: Maps the formatting function across all examples
6. **Remove Empty**: Filters out any empty or invalid formatted texts

The output shows 74 valid examples were successfully prepared. A sample of the formatted text is displayed, showing the proper Llama-3.1 chat template structure with system, user, and assistant headers.

In [22]:
"""Train model with ROCm-optimized settings"""
# Ensure tokenizer has proper padding
if tokenizer.pad_token is None:
    tokenizer.pad_token = tokenizer.eos_token
    tokenizer.pad_token_id = tokenizer.eos_token_id

# Setup trainer with ROCm-friendly settings and proper data handling
trainer = SFTTrainer(
    model=model,
    tokenizer=tokenizer,
    train_dataset=dataset,
    dataset_text_field="text",
    max_seq_length=max_seq_length,
    data_collator=DataCollatorForSeq2Seq(tokenizer=tokenizer, padding=True),
    packing=False,
    args=SFTConfig(
        per_device_train_batch_size=64,  # 🚀 MI300X can handle this with 192GB HBM3!
        gradient_accumulation_steps=1,   # Effective batch size = 8*2 = 16
        warmup_steps=5,
        num_train_epochs=1,
        learning_rate=1e-4,
        logging_steps=1,
        optim="adamw_8bit",  # Pure torch optimizer
        weight_decay=0.01,
        lr_scheduler_type="linear",
        seed=3407,
        output_dir="logical_reasoning_rocm_outputs",
        report_to="none",
        bf16=True,
        dataloader_pin_memory=False,
        remove_unused_columns=True,  # Remove unused columns to avoid tensor issues
        gradient_checkpointing=True,
        dataloader_num_workers=0,  # Single worker for ROCm stability
    ),
)

# Train only on responses
trainer = train_on_responses_only(
    trainer,
    instruction_part="<|start_header_id|>user<|end_header_id|>\n\n",
    response_part="<|start_header_id|>assistant<|end_header_id|>\n\n",
)

FastLanguageModel.for_training(model)
trainer_stats = trainer.train()


trainer_stats = trainer.train()

Unsloth: Tokenizing ["text"] (num_proc=24):   0%|          | 0/72 [00:00<?, ? examples/s]

Map (num_proc=24):   0%|          | 0/72 [00:00<?, ? examples/s]

The model is already on multiple devices. Skipping the move to device specified in `args`.
==((====))==  Unsloth - 2x faster free finetuning | Num GPUs used = 1
   \\   /|    Num examples = 72 | Num Epochs = 1 | Total steps = 2
O^O/ \_/ \    Batch size per device = 64 | Gradient accumulation steps = 1
\        /    Data Parallel GPUs = 1 | Total batch size (64 x 1 x 1) = 64
 "-____-"     Trainable parameters = 828,375,040 of 71,382,081,536 (1.16% trained)


Unsloth: Will smartly offload gradients to save VRAM!


Step,Training Loss
1,0.9863
2,1.1067


The model is already on multiple devices. Skipping the move to device specified in `args`.
==((====))==  Unsloth - 2x faster free finetuning | Num GPUs used = 1
   \\   /|    Num examples = 72 | Num Epochs = 1 | Total steps = 2
O^O/ \_/ \    Batch size per device = 64 | Gradient accumulation steps = 1
\        /    Data Parallel GPUs = 1 | Total batch size (64 x 1 x 1) = 64
 "-____-"     Trainable parameters = 828,375,040 of 71,382,081,536 (1.16% trained)


Unsloth: Will smartly offload gradients to save VRAM!


Step,Training Loss
1,0.9366
2,1.05


### Training the Model with ROCm-Optimized Settings

This cell configures and executes the fine-tuning process:

**Training Configuration (SFTConfig):**
- **Batch size**: 64 per device - leveraging the AMD MI300X's massive 192GB HBM3 memory
- **Gradient accumulation**: 1 step
- **Warmup**: 5 steps
- **Epochs**: 1 full pass through the dataset
- **Learning rate**: 1e-4
- **Optimizer**: adamw_8bit for memory efficiency
- **Precision**: bf16 (bfloat16) for ROCm
- **Gradient checkpointing**: Enabled for memory efficiency

**Special Training Mode:**
Uses `train_on_responses_only` to compute loss only on the assistant's responses, not on the user's questions. This focuses the model on learning to generate accurate answers rather than memorizing the input format.

**Key Features:**
- DataCollatorForSeq2Seq handles variable-length sequences with proper padding
- No packing to preserve conversation structure
- Single dataloader worker for ROCm stability
- Gradient checkpointing via Unsloth for memory optimization

The model is then trained on the 74 logical reasoning conversations.

In [23]:
"""Save the trained model"""
print("\n💾 SAVING ROCM-TRAINED MODEL")

# Save LoRA adapters
lora_path = "logical_reasoning_rocm_lora"
model.save_pretrained(lora_path)
tokenizer.save_pretrained(lora_path)
print(f"✅ LoRA adapters saved to: {lora_path}")

# Save merged model
merged_path = "logical_reasoning_rocm_merged"
print("🔄 Saving merged model...")
model.save_pretrained_merged(merged_path, tokenizer, save_method="merged_16bit")
print(f"✅ Merged model saved to: {merged_path}")

print(f"\n🎉 ROCM MODEL READY!")


💾 SAVING ROCM-TRAINED MODEL
✅ LoRA adapters saved to: logical_reasoning_rocm_lora
🔄 Saving merged model...
Found HuggingFace hub cache directory: /root/.cache/huggingface/hub


Fetching 1 files:   0%|          | 0/1 [00:00<?, ?it/s]

Checking cache directory for required files...


Unsloth: Copying 30 files from cache to `logical_reasoning_rocm_merged`: 100% 30/30 [02:14<00:00,  4.48s/it]


Successfully copied all 30 files from cache to `logical_reasoning_rocm_merged`
Checking cache directory for required files...
Cache check failed: tokenizer.model not found in local cache.
Not all required files found in cache. Will proceed with downloading.


Unsloth: Preparing safetensor model files: 100% 30/30 [00:00<00:00, 466033.78it/s]
Unsloth: Merging weights into 16bit: 100% 30/30 [03:32<00:00,  7.09s/it]


Unsloth: Merge process complete. Saved to `/app/projects/Unsloth-AMD-Fine-Tuning-Synthetic-Data/logical_reasoning/logical_reasoning_rocm_merged`
✅ Merged model saved to: logical_reasoning_rocm_merged

🎉 ROCM MODEL READY!


### Saving the Fine-Tuned Model

This cell saves the trained model in two formats:

1. **LoRA Adapters** (`logical_reasoning_rocm_lora/`):
   - Saves only the trained LoRA adapter weights (lightweight, ~few hundred MB)
   - Can be loaded later with the base model
   - Useful for sharing or deploying with the original base model

2. **Merged Model** (`logical_reasoning_rocm_merged/`):
   - Merges LoRA adapters back into the base model
   - Creates a standalone model with all weights
   - Saved in 16-bit precision for better quality
   - Ready for immediate inference without loading adapters

Both formats include the tokenizer configuration. The merged model is production-ready and can be used directly for generating answers to logical reasoning questions.

In [None]:
#fin

### Testing the model after creating it

In [25]:
#asfd

In [None]:
"""Test the fine-tuned model with inference"""
# Switch model to inference mode
FastLanguageModel.for_inference(model)

# Test question - a classic knight/knave logic puzzle
test_question = "A says 'B is a knave.' B says 'A and I are different types.' What are A and B?"

# Format the prompt using the chat template
messages = [{"role": "user", "content": test_question}]
input_text = tokenizer.apply_chat_template(
    messages, 
    tokenize=False, 
    add_generation_prompt=True  # Adds assistant header so model knows to respond
)

# Tokenize and move to GPU
inputs = tokenizer(input_text, return_tensors="pt").to(model.device)

# Generate response
outputs = model.generate(
    **inputs,
    max_new_tokens=512,
    temperature=0.1,      # Low temperature for more deterministic logical reasoning
    top_p=0.9,
    do_sample=True,
    pad_token_id=tokenizer.pad_token_id,
)

# Decode only the generated part (exclude the input prompt)
response = tokenizer.decode(outputs[0][inputs['input_ids'].shape[1]:], skip_special_tokens=True)

print(f"Question: {test_question}\n")
print(f"Answer: {response}")