Decoder-Only Transformer-Powered Spectrum Activity Forecasting via Tokenized RF Data
This repository contains the official training, evaluation, and fine-tuning code accompanying the paper:
📄 Mohammad Mosiur Rahman Lunar and M. C. Vuran, "Large Spectrum Models (LSMs): Decoder-Only Transformer-Powered Spectrum Activity Forecasting via Tokenized RF Data," IEEE DySPAN 2026, Washington, DC, May 11–14, 2026.
If you use this code or build on this work, please cite:
@inproceedings{lunar2026lsm,
author = {Mohammad Mosiur Rahman Lunar and M. C. Vuran},
title = {Large Spectrum Models ({LSMs}): Decoder-Only Transformer-Powered
Spectrum Activity Forecasting via Tokenized {RF} Data},
booktitle = {IEEE International Symposium on Dynamic Spectrum Access Networks (DySPAN)},
address = {Washington, DC},
month = {May},
year = {2026}
}This project is licensed under the GNU General Public License v3.0 — see LICENSE for details.
src/
├── GPT/ # GPT-2 training & evaluation
├── Llama/ # Llama training & evaluation
├── Mistral/ # Mistral training & evaluation
├── Gemma/ # Gemma training & evaluation
├── Phi/ # Phi training & evaluation
├── LSTM/ # LSTM baseline
├── finetuning/
│ ├── FineTuning_Avtl/ # Fine-tuning on AVTL dataset
│ ├── FineTuning_Matlab_5g/ # Fine-tuning on MATLAB 5G dataset
│ └── Binary_Classification/ # Binary spectrum occupancy classification
├── Preprocess_Data/ # Data preprocessing scripts
├── tokenization/ # Tokenization utilities
├── data_generation_for_simulation/ # Synthetic data generation (MATLAB + Python)
├── Weighted_Kappa/ # Weighted Kappa evaluation metric
├── plot_codes/ # MATLAB plotting scripts for paper figures
├── libs/ # Shared utility libraries
└── Other_Codes/ # Miscellaneous model exploration scripts
pip install -r requirements.txt- Python ≥ 3.9
- CUDA-capable GPU strongly recommended for training
- MATLAB R2022a or later (for data generation and plotting scripts)
Every script has a configuration block near the top with # TODO comments marking paths you need to set:
# Training scripts
DATASET_PATH = "./data/high_activity_full_dataset" # TODO: set path to your dataset
MODEL_SAVE_PATH = "./models" # TODO: set path to save trained models
LOGGING_DIR = "./logs" # TODO: set path to save training logs
# Test / evaluation scripts
os.environ['HF_HOME'] = './hf_cache/' # TODO: set your HuggingFace cache directory
sys.path.append('./libs') # TODO: update to local libs directory
MODEL_SAVE_PATH = "./pretrained_model" # TODO: set path to your pretrained checkpoint
RESULTS_SAVE_PATH = "./results/eval_results.pt" # TODO: set path to save evaluation results- Generate simulation data —
src/data_generation_for_simulation/data_gen.m - Convert
.mat→.npy—src/data_generation_for_simulation/mat2npy.py - Preprocess —
src/Preprocess_Data/ - Tokenize —
src/tokenization/basic/data_tokenization.py - Train —
src/<Model>/High/main.pyorsrc/<Model>/Low/main.py - Evaluate —
src/<Model>/Test_Array/main_test_tensor.py - Plot results —
src/plot_codes/
| Scenario | Path | Description |
|---|---|---|
| AVTL | src/finetuning/FineTuning_Avtl/ |
Fine-tune pretrained LSM on AVTL spectrum data |
| MATLAB 5G | src/finetuning/FineTuning_Matlab_5g/ |
Fine-tune on MATLAB-generated 5G data |
| Binary Classification | src/finetuning/Binary_Classification/ |
Spectrum occupancy binary classifier |
src/Weighted_Kappa/calculate.py— computes weighted Cohen's Kappa across all modelssrc/plot_codes/— MATLAB scripts to reproduce all paper figures; update the CSV/result paths at the top of each.mfile before running
Mohammad Mosiur Rahman Lunar
LinkedIn
Email: mlunar2@unl.edu
Mehmet Can Vuran
LinkeedIn
Email: mcv@unl.edu