Beyond Real: Imaginary Extension of Rotary Position Embeddings for Long-Context LLMs

Xiaoran Liu^1,2*, Yuerong Song^1,2*, Zhigeng Liu^1,2, Zengfeng Huang^1,2,

Qipeng Guo^2,3, Zhaoxiang Liu⁴, Shiguo Lian⁴, Ziwei He^2,†, Xipeng Qiu^1,2,†

¹ Fudan Univerisity, ²Shanghai Innovation Institute, ³Shanghai AI Lab, ⁴China Unicom

[📝 Paper] | [🔥 HF] | [🚀 Code] | [🤗 Model]

Introduction

In this work, we propose RoPE++, which re-injects the discarded imaginary component as a new group of attention heads computed in parallel with the real attentions. Particularly, we introduce RoPE++_EH that keeps equal attention head number while halving QKV parameters as well as KV cache, and RoPE++_EC that keeps equal cache size and doubles the number of attention heads.

We first identify the loss of imaginary information in the complex form of RoPE and find it advantageous for capturing long-context dependencies. Compared with the real attention exhibiting stronger semantic locality, the imaginary attentions attend more to long-context information on average, promising gains on long-context tasks. Moreover, adding imaginary attention also exposes qk to a wider positional information range, implicitly improving length extrapolation.

Pre-training and evaluation at 376M and 776M sizes show that RoPE++_EH and RoPE++_EC outperform vanilla RoPE and other position embeddings on average across short- and long-context tasks. Further analysis reveals that the imaginary attentions play a dominant role in modeling long-context dependencies, confirming the effectiveness of introducing imaginary attention for improved long-context capability.

Installation

Prepare Your OpenCompass

We run our downstream evaluation based on OpenCompass.

git clone https://github.com/open-compass/opencompass
cd opencompass
pip install -e .

The necessary Python packages we use and their corresponding versions.

flash-attn==2.7.4.post1
torch==2.6.0
transformers==4.51.0
opencompass==0.4.2

Prepare Your Model

Copy the folder rope_pp/rope_pp/ to opencompass/models/ and add the following line to the end of opencompass/models/__init__.py.

from .rope_pp.rope_pp_wrapper import RoPEPPCausalLM
from .rope_pp.fope_wrapper import FoPECausalLM
from .rope_pp.alibi_wrapper import AlibiCausalLM
from .rope_pp.pythia_wrapper import PythiaCausalLM
from .rope_pp.mask_wrapper import MaskCausalLM

Prepare Your Dataset

We pretrain our model with mlfoundations/dclm-baseline-1.0 and calculate validation loss with OpenDataLab/Pile-CC.

Please download these datasets and copy the paths to the corresponding positions in our training Python scripts before training.

Training

Short-Context Training

Make sure you have download mlfoundations/dclm-baseline-1.0 and OpenDataLab/Pile-CC.
Make the following directory in rope_pp/.

checkpoints/
logs/
results/
wandb/

Take 776M RoPE++_EC as an example, execute the following command. Here, imag2 stands for RoPE++_EC, and imag1 stands for RoPE++_EH.

set -x
port=$(shuf -i25000-30000 -n1)
wait
export WANDB_RUN_GROUP="rope_pp-776m"

wait
deepspeed --master_port "$port" --include localhost:0,1,2,3,4,5,6,7 \
train_rope_pp.py --config_abbr '776m' --imag --imag_mode 'imag2' --save_abbr 'rope_pp-776m-4k-imag2' > logs/rope_pp-776m-4k-imag2.log 2>&1
wait
deepspeed --master_port "$port" --include localhost:0,1,2,3,4,5,6,7 \
train_rope_pp-decay.py --config_abbr '776m' --imag --imag_mode 'imag2' --save_abbr 'rope_pp-776m-4k-imag2' --load_ckpt 90000 --decay_step 10000 > logs/rope_pp-776m-4k-imag2-ckpt90000-decay.log 2>&1

Long-Context Training

Take 776M RoPE++_EC as an example, execute the following command.

set -x
port=$(shuf -i25000-30000 -n1)
wait
export WANDB_RUN_GROUP="rope_pp-776m"

wait # for NTK
deepspeed --master_port "$port" --include localhost:0,1,2,3,4,5,6,7 \
train_rope_pp-lctx.py --config_abbr '776m' --imag --imag_mode 'imag2' --save_abbr 'rope_pp-776m-4k-imag2-ckpt90000-decay' --load_ckpt 10000 > logs/rope_pp-776m-4k-imag2-ckpt90000-decay-lctx.log 2>&1

wait  # for Linear PI
deepspeed --master_port "$port" --include localhost:0,1,2,3,4,5,6,7 \
train_rope_pp-lctx-linear.py --config_abbr '776m' --factor 8  --imag --imag_mode 'imag2' --save_abbr 'rope_pp-776m-4k-imag2-ckpt90000-decay' --load_ckpt 10000 > logs/rope_pp-776m-4k-imag2-ckpt90000-decay-lctx-pi8.log 2>&1

wait  # for YaRN
deepspeed --master_port "$port" --include localhost:0,1,2,3,4,5,6,7 \
train_rope_pp-lctx-linear.py --config_abbr '776m' --yarn --factor 32 --imag --imag_mode 'imag2' --save_abbr 'rope_pp-776m-4k-imag2-ckpt90000-decay' --load_ckpt 10000 > logs/rope_pp-776m-4k-imag2-ckpt90000-decay-lctx-yarn32.log 2>&1

These training commands are detailed in scripts/train-776m-rope_pp_ec.sh. We also provide more bash scripts for other methods or other model scales.

Note: please execute these in scripts under rope_pp/ directory.

Download Our Models

You can also download our checkpoints from Huggingface.

Evaluation

Copy the folder rope_pp/eval/ to your OpenCompass directory, and then you can try the following evaluations.

Short-Context Evaluation

We evaluate short-context performance on tasks mainly in Open LLM Leaderboard, including TruthfulQA, PIQA, HellaSwag, WinoGrande, ARC-e, GPQA, SocialIQA, OpenBookQA, and SuperGLUE. All models are tested within a 4k context length.
Execute the following command.

python run.py eval/eval_rope_pp_short.py --dump-eval-details -r
wait
python run.py eval/eval_fope_short.py --dump-eval-details -r
wait
python run.py eval/eval_alibi_short.py --dump-eval-details -r
wait
python run.py eval/eval_pythia_short.py --dump-eval-details -r

Long-Context Evaluation

We evaluate long-context performance at varying lengths on synthetic benchmarks, RULER, and BABILong.
Before evaluation, we need first to edit the prompt format of the RULER and BABILong to enable the base model to respond more effectively. In files like ruler_cwe_gen.py under the path opencompass/configs/datasets/ruler/, and files like babilong_4k_gen.py under the path opencompass/configs/datasets/babilong/, comment out the '\n' at the end of the prompt. The following is an example in opencompass/configs/datasets/ruler/ruler_vt_gen.py.

vt_datasets = [
    {
        'abbr': 'ruler_vt',
        'type': RulerVtDataset,
        'num_chains': 1,
        'num_hops': 4,
        'reader_cfg': dict(input_columns=['prompt'], output_column='answer'),
        'infer_cfg': dict(
            prompt_template=dict(
                type=PromptTemplate,
                template=dict(
                    round=[
                        dict(role='HUMAN', prompt='{prompt}'),
                        # dict(role='BOT', prompt='{answer}\n'),    # comment out this line
                    ]
                ),
            ),
            retriever=dict(type=ZeroRetriever),
            inferencer=dict(type=GenInferencer),
        ),
        'eval_cfg': dict(
            evaluator=dict(type=RulerVtEvaluator),
        ),
    }
]

Execute the following command.

python run.py eval/eval_rope_pp_ruler.py --dump-eval-details -r
wait
python run.py eval/eval_rope_pp_babilong.py --dump-eval-details -r

Perplexity (PPL) Evaluation

We calculate the perplexity in rope_pp/ directory instead of OpenCompass as follows.

We measure perplexity on WikiText and LAMBADA by executing the following command.

python test_ppl.py

Noise Experiment

To verify how imaginary attention captures long-context dependencies and to contrast it with real attention in RoPE++, we design the noise experiment.
We add Gaussian noise with equal standard deviation to the imaginary and real attention components separately, and monitor the change in RoPE++ performance on RULER-4k.
Results can be acquired by executing the following command.

python run.py eval/eval_rope_pp_ruler_mask.py --dump-eval-details -r

Results

Citation

@article{liu2025beyond,
  title={Beyond Real: Imaginary Extension of Rotary Position Embeddings for Long-Context LLMs},
  author={Liu, Xiaoran and Song, Yuerong and Liu, Zhigeng and Huang, Zengfeng and Guo, Qipeng and Liu, Zhaoxiang and Lian, Shiguo and He, Ziwei and Qiu, Xipeng},
  journal={arXiv preprint arXiv:2512.07525},
  year={2025}
}

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
configs		configs
eval		eval
img		img
rope_pp		rope_pp
scripts		scripts
utils		utils
.gitignore		.gitignore
README.md		README.md
test_ppl.py		test_ppl.py
train_alibi-decay.py		train_alibi-decay.py
train_alibi.py		train_alibi.py
train_fope-decay.py		train_fope-decay.py
train_fope.py		train_fope.py
train_pythia-decay.py		train_pythia-decay.py
train_pythia.py		train_pythia.py
train_rope_pp-decay.py		train_rope_pp-decay.py
train_rope_pp-lctx-linear.py		train_rope_pp-lctx-linear.py
train_rope_pp-lctx.py		train_rope_pp-lctx.py
train_rope_pp.py		train_rope_pp.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Beyond Real: Imaginary Extension of Rotary Position Embeddings for Long-Context LLMs

Introduction

Installation

Prepare Your OpenCompass

Prepare Your Model

Prepare Your Dataset

Training

Short-Context Training

Long-Context Training

Download Our Models

Evaluation

Short-Context Evaluation

Long-Context Evaluation

Perplexity (PPL) Evaluation

Noise Experiment

Results

Citation

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

OpenMOSS/rope_pp

Folders and files

Latest commit

History

Repository files navigation

Beyond Real: Imaginary Extension of Rotary Position Embeddings for Long-Context LLMs

Introduction

Installation

Prepare Your OpenCompass

Prepare Your Model

Prepare Your Dataset

Training

Short-Context Training

Long-Context Training

Download Our Models

Evaluation

Short-Context Evaluation

Long-Context Evaluation

Perplexity (PPL) Evaluation

Noise Experiment

Results

Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages