HER introduces dual-layer thinking that distinguishes characters' first-person thinking from LLMs' third-person thinking for cognitive-level persona simulation.
LLM role-playingโusing LLMs to simulate specific personasโhas emerged as a key capability in companionship, content creation, and digital games. While current models effectively capture character tones and knowledge, simulating the inner thoughts behind their behaviors remains a challenge.
HER (Hierarchical Emotion Reasoning) is a unified framework for cognitive-level persona simulation that introduces:
- ๐ง Dual-Layer Thinking: Distinguishes characters' first-person thinking from LLMs' third-person thinking
- ๐ Reasoning-Augmented Data: High-quality role-playing data via reverse engineering
- ๐ฏ Human-Aligned Rewards: Principle-based reward models aligned with human preferences
| Rank | Model | CoSER Avg | CoSER SC | CoSER AN | CoSER CF | CoSER SQ | MiniMax Avg | MiniMax Worlds (50%) | MiniMax Stories (25%) | MiniMax Pref (25%) | 95% CI |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | Claude-4.5-Opus | 62.43 | 63.74 | 64.28 | 58.45 | 63.24 | 76.62 | 67.23 | 82.10 | 89.90 | [75.5, 77.7] |
| 2 | Gemini-3-Pro | 61.80 | 65.95 | 60.42 | 58.34 | 62.49 | 75.60 | 62.72 | 83.87 | 93.08 | [74.5, 76.7] |
| 3 | GPT-5.1 | 61.10 | 64.95 | 53.99 | 60.13 | 65.35 | 80.63 | 76.62 | 72.21 | 97.05 | [79.6, 81.6] |
| 4 | Gemini-2.5-Pro | 60.68 | 61.05 | 60.80 | 57.48 | 63.40 | 68.23 | 52.36 | 82.11 | 86.08 | [67.1, 69.3] |
| 5 | DeepSeek-v3.2 | 58.68 | 55.85 | 57.07 | 57.44 | 64.35 | 60.27 | 45.81 | 66.64 | 82.83 | [59.2, 61.4] |
| 6 | MiniMax-M2-her | 57.30 | 60.03 | 50.11 | 49.30 | 69.77 | 84.65 | 80.55 | 79.97 | 97.51 | [83.6, 85.7] |
| 7 | DeepSeek-v3.1 | 53.50 | 50.15 | 53.18 | 53.93 | 56.72 | 64.22 | 51.11 | 66.45 | 88.21 | [62.9, 65.5] |
| 8 | HER-RL (this model) | 53.12 | 54.33 | 47.26 | 52.78 | 58.12 | 65.73 | 59.13 | 57.74 | 86.90 | [63.0, 68.4] |
| 9 | HER-SFT | 50.92 | 50.52 | 45.99 | 49.78 | 57.37 | 58.44 | 47.29 | 52.78 | 86.40 | [56.5, 60.4] |
| 10 | Grok-4.1-Fast | 47.40 | 49.21 | 47.57 | 42.64 | 50.17 | 48.47 | 29.87 | 47.51 | 86.64 | [47.4, 49.5] |
| 11 | Claude-4.5-Sonnet | 45.21 | 47.18 | 36.02 | 47.55 | 50.09 | 69.35 | 55.72 | 75.66 | 90.28 | [68.2, 70.5] |
| 12 | Claude-3.7-Think | 39.73 | 44.84 | 31.00 | 42.45 | 40.65 | 61.25 | 50.66 | 59.53 | 84.15 | [58.5, 64.0] |
| 13 | CoSER-70B | 35.95 | 35.05 | 31.16 | 32.28 | 45.33 | 45.38 | 34.32 | 30.32 | 82.58 | [43.5, 47.2] |
| 14 | GPT-5-Mini | 32.97 | 38.10 | 24.60 | 27.20 | 42.00 | 57.63 | 43.32 | 50.11 | 93.78 | [55.9, 59.3] |
| 15 | GPT-4o-240806 | 27.69 | 34.00 | 14.90 | 22.90 | 38.90 | 66.39 | 64.96 | 46.23 | 89.40 | [64.1, 68.7] |
| 16 | GPT-OSS-120B | 26.12 | 32.80 | 14.80 | 21.50 | 35.40 | 60.72 | 47.27 | 56.65 | 91.71 | [58.0, 63.4] |
| 17 | Qwen3-32B | 22.86 | 30.56 | 19.61 | 15.52 | 30.56 | 50.76 | 40.38 | 32.82 | 89.48 | [48.4, 53.2] |
HER achieves +30.26% improvement on CoSER and +14.97% on MiniMax Role-Play Bench.
git clone https://github.com/xxx/HER.git
cd HER
pip install -r requirements.txtChat with AI characters from classic literature:
cd chat_demo
# Interactive chat with 200 classic book scenarios
python chat_demo.py
# Show the model's thinking process
python chat_demo.py --show-think --show-rolethink๐ฌ Example Conversation
๐ Pride and Prejudice
๐ฅ AI plays: Elizabeth Bennet | You play: Mr. Darcy
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
๐ญ ใElizabeth Bennet's Responseใ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
[His tone is light, but the air feels heavy. I cannot let him see how much
Lady Catherine's intrusion still stings.]
(takes a steadying breath, smoothing the folds of her dress)
I believe I can manage, Father. Though I must admit, I am curious about
what this letter contains.
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
HER/
โโโ ๐ chat_demo/ # ๐ฎ Interactive demo (try this first!)
โ โโโ chat_demo.py # Chat with book characters
โ
โโโ ๐ data_process_code/ # ๐ง Data synthesis pipeline
โ โโโ step1_data_process/ # Data cleaning & format conversion
โ โโโ step2_gen_rolethinking/# Role thinking enhancement
โ โโโ step3_gen_systhinking/ # System thinking generation
โ โโโ step4_setting_completion/ # Character profile enrichment
โ
โโโ ๐ training_code/ # ๐ฏ Model training pipeline
โ โโโ step1_roleplay_sft/ # Supervised fine-tuning
โ โโโ step2_reward_sft/ # Reward model data
โ โโโ step3_reward_rl/ # RM training
โ โโโ step4_roleplay_rl/ # RL training
โ
โโโ ๐ eval_code/ # ๐ Evaluation framework
โโโ benchmarks/coser/ # CoSER multi-turn benchmark
โโโ models/ # Model adapters (vLLM, API)
โโโ configs/ # Evaluation configs
HER introduces a hierarchical thinking mechanism:
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ HER Response Structure โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ ๐ System Thinking (Third-Person Analysis) โ โ
โ โ "Elizabeth should respond with wit but maintain dignity. โ โ
โ โ Consider her pride and her evolving feelings..." โ โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ ๐ญ Role Thinking (First-Person Inner Thoughts) โ โ
โ โ [His manner is so different now. Can I trust this โ โ
โ โ change, or is it merely another game?] โ โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ ๐ญ Role Response (Speech + Action) โ โ
โ โ (lifts her chin slightly) "Mr. Darcy, I find your โ โ
โ โ sudden interest in conversation rather remarkable." โ โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
| Layer | Perspective | Purpose | Visibility |
|---|---|---|---|
| System Thinking | Third-person | How to portray the character | Hidden |
| Role Thinking | First-person | Character's inner thoughts | Self only |
| Role Response | First-person | Speech and actions | All |
Raw Dialogues (760 books)
โ
โผ
โโโโโโโโโโโโโโโโโโโโ
โ Data Processing โโโโบ Enhanced with dual-layer thinking
โโโโโโโโโโโโโโโโโโโโ
โ
โผ
โโโโโโโโโโโโโโโโโโโโ
โ Roleplay SFT โโโโบ Base roleplay capability
โโโโโโโโโโโโโโโโโโโโ
โ
โผ
โโโโโโโโโโโโโโโโโโโโ
โ Reward Model โโโโบ Principle-based quality assessment
โโโโโโโโโโโโโโโโโโโโ
โ
โผ
โโโโโโโโโโโโโโโโโโโโ
โ Roleplay RL โโโโบ Aligned with human preferences
โโโโโโโโโโโโโโโโโโโโ
โ
โผ
Final Model (HER-RL)
# Step 1: Data Processing
cd data_process_code
# See DATA_PIPELINE.md for detailed workflow
# Step 2: SFT Training
cd training_code/step1_roleplay_sft
python convert_to_sft.py
# Step 3-4: Reward Model & RL
# See training_code/PIPELINE.mdMulti-agent group conversation evaluation with 4 dimensions:
| Metric | Description |
|---|---|
| SC (Storyline Consistency) | Character consistency across turns |
| AN (Anthropomorphism) | Human-like behavior and emotions |
| CF (Character Fidelity) | Adherence to character settings |
| SQ (Storyline Quality) | Overall dialogue quality |
cd eval_code
# Run evaluation
python run_coser.py \
--actor your-model \
--judge qwen \
--max-rounds 20| Document | Description |
|---|---|
| Data Processing Pipeline | Complete data synthesis workflow |
| Training Pipeline | Multi-stage training guide |
| Evaluation Guide | CoSER benchmark details |
| Chat Demo | Interactive demo guide |
For questions or feedback, please open an issue in the repository.
@article{her2025,
title={HER: Human-like Reasoning and Reinforcement Learning for LLM Role-playing},
author={Chengyu Du, Xintao Wang, Aili Chen, Weiyuan Li, Rui Xu, Junteng Liu, Zishan Huang, Rong Tian, Zijun Sun, Yuhao Li, Liheng Feng, Deming Ding, Pengyu Zhao, Yanghua Xiao},
journal={arXiv preprint arXiv:2026.21459},
year={2026}
}This project is licensed under the MIT License - see the LICENSE file for details.