Multilingual KokoroChat: A Multi-LLM Ensemble Translation Method for Creating a Multilingual Counseling Dialogue Dataset
This dataset was created by translating KokoroChat, a large-scale manually authored Japanese counseling corpus, into both English and Chinese. We developed and employed a novel Multi-LLM Ensemble method. Our approach first generated diverse hypotheses from multiple distinct LLMs. A single LLM then produced a high-quality translation based on an analysis of the respective strengths and weaknesses of all presented hypotheses.
This work has been accepted to the main conference of LREC 2026.
For the English translation, we selected three models representing the state-of-the-art at the time of manuscript preparation: GPT-5 (gpt-5-2025-0907), Gemini-2.5-Pro (gemini-2.5-pro), Grok-4 (grok-4-0709). For Chinese translation, we replaced Grok-4 with Qwen-Plus (qwen-plus-2025-07-28).
We selected Gemini-2.5-Pro as the refiner LLM.
| Language | Dialogues | Avg. utterances/dialogue |
|---|---|---|
| Japanese (KokoroChat) | 6,589 | 91.2 |
| English | 6,565 | 91.2 |
| Chinese | 6,582 | 91.2 |
Note that the slight reduction in the number of dialogues for the multilingual versions is due to the exclusion of content that triggered the LLM's safety filters.
Chinese/ # Chinese translation dialogues by Multi-LLM Ensemble
└── zh*.json
English/ # English translation dialogues by Multi-LLM Ensemble
└── en*.json
src/ # Source code
├── TestHyp/ # Hypotheses generated by each LLM (used for refinement)
| ├── Chinese/
| | └── {LLM}4.json
| └── English/
| └── {LLM}4.json
|
├── CancelBatchProcess.py # Script to cancel running batch jobs
├── config.json # API keys for each LLM
├── Hypothesis_{LLM}.py # Prompts and code to generate hypotheses with each LLM
├── main.py # Main entry point to run the experiments
├── Refine_Gemini.py # Prompt and code to refine translations by integrating 3 hypotheses
└── utils.py # Shared utility functions
The English translation dialogue data is stored in English/ as en*.json, while the Chinese translation dialogue data is stored in Chinese/ as zh*.json.
src/Hypothesis_{LLM}.py contains the prompts and code used to generate hypotheses with each LLM.
src/Refine_Gemini.py contains the prompt and code used to refine translations by integrating 3 hypotheses.
src/TestHyp/ contains, for each language, one dialogue of hypotheses generated by each LLM, which are used to reproduce the integration performed in src/Refine_Gemini.py.
If you want to run each file, please execute src/main.py and specify the file ID to be translated, the LLM to use, and the target language.
The table below describes the fields of each utterance object in the dialogue array.
| Key | Type | Description |
|---|---|---|
role |
String | Speaker's role (counselor or client). |
time |
String | The timestamp of the utterance in ISO 8601 format. |
origin |
String | The original utterance in Japanese. |
content |
String | The translated text of origin in the target language. |
If you use this dataset, please cite the following paper:
@inproceedings{suzuki2026multilingualkokorochat,
title = {Multilingual KokoroChat: A Multi-LLM Ensemble Translation Method for Creating a Multilingual Counseling Dialogue Dataset},
author = {Ryoma Suzuki and Zhiyang Qi and Michimasa Inaba},
booktitle = {Proceedings of the Fifteenth Language Resources and Evaluation Conference},
year = {2026},
url = {https://github.com/UEC-InabaLab/MultilingualKokoroChat}
}Multilingual KokoroChat is released under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International (CC BY-NC-ND 4.0) license.
6

