Beyond-Reproduction
_{^{A Paired-Task Framework for Assessing LLM Comprehension and Creativity in Literary Translation}}

[](to be updated)

📁 Repository Structure

beyond_imitation/
├── prompt_openrouter.py      # fast API call for efficient and scalable generation and evaluation
├── model_list_full.txt       # Evaluated model list
├── task1/                    # Claim-evaluation benchmark
│   ├── dataset/              # task 1 dataset/task 1 adversarial dataset (to prevent data contamination, please download separately; See instructions below)
│   ├── step1_task1_prompt_gen.py # generate prompt
│   ├── step2_task1_batch_run.py  # batch evaluation of models
│   ├── utils.py              # JSON/text helpers for notebooks
│   └── *.ipynb               # Result analysis and plots generation for results reproducibility
└── task2/                    # Translational creativity benchmark
    ├── dataset/              # task 2 dataset (annotated En-Zh/En-Nl parallel corpus)
    ├── step1_task2_TransPrompt_gen.py
    ├── step2_batch_task2_translation.py   
    ├── task2_evaluator.py
    ├── task2_bench/          # Generated prompts and model outputs
    └── *.ipynb               # Auto-eval and human-eval analysis

🚀 Usage

Two benchmark tasks:
- Task 1 model benchmark and adversarial test, see task1
- Task 2 analysis of human annotation and automatic annotation, see task2
Instructions to run translation generation and evaluation with the shared runner
All experiments use a unified interface (we also upload batch run .py for each separate task):

# Step 1: Build prompts from datasets
## for task 1:
python step1_task1_prompt_gen.py
## for task 2:
python step1_task2_TransPrompt_gen.py

# Step 2: run translation generation and evaluation with the shared runner separately or in batch
python prompt_openrouter.py \
  --file path/to/prompt.csv \
  --model anthropic/claude-3.7-sonnet:thinking \
  --temperature 0.3 \ # (for Task 1 benchmark: 0.3 for less randomness while preserving literary reasoning; for task 2 auto-annotation: 0 for reproducibility; for task 2 literary translation: 0.7 for creative freedom)
  --content-column prompt \
  --output-dir path/to/output.csv

Key arguments:
--file: input CSV with prompts
--model: OpenRouter model ID
--content-column: column sent as input
--temperature: sampling temperature
--output-dir: output location

#Step 2: run evaluation/translation in batch
## for task1:
python step2_task1_batch_run.py
## for task2 translation:
python step2_batch_task2_translation.py
## for task2 evaluation:
# Run step3_1_AutoEval_pipeline.ipynb for prompt and data preparation, and when instructed run: 
python step3_2_task2_evaluator.py

Instructions to reproduce results in Tasks 1 and 2, see .ipynb in Task 1 and 2 folders.
To download annotated datasets upon agreeing on the following conditions:

📊 Dataset Overview

🤝 Contributing

Feel free to contribute by submitting a pull request.

# Fork the repository
# Create a new branch for your feature or fix
# Commit your changes with a clear message
# Push to your fork and submit a PR

📜 License

Specify the license under which this code is shared.

This project is licensed under the CC License - see the LICENSE file for details.

📖 Citation

If you use this work in your research, please cite it as:

to be updated

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
Fig		Fig
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Beyond-Reproduction
_{^{A Paired-Task Framework for Assessing LLM Comprehension and Creativity in Literary Translation}}

📁 Repository Structure

🚀 Usage

📊 Dataset Overview

🤝 Contributing

📜 License

📖 Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

Beyond-Reproduction A Paired-Task Framework for Assessing LLM Comprehension and Creativity in Literary Translation

📁 Repository Structure

🚀 Usage

📊 Dataset Overview

🤝 Contributing

📜 License

📖 Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Beyond-Reproduction
_{^{A Paired-Task Framework for Assessing LLM Comprehension and Creativity in Literary Translation}}

Packages