LLM Collaboration - Code Completion

This repo provides the extended environments for CoMLRL.

Installation

pip install comlrl
# Install PyTorch compatible with your device

Or via conda-forge:

conda install -c conda-forge comlrl
# Install PyTorch compatible with your device

Dataset: ClassEval

Contents: each sample includes a class skeleton, method stubs (with docstrings or pass), and canonical hidden tests.
Splitting: train/train_grpo.py performs a stable random train/eval split (default 80/20) on top of the Hugging Face dataset; seeds are optionally randomized via fixed_seed=false.
Prompting: prompts include the sanitized class skeleton, explicit method names per agent, and any collaboration instructions.
Testing: reward code merges agent completions back into the skeleton and runs the provided unit tests inside a temporary directory to isolate state.

Settings

Key sections in configs/config.yaml:

model: base checkpoint (Qwen/Qwen2.5-Coder-3B-Instruct by default), tokenizer/model kwargs, and device mapping.
data: dataset name and split ratio; customize when experimenting with different ClassEval sub-splits or local mirrors.
collab: choose ONE or TAKE_JOB and set num_agents (for TAKE_JOB).
external: determines the feedback mode. token_report summarizes syntax/tests at each turn; other modes replicate the options documented in the code-generation README (plain, level_feedback, group_feedback, personal_feedback, personal_detailed_feedback, passed, level_passed).
trainer: forwarded to comlrl.trainers.magrpo.MAGRPOTrainer. Includes sampling settings (num_generations, num_turns, temperature/top_p), optimization hyperparameters, and IO controls (output_dir, logging_steps, etc.).
reward_processor: optional extra scaling/shift before the reward hits the trainer.
output: persistence knobs (save final model, keep tmp dirs); environment variables such as CLASSEVAL_TMP_BASE are derived from this section to colocate temp files per job.

Rewards, Logging, and Evaluation

rewards/CE_reward.py computes structured rewards:
- lv1: coverage of unique methods completed.
- lv2: penalizes under/over-allocation of total method picks.
- lv3: balance term encouraging an even workload across agents.
- lv4/lv5: syntax + unit-test bonuses (reported for analysis; syntax/test failures short-circuit the run where applicable).
Tests execute inside per-sample temporary directories to avoid polluted state and are automatically truncated on timeout.
Loggers are inherited from CoMLRL. Enable Weights & Biases by filling wandb.entity or disable it for offline debugging.

Name		Name	Last commit message	Last commit date
Latest commit History 96 Commits
configs		configs
external		external
loggers		loggers
rewards		rewards
train		train
utils		utils
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
__init__.py		__init__.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

LLM Collaboration - Code Completion

Installation

Dataset: ClassEval

Settings

Rewards, Logging, and Evaluation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 2

Uh oh!

Languages

License

OpenMLRL/LLM_Collab_Code_Completion

Folders and files

Latest commit

History

Repository files navigation

LLM Collaboration - Code Completion

Installation

Dataset: ClassEval

Settings

Rewards, Logging, and Evaluation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Uh oh!

Languages

Packages