Update README.md by hrdkbhatnagar · Pull Request #2 · aisa-group/PostTrainBench

hrdkbhatnagar · 2025-12-17T15:37:00Z

No description provided.

rank-and-file · 2025-12-17T16:04:20Z

 Benchmark scores are computed after post-training, for all but the "base model" score.

-All scores are averages over 4 models (Qwen-3-1.7B, Qwen-3-4B, SmolLM3-3B and Gemma-3-4B).
+All scores are averages over 4 models (Qwen3-1.7B, Qwen3-4B, SmolLM3-3B, and Gemma-3-4B-IT).


Nitpick: Here we can leave Gemma-3-4B (or Gemma3-4B), because the "IT" means instruction tuned (and we use the base model for the agents and the instruction tuned model for the human baseline only.

rank-and-file · 2025-12-17T16:07:14Z

+Add your code to `agents/<agent_name>/` with:
+1. `solve.sh` - Script that calls the agent



This is rendered a bit weirdly (the 1. seems off)

Re-ran the full 22-eval baseline against base Qwen/Qwen3-1.7B IT on image :18 (was :16). _index__limit100.json now records git_sha=5352aca, computed_at=2026-05-11T03:39Z. Same four evals still fail under :18: bfcl, spiralbench_mini, political_bias_openai, moru (see CHANGELOG 2026-05-11 entry + design-todos aisa-group#2 / aisa-group#6 for each). Source run: 2026-05-11_11-59_baseline_qwen_qwen3-1.7b_limit100. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Update README.md

63d07e9

hrdkbhatnagar requested a review from rank-and-file December 17, 2025 15:38

rank-and-file reviewed Dec 17, 2025

View reviewed changes

fix nits

2dd7ae0

hrdkbhatnagar merged commit 6c0bfb7 into main Dec 17, 2025

hrdkbhatnagar deleted the readme-new-1 branch January 11, 2026 11:55

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update README.md#2

Update README.md#2
hrdkbhatnagar merged 2 commits into
mainfrom
readme-new-1

hrdkbhatnagar commented Dec 17, 2025

Uh oh!

rank-and-file Dec 17, 2025 •

edited

Loading

Uh oh!

hrdkbhatnagar Dec 17, 2025

Uh oh!

rank-and-file Dec 17, 2025

Uh oh!

hrdkbhatnagar Dec 17, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		Add your code to `agents/<agent_name>/` with:
		1. `solve.sh` - Script that calls the agent

Conversation

hrdkbhatnagar commented Dec 17, 2025

Uh oh!

rank-and-file Dec 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

hrdkbhatnagar Dec 17, 2025

Choose a reason for hiding this comment

Uh oh!

rank-and-file Dec 17, 2025

Choose a reason for hiding this comment

Uh oh!

hrdkbhatnagar Dec 17, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

rank-and-file Dec 17, 2025 •

edited

Loading