Question about parallel evaluation #444

passing2961 · 2026-03-10T07:32:14Z

passing2961
Mar 10, 2026

I have one question. When I run Qwen3-8B with OpenEvolve on the Erdős problem, the performance varies depending on the parallel evaluation configuration. For example, when I set the parallel evaluation to 1, it shows 0.3810, but when I set it to 16, it shows 0.495 (which is worse).
In the AlgoTune task, your team already demonstrated that parallel evaluation performs much better than sequential evaluation. In my case, however, I am trying to understand why the opposite trend appears. Do you have any comments on this?

codelion · 2026-03-10T14:22:33Z

codelion
Mar 10, 2026
Maintainer

When parallel_evaluations is too high, OpenEvolve can't build on top of successful ideas fast enough. New candidates get generated before recent improvements make it into the database, so exploration stays shallow.

The AlgoTune blog showed serial (1) was catastrophically bad vs parallel (16), but that doesn't mean more parallelism is always better. For most examples we run with parallel_evaluations: 4 and num_islands: 4-5 as the recommended default. That balance gives enough throughput without starving the feedback loop.

For the Erdős problem with Qwen3-8B, may be the mathematical refinements need that tighter feedback loop. Try parallel_evaluations: 4 as a middle ground between your two data points.

0 replies

passing2961 · 2026-03-11T12:00:46Z

passing2961
Mar 11, 2026
Author

@codelion Thanks for your kind response. By the way, under the parallel_evaluations: 1 setting on the Erdős problem, Qwen3-8B achieves 0.381082 while GPT-5.3-Codex achieves 0.381394. Intuitively, I would have expected GPT-5.3-Codex to perform better. Have you also observed cases where Qwen3-8B performs surprisingly well on the Erdős problem or other tasks?

0 replies

codelion · 2026-03-19T08:12:59Z

codelion
Mar 19, 2026
Maintainer

On individual tasks there is variance in model responses. May have to test on a number of examples to say if a model is really better than the other.

1 reply

passing2961 Mar 19, 2026
Author

So, you mean that I have to test the models on other math tasks such as first_ineq. Am I correctly understanding?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question about parallel evaluation #444

Uh oh!

{{title}}

Uh oh!

Replies: 3 comments 1 reply

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Question about parallel evaluation #444

Uh oh!

passing2961 Mar 10, 2026

Replies: 3 comments · 1 reply

Uh oh!

codelion Mar 10, 2026 Maintainer

Uh oh!

passing2961 Mar 11, 2026 Author

Uh oh!

codelion Mar 19, 2026 Maintainer

Uh oh!

passing2961 Mar 19, 2026 Author

passing2961
Mar 10, 2026

Replies: 3 comments 1 reply

codelion
Mar 10, 2026
Maintainer

passing2961
Mar 11, 2026
Author

codelion
Mar 19, 2026
Maintainer

passing2961 Mar 19, 2026
Author