Feature: Better support for LLM feedback and handling of LLM ensembles. #47

jvm123 · 2025-06-02T21:52:08Z

Better support for LLM feedback and handling of LLM ensembles with an arbitrary number of models.

config.py supports configuration of n-model ensembles for evolution, and, optionally a separate ensemble for evaluation; backwards compatible yaml format; settings can be made for all models in llm: or for a specific model in llm:models; new evaluator_system_message setting
ensemble.py supports n-model ensembles
OpenAILLM supports individual parameter config per model
ensemble.py has a new generate_all_with_context() function
evaluator.py uses prompt sampler to generate llm feedback prompts
templates.py contains default prompts for llm feedback
New unit test confirms that all config.yaml files in the project load without causing an exception

With the function_minimization example, set use_llm_feedback: true in its config.yaml.
The LLM feedback will provide output such as
{ "readability": 0.92, "maintainability": 0.88, "efficiency": 0.82, "reasoning": "The code is quite readable, with clear function and variable names, concise comments, and a docstring explaining the purpose and arguments of the main search function. There is some minor room for improvement, such as splitting up large inner loops or extracting repeated logic, but overall it is easy to follow. Maintainability is high due to modularization and descriptive naming, but could be slightly improved by reducing the nesting level and possibly moving the annealing routine to its own top-level function. Efficiency is good for a simple global optimization approach; vectorized numpy operations are used where appropriate, and the population-based simulated annealing is a reasonable trade-off between exploration and exploitation. However, the algorithm could be further optimized (e.g., by fully vectorizing more of the walker updates or parallelizing restarts), and the approach is not the most efficient for high-dimensional or more complex landscapes." }
The evolution can then consider the additional values:
Evolution complete! Best program metrics: runs_successfully: 1.0000 value_score: 0.9997 distance_score: 0.9991 overall_score: 0.9905 standard_deviation_score: 0.9992 speed_score: 0.0610 reliability_score: 1.0000 combined_score: 0.9525 success_rate: 1.0000 llm_readability: 0.0904 llm_maintainability: 0.0816 llm_efficiency: 0.0764

Note: I did not evaluate the results yet.

merge from upstream

CLAassistant · 2025-06-02T21:52:14Z

All committers have signed the CLA.

- config.py supports configuration of n-model ensembles for evolution, and, optionally a separate ensemble for evaluation; backwards compatible yaml format; settings can be made for all models in llm: or for a specific model in llm:models; new evaluator_system_message setting - ensemble.py supports n model ensembles - OpenAILLM supports individual parameter config per model - ensemble.py has a new generate_all_with_context() function - evaluator.py uses prompt sampler to generate llm feedback prompts - templates.py contains default prompts for llm feedback With the function_minimization example, set use_llm_feedback: true in its config.yaml. The LLM feedback will provide output such as ` { "readability": 0.92, "maintainability": 0.88, "efficiency": 0.82, "reasoning": "The code is quite readable, with clear function and variable names, concise comments, and a docstring explaining the purpose and arguments of the main search function. There is some minor room for improvement, such as splitting up large inner loops or extracting repeated logic, but overall it is easy to follow. Maintainability is high due to modularization and descriptive naming, but could be slightly improved by reducing the nesting level and possibly moving the annealing routine to its own top-level function. Efficiency is good for a simple global optimization approach; vectorized numpy operations are used where appropriate, and the population-based simulated annealing is a reasonable trade-off between exploration and exploitation. However, the algorithm could be further optimized (e.g., by fully vectorizing more of the walker updates or parallelizing restarts), and the approach is not the most efficient for high-dimensional or more complex landscapes." } ` The evolution can then consider the additional values: ` Evolution complete! Best program metrics: runs_successfully: 1.0000 value_score: 0.9997 distance_score: 0.9991 overall_score: 0.9905 standard_deviation_score: 0.9992 speed_score: 0.0610 reliability_score: 1.0000 combined_score: 0.9525 success_rate: 1.0000 llm_readability: 0.0904 llm_maintainability: 0.0816 llm_efficiency: 0.0764 Note: I did not evaluate the results yet.

jvm123 · 2025-06-02T22:03:06Z

This resolves issue #41 "use_llm_feedback doesn't work"

codelion · 2025-06-04T04:38:25Z

I had to fix an urgent issue with formatting of floats that broke the examples. Can you please pull from main. I can take a look at this PR. I would need sometime to test the changes.

Weaverzhu · 2025-06-06T14:56:30Z

openevolve/config.py

+    def __post_init__(self):
+        """Post-initialization to set up model configurations"""
+        # Handle backward compatibility for primary_model(_weight) and secondary_model(_weight).
+        if (self.primary_model or self.primary_model_weight) and len(self.models) < 1:


primary_model and these parameters has a default value

so it will always hit this branch

the new way of configuring models is not working

jvm123 · 2025-06-06T15:01:11Z

Thanks for the review, @Weaverzhu . Can you confirm whether PR #56 fixes it?

Weaverzhu · 2025-06-06T15:06:27Z

Thanks for the review, @Weaverzhu . Can you confirm whether PR #56 fixes it?

I think it will fix the problem. Hope this fix will apply soon.

jvm123 added 2 commits June 2, 2025 01:41

Merge pull request #3 from codelion/main

19a9f99

merge from upstream

Merge branch 'codelion:main' into main

aa27227

jvm123 force-pushed the feat-n-model-ensemble branch from 98f3b21 to f84be60 Compare June 2, 2025 21:58

lint

16dce11

jvm123 mentioned this pull request Jun 3, 2025

Feature: Benchmarks with EleutherAI lm-evaluation-harness #49

Merged

Merge branch 'main' into feat-n-model-ensemble

659e128

codelion merged commit 3fc9465 into algorithmicsuperintelligence:main Jun 4, 2025
3 checks passed

nerdsane mentioned this pull request Jun 4, 2025

Feature: Artifact side channel #52

Merged

jvm123 deleted the feat-n-model-ensemble branch June 5, 2025 02:34

Weaverzhu reviewed Jun 6, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Feature: Better support for LLM feedback and handling of LLM ensembles. #47

Feature: Better support for LLM feedback and handling of LLM ensembles. #47

Uh oh!

jvm123 commented Jun 2, 2025 •

edited

Loading

Uh oh!

CLAassistant commented Jun 2, 2025 •

edited

Loading

Uh oh!

jvm123 commented Jun 2, 2025

Uh oh!

codelion commented Jun 4, 2025

Uh oh!

Uh oh!

Weaverzhu Jun 6, 2025

Uh oh!

jvm123 commented Jun 6, 2025 •

edited

Loading

Uh oh!

Weaverzhu commented Jun 6, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Feature: Better support for LLM feedback and handling of LLM ensembles. #47

Feature: Better support for LLM feedback and handling of LLM ensembles. #47

Uh oh!

Conversation

jvm123 commented Jun 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

CLAassistant commented Jun 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jvm123 commented Jun 2, 2025

Uh oh!

codelion commented Jun 4, 2025

Uh oh!

Uh oh!

Weaverzhu Jun 6, 2025

Choose a reason for hiding this comment

Uh oh!

jvm123 commented Jun 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Weaverzhu commented Jun 6, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

jvm123 commented Jun 2, 2025 •

edited

Loading

CLAassistant commented Jun 2, 2025 •

edited

Loading

jvm123 commented Jun 6, 2025 •

edited

Loading