-
Notifications
You must be signed in to change notification settings - Fork 650
Feature: Better support for LLM feedback and handling of LLM ensembles. #47
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature: Better support for LLM feedback and handling of LLM ensembles. #47
Conversation
merge from upstream
- config.py supports configuration of n-model ensembles for evolution, and, optionally a separate ensemble for evaluation; backwards compatible yaml format; settings can be made for all models in llm: or for a specific model in llm:models; new evaluator_system_message setting
- ensemble.py supports n model ensembles
- OpenAILLM supports individual parameter config per model
- ensemble.py has a new generate_all_with_context() function
- evaluator.py uses prompt sampler to generate llm feedback prompts
- templates.py contains default prompts for llm feedback
With the function_minimization example, set use_llm_feedback: true in its config.yaml.
The LLM feedback will provide output such as
`
{
"readability": 0.92,
"maintainability": 0.88,
"efficiency": 0.82,
"reasoning": "The code is quite readable, with clear function and variable names, concise comments, and a docstring explaining the purpose and arguments of the main search function. There is some minor room for improvement, such as splitting up large inner loops or extracting repeated logic, but overall it is easy to follow. Maintainability is high due to modularization and descriptive naming, but could be slightly improved by reducing the nesting level and possibly moving the annealing routine to its own top-level function. Efficiency is good for a simple global optimization approach; vectorized numpy operations are used where appropriate, and the population-based simulated annealing is a reasonable trade-off between exploration and exploitation. However, the algorithm could be further optimized (e.g., by fully vectorizing more of the walker updates or parallelizing restarts), and the approach is not the most efficient for high-dimensional or more complex landscapes."
}
`
The evolution can then consider the additional values:
`
Evolution complete!
Best program metrics:
runs_successfully: 1.0000
value_score: 0.9997
distance_score: 0.9991
overall_score: 0.9905
standard_deviation_score: 0.9992
speed_score: 0.0610
reliability_score: 1.0000
combined_score: 0.9525
success_rate: 1.0000
llm_readability: 0.0904
llm_maintainability: 0.0816
llm_efficiency: 0.0764
Note: I did not evaluate the results yet.
98f3b21 to
f84be60
Compare
|
This resolves issue #41 "use_llm_feedback doesn't work" |
|
I had to fix an urgent issue with formatting of floats that broke the examples. Can you please pull from main. I can take a look at this PR. I would need sometime to test the changes. |
| def __post_init__(self): | ||
| """Post-initialization to set up model configurations""" | ||
| # Handle backward compatibility for primary_model(_weight) and secondary_model(_weight). | ||
| if (self.primary_model or self.primary_model_weight) and len(self.models) < 1: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
primary_model and these parameters has a default value
so it will always hit this branch
the new way of configuring models is not working
|
Thanks for the review, @Weaverzhu . Can you confirm whether PR #56 fixes it? |
I think it will fix the problem. Hope this fix will apply soon. |
Better support for LLM feedback and handling of LLM ensembles with an arbitrary number of models.
With the function_minimization example, set use_llm_feedback: true in its config.yaml.
The LLM feedback will provide output such as
{ "readability": 0.92, "maintainability": 0.88, "efficiency": 0.82, "reasoning": "The code is quite readable, with clear function and variable names, concise comments, and a docstring explaining the purpose and arguments of the main search function. There is some minor room for improvement, such as splitting up large inner loops or extracting repeated logic, but overall it is easy to follow. Maintainability is high due to modularization and descriptive naming, but could be slightly improved by reducing the nesting level and possibly moving the annealing routine to its own top-level function. Efficiency is good for a simple global optimization approach; vectorized numpy operations are used where appropriate, and the population-based simulated annealing is a reasonable trade-off between exploration and exploitation. However, the algorithm could be further optimized (e.g., by fully vectorizing more of the walker updates or parallelizing restarts), and the approach is not the most efficient for high-dimensional or more complex landscapes." }The evolution can then consider the additional values:
Evolution complete! Best program metrics: runs_successfully: 1.0000 value_score: 0.9997 distance_score: 0.9991 overall_score: 0.9905 standard_deviation_score: 0.9992 speed_score: 0.0610 reliability_score: 1.0000 combined_score: 0.9525 success_rate: 1.0000 llm_readability: 0.0904 llm_maintainability: 0.0816 llm_efficiency: 0.0764Note: I did not evaluate the results yet.