Hi, thanks for OpenEvolve! It seems like there might be inconsistent use of "combined_score" and "safe_numeric_average" in Database.py. In general, it seems the intended behavior is to use the "combined_score" where possible, and default back to safe_numeric_average, e.g., with p.metrics.get("combined_score", safe_numeric_average(p.metrics)).
However, there appear to be several instances where the is no default to check the combined_score, leaving fitness comparisons inconsistent. For example, I believe this occurs here https://github.com/codelion/openevolve/blob/main/openevolve/database.py#L1285