Open
Description
We can leverage the LLM to determine what rules are checkable with code and then generate code to apply them to the test cases. Since there are probably a limited number of possible code checkers, we could have a set of pre-defined checkers that the LLM would just have to choose between (possibly).
We talked about this feature but didn't have time to add it.
Checkers can be generated both as input filters (rejecting inputs that don't meet the requirements) and as output filters.