add this to the main repo here
add a variety of various autonomous sandbox builders that build software, these will function as benchmarks
one builds using ObjectiveAI judgment, the other doesn't use ObjectiveAI judgment
will need to collect detailed metrics
this will be something you execute and it runs to completion. completion is whenever the agents have decided the software is completed
this feature is high priority, must resolve ASAP after blockers are out of the way