Thank you for sharing this incredible framework.
I’ve used many of these approaches, but I later realized that while they work great with OpenAI/Anthropic models, smaller models often hit context limit errors even with 128k context windows.
Recently, I came across some datasets like Mind2Web and ScribeAgent:
Interestingly, they achieved results with 7B models that were competitive with flagship ones.
Since we’re in control of this browser, my goal is to explore whether we can implement a LogTrail or LogTrace system that allows the browser to export a few samples after, say, 50–100 iterations. This could help in creating highly specialized models for example, a model dedicated to flight booking.
I’d be happy to help contribute if someone could point me toward where this functionality should be integrated and what areas of the code would need modification.