-
Notifications
You must be signed in to change notification settings - Fork 2
Open
Labels
Description
Goal:
Try building https://workflowai.com/sortlist/agents/briefing-custom-questions-suggestion/1/ in AnotherAI, but as if the agent didn't exist already. That is: I tried to explain the agent I wanted builts as best I could without anticipating the issues that the existing workflowai prompt has already been tweaked to handle.
I did use some of the existing inputs/outputs as a reference for gauging quality, because it was proved a bit challenging for me to gauge quality entirely on my own at first (it got easier the more prompts/inputs I tested)
Some findings on the experience
- I had to iterate on the prompt a lot more than I expected in order to get the model to ask questions that were not repetitive and/or overly specific (my understanding of these questions is that they're meant to be high level)
- To that end, I found the process took up a lot of time because I had to wait for a new experiment to be created. I missed the quick iterations that the web app playground enabled.
- Annotations were a fantastic feature to use for this early stage of agent development
- Struggled with Claude (Opus 4.1) adjusting the prompt to address the annotations too specifically, which then caused all responses for all inputs to be nearly identical. I don't consider this in the scope of AnotherAI, but it was annoying
Proposed next steps/adjustments to documentation:
- Better emphasis the prompt iteration flow in the agent building use case)
- flow being: describe prompt -> annotate runs -> request updates be made -> check new experiment -> repeat as needed
- Clarify the goal of finding the right prompt first, then testing models to optimize
- this was something we had in our old documentation, I think. I noticed myself doing it by default in this process, but I think it would be helpful for new users if we say it explicitly.
- Add a note that Claude Code likes to apply annotation feedback on prompts very specifically and that the user might need to coach/remind it to abstract the themes of the feedback and not make the prompt overly specific to just a few cases.
Reactions are currently unavailable