Skip to content

Findings + documentation updates from agent building process (Sortlist use case) #339

@anyacherniss

Description

@anyacherniss

Goal:
Try building https://workflowai.com/sortlist/agents/briefing-custom-questions-suggestion/1/ in AnotherAI, but as if the agent didn't exist already. That is: I tried to explain the agent I wanted builts as best I could without anticipating the issues that the existing workflowai prompt has already been tweaked to handle.

I did use some of the existing inputs/outputs as a reference for gauging quality, because it was proved a bit challenging for me to gauge quality entirely on my own at first (it got easier the more prompts/inputs I tested)

Some findings on the experience

  • I had to iterate on the prompt a lot more than I expected in order to get the model to ask questions that were not repetitive and/or overly specific (my understanding of these questions is that they're meant to be high level)
    • To that end, I found the process took up a lot of time because I had to wait for a new experiment to be created. I missed the quick iterations that the web app playground enabled.
  • Annotations were a fantastic feature to use for this early stage of agent development
    • Struggled with Claude (Opus 4.1) adjusting the prompt to address the annotations too specifically, which then caused all responses for all inputs to be nearly identical. I don't consider this in the scope of AnotherAI, but it was annoying

Proposed next steps/adjustments to documentation:

  • Better emphasis the prompt iteration flow in the agent building use case)
    • flow being: describe prompt -> annotate runs -> request updates be made -> check new experiment -> repeat as needed
  • Clarify the goal of finding the right prompt first, then testing models to optimize
    • this was something we had in our old documentation, I think. I noticed myself doing it by default in this process, but I think it would be helpful for new users if we say it explicitly.
  • Add a note that Claude Code likes to apply annotation feedback on prompts very specifically and that the user might need to coach/remind it to abstract the themes of the feedback and not make the prompt overly specific to just a few cases.

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions