-
Notifications
You must be signed in to change notification settings - Fork 76
Fix system prompt to prevent automatic markdown file creation #1288
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
- Changed TROUBLESHOOTING section to say 'Explain your reasoning process' instead of 'Document your reasoning process' - Made DOCUMENTATION section much more explicit about NOT creating markdown files - Added clear instruction: Do NOT create README.md, CHANGES.md, NOTES.md, or any other documentation files unless explicitly requested - Emphasized that explanations should ALWAYS be in conversation responses, not separate files Co-authored-by: openhands <openhands@all-hands.dev>
|
Evaluation Triggered
|
|
Just a thought about this PR, not a review. I’d love your thoughts 😅 I should note, for the record, that this is LLM-specific: it looks like a very Sonnet thing, it must have been trained to do it. Sorry, Xingyao, so I feel maybe you believe context engineering is not really taking us far 😅 and maybe you’ll be ultimately proven right, I just feel we don’t get the best experience if we apply a blanket prompt to all LLMs either. I’m pretty sure there is no need for this for Gemini 2.5 or GPT-5 all variants, or a number of older models like R1. The important part, it seems to me, is that this is the kind of thing that we mostly needed to adjust system prompts over the past year and a half, and they’re LLM-specific. The majority, I feel, as far as I recall, were LLM-specific: tweaking something that a particular LLM kept doing or not doing. (There is a reasonable argument to be made that the particular phrases in this PR maybe it don’t hurt other models, and that’s totally possible, but idk, it doesn’t seem completely obvious: for example, according to OpenAI docs, for Codex variants, adding instructions to talk to the user may hurt performance or make it stop/finish early. Idk, this is about talking to the user? “ALWAYS include explanations in your conversation responses”. It does seem like a similar topic with this.) TBH this is one reason why I think we may need to add dedicated prompts for a few SOTA families. I think it may prove easier for us,
|
One thing I'm worried about would be the fairness of evals, eg, there's an additional variable of "prompts" when we eval and compare different models, so when one model performs worse than the other, it is hard to tell if it is due to model capability or prompt optimization - this makes the OH evaluation number less trusted since it is no longer easily comparable. And especially now we don't have a systematic way to optimize system prompt for any model, this makes me worried about maintaining separate files - those files can easily be out of sync, fixes make to one model doesn't propagate to other models. IMO, we should only separate system prompts out when (1) we have enough manpower to track and maintain system prompts for different models, so we are sure each changes are properly evaluated, OR (2) we have an automated system that optimize system prompts based on a list of evaluation instances (real-world tasks, and we use llm-as-judge to monitor agent behavior. Also, on the other hand, system prompt describes the expected behavior of agents, which I think is valuable to keep across models (although a lot of times, we don't need these prompts for other models like GPT-5 / Codex) - and I'd be happy to revert those relavant part in this PR that may hurt GPT-5 performances. |
Not really, claude-code doesn't do that often, i suspect it is the "DOCUMENTATION BLOCK" we have in our system prompt. The other alternative would be, we remove the documentation block completely to simplify things down. It was there to inhibit some sonnet 4 behavior |
The DOCUMENTATION block is redundant since FILE_SYSTEM_GUIDELINES already contains the guidance about not creating documentation files. Removing this block simplifies the prompt while maintaining the same behavior. Co-authored-by: openhands <openhands@all-hands.dev>
|
Evaluation Triggered
|
🎉 Evaluation Job CompletedEvaluation Name: Results Summary
|
|
Actually, I re-run patch eval locally and it is giving 39/50, which is comparable and/or better than @ryanhoangt's number here (35/50) I think this PR is ready for review and merge |
hieptl
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you! 🙏
Problem
The agent was frequently creating markdown files (README.md, CHANGES.md, NOTES.md, etc.) during its work to document changes for users, even when not explicitly requested. This created unnecessary files that needed to be cleaned up.
Root Cause
The system prompt had ambiguous language that could be interpreted as encouraging documentation file creation:
Solution
HUMAN: I removed the documentation section completely since it seems not necessary
Impact
The default behavior is now that the agent should NOT write markdown files at all unless the user explicitly requests them. All explanations should be provided in conversation responses instead.
Testing
No code changes - only prompt modifications. Testing will be done through agent interactions to verify markdown files are no longer created automatically.
Agent Server images for this PR
• GHCR package: https://github.com/OpenHands/agent-sdk/pkgs/container/agent-server
Variants & Base Images
eclipse-temurin:17-jdknikolaik/python-nodejs:python3.12-nodejs22golang:1.21-bookwormPull (multi-arch manifest)
# Each variant is a multi-arch manifest supporting both amd64 and arm64 docker pull ghcr.io/openhands/agent-server:21a4a2b-pythonRun
All tags pushed for this build
About Multi-Architecture Support
21a4a2b-python) is a multi-arch manifest supporting both amd64 and arm6421a4a2b-python-amd64) are also available if needed