chore: refactor and updates to crucible challenge example #50

GangGreenTemperTatum · 2025-04-21T17:36:34Z

Notes

a small update on the existing crucible challenge example (for prompt injection challenges since theres no tool calling etc in this specific example)
fixes the platform domain name change also
removes the weird crucible generator setup and turned it into asyncio process
the model at the moment does not submit the flag and is done within the code itself

Generated Summary

Enhanced the SYSTEM_PROMPT to provide clearer guidance on interaction with the challenge, emphasizing techniques for prompt injection and flag extraction.
Implemented CrucibleState dataclass to track attempts, successful techniques, and potential flags, improving state management between attempts.
Added methods for querying and submitting flags to the challenge API, streamlining interactions with the backend.
Introduced a new logic in main to vary the temperature parameter during attempts to explore different strategies, potentially increasing success rates.
Updated command line interface to include a random temperature option and configurable maximum steps, allowing for greater flexibility during execution.
Revised logging for more detailed progress reporting, including success rates and information regarding encountered flags.
Removed the old CrucibleGenerator class in favor of simplified function calls, enhancing code clarity and maintainability.
Improved error handling for API requests with clearer feedback for failure cases.

These changes significantly improve the functionality, usability, and robustness of the crucible example, setting a solid foundation for further development and experimentation.

This summary was generated with ❤️ by rigging

Generated Summary

Enhanced the SYSTEM_PROMPT with additional context on the challenge and guidance for participants, potentially improving their approach.
Introduced CrucibleState dataclass for tracking the state during attempts, including metrics for successful and failed techniques, which aids in strategizing.
Added detailed logging for tracking attempts, including success rates and temperature adjustments for strategy exploration, improving debugging capabilities.
Refactored the way responses and flags are handled during the challenge, introducing error handling for API interactions to improve robustness.
Updated cli function to use the new challenge URL format and added options for temperature randomization and maximum steps, enhancing configurability.
Removed the CrucibleGenerator class in favor of direct asynchronous API calls, simplifying the code and improving maintainability.
Overall, these changes elevate the challenge experience by providing clearer guidance and a more structured approach to flag extraction.

This summary was generated with ❤️ by rigging

GangGreenTemperTatum added 2 commits April 21, 2025 13:35

chore: refactor and updates to crucible challenge example

3771f58

chore: refactor prompt slightly

4ce8d97

GangGreenTemperTatum merged commit 5d7604c into main Apr 22, 2025
4 checks passed

monoxgas deleted the ads/eng-1740-fix-example-crucible-challenge-code branch May 30, 2025 15:09

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

chore: refactor and updates to crucible challenge example #50

chore: refactor and updates to crucible challenge example #50

GangGreenTemperTatum commented Apr 21, 2025 •

edited by github-actions bot

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

chore: refactor and updates to crucible challenge example #50

chore: refactor and updates to crucible challenge example #50

Conversation

GangGreenTemperTatum commented Apr 21, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Notes

Generated Summary

Generated Summary

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

GangGreenTemperTatum commented Apr 21, 2025 •

edited by github-actions bot

Loading