This Project was initially done as part of a class project. The work is under review so the code is not provided.
This project extends the CacheForge LLM-in-the-loop framework into a full agentic discovery system capable of automatically generating, evaluating, and refining last-level cache (LLC) replacement policies. Our goal was to outperform state-of-the-art baseline policies across SPEC CPU 2006 workloads—specifically targeting improvements in Instructions Per Cycle (IPC) under strict metadata constraints.
A persistent, structured reasoning graph that stores:
- Candidate policies
- Lineage relationships
- Performance metrics
- Design patterns
GoT enables long-range reasoning across iterations, avoids repeated mistakes, and guides the LLM toward productive regions of the design space.
A lightweight, feature-based predictor (Ridge Regression) that evaluates candidate policies without costly ChampSim simulations.
This model:
- Filters poor candidates early
- Reduces the number of full simulations by up to 10×
We trained the model with over 2000+ samples.
A dynamic mixture of:
- Lineage diversity (exploration)
- Performance-guided refinement (exploitation)
- Novelty injection (breakthrough discovery)
This structure ensures both breadth and depth as the system evolves policies over time. The strategy also used our DAG and our discovery log path to reason for future iterations.
An automatic C++-to-JSON summarizer that interprets cache replacement policies and generates structured, human-readable representations.
The system outputs:
- Discovery database (
GoT.db) - Lineage graphs
- All generated policies
- Scored surrogate predictions
- ChampSim results for every iteration
Note: Different experiments, files, and scripts vary across branches.
See our final report for more details.
You may execute the evolutionary search loop through the HPC or locally.
Inside run_loop/, submit:
bsub < reproduce.shor locally:
python3 run_loop.py