GreenCache is a carbon-aware cache management framework that dynamically derives resource allocation plans for LLM serving.
It includes:
- Dataset preprocessing and chat-history generation from ShareGPT dataset.
- Cache simulation tooling to build cache lists and request slices.
- A multi-round QA workload driver (LMCache + vLLM).
- Automation scripts for running parameter sweeps and collecting power metrics.
dataset/: scripts for preprocessing ShareGPT data and generating chat-history pickles.src/70BMulti/: cache simulation + workload driver + automation scripts.
- Python 3.10+.
- LMCache + vLLM.
- Place the ShareGPT V3 JSON at:
/dataset/ShareGPT_V3_unfiltered_cleaned_split.json
- Preprocess and add token lengths:
python dataset/dataset_preprocessing.py --parse 1
- Generate chat histories and a request sequence:
python dataset/dataset_creation.py
src/70BMulti/70BMulti_automation.sh sweeps cache sizes and lambdas, starts LMCache, runs workloads, and records power.
Run src/70BMulti/70BMulti_automation.sh to start the end-to-end automation.
Warning: these scripts are destructive and environment-specific. Review paths and commands before running.
If you find GreenCache useful in your research or project, please consider citing our paper:
@misc{tian2026cachepromptitsgreen,
title={Cache Your Prompt When It's Green: Carbon-Aware Caching for Large Language Model Serving},
author={Yuyang Tian and Desen Sun and Yi Ding and Sihang Liu},
year={2026},
eprint={2505.23970},
archivePrefix={arXiv},
primaryClass={cs.DC},
url={https://arxiv.org/abs/2505.23970}
}You can find the full paper on arXiv: Cache Your Prompt When It's Green: Carbon-Aware Caching for Large Language Model Serving