## Further research

- Grun, Peter, Nikil Dutt, and Alex Nicolau. "APEX: access pattern based memory architecture exploration." *Proceedings of the 14th international symposium on Systems synthesis*. 2001.
- The same group's work, not only used cache, but also integrated other custom memory module like stream buffer so that they can match with more access pattern.
- Ono, Takatsugu, Koji Inoue, and Kazuaki Murakami. "Adaptive cache-line size management on 3D integrated microprocessors." 2009 International SoC Design Conference (ISOCC). IEEE, 2009.
- Their L1 data cache can dynamically change cache line size with little overhead.

## Summary Q&A

## Take-home message

- What's the problem?
  - Since higher BW means higher power, can we achieve same performance with smaller BW?
- How did we solve it?
  - Using "spatial cache+temporal cache" to reduce redundant data traffic from memory
  - The Mr.tall and Mr.overweight analogy
- What's the result?
  - Average 30% energy saving, reserve the same hit rate(performance)
  - A little bit overhead from the extra cache ctrl logic