An interactive educational tool for visualizing and analyzing memory access patterns in General Matrix Multiply (GEMM) operations. Compare different loop orderings, understand cache behavior, and explore the impact of blocking/tiling optimizations.
- Real-time animation of memory access patterns
- Multiple loop orderings: ijk, ikj, jik, jki, kij, kji
- Blocking comparison: blocked vs unblocked implementations
- Play/pause/step controls for detailed analysis
- Live memory access animation showing A, B, C matrix accesses
- Cache hit rate tracking with real-time performance graphs
- Access frequency heatmaps for pattern analysis
- Detailed statistics on cache performance
- Generate MP4 animations for presentations
- Cross-platform video encoding support
- Customizable animation speed and quality
# Clone the repository
git clone https://github.com/bugparty/gemm_visualizations.git
cd gemm_visualizations
# Install dependencies
pip install -r requirements.txtpython interactive_viz.pyThen open your browser to: http://127.0.0.1:8050
python gen.pyThis will generate MP4 animations for different loop orderings.
gemm_visualizations/
โโโ interactive_viz.py # ๐ฎ Main Dash web application
โโโ gemm_simulator.py # ๐งฎ Core GEMM simulation engine
โโโ cache_simulator.py # ๐พ Cache behavior simulator
โโโ gen.py # ๐ฅ Video generation script (legacy)
โโโ requirements.txt # ๐ฆ Python dependencies
โโโ README.md # ๐ This file
โโโ drawGemmForLoop.ipynb # ๐ Jupyter notebook examples
โโโ cacheSimu.ipynb # ๐ Cache simulation examples
Matrix multiplication: C[i][j] += A[i][k] * B[k][j]
Different loop orderings affect memory access patterns:
| Loop Order | Description | Cache Behavior |
|---|---|---|
| IJK | Row-Column-Depth | Good for C, poor for B |
| IKJ | Row-Depth-Column | Good spatial locality |
| JIK | Column-Row-Depth | Poor for row-major matrices |
| JKI | Column-Depth-Row | Poor spatial locality |
| KIJ | Depth-Row-Column | Moderate performance |
| KJI | Depth-Column-Row | Good for blocked algorithms |
- Matrix Size: 4ร4 to 32ร32
- Block Size: 2 to 16 (for blocked algorithms)
- Loop Order: Select from 6 different orderings
- Blocking: Toggle between blocked/unblocked
- โถ Play: Start automatic animation
- โธ Pause: Pause animation
- ๐ Reset: Reset to frame 0
- Speed Slider: Control animation speed (1-100 fps)
- Frame Slider: Jump to specific frame
- > 90%: Excellent cache utilization
- 70-90%: Good performance
- < 70%: Poor cache behavior, consider optimization
- Bright spots: Frequently accessed elements
- Uniform color: Good spatial locality
- Scattered patterns: Cache-unfriendly access
- Set matrix size to 16ร16, block size to 4
- Select "KJI" loop order, set "Blocked" to ON
- Note the cache hit rate
- Switch "Blocked" to OFF
- Compare performance!
For blocked algorithms (16ร16, block=4):
- Best: KJI or IKJ (~85%+ hit rate)
- Worst: JKI or JIK (~60-70% hit rate)
This tool is designed for:
- Computer Architecture courses: Teaching cache hierarchies
- Performance optimization: Understanding memory access patterns
- Algorithm analysis: Comparing loop transformations
- Research: Experimenting with blocking strategies
Edit cache_simulator.py:
cache = CacheSimulator(
cache_size=32768, # 32KB L1 cache
line_size=64, # 64-byte cache lines
associativity=8, # 8-way set associative
element_size=8 # 8 bytes per double
)Edit gemm_simulator.py to add custom loop transformations.
Modify interactive_viz.py to add CSV/JSON export functionality.
# Change port in interactive_viz.py
app.run_server(debug=True, host='0.0.0.0', port=8051)# Install FFmpeg
# Ubuntu/Debian:
sudo apt-get install ffmpeg
# macOS:
brew install ffmpeg
# Windows:
# Download from https://ffmpeg.org/# Ensure all dependencies are installed
pip install -r requirements.txt --upgrade- Row-major order: C[i][j] stored at
base + (i*n + j)*sizeof(element) - Cache line: 64 bytes (8 doubles)
- Spatial locality: Sequential elements benefit from cache prefetch
- LRU replacement: Least Recently Used eviction policy
- Set-associative: Configurable N-way associativity
- Address mapping: Tag-Index-Offset breakdown
Contributions welcome! Areas for improvement:
- Additional cache replacement policies (FIFO, Random)
- 3D visualization of memory access
- Performance prediction models
- Multi-level cache hierarchy
- Non-square matrix support
MIT License - feel free to use for educational purposes!
Based on classical GEMM optimization techniques from:
- Computer Architecture: A Quantitative Approach (Hennessy & Patterson)
- Optimizing Matrix Multiply (Goto & Geijn)
For questions or feedback, please open an issue on GitHub.
Happy Learning! ๐