The ASPLOS 2025 / EuroSys 2025 Contest on an Optimized Neuron Kernel Interface (NKI) Implementation of Llama 3.2 1B (Inference)
Teams from around the globe are invited to contribute submissions toward producing the fastest implementation of the Llama 3.2 1B model (inference only), written using the Neuron Kernel Interface (NKI) programming interface and running on Amazon ML hardware (Trainium/Inferentia). Prizes will be awarded to top-ranking teams who commit to open-sourcing their solutions prior to next year's conference.
| Rank | Prize | Team | Links |
|---|---|---|---|
| First Place 🥇 | $25,000 | Shiwei Gao + Ruwen Fan + Shaoxun Zeng + Haodi Jiang + Huajun Bai + Yitian Yang + Hao Guo + Qing Wang + Jiwu Shu + Youyou Lu (Tsinghua University) | [code] |
| Second Place 🥈 | $10,000 | Dong Li + Dinghong Song + Jierui Xu (University of California Merced / University of Wisconsin Madison) | |
| Third Place 🥉 | $5,000 | Dongkwan Kim + Chan Lee + Seonyoung Cheon + Kunmo Jeong + Hoyun Youm + Sungwoo Yun + Yongwoo Lee (Yonsei University) |
| Date | Event |
|---|---|
| Contest Announced | |
| Contest GitHub Repository & Benchmark Subset Released | |
| Application Deadline for Student Travel Grants | |
| Contest Registrations & Preliminary Submissions Due* (deadline extended!) | |
| Early Registration Deadline for ASPLOS 2025 / EuroSys 2025 | |
| Contest Final Submissions Due* (EXTENDED) | |
| Special Session Schedule Finalized | |
| Contest Special Session during ASPLOS 2025 / EuroSys 2025 Workshops | |
| Contest Winners Announced during ASPLOS 2025 / EuroSys 2025 Conference |
*Submissions are due by 11:59pm at any time on Earth.
Amazon Web Services has two family of machine learning chips, called Trainium and Inferentia. AWS Neuron SDK is an SDK with a compiler and profiling tools for programming these devices using high-level libraries like PyTorch. AWS recently released a new programming interface called Neuron Kernel Interface (NKI) that gives programmers down-to-the-metal access to Trainium/Inferentia hardware features, potentially unlocking even greater performance opportunities.
For this contest, teams will submit code that leverages NKI to implement the Llama3.2 1B model, targeting a single Trainium1 (trn1) chip.
Please consult the following GitHub repository for full contest details: https://github.com/aws-samples/nki-llama.
- Emery Berger (Amazon Web Services), emerydb@amazon.com
- Aninda Manocha (Amazon Web Services)
- Wei Tang (Amazon Web Services)
- Emily Webber (Amazon Web Services)
- Ziyang Xu (Amazon Web Services)
Amazon Web Services
