This repository contains benchmark data and documentation for evaluating the inference speeds of various large language models (LLMs) on different GPUs.
Massed Compute leverages cutting-edge technology to offer scalable and efficient distributed computing solutions. We provide flexible computing power for AI research, visual effects production, data science, and more. Our goal is to empower organizations with the tools they need to maximize their computational capabilities.
For more information, visit Massed Compute.
This repository covers benchmarking LLM inference speeds on different GPUs, including:
- Llama 3
- Qwen
- Mixtral
- Magnum
- Other popular models
Each benchmark includes:
- Model Description: Overview of the model being tested.
- Hardware Specifications: Details about the GPUs used.
- Benchmark Results: Inference speed and performance metrics.
