Deploying LLM models is proven to be a challenging task in constrained environment, here is a collection of resources that can help. This is a continuous work in progress and both the structure and the content will be improved over time. Any PR or comment is more than welcome.
- ExLLAMAv2 - Framework to expore different quantization levels for LLM and their impact on the accuracy and the runtime.
- LQ-LoRA : Low-rank plus Quantized Matrix Decomposition for Efficient Language Model Finetuning
- vLLM - vLLM is a fast and easy-to-use library for LLM inference and serving.
- LangChain - is a framework for developing applications powered by language models (Python).
- DataScience-StudyMaterial - resources to study LLM & ML, with annotated papers.
- Hugging Face LLM Leaderboard - The HF Open LLM Leaderboard aims to track, rank and evaluate open LLMs and chatbots.
- Hugging Face LLM Perf Leaderboard - aims to benchmark the performance (latency, throughput, memory & energy) of Large Language Models (LLMs) with different hardwares, backends and optimizations.
- Intel Low-bit Quantized Open LLM Leaderboard - Quantization is a key technique for making LLMs more accessible and practical for a wide range of applications, especially where computational resources are a limiting factor. Use this benchmark to find the best LLM for low hardware.
- Can it run LLM? - A tool designed to comprehensively analyze the hardware requirements necessary for the execution of a specific LLM.
- Hugging Face LLMs - list of models from HF.
- GPT4All - a free-to-use, locally running, privacy-aware chatbot. No GPU or internet required.
- ExploreLLM - Explore and compare the LLMs that fit your needs.