Skip to content

Deploying LLM models is proven to be a challenging task in constrained environment, here is a collection of resources that can help

License

Notifications You must be signed in to change notification settings

e-dupuis/awesome-llm-deployment

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 

Repository files navigation

awesome-llm-deployment

Deploying LLM models is proven to be a challenging task in constrained environment, here is a collection of resources that can help. This is a continuous work in progress and both the structure and the content will be improved over time. Any PR or comment is more than welcome.

Inference Optimisation Techniques

  • ExLLAMAv2 - Framework to expore different quantization levels for LLM and their impact on the accuracy and the runtime.

Fine-Tuning Optimization Techniques

  • LQ-LoRA : Low-rank plus Quantized Matrix Decomposition for Efficient Language Model Finetuning

Model Serving/ Inference Engine

  • vLLM - vLLM is a fast and easy-to-use library for LLM inference and serving.
  • LangChain - is a framework for developing applications powered by language models (Python).

Learning Material

Benchmark

  • Hugging Face LLM Leaderboard - The HF Open LLM Leaderboard aims to track, rank and evaluate open LLMs and chatbots.
  • Hugging Face LLM Perf Leaderboard - aims to benchmark the performance (latency, throughput, memory & energy) of Large Language Models (LLMs) with different hardwares, backends and optimizations.
  • Intel Low-bit Quantized Open LLM Leaderboard - Quantization is a key technique for making LLMs more accessible and practical for a wide range of applications, especially where computational resources are a limiting factor. Use this benchmark to find the best LLM for low hardware.
  • Can it run LLM? - A tool designed to comprehensively analyze the hardware requirements necessary for the execution of a specific LLM.

Open Models

  • Hugging Face LLMs - list of models from HF.
  • GPT4All - a free-to-use, locally running, privacy-aware chatbot. No GPU or internet required.
  • ExploreLLM - Explore and compare the LLMs that fit your needs.

About

Deploying LLM models is proven to be a challenging task in constrained environment, here is a collection of resources that can help

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published