Large Language Models (LLMs)

The world of Large Language Models (LLMs) is complex and varied. This resource collates together the things that matter, helping to make sense of this increasing important topic.

LLM CHAT

Everyone knows ChatGPT, but do you know these others?

CUSTOM GPTs

OpenAI's custom GPTs are on fire - checkout what people are developing:

RESEARCH PAPERS

A selection of interesting & noteworthy research papers related to LLMs.

EDUCATION

Get skilled up with these free and paid-for courses.

BENCHMARKS

These various benchmarks are commonly used to compare LLM performance.

LEADERBOARDS

These leaderboards show how LLMs compare relative to each other.

GEN-AI FOR DEVELOPERS

Coding assistants and the like can have a major positive impact on development productivity. There's now a burgeoning market of such tools with integration into popular IDEs.

INFERENCING FRAMEWORKS

If you want to host an LLM yourself, you're going to need one of these frameworks.

GPT4V ALTERNATIVES

Turn images into text just like GPT-4V with these models.

CLOUD GPUs

Training and inferencing your own model needs GPUs. You can get these on any cloud provider, but there's some specialist ones that are worth considering.

OPEN SOURCE MODELS

Open source models are generally understood to be free to use, but some models have restrictive licensing that prohibits commercial use or restricts usage in some way. Be careful to check out the exact license for the model you want to use, making sure you understand exactly what is permissable.

G Gemma

Parameters: 2B, 7B
Origin: Google
License: Gemma
Release date: February 2024
Paper:
Commercial use possible: YES
GitHub: https://huggingface.co/models?search=google/gemma
Training cost:

φ Phi-2

Parameters: 2.7B
Origin: Microsoft
License: MIT
Release date: December 2023
Paper:
Commercial use possible: YES
GitHub: https://huggingface.co/microsoft/phi-2
Training cost:

🌬️ DeciLM-7B-Instruct

Parameters: 7B
Origin: Deci.ai
License: Apache 2.0
Release date: December 2023
Paper:
Commercial use possible: YES
GitHub: https://huggingface.co/Deci/DeciLM-7B-instruct
Training cost:

🌬️ Mistral 8x7B

Parameters: 8x7B Mixture of Experts
Origin: Mistral
License: Apache 2.0
Release date: December 2023
Paper: https://arxiv.org/abs/2401.04088
Commercial use possible: YES
GitHub: https://huggingface.co/mistralai
Training cost:
Comment: Seems to rival GPT-3.5 in benchmarks at a fraction of the size

🌬️ Notus

Parameters: 7B
Origin: Argilla, fine-tuneed from Mistral
License: MIT
Release date: December 2023
Paper:
Commercial use possible: No - uses synthetic data from OpenAI GPT models
GitHub: https://huggingface.co/argilla/notus-7b-v1
Training cost:
Comment: Strong perforamnce for small size, uses DPO fine-tuning.

🍃 Zephyr

Parameters: 7B
Origin: HuggingFace, fine-tuneed from Mistral
License: MIT
Release date: November 2023
Paper:
Commercial use possible: No - uses synthetic data from OpenAI GPT models
GitHub: https://huggingface.co/collections/HuggingFaceH4/zephyr-7b-6538c6d6d5ddd1cbb1744a66
Training cost:
Comment: Strong perforamnce for small size, uses DPO fine-tuning.

🐦‍⬛ Starling

Parameters: 7B
Origin: Berkely, based on LLaMA2
License: LLaMA2 Community License
Release date: November 2023
Paper:
Commercial use possible: No - uses synthetic data from OpenAI GPT models
GitHub: https://huggingface.co/berkeley-nest/Starling-LM-7B-alpha
Training cost:
Comment: Strong reasoning perforamnce for small size.

1️⃣ Yi

Parameters: 7B, 34B
Origin: 01.AI
License: Apache 2.0
Release date: November 2023
Paper:
Commercial use possible: Via request
GitHub: https://github.com/01-ai/Yi
Training cost:
Comment: Strong performance for small size.

🐳 Orca 2

Parameters: 7B, 13B
Origin: MS, fine-tuned LLaMA2
License: MS Research License
Release date: November 2023
Paper: https://arxiv.org/abs/2311.11045
Commercial use possible: NO
GitHub: 7B: https://huggingface.co/microsoft/Orca-2-7b, 13B: https://huggingface.co/microsoft/Orca-2-13b
Training cost: Orca 2 trained on 32 NVIDIA A100 GPUs with 80GB memory. For the 13B checkpoint, it took ~17 hours to train Orca 2 on FLAN dataset for one epoch, ~40 hours to train on 5 million ChatGPT data for 3 epochs and ~23 hours to continue training on ~1.8 million GPT-4 data for 4 epochs. Comment: Strong reasoning abilities for a small model

🌬️ Mistral

Parameters: 7B
Origin: Mistral
License: Apache 2.0
Release date: October 2023
Paper: https://arxiv.org/abs/2310.06825
Commercial use possible: YES
GitHub: https://huggingface.co/mistralai
Training cost:
Comment: Outperforms LLaMA2 13B

📏 LongChat

Parameters: 7B
Origin: UC Berkeley, CMU, Stanford, and UC San Diego
License: Apache 2.0
Release date: August 2023
Paper:
Commercial use possible: YES
GitHub: https://github.com/DachengLi1/LongChat
Training cost:
Comment: 32k context length!

🏯 Qwen

Parameters: 7B, 14B, 72B
Origin: Alibaba
License: Tongyi Qianwen
Release date: August 2023
Paper:
Commercial use possible: YES
GitHub: https://github.com/QwenLM/Qwen-7B
Training cost:

🦙 Vicuna 1.5

Parameters: 13B
Origin: UC Berkeley, CMU, Stanford, and UC San Diego
License: Apache 2.0
Release date: August 2023 (v1.5 uses LLaMA2 instead of LLaMA of prior releases)
Paper:
Commercial use possible: NO (trained on https://sharegpt.com conversations that potentially breaches OpenAI license)
GitHub: https://github.com/lm-sys/FastChat Training cost:

🐋 Stable Beluga

Parameters: 7B, 40B
Origin: Stability AI.
License: CC BY-NC-4.0
Release date: July 2023
Paper:
Commercial use possible: NO
GitHub: https://huggingface.co/stabilityai/StableBeluga2
Training cost:

🦙 LLaMA2

Parameters: 7B, 40B
Origin: Meta.
License: Llama 2 Community License Agreement
Release date: July 2023
Paper: https://arxiv.org/abs/2307.09288
Commercial use possible: YES
GitHub: https://huggingface.co/meta-llama
Training cost: A cumulative of 3.3M GPU hours of computation was performed on hardware of type A100-80GB (TDP of 400W or 350W). We estimate the total emissions for training to be 539 tCO2eq, of which 100% were directly offset by Meta’s sustainability program.

🦅 Falcon

Parameters: 7B, 40B
Origin: UAE Technology Innovation Institute.
License: Apache 2.0
Release date: May 2023
Paper: https://arxiv.org/abs/2311.16867
Commercial use possible: YES
GitHub: https://huggingface.co/tiiuae/falcon-7b
GitHub: https://huggingface.co/tiiuae/falcon-7b-instruct
GitHub: https://huggingface.co/tiiuae/falcon-40b
GitHub: https://huggingface.co/tiiuae/falcon-40b-instruct
Training cost: Falcon-40B was trained on AWS SageMaker, on 384 A100 40GB GPUs in P4d instances.

🧩 MosaicML MPT-30B

Parameters: 30B
Origin: Open souce from MosaicML.
License (MPT-30B Base, Instruct): Attribution-ShareAlike 3.0 Unported
License (MPT-30B Chat): Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0)
Release date: June 2023
Paper:
Commercial use possible: YES(Base & Instruct), NO(Chat)
GitHub: Base: https://huggingface.co/mosaicml/mpt-30b
GitHub: Instruct: https://huggingface.co/mosaicml/mpt-30b-instruct
GitHub: Chat: https://huggingface.co/mosaicml/mpt-30b-chat
Training cost: From Scratch: 512xA100-40GB, 28.3 Days, ~ $871,000.
Traingin cost: Finetune 30B Base model: 16xA100-40GB, 21.8 Hours, $871

🧩 MosaicML MPT-7B

Parameters: 7B
Origin: Open souce from MosaicML. Claimed to be competitive with LLaMA-7B. Base, storywriter, instruct, chat fine-tunings available.
License (MPT-7B Base): Apache 2.0
License (MPT-7B-StoryWriter-65k+): Apache 2.0
License (MPT-7B-Instruct): CC-By-SA-3.0
License (MPT-7B-Chat): CC-By-NC-SA-4.0
Release date: May 2023
Paper:
Commercial use possible: YES
GitHub: https://huggingface.co/togethercomputer/RedPajama-INCITE-Instruct-7B-v0.1
Training cost: Nearly all of the training budget was spent on the base MPT-7B model, which took ~9.5 days to train on 440xA100-40GB GPUs, and cost ~$200k

🦙 Together RedPajama-INCITE

Parameters: 3B, 7B
Origin: "Official" version of the Open Source recreation of LLaMA + chat/instruction-tuned versions
License: Apache 2.0
Release date: May 2023
Paper:
Commercial use possible: YES
GitHub: https://huggingface.co/togethercomputer/RedPajama-INCITE-Instruct-7B-v0.1
Training cost: The training of the first collection of RedPajama-INCITE models is performed on 3,072 V100 GPUs provided as part of the INCITE compute grant on Summit supercomputer at the Oak Ridge Leadership Computing Facility (OLCF).

🦙 OpenAlpaca

Parameters: 7B
Origin: An instruction tuned version of OpenLLaMA
License: Apache 2.0
Release date: May 2023
Paper:
Commercial use possible: YES
GitHub: https://github.com/yxuansu/OpenAlpaca

🦙 OpenLLaMA

Parameters: 7B
Origin: A claimed recreation of Meta's LLaMA without the licensing restrictions License: Apache 2.0
Release date: May 2023
Paper:
Commercial use possible: YES
GitHub: https://github.com/openlm-research/open_llama

🐪 Camel

Parameters: 5B, (20B coming)
Origin: Writer
License: Apache 2.0
Release date: April 2023
Paper:
Commercial use possible: YES
GitHub: https://github.com/basetenlabs/camel-5b-truss

🏛️ Palmyra

Parameters: 5B, (20B coming)
Origin: Writer
License: Apache 2.0
Release date: April 2023
Paper:
Commercial use possible: YES
GitHub:

🐎 StableLM

Parameters: 3B, 7B, (15B, 65B coming)
Origin: Stability.ai
License: CC BY-SA-4.0
Release date: April 2023
Paper:
Commercial use possible: YES
GitHub: https://github.com/Stability-AI/StableLM

🧱 Databricks Dolly 2

Parameters: 12B
Origin: Databricks, an instruction tuned version of EleutherAI pythia
License: CC BY-SA-4.0
Release date: April 2023
Paper:
Commercial use possible: YES
GitHub: https://github.com/databrickslabs/dolly
Training cost: Databricks cite "for thousands of dollars and in a few hours, Dolly 2.0 was built by fine tuning a 12B parameter open-source model (EleutherAI's Pythia) on a human-generated dataset of 15K Q&A pairs". This, of course, is just for the fine-tuning and the cost of training the underlying Pythia model also needs to be taken into account when estimating total training cost.

🦙 Vicuna

Parameters: 13B
Origin: UC Berkeley, CMU, Stanford, and UC San Diego
License: Requires access to LlaMA, trained on https://sharegpt.com conversations that potentially breaches OpenAI license
Release date: April 2023
Paper:
Commercial use possible: NO
GitHub: https://github.com/lm-sys/FastChat

🧠 Cerebras-GPT

Parameters: 111M, 256M, 590M, 1.3B, 2.7B, 6.7B, and 13B
Origin: Cerebras
License: Apache 2.0
Release date: March 2023
Paper: https://arxiv.org/abs/2304.03208
Commercial use possible: YES

🦙 Stanford Alpaca

Parameters: 7B
Origin: Stanford, based on Meta's LLaMA
License: Requires access to LlaMA, trained on GPT conversations against OpenAI license
Release date: March 2023
Paper:
Commercial use possible: NO
GitHub: https://github.com/tatsu-lab/stanford_alpaca
Training cost: Replicate posted a blog post where they replicated the Alpaca fine-tuning process. They used 4x A100 80GB GPUs for 1.5 hours. For total training cost, the cost of training the underlying LLaMA model also needs to be taken into account.

🔮 EleutherAI pythia

Parameters: 70M, 160M, 410M, 1B, 1.4B, 2.8B, 6.9B, 12B
Origin: EleutherAI
License: Apache 2.0
Release date: February 2023
Paper: https://arxiv.org/pdf/2304.01373.pdf
Commercial use possible: YES

🦙 LLaMA

Parameters: 7B, 33B, 65B
Origin: Meta
License: Model weights available for non-commercial use by application to Meta
Release date: February 2023
Paper: https://arxiv.org/abs/2302.13971
Commercial use possible: NO
Training cost: Meta cite "When training a 65B-parameter model, our code processes around 380 tokens/sec/GPU on 2048 A100 GPU with 80GB of RAM. This means that training over our dataset containing 1.4T tokens takes approximately 21 days... Finally, we estimate that we used 2048 A100-80GB for a period of approximately 5 months to develop our models." However, that cost is for all the different model sizes combined. Separately in the LLaMA paper Meta cite 1,022,362 GPU hours on A100-80GB GPUs.

🌸 Bloom

Parameters: 176B
Origin: BigScience
License: BigScience Rail License
Release date: July 2022
Paper: https://arxiv.org/abs/2211.05100
Commercial use possible: YES

🌴 Google PaLM

Parameters: 540B
Origin: Google
License: Unknown - only announcement of intent to open
Release date: April 2022
Paper: https://arxiv.org/abs/2204.02311
Commercial use possible: Awaiting more information

🤖 GPT-NeoX-20B

Parameters: 20B
Origin: EleutherAI
License: Apache 2.0
Release date: January 2022
Paper: https://aclanthology.org/2022.bigscience-1.9/
Commercial use possible: YES
GitHub: https://github.com/EleutherAI/gpt-neox

🤖 GPT-J

Parameters: 6B
Origin: EleutherAI
License: Apache 2.0
Release date: June 2021
Paper:
Commercial use possible: YES

🍮 Google FLAN-T5

Parameters: 80M, 250M, 780M, 3B, 11B
Origin: Google
License: Apache 2.0
Release date: October 2021
Paper: https://arxiv.org/pdf/2210.11416.pdf
Commercial use possible: YES
GitHub: https://github.com/google-research/t5x

🦙 IBM Dromedary

Parameters: 7B, 13B, 33B and 65B
Origin: IBM, based on Meta's LLaMA
License: GNU General Public License v3.0
Release date:
Paper: https://arxiv.org/abs/2305.03047
Commercial use possible: NO
GitHub: https://github.com/IBM/Dromedary

COMMERCIAL MODELS

These commercial models are generally available through some form of usage-based payment model - you use more, you pay more.

OpenAI

GPT-4
Parameters: undeclared
Availability: Wait-list https://openai.com/waitlist/gpt-4-api
Fine-tuning: No fine-tuning yet available or announced.
Paper: https://arxiv.org/abs/2303.08774
Pricing: https://openai.com/pricing
Endpoints: Chat API endpoint, which also serves as a completions endpoint.
Privacy: Data from API calls not collected or used to train models https://openai.com/policies/api-data-usage-policies

GPT-3.5
Parameters: undeclared (GPT-3 had 175B)
Availability: GA
Fine-tuning: Yes, fine-tuning available through APIs.
Paper: https://arxiv.org/pdf/2005.14165.pdf
Pricing: https://openai.com/pricing
Endpoints: A variety of endpoints available, including: chat, embeddings, fine-tuning, moderation, completions.
Privacy: Data from API calls not collected or used to train models.

ChatGPT
Parameters: undeclared (uses GPT-3.5 model)
Availability: GA
Fine-tuning: N/A - consumer web-based solution.
Paper:
Pricing: https://openai.com/pricing
Endpoints: N/A - consumer web-based solution.
Privacy: Data submitted on the web-based ChatGPT service is collected and used to train models https://openai.com/policies/api-data-usage-policies

AI21Labs

Jurassic-2
Parameters: undeclared (jurassic-1 had 178B)
Availability: GA
Fine-tuning: Yes, fine-tuning available through APIs.
Paper:
Pricing: https://www.ai21.com/studio/pricing
Endpoints: A variety of endpoints available, including: task-specific endpoints including paraphrase, gramtical errors, text improvements, summarisation, text segmentation, contextual answers.
Privacy:

Anthropic

Claude
Parameters: undeclared
Availability: Waitlist https://www.anthropic.com/product
Fine-tuning: Not standard, large enterprise may contact via https://www.anthropic.com/earlyaccess to discuss.
Paper: https://arxiv.org/abs/2204.05862
Pricing: https://cdn2.assets-servd.host/anthropic-website/production/images/apr-pricing-tokens.pdf
Endpoints: Completions endpoint.
Privacy: Data sent to/from is not used to train models unless feedback is given - https://vault.pactsafe.io/s/9f502c93-cb5c-4571-b205-1e479da61794/legal.html#terms

Google

Google Bard
Parameters: 770M
Availability: Waitlist https://bard.google.com
Fine-tuning: No
Paper:
Pricing:
Endpoints: Consumer UI only, API via PaLM
Privacy:

Google PaLM API
Parameters: Upto 540B
Availability: Announced but not yet available – https://blog.google/technology/ai/ai-developers-google-cloud-workspace/
Fine-tuning: unknown
Paper: https://arxiv.org/abs/2204.02311
Pricing: unknown
Endpoints: unknown
Privacy: unknown

Amazon

Amazon Titan
Parameters: unknown
Availability: Announced but not yet available – https://aws.amazon.com/bedrock/titan/ai-developers-google-cloud-workspace/
Fine-tuning: unknown
Paper:
Pricing: unknown
Endpoints: unknown
Privacy: unknown

Cohere

Cohere
Parameters: 52B
Availability: GA
Fine-tuning:
Paper:
Pricing: https://cohere.com/pricing
Endpoints: A variety of endpoints including embedding, text completion, classification, summarisation, tokensisation, language detection.
Privacy: Data submitted is used to train models - https://cohere.com/terms-of-use

IBM

Granite
Parameters: 13B, 20B
Availability: GA (granite.13b - .instruct, .chat; granite.20b.code - .ansible, .cobol; other variants on roadmap)
Fine-tuning: Currently Prompt Tuning via APIs
Paper: https://www.ibm.com/downloads/cas/X9W4O6BM
Pricing: https://www.ibm.com/products/watsonx-ai/pricing
Endpoints: Various endpoints - Q&A; Generate; Extract; Summarise; Classify
Privacy: IBM curated 6.48 TB of data before pre-processing, 2.07 TB after pre-processing. 1T tokens generated from a total of 14 datasets. Detail in paper. Prompt data is not saved by IBM for other training purposes. Users have complete control in their storage area of any saved prompts, prompt sessions, or prompt tuned models.
Training cost: granite.13b trained on 256 A100 for 1056 GPU hours.
Legal: IBM indemnifies customer use of these models on the watsonx platform

Name		Name	Last commit message	Last commit date
Latest commit History 78 Commits
LICENSE		LICENSE
README.md		README.md

License

Barnacle-ai/awesome-llm-list

Folders and files

Latest commit

History

Repository files navigation