A Survey on Language, Multimodal, and Scientific GPT Models: Examing User-Friendly and Open-Sourced Large GPT Models

Continuously updating

The original paper is released on arxiv.

Introduction

The advent of GPT models has brought about a significant transformation in the field of NLP. These models, such as GPT-4, demonstrate exceptional capabilities in various NLP tasks. However, despite their impressive capabilities, large GPT models have inherent limitations that restrict their widespread adoption, usability, and fine-tuning. The need for user-friendly, relatively small, and open-sourced alternative GPT models arises from the desire to overcome these limitations while retaining high performance. In this survey paper, we provide an examination of alternative open-sourced models of large GPTs, focusing on user-friendly and relatively small models (near 10B) that facilitate easier deployment and accessibility.

Investigate the architecture, design principles, and trade-offs of user-friendly and relatively small alternative GPT models, focusing on their ability to overcome the challenges posed by large GPT models.
Present the data collection and analyze the pre-training data source, data quality, quantity, diversity, and finetuning data including instruction data, alignment data, and also the domain-specific data for domain-specific models.
Survey the techniques for efficient deployment and fine-tuning of these GPT models.
Introduce ongoing open-source projects and initiatives for user-friendly GPT model reproduction and deployment.
Provide a thorough analysis of benchmark evaluations and offer human evaluations of these relatively small GPT models to give some human-liked recommendations in real usage.
Explore the extension of GPT models to multimodal settings, focusing on models that integrate NLP with computer vision, and also place special focus on user-friendly scientific GPT models and biomedical domains

The overview of the content is shown in Figure 1.

GPT and GPT-like models

Related papers/links for open LLMs (List is updating)

Language Domain

Exploring the limits of transfer learning with a unified text-to-text transformer. JMLR 2020. [paper] [code & models] [Huggingface models]
mT5: A massively multilingual pre-trained text-to-text transformer. NAACL 2021. [paper] [code & models] [Huggingface models]
GPT-Neo: Large Scale Autoregressive Language Modeling with Mesh-Tensorflow. [code & models] [Huggingface models]
Gpt-neox-20b: An open-source autoregressive language model. arxiv 2022. [paper] [code] [original models] [Huggingface models]
GPT-J-6B: A 6 Billion Parameter Autoregressive Language Model. [code & models] [Huggingface models]
Opt: Open pre-trained transformer language models. arxiv 2022. [paper] [code] [Huggingface models]
BLOOM: A 176b-parameter open-access multilingual language model. arxiv 2022. [paper] [Huggingface models]
Crosslingual Generalization through Multitask Finetuning. arxiv 2022. [paper] [Huggingface models]
Glm: General language model pretraining with autoregressive blank infilling. ACL 2022. [paper] [code & models] [Huggingface models]
GLM-130B: An Open Bilingual Pre-trained Model. ICLR 2023. [paper] [code & models]
ChatGLM-6B [code & models] [Huggingface models]
ChatGLM2-6B [code & models] [Huggingface models]
LLaMA: Open and Efficient Foundation Language Models. arxiv 2023. [paper] [code & models]
OpenLLaMA: An Open Reproduction of LLaMA. [code & models]
Stanford Alpaca: An Instruction-following LLaMA Model. [code & models]
Vicuna: An Open-Source Chatbot Impressing GPT-4 with 90% ChatGPT Quality. [blog] [code & models]
StableLM: Stability AI Language Models. [code & models]
Baize. [code & models]
Koala: A Dialogue Model for Academic Research. [blog] [code & models]
WizardLM: Empowering Large Pre-Trained Language Models to Follow Complex Instructions. [code & models]
Large-scale, Informative, and Diverse Multi-round Dialogue Data, and Models. [code & models]
YuLan-Chat: An Open-Source Bilingual Chatbot. [code & models]
Pythia: Interpreting Transformers Across Time and Scale. arxiv 2023. [paper] [code & models]
Dolly. [code & models]
OpenChatKit. [code & models]
BELLE: Be Everyone's Large Language model Engine. [code & models]
RWKV: Reinventing RNNs for the Transformer Era. arxiv 2023. [paper] [code & models] [Huggingface models]
ChatRWKV. [code & models]
MOSS. [code & models]
RedPajama-INCITE. [blog] [Huggingface models]
Introducing MPT-7B: A New Standard for Open-Source, Commercially Usable LLMs. [blog] [code] [Huggingface models]
Introducing Falcon LLM. [blog] [Huggingface models]
InternLM. [code & models]
Baichuan-7B. [code & models]
Llama 2: Open Foundation and Fine-Tuned Chat Models. arxiv 2023. [paper] [code & models]
Introducing Qwen-7B: Open foundation and human-aligned models. code & models]
XVERSE-13B. [code & models]

Multimodal Domain

Flamingo: a Visual Language Model for Few-Shot Learning. NeurIPS 2022. [paper]
BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models. arxiv 2023. [paper] [code]
MiniGPT-4: Enhancing Vision-Language Understanding with Advanced Large Language Models. arxiv 2023. [paper] [website, code & models]
Visual Instruction Tuning. arxiv 2023. [paper] [website, code & models]
mPLUG-Owl: Modularization Empowers Large Language Models with Multimodality. arxiv 2023. [paper] [code & models]
Transfer Visual Prompt Generator across LLMs. arxiv 2023. [paper] [webste, code & models]
Otter: A Multi-Modal Model with In-Context Instruction Tuning. arxiv 2023. [paper] [code & models]
MultiModal-GPT: A Vision and Language Model for Dialogue with Humans. arxiv 2023. [paper] [code & models]

Scientific Domain

BioGPT: Generative Pre-trained Transformer for Biomedical Text Generation and Mining. Bioinformatics 2022. [paper] [code & models]
Galactica: A Large Language Model for Science. arxiv 2022. [paper] [models]
BiomedGPT: A Unified and Generalist Biomedical Generative Pre-trained Transformer for Vision, Language, and Multimodal Tasks. arxiv 2023. [paper] [code & models]
MolXPT: Wrapping Molecules with Text for Generative Pre-training. ACL 2023. [paper] [code & models]
Translation between Molecules and Natural Language. EMNLP 2022. [paper] [code & models]

Table 1. Statistical overview of open large language models in recent years, categorized by base models Outer pipes Cell padding No sorting

Model	#Param	Backbone	Release Date	Training Data Size
T5 (enc-dec) [github]	60M, 220M, 770M, 3B, 11B	Base Model	2019-10	1T tokens
mT5 (enc-dec) [github]	300M, 580M, 1.2B, 3.7B, 13B	Base Model	2020-10	1T tokens
GPT-Neo [github]	125M, 350M, 1.3B, 2.7B	Base Model	2021-03	825GB
GPT-NeoX [github]	20B	Base Model	2022-02	825GB
GPT-J [github]	6B	Base Model	2021-06	825GB
OPT [github]	125M, 1.3B, 2.7B, 6.7B, 13B, 30B, 66B, 175B	Base Model	2022-05	180B tokens
BLOOM	560M, 1.1B, 1B7, 3B, 7.1B, 176B	Base Model	2022-07	366B tokens
BLOOMZ	560M, 1.1B, 1B7, 3B, 7.1B, 176B	BLOOM	2022-11	-
GLM [github]	110M, 335M, 410M, 515M, 2B, 10B, 130B	Base Model	2021-03
English Wikipedia	-
GLM-130B [github]	130B	Base Model	2022-08	-
ChatGLM [github]	6B	GLM	2023-03	-
ChatGLM2 [github]	6B	GLM	2023-06	-
LLaMA [github]	7B, 13B, 33B, 65B	Base Model	2023-02	1.4T tokens
OpenLLaMA [github]	3B, 7B	Replicate of LLaMA	2023-05
Alpaca [github]	7B	LLaMA	2023-03	52K
Vicuna [github]	7B, 13B	LLaMA	2023-03	70K
StableVicuna [github]	13B	LLaMA	Vicuna	-
BAIZE [github]	7B, 13B, 30B	LLaMA	2023-04	54K/57K/47K
Koala [github]	13B	LLaMA	2023-04	-
WizardLM [github]	7B, 13B, 30B	LLaMA	2023-06	250k/70k
UltraLM [github]	13B	LLaMA	2023-06	-
Pythia [github]	70M, 160M, 410M, 1B, 1.4B, 2.8B, 6.9B, 12B	Base Model	2023-01	299.9B tokens/207B tokens
Dolly-v2 [github]	12B	Pythia	2023-04	\textasciitilde 15k
Openchatkit [github]	7B	Pythia	2023-03
BELLE-7B [github]	7B	Pythia	2023-03	1.5M
StableLM-Alpha [github]	3B, 7B	Base Model	2023-04	1.5T tokens
StableLM-Tuned-Alpha [github]	7B	StableLM	2023-04	-
RWKV [github]	169M, 430M, 1.5B, 3B, 7B, 14B	Base Model	-	825GB
ChatRWKV [github]	7B, 14B	RWKV	2022-12	-
moss-moon-003-base [github]	16B	base model	2023-04	700B tokens
moss-moon-003-sft [github]	16B	moss-moon-003-base	2023-04	1.1 million
RedPajama-INCITE	3B, 7B	Base Model	2023-05	1.2T tokens
MPT-7B [github]	7B	Base Model	2023-05	1T tokens
MPT-7B-Chat [github]	7B	MPT-7B	2023-05	-
Falcon LLM	7B, 40B	Base Model	2023-06	1T tokens
InternLM [github]	7B	Base Model	2023-06	trillions of tokens
InternLM Chat [github]	7B	InternLM	2023-06	-
Baichuan [github]	7B	Base Model	2023-06	1.2T tokens
LLAMA 2 [github]	7B, 13B, 70B	Base Model	2023-07	2T tokens
LLAMA 2-CHAT [github]	7B, 13B, 70B	LLAMA 2	2023-07	27,540 instruction tuning data, 2,919,326 human preference data
Qwen [github]	7B	Base Model	2023-08	2.2T tokens
Qwen-Chat [github]	7B	Qwen	2023-08	-

Training/fintuning Data sources

C4 (https://www.tensorflow.org/datasets/catalog/c4), mC4 (https://www.tensorflow.org/datasets/catalog/c4#c4multilingual_nights_stay)
The Pile (https://pile.eleuther.ai/)
ROOTS corpus
xP3 (extended from P3) (https://huggingface.co/datasets/bigscience/xP3)
BooksCorpus
English CommonCrawl, Github, Wikipedia, Gutenberg Books3, Stack Exchange
Quora, StackOverflow, MedQuAD
ShareGPT (https://sharegpt.com), HC3 (https://huggingface.co/datasets/Hello-SimpleAI/HC3)
Stanford's Alpaca (https://huggingface.co/datasets/tatsu-lab/alpaca), Nomic-AI's gpt4all, Databricks labs' Dolly, and Anthropic's HH
UltraChat
OIG (https://laion.ai/blog/oig-dataset/) dataset
BELLE's Chinese Dataset (https://github.com/LianjiaTech/BELLE/tree/main/data/1.5M)
RedPajama-Data

Deployment and fine-tuning technique

Efficient Deploy

Related papers

ZeroQuant: Efficient and Affordable Post-Training Quantization for Large-Scale Transformers. arxiv 2022. [paper]
LUT-GEMM: Quantized Matrix Multiplication based on LUTs for Efficient Inference in Large-Scale Generative Language Models. arxiv 2022. [paper]
LLM.int8(): 8-bit Matrix Multiplication for Transformers at Scale. Neurips 2022. [paper]
GPTQ: Accurate Post-Training Quantization for Generative Pre-trained Transformers. ICLR 2023. [paper]
SmoothQuant: Accurate and Efficient Post-Training Quantization for Large Language Models. ICML 2023. [paper]
LLM-QAT: Data-Free Quantization Aware Training for Large Language Models. arxiv 2023. [paper]

3.2 Efficient Finetuning

Related papers

Parameter-Efficient Transfer Learning for NLP. ICML 2019. [paper]
LoRA: Low-Rank Adaptation of Large Language Models. ICLR 2022. [paper]
The power of scale for parameter-efficient prompt tuning. EMNLP 2021. [paper]
GPT Understands, Too. arxiv 2021. [paper]
Prefix-Tuning: Optimizing Continuous Prompts for Generation. ACL 2021. [paper]
P-Tuning: Prompt Tuning Can Be Comparable to Fine-tuning Across Scales and Tasks. ACL 2022. [paper]
QLoRA: Efficient Finetuning of Quantized LLMs. arxiv 2023. [paper]

Open-sourced tools

TABLE 5: Overview of open-source efforts and tools development I have swapped the "tool" and "category" columns in the markdown table as requested:

Category	Tool	Application	Released by	Link
Deployment	Transformers	LLM training and deployment	Huggingface	https://huggingface.co/transformers
	Colossal-AI	Unified system to train and deploy large-scale models	HPC-AI Tech	https://colossalai.org/
	GPT4all	Large and personalized language models training and deployment on common hardware	Nomic AI	https://gpt4all.io/
	PandaLM	System providing automated and reproducible comparisons among various LLMs	Westlake University	https://github.com/WeOpenML/PandaLM
	MLC LLM	Solution allowing LLMs to be deployed natively	MLC AI	https://mlc.ai/mlc-llm/
Accelerating	Deepspeed	Accelerating training and inference of large-scale models	Microsoft	https://github.com/microsoft/DeepSpeed
	Megatron-LM	Accelerating training and inference of large-scale models	Nvidia	https://github.com/NVIDIA/Megatron-LM
Reproduction	MinGPT	Re-implementation of GPT which is clean, interpretable and educational	Stanford University	https://github.com/karpathy/minGPT
	RedPajama	An effort to produce reproducible and fully-open language models	ETH Zurich	https://together.xyz/blog/redpajama
Framework	LangChain	Framework for integration of LLMs with other computational sources and knowledge	LangChain	https://python.langchain.com/
	xTuning	Framework providing fast, efficient and simple fine-tuning of LLMs	Stochastic	https://github.com/stochasticai/xturing
Evaluation	Open LLM Leaderboard	LM evaluation leaderboard	Huggingface	https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard
Framework	Scikit-LLM	Framework integrating LLMs into scikit-learn for enhanced text analysis tasks	Tractive	https://github.com/iryna-kondr/scikit-llm
	AlpacaFarm	Simulation framework for methods that learn from human feedback	Stanford	https://github.com/tatsu-lab/alpaca_farm/
	h2oGPT	LLM finetuning framework and chatbot UI with document(s) question-answer capabilities	H2O.ai	https://github.com/h2oai/h2ogpt
Software	Open-Assistant	Customized and personalized chat-based assistant	LAION AI	https://github.com/LAION-AI/Open-Assistant
	MetaGPT	Multi-agent framework to tackle tasks with multiple agents	Open-Source Community	https://github.com/geekan/MetaGPT
Finetuning	PEFT	Library for finetuning LLMs with only part of parameters	Huggingface	https://huggingface.co/docs/peft

Benchmark evaluations

Upcoming soon ...

Misc

TABLE 16. ChatGPT Alternatives on Different Applications

Field	Software	Backbone	Url
Writing	ChatSonic	GPT-4	https://writesonic.com/chat
	Jasper Chat	GPT 3.5 and others	https://www.jasper.ai/chat
Search Engines	ChatSonic on Opera	GPT-4	https://writesonic.com/chatsonic-opera
	NeevaAI	ChatGPT	https://neeva.com/
Coding	Copilot	Codex	https://github.com/features/copilot
	Tabnine	GPT-2	https://www.tabnine.com/
	Codewhisperer	-	https://aws.amazon.com/cn/codewhisperer
Language Learning	Elsa	-	https://elsaspeak.com/en
	DeepL Write	-	https://www.deepl.com/translator
Research	Elicit	-	https://elicit.org
	ChatPDF	ChatGPT	https://www.chatpdf.com
	Copilot in Azure Quantum	GPT-4	https://quantum.microsoft.com/
Productivity (team work)	CoGram	-	https://www.cogram.com
	Otter	-	https://otter.ai
	Chatexcel	-	https://chatexcel.com
	AI Anywhere	ChatGPT, GPT-4	https://www.ai-anywhere.com/#/dashboard
Conversation	Replika	A model with 774M parameters	https://replika.com
	Character AI	GPT-4	https://beta.character.ai
	Poe	Multiple Models (GPT-4, LLaMA, ...)	https://poe.com
Building customized AI	Botsonic AI chatbot	GPT-4	https://writesonic.com/botsonic

Reference

If you find our paper/repository useful, please kindly cite our paper.

@misc{gao2023examining,
      title={Examining User-Friendly and Open-Sourced Large GPT Models: A Survey on Language, Multimodal, and Scientific GPT Models}, 
      author={Kaiyuan Gao and Sunan He and Zhenyu He and Jiacheng Lin and QiZhi Pei and Jie Shao and Wei Zhang},
      year={2023},
      eprint={2308.14149},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

Name		Name	Last commit message	Last commit date
Latest commit History 40 Commits
image		image
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

A Survey on Language, Multimodal, and Scientific GPT Models: Examing User-Friendly and Open-Sourced Large GPT Models

Introduction

GPT and GPT-like models

Training/fintuning Data sources

Deployment and fine-tuning technique

Efficient Deploy

3.2 Efficient Finetuning

Open-sourced tools

Benchmark evaluations

Misc

Reference

About

Releases

Packages

Contributors 4

GPT-Alternatives/gpt_alternatives

Folders and files

Latest commit

History

Repository files navigation

A Survey on Language, Multimodal, and Scientific GPT Models: Examing User-Friendly and Open-Sourced Large GPT Models

Introduction

GPT and GPT-like models

Training/fintuning Data sources

Deployment and fine-tuning technique

Efficient Deploy

3.2 Efficient Finetuning

Open-sourced tools

Benchmark evaluations

Misc

Reference

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 4

Packages