Awesome LLM Papers

중요한 LLM 관련 논문들을 팔로잉하고 있습니다🤗
Following the newest & important LLM papers🤗

Foundation Models & Finetuned Family

Paper	a.k.a	Affiliation	Published date	#
LLaMA: Open and Efficient Foundation Language Models	LLaMA	Meta	February. 2023	#Model #Foundation
*Alpaca: A Strong, Replicable Instruction-Following Model	Alpaca	Stanford University	March. 2023	#Model #Finetuning #Self-instruct
Vicuna: An Open-Source Chatbot Impressing GPT-4 with 90% ChatGPT Quality	Vicuna	LMSYS Org.	March. 2023	#Model #Finetuning #Methodology
Training Compute-Optimal Large Language Models	Chinchilla	-	May. 2022	#Model #Foundation #Methodology
LIMA: Less Is More for Alignment	LIMA	Meta	May. 2023	#Model #Finetuning #Data-centric
Orca: Progressive Learning from Complex Explanation Traces of GPT-4	Orca	Microsoft	June. 2023	#Model #Finetuning #Methodology
Platypus: Quick, Cheap, and Powerful Refinement of LLMs	Platypus	Boston University	August. 2023	#Model #Finetuning #Methodology
Mistral 7B	Mistral	Mistral.AI	Oct.2023	#Model #Finetuning #LightWeight

Paper	a.k.a	Affiliation	Published date	#
Parameter-Efficient Transfer Learning for NLP	Adapter	-	Jun. 2019	#PEFT
Prefix-Tuning: Optimizing Continuous Prompts for Generation	Prefix-tuning	Stanford University	Jan. 2021	#PEFT
The Power of Scale for Parameter-Efficient Prompt Tuning	Prompt-tuning	Google Research	Sep. 2021	#PEFT
GPT Understands, Too	P-tuning	-	March. 2021	#PEFT
LoRA: Low-Rank Adaptation of Large Language Models	LoRA	NC univ. @chapel hill	October. 2021	#PEFT
QLoRA: Efficient Finetuning of Quantized LLMs	QLoRA	University of Washington	May. 2023	#PEFT #LoRA
Few-Shot Parameter-Efficient Fine-Tuning is Better and Cheaper than In-Context Learning	IA3	NC univ. @chapel hill	August. 2022	#PEFT #FewShot-learning
Stack More Layers Differently: High-Rank Training Through Low-Rank Updates	ReLoRA	Massachusetts Lowel	August. 2023	#PEFT #LoRA
LoraHub: Efficient Cross-Task Generalization via Dynamic LoRA Composition	LoRAHub	Sea AI Lab	Jun. 2023	#PEFT #LoRA #Compose
VeRA: Vector-based Random Matrix Adaptation	VeRA	University of Amsterdam	Oct. 2023	#PEFT

Paper	a.k.a	Affiliation	Published date	#
Knowledge Distillation of Large Language Models	MiniLLM	CoAI Group	Jun. 2023	#LightWeight #Distillation
VLLM: Efficient Memory Management for Large Language Model Serving with PagedAttention	VLLM	UC Berkeley	Sep. 2023	#LightInference #Attention #KVcache
GQA: Training Generalized Multi-Query Transformer Models from Multi-Head Checkpoints	GQA	Google Research	Oct. 2023	#LightWeight #Attention #Distillation
*Flash-Decoding for long-context inference	FlashDecoding	Stanford University	Oct. 2023	#LightInference #Attention #Parallelization
Efficient Streaming Language Models with Attention Sinks	StreamingLLM	Massachusetts University	Sep. 2023	#LightInference #Attention #KVcache

Paper	a.k.a	Affiliation	Published date	#
RoFormer: Enhanced Transformer with Rotary Position Embedding	RoPE	Zhuiyi Technology	August. 2022	#PE #RPE #ComplexPlane
TRAIN SHORT, TEST LONG: ATTENTION WITH LINEAR BIASES ENABLES INPUT LENGTH EXTRAPOLATION	ALiBi	Facebook	April. 2022	#seq_len #Extrapolation #Efficient
A Length-Extrapolatable Transformer	xPos	Microsoft	December. 2022	#PE #RoPE #ComplexPlane
*Extending Context is Hard…but not Impossible	kaiokendev	-	February. 2023
EXTENDING CONTEXT WINDOW OF LARGE LANGUAGE MODELS VIA POSITION INTERPOLATION	post-kaiokendev	Meta	Jun. 2023	#seq_len #Interpolation #RoPE

Paper	a.k.a	Affiliation	Published date	#
Training language models to follow instructions with human feedback	InstructGPT	OpenAI	March. 2022	#Finetuning #PPO #Instruction
Voyager: An Open-Ended Embodied Agent with Large Language Models	Voyager	NVIDIA	Oct. 2023	#Prompting #Gam
Motif: Intrinsic Motivation from Artificial Intelligence Feedback	Motif	Mila	Sep. 2023	#LLMfeedback #Game
Direct Preference Optimization: Your Language Model is Secretly a Reward Model	DPO	Stanford University	May. 2023	#Finetuning #DPO

Paper	a.k.a	Affiliation	Published date	#
RWKV: Reinventing RNNs for the Transformer Era	RWKV	RWKV Foundation	May. 2023	#Architecture #Recurrent #Efficient
Retentive Network: A Successor to Transformer for Large Language Models	RetNet	Microsoft	July. 2023	#Architecture #Recurrent #Efficient
Hyena Hierarchy: Towards Larger Convolutional Language Models	Hyena	-	April. 2023	#Architecture #Recurrent #Efficient
BitNet: Scaling 1-bit Transformers for Large Language Models	1-Bit Transformer	Microsoft	Oct. 2023	#Architecture #Quantized

Paper	a.k.a	Affiliation	Published date	#
Who's Harry Potter? Approximate Unlearning in LLMs	-	Microsoft	Oct. 2023	#EraseMemory #Forgetting #Finetuning
Prometheus: Inducing Fine-grained Evaluation Capability in Language Models	Prometheus	KAIST university	Oct. 2023	#Evaluation
In-Context Learning Creates Task Vectors	Task-Vector	Tel Aviv University	Oct. 2023	#In-Context Learning

*: not a paper