중요한 LLM 관련 논문들을 팔로잉하고 있습니다🤗
Following the newest & important LLM papers🤗
Paper | a.k.a | Affiliation | Published date | # | Desc. |
---|---|---|---|---|---|
LLaMA: Open and Efficient Foundation Language Models | LLaMA | Meta | February. 2023 | #Model #Foundation |
|
*Alpaca: A Strong, Replicable Instruction-Following Model | Alpaca | Stanford University | March. 2023 | #Model #Finetuning #Self-instruct |
|
*Vicuna: An Open-Source Chatbot Impressing GPT-4 with 90%* ChatGPT Quality | Vicuna | LMSYS Org. | March. 2023 | #Model #Finetuning #Methodology |
|
Training Compute-Optimal Large Language Models | Chinchilla | - | May. 2022 | #Model #Foundation #Methodology |
|
LIMA: Less Is More for Alignment | LIMA | Meta | May. 2023 | #Model #Finetuning #Data-centric |
|
Orca: Progressive Learning from Complex Explanation Traces of GPT-4 | Orca | Microsoft | June. 2023 | #Model #Finetuning #Methodology |
|
Platypus: Quick, Cheap, and Powerful Refinement of LLMs | Platypus | Boston University | August. 2023 | #Model #Finetuning #Methodology |
|
Mistral 7B | Mistral | Mistral.AI | Oct.2023 | #Model #Finetuning #LightWeight |
Paper | a.k.a | Affiliation | Published date | # | Desc. |
---|---|---|---|---|---|
Knowledge Distillation of Large Language Models | MiniLLM | CoAI Group | Jun. 2023 | #LightWeight #Distillation |
|
VLLM: Efficient Memory Management for Large Language Model Serving with PagedAttention | VLLM | UC Berkeley | Sep. 2023 | #LightInference #Attention #KVcache |
|
GQA: Training Generalized Multi-Query Transformer Models from Multi-Head Checkpoints | GQA | Google Research | Oct. 2023 | #LightWeight #Attention #Distillation |
|
*Flash-Decoding for long-context inference | FlashDecoding | Stanford University | Oct. 2023 | #LightInference #Attention #Parallelization |
|
Efficient Streaming Language Models with Attention Sinks | StreamingLLM | Massachusetts University | Sep. 2023 | #LightInference #Attention #KVcache |
Paper | a.k.a | Affiliation | Published date | # | Desc. |
---|---|---|---|---|---|
RoFormer: Enhanced Transformer with Rotary Position Embedding | RoPE | Zhuiyi Technology | August. 2022 | #PE #RPE #ComplexPlane |
|
TRAIN SHORT, TEST LONG: ATTENTION WITH LINEAR BIASES ENABLES INPUT LENGTH EXTRAPOLATION |
ALiBi | April. 2022 | #seq_len #Extrapolation #Efficient |
||
A Length-Extrapolatable Transformer | xPos | Microsoft | December. 2022 | #PE #RoPE #ComplexPlane |
|
*Extending Context is Hard…but not Impossible | kaiokendev | - | February. 2023 | ||
EXTENDING CONTEXT WINDOW OF LARGE LANGUAGE MODELS VIA POSITION INTERPOLATION | post-kaiokendev | Meta | Jun. 2023 | #seq_len #Interpolation #RoPE |
Paper | a.k.a | Affiliation | Published date | # | Desc. |
---|---|---|---|---|---|
Training language models to follow instructions with human feedback | InstructGPT | OpenAI | March. 2022 | #Finetuning #PPO #Instruction |
|
Voyager: An Open-Ended Embodied Agent with Large Language Models | Voyager | NVIDIA | Oct. 2023 | #Prompting #Gam |
|
Motif: Intrinsic Motivation from Artificial Intelligence Feedback | Motif | Mila | Sep. 2023 | #LLMfeedback #Game |
|
Direct Preference Optimization: Your Language Model is Secretly a Reward Model | DPO | Stanford University | May. 2023 | #Finetuning #DPO |
Paper | a.k.a | Affiliation | Published date | # | Desc. |
---|---|---|---|---|---|
RWKV: Reinventing RNNs for the Transformer Era |
RWKV | RWKV Foundation | May. 2023 | #Architecture #Recurrent #Efficient |
|
Retentive Network: A Successor to Transformer for Large Language Models | RetNet | Microsoft | July. 2023 | #Architecture #Recurrent #Efficient |
|
Hyena Hierarchy: Towards Larger Convolutional Language Models | Hyena | - | April. 2023 | #Architecture #Recurrent #Efficient |
|
BitNet: Scaling 1-bit Transformers for Large Language Models | 1-Bit Transformer | Microsoft | Oct. 2023 | #Architecture #Quantized |
Paper | a.k.a | Affiliation | Published date | # | Desc. |
---|---|---|---|---|---|
Who's Harry Potter? Approximate Unlearning in LLMs | - | Microsoft | Oct. 2023 | #EraseMemory #Forgetting #Finetuning |
|
Prometheus: Inducing Fine-grained Evaluation Capability in Language Models | Prometheus | KAIST university | Oct. 2023 | #Evaluation | |
In-Context Learning Creates Task Vectors | Task-Vector | Tel Aviv University | Oct. 2023 | #In-Context Learning |
*: not a paper