Skip to content

Paper list and datasets for the paper: A Survey on Data Selection for LLM Instruction Tuning

Notifications You must be signed in to change notification settings

Bolin97/awesome-instruction-selector

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

4 Commits
Β 
Β 

Repository files navigation

awesome-instruction-selector

Paper list and datasets for the paper: A Survey on Data Selection for LLM Instruction Tuning

Latest update date: February 2, 2024 UTC.

Labels: publisher year πŸ“„PDF, πŸ”—Codes, πŸ’‘Report

Useful instruction sets

Self-Instruct: Aligning Language Models with Self-Generated Instructions. ACL 2023 πŸ“„PDF, πŸ”—Data

Alpaca: A Strong, Replicable Instruction-Following Model. Report 2023 πŸ’‘Blog, πŸ”—Data

WizardLM: Empowering Large Language Models to Follow Complex Instructions. arXiv 2023 πŸ“„PDF, πŸ”—Data

LIMA: Less Is More for Alignment. arXiv 2023 πŸ“„PDF, πŸ”—Data

Free Dolly: Introducing the World's First Truly Open Instruction-Tuned LLM. Report 2023 πŸ’‘Blog, πŸ”—Data

Multitask Prompted Training Enables Zero-Shot Task Generalization. ICLR 2022 πŸ“„PDF, πŸ”—Data

Instruction Selections Methods

System of indicators as data selector

Instruction Mining: High-Quality Instruction Data Selection for Large Language Models. arXiv 2023 πŸ“„PDF

InstructionGPT-4: A 200-Instruction Paradigm for Fine-Tuning MiniGPT-4. arXiv 2023 πŸ“„PDF, πŸ”—Codes

Dataset Quantization. ICCV 2023 πŸ“„PDF, πŸ”—Codes

Trainable LLMs as data selector

From Quantity to Quality: Boosting LLM Performance with Self-Guided Data Selection for Instruction Tuning. arXiv 2023 πŸ“„PDF, πŸ”—Codes

Self-Alignment with Instruction Backtranslation. arXiv 2023 πŸ“„PDF

One Shot Learning as Instruction Data Prospector for Large Language Models. arXiv 2023 πŸ“„PDF, πŸ”—Codes

Self-Evolved Diverse Data Sampling for Efficient Instruction Tuning. arXiv 2023 πŸ“„PDF, πŸ”—Codes

TeGit: Generating High-Quality Instruction-Tuning Data with Text-Grounded Task Design. ICLR 2024 πŸ“„PDF

Active Instruction Tuning: Improving Cross-Task Generalization by Training on Prompt Sensitive Tasks. EMNLP 2023 πŸ“„PDF, πŸ”—Codes

Powerful LLMs like ChatGPT as data selector

AlpaGasus: Training A Better Alpaca with Fewer Data. arXiv 2023 πŸ“„PDF, πŸ’‘Blog

#InsTag: Instruction Tagging for Analyzing Supervised Fine-tuning of Large Language Models. arXiv 2023 πŸ“„PDF, πŸ”—Codes

Rethinking the Instruction Quality: {LIFT} is What You Need. arXiv 2023 πŸ“„PDF

What Makes Good Data for Alignment? A Comprehensive Study of Automatic Data Selection in Instruction Tuning. arXiv 2023 πŸ“„PDF, πŸ”—Codes

A Preliminary Study of the Intrinsic Relationship between Complexity and Alignment. arXiv 2023 πŸ“„PDF

WaveCoder: Widespread And Versatile Enhanced Instruction Tuning with Refined Data Generation. arXiv 2023 πŸ“„PDF

Small Reward Models as data selector

MoDS: Model-oriented Data Selection for Instruction Tuning. arXiv 2023 πŸ“„PDF, πŸ”—Codes

Maybe Only 0.5% Data is Needed: A Preliminary Exploration of Low Training Data Instruction Tuning. arXiv 2023 πŸ“„PDF, πŸ”—Codes

Evaluation Results

image

image

image

About

Paper list and datasets for the paper: A Survey on Data Selection for LLM Instruction Tuning

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published