[ICML 2024] LESS: Selecting Influential Data for Targeted Instruction Tuning
-
Updated
Oct 20, 2024 - Jupyter Notebook
[ICML 2024] LESS: Selecting Influential Data for Targeted Instruction Tuning
Official Repository of "LLM × DATA" Survey Paper
DSIR large-scale data selection framework for language model training
A Survey on Data Selection for Language Models
⛔ [DEPRECATED] Adapt Transformer-based language models to new text domains
InstructionGPT-4
🐂 🔥Official repository for the paper "LEAD: Iterative Data Selection for Efficient LLM Instruction Tuning".
[ACL 2023] The code for our ACL'23 paper Cold-Start Data Selection for Few-shot Language Model Fine-tuning: A Prompt-Based Uncertainty Propagation Approach
Official code for MIG: Automatic Data Selection for Instruction Tuning by Maximizing Information Gain in Semantic Space
Code for ACL 2025 Main paper "Data Whisperer: Efficient Data Selection for Task-Specific LLM Fine-Tuning via Few-Shot In-Context Learning".
This is an official repository for "Performance Scaling via Optimal Transport: Enabling Data Selection from Partially Revealed Sources" (NeurIPS 2023).
Enhancing Efficiency in Multidevice Federated Learning through Data Selection
Enhanced spatio-temporal electric load forecasts with less data using active deep learning
Keras sentence classification
Repository for the experiments in my paper accepted to the CLIN Journal: "Selecting Parallel In-domain Sentences for Neural Machine Translation Using Monolingual Texts"
A Python package for studying neural learning
Dynamic Transfer Learning for Low-Resource Neural Machine Translation
Code for NeurIPS 2023 Paper (Imitation Learning from Imperfection: Theoretical Justifications and Algorithms)
An Approach to Enhancing the Efficacy of Post-Training Using Synthetic Data by Iterative Data Selection
Add a description, image, and links to the data-selection topic page so that developers can more easily learn about it.
To associate your repository with the data-selection topic, visit your repo's landing page and select "manage topics."