Dataset Distillation with Attention Labels for Fine-tuning BERT, ACL'23 #827

AkihikoWatanabe · 2023-07-14T07:20:19Z

https://virtual2023.aclweb.org/paper_P5706.html

AkihikoWatanabe · 2023-07-22T15:41:25Z

Dataset distillation aims to create a small dataset of informative synthetic samples to rapidly train neural networks that retain the performance of the original dataset. In this paper, we focus on constructing distilled few-shot datasets for natural language processing (NLP) tasks to fine-tune pre-trained transformers. Specifically, we propose to introduce attention labels, which can efficiently distill the knowledge from the original dataset and transfer it to the transformer models via attention probabilities. We evaluated our dataset distillation methods in four various NLP tasks and demonstrated that it is possible to create distilled few-shot datasets with the attention labels, yielding impressive performances for fine-tuning BERT. Specifically, in AGNews, a four-class news classification task, our distilled few-shot dataset achieved up to 93.2% accuracy, which is 98.5% performance of the original dataset even with only one sample per class and only one gradient step.

Translation (by gpt-3.5-turbo)

データセットの蒸留は、元のデータセットのパフォーマンスを保持しながら、迅速にニューラルネットワークをトレーニングするための情報豊かな合成サンプルの小さなデータセットを作成することを目指しています。本論文では、事前学習済みのトランスフォーマーを微調整するための自然言語処理（NLP）タスクの蒸留されたfew-shotデータセットの構築に焦点を当てています。具体的には、注意ラベルを導入することを提案し、注意確率を介して元のデータセットから知識を効率的に蒸留し、トランスフォーマーモデルに転送することができます。私たちは、4つの異なるNLPタスクでデータセットの蒸留方法を評価し、注意ラベルを使用してfew-shotデータセットを作成し、BERTの微調整において印象的なパフォーマンスを実現できることを示しました。具体的には、4クラスのニュース分類タスクであるAGNewsでは、クラスごとにわずか1つのサンプルとわずか1つの勾配ステップのみで、93.2％の精度を達成し、元のデータセットの98.5％のパフォーマンスを実現しました。

Summary (by gpt-3.5-turbo)

本研究では、データセットの蒸留を使用して、元のデータセットのパフォーマンスを保持しながら、ニューラルネットワークを迅速にトレーニングするための小さなデータセットを作成する方法に焦点を当てています。具体的には、事前学習済みのトランスフォーマーを微調整するための自然言語処理タスクの蒸留されたfew-shotデータセットの構築を提案しています。実験結果では、注意ラベルを使用してfew-shotデータセットを作成し、BERTの微調整において印象的なパフォーマンスを実現できることを示しました。例えば、ニュース分類タスクでは、わずか1つのサンプルとわずか1つの勾配ステップのみで、元のデータセットの98.5％のパフォーマンスを達成しました。

AkihikoWatanabe · 2023-10-22T04:24:52Z

Datadistillationしたら、データセットのうち1サンプルのみで、元のデータセットの98.5%の性能を発揮できたという驚異的な研究（まえかわ君）

AkihikoWatanabe added the translation_required label Jul 22, 2023

AkihikoWatanabe changed the title ~~Dataset Distillation with Attention Labels for Fine-tuning BERT~~ Dataset Distillation with Attention Labels for Fine-tuning BERT, ACL'23 Oct 22, 2023

AkihikoWatanabe added DataDistillation NLP Zero/Few-shot Attention labels Oct 22, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dataset Distillation with Attention Labels for Fine-tuning BERT, ACL'23 #827

Dataset Distillation with Attention Labels for Fine-tuning BERT, ACL'23 #827

AkihikoWatanabe commented Jul 14, 2023

AkihikoWatanabe commented Jul 22, 2023 •

edited

AkihikoWatanabe commented Oct 22, 2023

Dataset Distillation with Attention Labels for Fine-tuning BERT, ACL'23 #827

Dataset Distillation with Attention Labels for Fine-tuning BERT, ACL'23 #827

Comments

AkihikoWatanabe commented Jul 14, 2023

AkihikoWatanabe commented Jul 22, 2023 • edited

Translation (by gpt-3.5-turbo)

Summary (by gpt-3.5-turbo)

AkihikoWatanabe commented Oct 22, 2023

AkihikoWatanabe commented Jul 22, 2023 •

edited