[TACL] How much do language models copy from their training data? Evaluating linguistic novelty in text generation using RAVEN, TACL'23 #828

AkihikoWatanabe · 2023-07-14T07:23:35Z

https://virtual2023.aclweb.org/paper_T4503.html

AkihikoWatanabe · 2023-10-22T04:20:44Z

Current language models can generate high-quality text. Are they simply copying text they have seen before, or have they learned generalizable linguistic abstractions? To tease apart these possibilities, we introduce RAVEN, a suite of analyses for assessing the novelty of generated text, focusing on sequential structure (n-grams) and syntactic structure. We apply these analyses to four neural language models trained on English (an LSTM, a Transformer, Transformer-XL, and GPT-2). For local structure—e.g., individual dependencies—text generated with a standard sampling scheme is substantially less novel than our baseline of human-generated text from each model’s test set. For larger-scale structure—e.g., overall sentence structure—model-generated text is as novel or even more novel than the human-generated baseline, but models still sometimes copy substantially, in some cases duplicating passages over 1,000 words long from the training set. We also perform extensive manual analysis, finding evidence that GPT-2 uses both compositional and analogical generalization mechanisms and showing that GPT-2’s novel text is usually well-formed morphologically and syntactically but has reasonably frequent semantic issues (e.g., being self-contradictory).

Translation (by gpt-3.5-turbo)

現在の言語モデルは高品質なテキストを生成することができます。彼らは単に以前に見たテキストをコピーしているのでしょうか、それとも一般化可能な言語的抽象化を学んでいるのでしょうか？これらの可能性を明確にするために、私たちはRAVENという、生成されたテキストの新規性を評価するための分析スイートを紹介します。この分析は、シーケンシャル構造（n-gram）と構文構造に焦点を当てています。私たちは、英語で訓練された4つのニューラル言語モデル（LSTM、Transformer、Transformer-XL、GPT-2）にこれらの分析を適用します。局所的な構造（例：個々の依存関係）については、標準的なサンプリング手法で生成されたテキストは、各モデルのテストセットからの人間によるテキストのベースラインよりもかなり新規性に欠けています。大規模な構造（例：文全体の構造）については、モデルによって生成されたテキストは人間によるベースラインと同じくらい新規性があり、場合によっては訓練セットからの1,000語以上のパッセージを重複してコピーすることもあります。また、詳細な手動分析も行い、GPT-2が組成的および類推的な一般化メカニズムの両方を使用している証拠を見つけ、GPT-2の新規テキストが形態的および構文的に妥当であるが、意味的な問題（自己矛盾など）が比較的頻繁に発生することを示しています。

Summary (by gpt-3.5-turbo)

この研究では、言語モデルが生成するテキストの新規性を評価するための分析スイートRAVENを紹介しています。英語で訓練された4つのニューラル言語モデルに対して、局所的な構造と大規模な構造の新規性を評価しました。結果として、生成されたテキストは局所的な構造においては新規性に欠けており、大規模な構造においては人間と同程度の新規性があり、時には訓練セットからの重複したテキストを生成することもあります。また、GPT-2の詳細な手動分析により、組成的および類推的な一般化メカニズムの使用が示され、新規テキストが形態的および構文的に妥当であるが、意味的な問題が比較的頻繁に発生することも示されました。

AkihikoWatanabe added translation_required NLP Novelty Evaluation NaturalLanguageGeneration labels Oct 22, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[TACL] How much do language models copy from their training data? Evaluating linguistic novelty in text generation using RAVEN, TACL'23 #828

[TACL] How much do language models copy from their training data? Evaluating linguistic novelty in text generation using RAVEN, TACL'23 #828

AkihikoWatanabe commented Jul 14, 2023

AkihikoWatanabe commented Oct 22, 2023 •

edited

[TACL] How much do language models copy from their training data? Evaluating linguistic novelty in text generation using RAVEN, TACL'23 #828

[TACL] How much do language models copy from their training data? Evaluating linguistic novelty in text generation using RAVEN, TACL'23 #828

Comments

AkihikoWatanabe commented Jul 14, 2023

AkihikoWatanabe commented Oct 22, 2023 • edited

Translation (by gpt-3.5-turbo)

Summary (by gpt-3.5-turbo)

AkihikoWatanabe commented Oct 22, 2023 •

edited