Learning to Imagine: Visually-Augmented Natural Language Generation, ACL'23 #831

AkihikoWatanabe · 2023-07-15T22:14:41Z

https://virtual2023.aclweb.org/paper_P3022.html

AkihikoWatanabe · 2023-07-22T15:40:10Z

People often imagine relevant scenes to aid in the writing process. In this work, we aim to utilize visual information for composition in the same manner as humans. We propose a method, LIVE, that makes pre-trained language models (PLMs) Learn to Imagine for Visually-augmented natural language gEneration. First, we imagine the scene based on the text: we use a diffusion model to synthesize high-quality images conditioned on the input texts. Second, we use CLIP to determine whether the text can evoke the imagination in a posterior way. Finally, our imagination is dynamic, and we conduct synthesis for each sentence rather than generate only one image for an entire paragraph. Technically, we propose a novel plug-and-play fusion layer to obtain visually-augmented representations for each text. Our vision-text fusion layer is compatible with Transformer-based architecture. We have conducted extensive experiments on four generation tasks using BART and T5, and the automatic results and human evaluation demonstrate the effectiveness of our proposed method. We will release the code, model, and data at the link: https://github.com/RUCAIBox/LIVE.

Translation (by gpt-3.5-turbo)

抽象的な場面を想像することは、執筆プロセスを支援するために人々がよく行うことです。本研究では、人間と同じように視覚情報を作文に活用することを目指しています。私たちは、事前学習済み言語モデル（PLMs）を使用して、視覚的に補完された自然言語生成のために学習する方法であるLIVEを提案します。まず、テキストに基づいて場面を想像します。入力テキストに基づいて高品質な画像を合成するために拡散モデルを使用します。次に、CLIPを使用して、テキストが想像力を喚起できるかを事後的に判断します。最後に、私たちの想像力は動的であり、段落全体に1つの画像を生成するのではなく、各文に対して合成を行います。技術的には、各テキストの視覚的に補完された表現を得るための新しいプラグアンドプレイの融合層を提案します。私たちのビジョンテキスト融合層は、Transformerベースのアーキテクチャと互換性があります。BARTとT5を使用して4つの生成タスクで広範な実験を行い、自動結果と人間の評価が私たちの提案手法の有効性を示しています。コード、モデル、データは以下のリンクから公開します：https://github.com/RUCAIBox/LIVE。

Summary (by gpt-3.5-turbo)

本研究では、視覚情報を活用した自然言語生成のためのLIVEという手法を提案しています。LIVEは、事前学習済み言語モデルを使用して、テキストに基づいて場面を想像し、高品質な画像を合成する方法です。また、CLIPを使用してテキストの想像力を評価し、段落ごとに画像を生成します。さまざまな実験により、LIVEの有効性が示されています。コード、モデル、データは公開されています。

AkihikoWatanabe · 2023-10-22T04:12:30Z

まず、テキストに基づいて場面を想像します。入力テキストに基づいて高品質な画像を合成するために拡散モデルを使用します。次に、CLIPを使用して、テキストが想像力を喚起できるかを事後的に判断します。最後に、私たちの想像力は動的であり、段落全体に1つの画像を生成するのではなく、各文に対して合成を行います。

興味深い

AkihikoWatanabe added the translation_required label Jul 22, 2023

AkihikoWatanabe changed the title ~~Learning to Imagine: Visually-Augmented Natural Language Generation~~ Learning to Imagine: Visually-Augmented Natural Language Generation, ACL'23 Oct 22, 2023

AkihikoWatanabe added ComputerVision NaturalLanguageGeneration NLP MulltiModal DiffusionModel TextToImage labels Oct 22, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Learning to Imagine: Visually-Augmented Natural Language Generation, ACL'23 #831

Learning to Imagine: Visually-Augmented Natural Language Generation, ACL'23 #831

AkihikoWatanabe commented Jul 15, 2023

AkihikoWatanabe commented Jul 22, 2023 •

edited

AkihikoWatanabe commented Oct 22, 2023 •

edited

Learning to Imagine: Visually-Augmented Natural Language Generation, ACL'23 #831

Learning to Imagine: Visually-Augmented Natural Language Generation, ACL'23 #831

Comments

AkihikoWatanabe commented Jul 15, 2023

AkihikoWatanabe commented Jul 22, 2023 • edited

Translation (by gpt-3.5-turbo)

Summary (by gpt-3.5-turbo)

AkihikoWatanabe commented Oct 22, 2023 • edited

AkihikoWatanabe commented Jul 22, 2023 •

edited

AkihikoWatanabe commented Oct 22, 2023 •

edited