Learning Multimodal Data Augmentation in Feature Space, ICLR'23 #546

AkihikoWatanabe · 2023-04-26T12:02:23Z

AkihikoWatanabe · 2023-04-26T12:05:21Z

The ability to jointly learn from multiple modalities, such as text, audio, and visual data, is a defining feature of intelligent systems. While there have been promising advances in designing neural networks to harness multimodal data, the enormous success of data augmentation currently remains limited to single-modality tasks like image classification. Indeed, it is particularly difficult to augment each modality while preserving the overall semantic structure of the data; for example, a caption may no longer be a good description of an image after standard augmentations have been applied, such as translation. Moreover, it is challenging to specify reasonable transformations that are not tailored to a particular modality. In this paper, we introduce LeMDA, Learning Multimodal Data Augmentation, an easy-to-use method that automatically learns to jointly augment multimodal data in feature space, with no constraints on the identities of the modalities or the relationship between modalities. We show that LeMDA can (1) profoundly improve the performance of multimodal deep learning architectures, (2) apply to combinations of modalities that have not been previously considered, and (3) achieve state-of-the-art results on a wide range of applications comprised of image, text, and tabular data.

Translation (by gpt-3.5-turbo)

複数のモダリティ（テキスト、音声、視覚データなど）からの共同学習能力は、インテリジェントシステムの特徴です。
マルチモーダルデータを活用するためのニューラルネットワークの設計には、有望な進展がありましたが、データ拡張の巨大な成功は、現在は画像分類などの単一モダリティのタスクに限定されています。
実際、各モダリティを拡張する際に、データの全体的な意味構造を保持することは特に困難です。たとえば、キャプションは翻訳などの標準的な拡張が適用された後、画像の良い説明ではなくなる場合があります。
さらに、特定のモダリティに合わせた変換を指定することは難しいです。
本論文では、LeMDA（Learning Multimodal Data Augmentation）を紹介します。これは、モダリティのアイデンティティやモダリティ間の関係に制約を設けずに、特徴空間で自動的にマルチモーダルデータを共同拡張するための簡単な方法です。
LeMDAは、(1) マルチモーダルディープラーニングアーキテクチャの性能を大幅に向上させることができ、(2) これまで考慮されていなかったモダリティの組み合わせに適用でき、(3) 画像、テキスト、表形式のデータからなる幅広いアプリケーションで最先端の結果を達成することができることを示しています。

Summary (by gpt-3.5-turbo)

マルチモーダルデータの共同学習能力は、インテリジェントシステムの特徴であるが、データ拡張の成功は単一モーダルのタスクに限定されている。本研究では、LeMDAという方法を提案し、モダリティのアイデンティティや関係に制約を設けずにマルチモーダルデータを共同拡張することができることを示した。LeMDAはマルチモーダルディープラーニングの性能を向上させ、幅広いアプリケーションで最先端の結果を達成することができる。

AkihikoWatanabe · 2023-10-22T06:47:37Z

Data Augmentationは基本的に単体のモダリティに閉じて行われるが、
マルチモーダルな設定において、モダリティ同士がどう関係しているか、どの変換を利用すべきかわからない時に、どのようにデータ全体のsemantic structureを維持しながら、Data Augmentationできるか？という話らしい

AkihikoWatanabe changed the title ~~Learning Multimodal Data Augmentation in Feature Space~~ Learning Multimodal Data Augmentation in Feature Space, ICLR'23 Oct 22, 2023

AkihikoWatanabe added MachineLearning DataAugmentation MulltiModal labels Oct 22, 2023

AkihikoWatanabe added the translation_required label Oct 22, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Learning Multimodal Data Augmentation in Feature Space, ICLR'23 #546

Learning Multimodal Data Augmentation in Feature Space, ICLR'23 #546

AkihikoWatanabe commented Apr 26, 2023

AkihikoWatanabe commented Apr 26, 2023 •

edited

AkihikoWatanabe commented Oct 22, 2023

Learning Multimodal Data Augmentation in Feature Space, ICLR'23 #546

Learning Multimodal Data Augmentation in Feature Space, ICLR'23 #546

Comments

AkihikoWatanabe commented Apr 26, 2023

AkihikoWatanabe commented Apr 26, 2023 • edited

Translation (by gpt-3.5-turbo)

Summary (by gpt-3.5-turbo)

AkihikoWatanabe commented Oct 22, 2023

AkihikoWatanabe commented Apr 26, 2023 •

edited