Skip to content

JingfengYang/Multi-modal-Deep-Learning

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

26 Commits
 
 

Repository files navigation

Multi-modal-Deep-Learning

Recent Multi-modal Deep Learning Advances (list of papers and highlights).


Introduction

Prelude

There are many advances of using unified models (e.g. Transformer) to create representations for multiple modalities. Some of them even enable fusion of multiple modalities to make different modalities help each other. Here, multiple modalities not only include natural language, vision and speech, but also include formal language (e.g. code), (semi-)structured knowledge (e.g. table, KG etc.) and biological/chemical compounds (e.g. protein, molecular, etc.). This is a list of recent important papers in this field. Welcome to contribute.

Resources

Natural Language

Vision

Supervised Vision Tasks

Unsupervised Vision Representation Learning

Speech

Unsupervised Speech Representation Learning

Unsupervised Automatic Speech Recognition

Formal Language

Structured Knowledge

Table

Knowledge Graph

Retrieval Paragraphs as Knowledge

Biology and Chemistry

Protein

Molecular

Modality Fusion

Vision and Natural Language

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published