Introduction

This is the source code of our TCYB 2020 paper "Unsupervised Visual-textual Correlation Learning with Fine-grained Semantic Alignment". Please cite the following paper if you use our code.

Yuxin Peng, Zhaoda Ye, Jinwei Qi and Yunkan Zhuo, "Unsupervised Visual-textual Correlation Learning with Fine-grained Semantic Alignment", IEEE Transactions on Cybernetics (TCYB), DOI:10.1109/TCYB.2020.3015084, Sep. 2020.

Dependency

The main code is implemented with pytorch.

Data Preparation

Obtain the image and text entity

We adopt the object detection model (https://github.com/peteanderson80/bottom-up-attention) and SceneGraphParser (https://github.com/vacancy/SceneGraphParser) to extracted the image and text entity. The entity files for flickr can be find in ./data folder.

Unsupervised Caption Generation

Based on IOU: Uses the script in ./caption/IOU
Based on Generation model: We adopt OpenNMT (https://github.com/OpenNMT/OpenNMT) for Caption generation. The generation captions for flickr can be find in ./data folder.

Usage

Local representation learning

cd ./Cross-modal/local

Train the model: sh script.sh

Test and obtain the similarity score: python test.py

Global representation learning

cd ./Cross-modal/global

Train the model: sh script.sh

Test and obtain the global representation: python test.py

Merge and test

Use the script in ./Merge

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Introduction

Dependency

Data Preparation

Obtain the image and text entity

Unsupervised Caption Generation

Usage

Local representation learning

Global representation learning

Merge and test

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
Cross-modal		Cross-modal
Merge		Merge
caption		caption
data		data
README.md		README.md

PKU-ICST-MIPL/UVCL_TCYB2020

Folders and files

Latest commit

History

Repository files navigation

Introduction

Dependency

Data Preparation

Obtain the image and text entity

Unsupervised Caption Generation

Usage

Local representation learning

Global representation learning

Merge and test

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages