Sort-of-CLEVR

Tensorflow implementation of Relation Networks and a VQA dataset named Sort-of-CLEVR proposed by DeepMind. In addition to the original approach, few custom questions, both relational and non-relational have also been used.

Description

This project includes a Tensorflow implementation of Relation Networks and a dataset generator which generates a synthetic VQA dataset named Sort-of-CLEVR proposed in the paper A Simple Neural Network Module for Relational Reasoning. The dataset consists of non-relational and relational questions. In case of natural language questions, the network has been augmented with LSTM to generate question embeddings. When the network is not augmented with LSTM, questions are hard encoded as one-hot-encoded questions.

Relation Networks

Relational reasoning is an essential component of intelligent systems. To this end, Relation Networks (RNs) are proposed to solve problems hinging on inherently relational concepts. To be more specific, RN is a composite function:

where o represents individual object while f and g are functions dealing with relational reasoning which are implemented as MLPs. Note that objects mentioned here are not necessary to be real objects; instead, they could consist of background, particular physical objects, textures, conjunctions of physical objects, etc. In the implementation, objects are defined by convoluted features. The model architecture proposed to solve Visual Question Answering (VQA) problems is as follows:

In addition to the RN model, a baseline model which consists of convolutional layers followed by MLPs is also provided in this implementation.

Sort-of-CLEVR

To verify the effectiveness of RNs, a synthesized VQA dataset is proposed in the paper named Sort-of-CLEVR. The dataset consists of paired questions and answers as well as images containing colorful shapes.

Each image has a number of shapes (rectangle or circle) which have different colors (red, blue, green, yellow, cyan, or magenta). Sort-of-CLEVR dataset consists of 6 objects with two uniquue shapes and six colors, one per object. We have a set of twenty three custom written questions comprising of both relational and non-relational questions per image. Here are some examples of images.

Questions are separated into relational and non-relational questions. The questions are encoded into vectors in the following three ways, while answers are represented as one-hot vectors.

Approach 1: The questions and answers are generated in the form of natural language. These are then converted to vectors by assigning a unique integer to a specific word. Corresponding to every unique word token a dictionary is maintained consisting of the word token with its integer value. This way we assign continous integer values to the word tokens in the questions and answers.
Approach 2: In this approach, every question is converted to a vector of length 20 using Doc2Vec.
Approach 3: Lastly, we follow the original approach of encoding questions as binary strings to prevent the effect of language parsing and embedding; while answers are represented as one-hot vectors.

Given a queried color, all the possible questions are as follows.

Non-relational questions

Is it a circle or a rectangle?
Is it closer to the bottom of the image?
Is it on the left of the image?

Relational questions

The color of the nearest object?
The color of the farthest object?

File format

Generated files use HDF5 file format. Each data point contains an image, an one-hot vector q encoding a question, and an one-hot vector a encoding the corresponding answer.

Note that this implementation only follows the main idea of the original paper while differing a lot in implementation details such as model architectures, hyperparameters, applied optimizer, etc. Also, the design of Sort-of-CLEVR only follows the high-level ideas of the one proposed in the orginal paper.

*This code is still being developed and subject to change.

Prerequisites

Usage

Datasets

Generate a default Sort-of-CLEVR dataset:

$ python generator.py

Or generate your own Sort-of-CLEVR dataset by specifying args:

$ python generator.py --train_size 12345 --img_size 256

*This code is still being developed and subject to change. The README.md will further be edited to include few more details.

Name		Name	Last commit message	Last commit date
Latest commit History 55 Commits
With LSTM		With LSTM
Without LSTM		Without LSTM
figure		figure
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

With LSTM

With LSTM

Without LSTM

Without LSTM

figure

figure

.gitignore

.gitignore

README.md

README.md

Repository files navigation

Sort-of-CLEVR

Description

Relation Networks

Sort-of-CLEVR

Prerequisites

Usage

Datasets

About

Releases

Packages

Languages

RishikMani/Sort-of-CLEVR

Folders and files

Latest commit

History

Repository files navigation

Sort-of-CLEVR

Description

Relation Networks

Sort-of-CLEVR

Prerequisites

Usage

Datasets

About

Resources

Stars

Watchers

Forks

Languages