Generating scene graphs using Transformers with Knowledge infusion and Residual Connections

Scene Graph Generation is an active research topic which involves representing a visual scene in term of nodes and edges. Given an image, the objective is to determine the actors or objects present in an image and identify the relationship between the actors. The nodes in a scene graph are the proposed objects and the edges correspond to the relationship between the nodes. For instance, given an image containing a car and a person, the model needs to identify if there is an action connecting the car and the person if it exists. In this project, we extend the work done from the Relation Transformer research work. We introduce residual connections between modules and infuse prior knowledge about the objects into the system. We hypothesize that in doing so, the model will be able to identify the objects and their relationships faster and accurately. We compare the performance of the customized architecture against the baseline RelTR model on the visual genome dataset with 5,000 and 7,500 training samples. We provide detailed inference about the pros and cons of the proposed model.

Dependency Installation

Clone the repo

git clone https://github.com/rewanth22/RelTResidual.git

For accounts that are SSH configured

 git clone git@github.com:rewanth22/RelTResidual.git

Install pip
```
python -m pip install --upgrade pip
```

Create and Activate Virtual Environment (Linux)

python3 -m venv [environment-name]
source [environment-name]/bin/activate

Install dependencies
```
pip install -r requirements.txt
```

Training/Evaluation on Visual Genome

a) Follow [README](https://github.com/yrcong/RelTR/blob/main/data/README.md) in the data directory to prepare the datasets.

# compile the code computing box intersection
cd lib/fpn
sh make.sh

Inference

a) Download our RelTR model pretrained on the Visual Genome dataset and put it under

ckpt/checkpoint0149.pth

b) Infer the relationships in an image with the command:

python inference.py --img_path $IMAGE_PATH --resume $MODEL_PATH

Training

python main.py --dataset vg --img_folder data/vg/images/ --ann_path data/vg/ --batch_size 2 --output_dir ckpt

Evaluation

python main.py --dataset vg --img_folder data/vg/images/ --ann_path data/vg/ --eval --batch_size 1 --resume ckpt/checkpoint0149.pth

NOTE:

For working with the baseline, you need to swap 3 files. main.py with main_src.py, transformer.py with transformer_src.py and reltr.py with reltr_src.py. For working with the custom model vice-versa. This is done in order to prevent import error issues across different python scripts.

Authors

Krish Rewanth Sevuga Perumal, Prasannakumaran Dhanasekaran

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
lib		lib
models		models
util		util
.gitignore		.gitignore
Inference.ipynb		Inference.ipynb
README.md		README.md
RelTR_Demo.ipynb		RelTR_Demo.ipynb
engine.py		engine.py
inference.py		inference.py
inference_residual.py		inference_residual.py
main.py		main.py
main_src.py		main_src.py
prepareModel_data.ipynb		prepareModel_data.ipynb
requirement.txt		requirement.txt

PrasannaKumaran/RelTResidual

Folders and files

Latest commit

History

Repository files navigation

Generating scene graphs using Transformers with Knowledge infusion and Residual Connections

Dependency Installation

Training/Evaluation on Visual Genome

Inference

Training

Evaluation

NOTE:

Authors

About

Resources

Stars

Watchers

Forks

Languages