This repository contains a Python driver for the Visual Genome dataset, an extensive dataset and knowledge base designed to connect structured image concepts with language. Visual Genome includes 108,077 images annotated with objects, attributes, relationships, and region descriptions.
The main goal of this repository is to offer a simple, easy-to-use interface for:
- Visualizing images and their annotations, including scene graphs.
- Performing exploratory data analysis.
- Applying sentence clustering to categorize the dataset.
This driver enhances functionality from the original Visual Genome Python Driver and incorporates a Graph Visualization tool based on Graphviz.
Firstly, clone the repository into your local machine.
git clone https://github.com/KarahanS/Visual-Genome-Python-Driver.gitAs the official API is not supported, you have to download the following data files from the Visual Genome website:
- image_data.json
- objects.json
- region_descriptions.json
- attributes.json
- relationships.json
- attributes_synsets.json
- object_synsets.json
- relationship_synsets.json
- object_alias.txt
- relationship_alias.txt
Please locate all your downloaded files in a directory called data/ in the root of the repository. In total, it should take up ~2.5GB of space on disk. You can directly run the setup.py to download the files and locate them in the correct directory.
python setup.pyThen, to be able to use demo.ipynb notebook, you have to create a virtual environment and install the necessary packages. You can use any virtual environment manager you like, one example is given below.
python -m venv env
source env/bin/activate # For Linux
env\Scripts\activate # For WindowsInstall the required packages using the following command.
pip install -r vg-requirements.txtAs part of our experiments, we also ran SAM, SAM-2, and FC-CLIP on the Visual Genome dataset to collect the number of segmentations (with SAM) and the number of unique classes (with FC-CLIP) in order to explain the perceived complexity. If you wish to obtain these statistics as well, download the necessary json files from this link, and then call the load_sam_results, load_sam2_results, and load_fc_clip_results functions in the VisualGenome class.
Data loading, querying, and visualization are all done through the VisualGenome class. To visualize the images, please have a look at the demo.ipynb notebook. There are some statistics about the dataset in the analysis.ipynb notebook.
We also offer a clustering approach for the images in the dataset, based on the K-Means algorithm. This method utilizes sentence transformers to vectorize the region descriptions of the images. The results can be found in the clustering.ipynb notebook.
To be able to run the notebook, you have to install the requirements listed in clustering-requirements.txt, which is a superset of the vg-requirements.txt file. For reference, it uses PyTorch 2.4.1. for CUDA 12.1. You might need to install a different version of PyTorch for your system. Please refer to the official website for more information.
pip install -r clustering-requirements.txt