Visual Genome Python Driver

This repository contains a Python driver for the Visual Genome dataset, an extensive dataset and knowledge base designed to connect structured image concepts with language. Visual Genome includes 108,077 images annotated with objects, attributes, relationships, and region descriptions.

The main goal of this repository is to offer a simple, easy-to-use interface for:

Visualizing images and their annotations, including scene graphs.
Performing exploratory data analysis.
Applying sentence clustering to categorize the dataset.

This driver enhances functionality from the original Visual Genome Python Driver and incorporates a Graph Visualization tool based on Graphviz.

Installation

Firstly, clone the repository into your local machine.

git clone https://github.com/KarahanS/Visual-Genome-Python-Driver.git

As the official API is not supported, you have to download the following data files from the Visual Genome website:

Please locate all your downloaded files in a directory called data/ in the root of the repository. In total, it should take up ~2.5GB of space on disk. You can directly run the setup.py to download the files and locate them in the correct directory.

python setup.py

Then, to be able to use demo.ipynb notebook, you have to create a virtual environment and install the necessary packages. You can use any virtual environment manager you like, one example is given below.

python -m venv env
source env/bin/activate  # For Linux
env\Scripts\activate  # For Windows

Install the required packages using the following command.

pip install -r vg-requirements.txt

As part of our experiments, we also ran SAM, SAM-2, and FC-CLIP on the Visual Genome dataset to collect the number of segmentations (with SAM) and the number of unique classes (with FC-CLIP) in order to explain the perceived complexity. If you wish to obtain these statistics as well, download the necessary json files from this link, and then call the load_sam_results, load_sam2_results, and load_fc_clip_results functions in the VisualGenome class.

Usage

Data loading, querying, and visualization are all done through the VisualGenome class. To visualize the images, please have a look at the demo.ipynb notebook. There are some statistics about the dataset in the analysis.ipynb notebook.

Clustering

We also offer a clustering approach for the images in the dataset, based on the K-Means algorithm. This method utilizes sentence transformers to vectorize the region descriptions of the images. The results can be found in the clustering.ipynb notebook.

To be able to run the notebook, you have to install the requirements listed in clustering-requirements.txt, which is a superset of the vg-requirements.txt file. For reference, it uses PyTorch 2.4.1. for CUDA 12.1. You might need to install a different version of PyTorch for your system. Please refer to the official website for more information.

pip install -r clustering-requirements.txt

Name		Name	Last commit message	Last commit date
Latest commit History 42 Commits
data		data
graphviz		graphviz
notebooks		notebooks
visual_genome		visual_genome
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
clustering-requirements.txt		clustering-requirements.txt
setup.py		setup.py
vg-requirements.txt		vg-requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Visual Genome Python Driver

Installation

Usage

Clustering

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

KarahanS/Visual-Genome-Python-Driver

Folders and files

Latest commit

History

Repository files navigation

Visual Genome Python Driver

Installation

Usage

Clustering

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages