Improving generalization by mimicking the human visual diet

Paper • Overview • Using the Codebase • Accessing the data •

This repository contains the official implementation of our paper: Improving generalization by mimicking the human visual diet. Here you can find the code and the data used for this project.

The paper can be accessed here.

Authors

Spandan Madan • Li You • Mengmi Zhang • Hanspeter Pfister • Gabriel Kreiman

Overview

We present a new perspective on bridging the generalization gap between biological and computer vision---mimicking the human visual diet. While computer vision models rely on internet-scraped datasets, humans learn from limited 3D scenes under diverse real-world transformations with objects in natural context. Our results demonstrate that incorporating variations and contextual cues ubiquitous in the human visual training data (visual diet) significantly improves generalization to real-world transformations such as lighting, viewpoint, and material changes. This improvement also extends to generalizing from synthetic to real-world data---all models trained with a human-like visual diet outperform specialized architectures by large margins when tested on natural image data. These experiments are enabled by our two key contributions: a novel dataset capturing scene context and diverse real-world transformations to mimic the human visual diet, and a transformer model tailored to leverage these aspects of the human visual diet.

Codebase

Our work builds on top of three existing codebase---Openrooms as the base renderer which we modify to create the HVD dataset, WhenPigsFly for base architecture which we modify to create our proposed model HDNet, and [DomainBed] (https://github.com/facebookresearch/DomainBed/tree/main) for Domain Generalization benchmarks. Each of these codebases were modified significantly from their original versions to adapt them for our work. Thus, we provide the adapted versions within this codebase to make our codebase standalone.

Code Structure

rendering: Contains all code used for rendering the Human Visual Diet (HVD) dataset.

training_models: Contains all code for training models on the HVD dataset.

dataset: Placeholder, where data needs to be downloaded. Please see below for more details.

System Requirements

Analysis was conducted on Harvard's FASRC clusters (https://www.rc.fas.harvard.edu). Machine architectures are as follows:

Operating System: CentOS Linux 7 (Core)
CPE OS Name: cpe:/o:centos:centos:7
Kernel: Linux 3.10.0-1062.el7.x86_64
Architecture: x86-64

The code should work out of the box on most linux distributions. The code was not tested on MacOS, or any other linux distribution except CentOS Linux 7 (Core). Code was run using Python 3.8.5. Exact version numbers for python pacakges can be found in the hvd_requirements.txt file. GPUs will be needed to accelerate training and inference time.

Installation

Clone this github repository.
Install the required python packages using pip in accordance with the in_dist_requirements.txt file. This includes all package names and versions used in our analysis.
Download the data (not needed to run demos) following instructions below.

Data

The HVD dataset can be found here: https://drive.google.com/drive/folders/1W0Wxp3DYGjzHNxOUnZI-H9uyR6D9mU83?usp=share_link

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
dataset		dataset
rendering		rendering
scripts		scripts
training_models/HDNet		training_models/HDNet
README.md		README.md
fig_1_overview.jpg		fig_1_overview.jpg
fig_1_overview.pdf		fig_1_overview.pdf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

dataset

dataset

rendering

rendering

scripts

scripts

training_models/HDNet

training_models/HDNet

README.md

README.md

fig_1_overview.jpg

fig_1_overview.jpg

fig_1_overview.pdf

fig_1_overview.pdf

Repository files navigation

Improving generalization by mimicking the human visual diet