Sample model: Cauthron

A Model, selected at random, from the dataset of the paper Taskonomy: Disentangling Task Transfer Learning. See the main repository of Taskonomy dataset for more details about the full data here.

This is only a single model (a small fraction of the dataset) as a sample. The full dataset includes > 4.5 million images from > 500 buildings with a similar format. Each image has annotations for every one of the 2D, 3D, and semantic tasks in Taskonomy's dictionary (see below).

For more details, please see the CVPR 2018 paper. Please also see the main repository and project website.

Data Statistics

The dataset consists of over 4.6 million images from 537 different buildings. The images are from indoor scenes. Images with people visible were exluded and we didn't include camera roll (pitch and yaw included). Below are some statistics about the images which comprise the dataset.

Image-level statistics

Property	Mean	Distribution
Camera Pitch	0.24°
Camera Roll	0.0°
Camera Field of View	61.2°
Distance (from camera to scene content)	5.3m
3D Obliqueness of Scene Content (wrt camera)	52.9°
Points in View (for point correspondences)	(median) 55

Point-level statistics

Property	Mean	Distribution
Cameras per Point	(median) 5

Camera-level statistics

Property	Mean	Distribution
Points/Camera	20.8

Model-level Statistics

Property	Mean	Distribution
Image Count	0.0°
Point Count	-0.77°
Camera Count	75°

Data structure

A model, selected at random, from the training set of the paper is shared in the repository. The folder structure is described below:

class_object/
    Object classification (Imagenet 1000) annotation distilled from ResNet-152
class_scene/
    Scene classification annotations distilled from PlaceNet
depth_euclidean/
    Euclidian distance images.
           Units of 1/512m with a max range of 128m.
depth_zbuffer/
   Z-buffer depth images.
       Units of 1/512m with a max range of 128m.
edge_occlusion/
    Occlusion (3D) edge images.
edge_texture/ 
    2D texture edge images.
keypoints2d/
    2D keypoint heatmaps.
keypoints3d/
    3D keypoint heatmaps.
nonfixated_matches/
    All (point', view') which have line-of-sight and a view of "point" within the camera frustum
normal/
    Surface normal images.
        127-centered
points/
    Metadata about each (point, view).
    For each image, we keep track of the optical center of the image.
    This is uniquely identified by the pair (point, view).
        Contains annotations for:
             Room layout
             Vanishing point
             Point matching
             Relative camera pose esimation (fixated)
             Egomotion
        And other low-dimensional geometry tasks. 
principal_curvature/
    Curvature images. 
        Principal curvatures are encoded in the first two channels.
        Zero curvature is encoded as the pixel value 127
reshading/
    Images of the mesh rendered with new lighting.
rgb/
    RGB images in 512x512 resolution.
rgb_large/
    RGB images in 1024x1024 resolution.
segment_semantic/
    Semantic segmentation annotations distilled from [FCIS](https://arxiv.org/pdf/1611.07709.pdf)
segment_unsup2d/
   Pixel-level unsupervised superpixel annotations based on RGB.
segment_unsup25d/
    Pixel-level unsupervised superpixel annotations based on RGB + Normals + Depth + Curvature.

Citation

If you find the code, data, or the models useful, please cite this paper:

@inproceedings{zamir2018taskonomy,
  title={Taskonomy: Disentangling Task Transfer Learning},
  author={Zamir, Amir R and Sax, Alexander and Shen, William B and Guibas, Leonidas and Malik, Jitendra and Savarese, Silvio},
  booktitle={2018 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
  year={2018},
  organization={IEEE}
}

Name		Name	Last commit message	Last commit date
Latest commit History 133 Commits
class_object		class_object
class_scene		class_scene
depth_euclidean		depth_euclidean
depth_zbuffer		depth_zbuffer
edge_occlusion		edge_occlusion
edge_texture		edge_texture
keypoints2d		keypoints2d
keypoints3d		keypoints3d
nonfixated_matches		nonfixated_matches
normal		normal
point_info		point_info
principal_curvature		principal_curvature
reshading		reshading
rgb		rgb
rgb_large		rgb_large
segment_semantic		segment_semantic
segment_unsup25d		segment_unsup25d
segment_unsup2d		segment_unsup2d
.gitignore		.gitignore
README.md		README.md

alexsax/taskonomy-sample-model-1

Folders and files

Latest commit

History

Repository files navigation

Sample model: Cauthron

Contents

Data Statistics

Image-level statistics

Point-level statistics

Camera-level statistics

Model-level Statistics

Data structure

Citation

About

Resources

Stars

Watchers

Forks