Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.Sign up
Challenge dataset format
Currently, the challenge data is distributed in the form of gzipped tarfiles, one per branch. For example, one tarfile is
control-ground-constant.tar.gz. If you extract its contents using
tar xvfz control-ground-constant.tar.gz
then it will create a directory structure,
containing a large number of files. Note that each tarball will result in a new directory of the form
The following types of files are included:
The FITS images containing the 100x100 grids of galaxies are called image-###-#.fits. The 3-digit number is the subfield index, which ranges from 0-199. The 1-digit number is the epoch index. For single-epoch branches (belonging to the "control", "real galaxy", and "variable PSF" experiments), there is just a single epoch with index 0. For multi-epoch branches, there are 6 epochs with indices from 0-5.
The grids correspond to 10x10 degree regions. The pixel scale depends on the branch: for ground-based images, the pixel scale is 0.2 arcsec, whereas for space-based images, the pixel scale is 0.05 arcsec (single epoch branches) or 0.10 arcsec (multi-epoch branches). For all but single-epoch space simulations, the galaxies lie at the center of 48x48 pixel postage stamps, modulo sub-pixel offsets to be determined by the participants; for single-epoch space simulations, due to the small pixel scales, we use 96x96 postage stamps. Note that the postage stamp sizes can be deduced automatically from the catalogs described below, since the galaxy (x, y) positions are always at the centers of postage stamps.
Important reminder: if using NumPy to extract the postage stamps from the 100x100 grids, please remember that its indexing convention is [y, x] instead of the usual (x, y).
The FITS images containing the star fields are called starfield_image-###-#.fits. The meaning of the 3- and 1-digit numbers in the filename is the same as for the galaxy image files. The format of the star fields depends on the branch:
For branches with constant PSFs, there are 9 PSF images in the star field, the lower left of which is perfectly centered within the 48x48 postage stamp (i.e., centered on a corner where 4 pixels meet) and the other 8 of which are offset. But they are the same underlying PSF, and there is no pixel noise added. The pixel scale is the same as for the galaxy images. Note change: in the originally released multiepoch simulations, the stars that were offset had different offsets in each epoch. In the new version as of mid-December, this is no longer true: the pattern of sub pixel shifts between stars in a star field is the same in each epoch.
For branches with variable PSFs, there are a large number of stars in the star field. For convenience, we have distributed them as postage stamps on grids, however the observed positions do not correspond to the true (random) positions, which are given in the catalog. Noise is included in the images, so the stars have an observationally-motivated S/N distribution from 25 to 400.
By construction the PSF is the same for subfields in a field for any given epoch, but we distribute star fields for all subfields despite the fact that they are identical. For variable PSF branches, stars are located at different locations in the starfields for different subfields, so they can be used together to better reconstruct the spatial variation of the PSF.
Galaxy catalogs are stored as FITS and ascii tables, in files called galaxy_catalog-###.fits and galaxy_catalog-###.txt. The meaning of the 3-digit number in the filename is the same as for the galaxy image files. These files have 3 entries per galaxy for the case of constant PSF, or 9 for the case of variable PSF. We first describe the 3 entries that are in all galaxy catalogs:
- x position in the image, in units of pixels, zero-indexed (contrary to the FITS and Fortran conventions, but consistent with C/C++ and python). It does not include information about sub-pixel centroid offsets. For example, "23" means it is at the center of the first postage stamp, with 48 pixels on a side, so in a 1-indexed convention, the centroid would be at the edge between pixels 24 and 25 along the x direction.
- y position in the image, in units of pixels, zero-indexed. It does not include information about sub-pixel centroid offsets.
- Object ID: a 9-digit identifier for this object. This number encodes information about where the galaxy lies in the subfield and in the field containing that subfield.
Galaxy catalogs for variable PSF branches have additional information relating to how the galaxy lies with respect to the tiles on which the PSF is defined. See the handbook for more information, but the basic picture is that there are multiple tiles within an image on which the PSF is defined, so that participants can define a per-tile PSF. We provide information regarding the tile on which the object is located and where on the tile, so that it's easy to map out the galaxy positions with respect to the grid of PSF tiles:
- x_tile_index, y_tile_index: The (x, y) indices of the tile on which the object is located, zero-indexed.
- tile_x_pos_deg, tile_y_pos_deg: x and y position on that galaxy's tile, in degrees.
- x_field_true_deg, y_field_true_deg: x and y location of the galaxy within the larger field that contains 20 subfields, in degrees.
Star catalogs are stored as FITS and ascii tables, in files called star_catalog-###-#.fits and star_catalog-###-#.txt. For constant PSF branches, these files have 2 entries per star: the x and y positions in the image in units of pixels, zero-indexed (contrary to the FITS and Fortran conventions, but consistent with C/C++ and python). The first star in the catalog is precisely centered within a postage stamp, meaning the centroid is at a corner where 4 pixels meet since the stamps have an even number of pixels; the others have random sub-pixel offsets.
Star catalogs for variable PSF branches have additional information relating to how the star lies with respect to the tiles on which the PSF is defined. This is particularly important since the gridded star positions bear no relation to the simulated star position within the field. In addition to the x and y positions of the postage stamp, the star catalogs for variable PSF branches also include the following information:
- x_tile_index, y_tile_index: x and y indices of the PSF tile on which the star is truly located (zero-indexed). This is an indication of the true star position, which is unrelated to its position in the gridded images given in the (x, y) fields.
- tile_x_pos_deg, tile_y_pos_deg: x and y positions on that star's tile, in degrees.
- x_field_true_deg, y_field_true_deg: x and y location of the star within the larger field that contains 20 subfields, in degrees.
Information about offsets of the subfield from the coordinate system defined by the field, in files called subfield_offset-###.txt and subfield_offset-###.yaml (the same information in two different formats). These contain two numbers: the x and y offset of that subfield in units of degrees. By definition they are (0, 0) for the first subfield in the field.
For the multiepoch imaging, we provide information about the x- and y-dithers between different epochs, unlike in reality where observers must determine this empirically (which is its own separate, complex software problem). In our simulations, these dithers are sub-pixel distances given in files called epoch_dither-###-#.txt and epoch_dither-###-#.yaml (the same information in two different formats). These contain two numbers: the x and y dither of that epoch relative to an underlying common reference origin, in units of pixels. These dithers are defined in the sense that corresponds with the shift of apparent galaxy positions between epochs: if epoch 0 has x_dither_0 and epoch 1 has x_dither_1 > x_dither_0, then the images in epoch 1 will all appear shifted by an amount x_dither_1 - x_dither_0 along the FITS positive x direction relative to the images in epoch 0. By definition they are (0, 0) for single epoch simulations.
Deep field information
There are also files with the same format and naming convention for the 5 "deep" subfields, which extend 1.5 magnitudes deeper. These can be used to infer something about the galaxy p(e) or distributions of other properties, and results for shears should not be submitted. These files have the same format as those for regular subfields, but the filenames all start with "deep".