Skip to content

Gaia2 Star Extractor

Jussi Saarivirta edited this page Aug 1, 2021 · 6 revisions

The Gaia2 Star Extractor is a simple program that processes the Gaia2 dataset and extracts the bare minimum data out of it (RA, Dec coordinates + magnitude per star) for further processing into quads by the Gaia2 Quad Database Creator.

The input dataset that the program uses is the Gaia Data Release 2. The raw data files are available at http://cdn.gea.esac.esa.int/Gaia/gdr2/gaia_source/csv/ This is a pretty huge dataset, 547 GB (587 761 693 368 bytes) so it takes some time, patience, disk space and effort to download the whole dataset. You can use this bash command to download the entire directory if you're on Linux:

wget -r -np -nH --cut-dirs 4 http://cdn.gea.esac.esa.int/Gaia/gdr2/gaia_source/csv/

To run the program, you only need to give it a few parameters. Example:

WatneyAstrometry.GaiaStarExtractor.exe --max-magnitude 18.0 --out z:\gaia2stars --files z:\gaia2db --threads 10
  • --max-magnitude: the maximum magnitude to include in the output
  • --out: the output directory
  • --files: the directory where the dataset *.csv.gz files are
  • --threads: how many concurrent threads to use

The resulting output files will be much smaller, with max-magnitude at 18 the resulting data is just 6 Gb. The job will take a while, as the files are GZip compressed and need to be uncompressed on the fly - there's a lot of IO involved, so suffice to say good read speeds on your hard drive will speed up things. Optimally speed wise it might have been better to extract all the files first and then read the raw CSVs, but the disk space it requires is so immense and the times needed to run this operation is so low that it really isn't worth it. As for the time involved in running the star extractor, the last time I ran it it took ~2.5 hours, extracting ~301 million stars on a modern PC.

The extracted stars are organized into the 406 files (roughly equal area cells), ready for the quad formation which uses the same cell division. These files are the input that the Gaia2 Quad Database Creator will need to form the quad database.