Skip to content

FraunhoferChalmersCentre/ska-sdc-2

Repository files navigation

Description

This is the repository for the pipeline of team FORSKA-Sweden, which participated in SKA Science Data Challenge 2. The challenge was about detecting galaxies from a HI data cube and corresponding characteristics such as line flux integral and major axis. The production-ready pipeline consisted of the following components

  1. Traverse the HI cube with a segmentation model (U-Net) for producing a binary mask. The purpose of the mask is to separate voxels belonging to a galaxy and the background.
  2. Refine the mask and compute a list of separated mask objects. This step is mainly performed by the SoFiA software, which was seamlessly incorporated.
  3. Given the objects and corresponding masks, estimate the characterization properties evaluated in the challenge to produce the full catalogue.

This implementation is consistent with the paper "Utilization of convolutional neural networks for HI source finding" ( Håkansson et al. in preparation)

The code in this repository is mainly aimed to scientists doing research on source-finding methods, rather than seeing this as a ready-to-use source-finding tool.

Code overview

All code for the pipeline is put into the pipeline module, which in its turn has four sub-modules with corresponding responsibilities:

  • common: Miscellaneous tasks used in different parts of the pipeline, currently all related to handling of file reading and writing
  • data: Generating of target mask and creation of data set readable by PyTorch-based components
  • hyperparameter: Utils for hyperparameter optimization, given a trained segmentation model
  • segmentation: Providing the segmentation model, based on PyTorch. Also tools for training and validation.
  • traversing: Efficiently traversing a full fits file, given a segmentation model and hyperparameters.
  • downstream: Refine a binary mask, object extract and estimation of characteristics. This is mainly done by calling SoFiA.

Requirements

Linux is the only OS used in development. CUDA-compatible GPU is highly recommended for acceleration of training and traversing tasks.

Installation

If you are using Swiss supercomputer Piz Daint (https://user.cscs.ch/), simply run source environment/daint.bash.

Otherwise, start with an empty environment with Python 3.8 and run pip install environment/requirements.txt

You also need to install also the correct version of CUDA toolkit, depending on hardware. Find out more about it here: https://pytorch.org/get-started/previous-versions/

Further, also SoFiA 1.3.2 needs to be installed. See https://github.com/SoFiA-Admin/SoFiA for installation instructions.

Instructions

A typical use of the code may look like this:

  1. Download data files: python download_data.py --type dev_s. Available data types are dev_s, dev_l & eval.
  2. Create data set files: python create_dataset.py. Will create datset files to be used in model fitting.
  3. Train the model: python model_fitting.py. Models performing best validation will be saved to /saved_models/
  4. Select saved model to use by changing traversing -> checkpoint in config.yaml
  5. Create data for hyperparameter optimization: python save_hparam_dataset.py
  6. Run hyperparameter optimization: python hyperparameter_search.py
  7. Currently, hyperparameters need to be set manually. Hyperparameters and corresponding scores are logged to Tensorboard in hparam_logs directory.
  8. Traversing a HI cube (fits-file) to produce a full catalogue: python traverse_cube.py. Argument --n-parallel can be used to split the job into n-parallel separate and independent jobs. Also specify 0<_--i-job<n-parallel, to specify index of the job to start.
  9. If traversing was split to different jobs, there will be a separate catalogue for each job. To merge these into one single file: python merge_catalogues.py

Useful links:

Web page for SDC2: https://sdc2.astronomers.skatelescope.org/

Discussion and support forum: https://sdc2-discussion.astronomers.skatelescope.org/

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •