The ML Solution to SKA SDC1

This repository shares icrar team's machine learning solution to the SKA Science Data Challenge 1. The ML solution has earned the team a second place in this data challenge.

Pre-processing

Convert the raw catalogues to CSV files
Split the entire image into a set I of small (205 by 205 pixel) cutouts
Spatially index each image cutout, and manage all indexes in the PostgreSQL database D
Go througth each "ground-truth" source S in the CSV catalogue
- Find the cutout C that contains S using its index in D
- Calculate the background noise level rms of S
- Check if the flux of S is greater than k (k = [0.5 to 3]) sigma above rms
  - If So, keep S in the training catalogue T
  - Else, discard S
Go through each valid source V in T
- Calculate the pix coordinates of its bounding box B based on its sky coordinates encoded in the catalogue
- Obtain the class label CL for V
- Assemble B and CL, together with some other identifiers (e.g. source id)as a valid source record R
Create the final JSON file J that contains
- names of all cutout images, each of which has at least one valid source
- a set of valid source records (many Rs)
Pass on both I and J to the following machine learning pipeline (see the section below)

Machine learning

Given I and J for each dataset (e.g. 1000h and B1), we trained ClaRAN - Classifying Radio Galaxies Automatically with Neural Networks to detect sources in all cutout images. Particurly, we used ClaRAN V0.2, which requires I and J to be organised as in the following directories:

SKASDC1/DATA_DIR/
  annotations/
    instances_train_B1_1000h.json
    instances_test_B1_1000h.json
    ...
  train_B1_1000h/
    SKAMid_B1_1000h_v3_train_image*.png
    ...
  val_B1_1000h/
    SKAMid_B1_1000h_v3_train_image*.png
    ...

All the above data is publicaly available. For detailed description of ClaRAN's detection algorithms, please refer to our paper.

We have also prepared a Python notebook that shows the basic steps to get started with training SDC1 datasets (B1, 1000 hours) with ClaRAN v0.2.

Name		Name	Last commit message	Last commit date
Latest commit History 34 Commits
data		data
docs		docs
scripts		scripts
LICENSE		LICENSE
README.md		README.md
claran_skasdc1_example.ipynb		claran_skasdc1_example.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

The ML Solution to SKA SDC1

Pre-processing

Machine learning

About

Releases

Packages

Languages

License

ICRAR/skasdc1

Folders and files

Latest commit

History

Repository files navigation

The ML Solution to SKA SDC1

Pre-processing

Machine learning

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages