<a href="https://colab.research.google.com/github/OllyK/Cata2Data/blob/colab/examples/lotsddr2/Create_LoTTS_Dataset.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Create a LoTTS Dataset Using Cata2Data

To start, create a local clone of this repository

Install cata2data into your local environment (We recommend that you should use a venv on your local machine).



In [1]:
!git clone https://github.com/mb010/Cata2Data.git && pip install ./Cata2Data

Cloning into 'Cata2Data'...
remote: Enumerating objects: 554, done.[K
remote: Counting objects: 100% (187/187), done.[K
remote: Compressing objects: 100% (81/81), done.[K
remote: Total 554 (delta 112), reused 135 (delta 94), pack-reused 367 (from 1)[K
Receiving objects: 100% (554/554), 6.16 MiB | 11.28 MiB/s, done.
Resolving deltas: 100% (230/230), done.
Processing ./Cata2Data
  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone
Collecting astropy==5.3.4 (from cata2data==0.1.dev111)
  Downloading astropy-5.3.4-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (9.4 kB)
Collecting regions==0.7 (from cata2data==0.1.dev111)
  Downloading regions-0.7-cp310-cp310-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (4.2 kB)
Collecting spectral-cube==0.6.0 (from cata2data==0.1.dev111)
  Downloading spectral_cube-0

## Download the data

Use the `data_scrapper.py` script to download the image files. If you want to just download one pointing (instead of all 841 pointings; 434 GB), then call it using the --test flag:

In [2]:
%%python Cata2Data/examples/lotssdr2/data_scrapper.py --dir downloaded_data/ --test

  0%|          | 0/841 [00:00<?, ?it/s]  0%|          | 0/841 [00:26<?, ?it/s]


This will have downloaded a .fits image file

In [3]:
!ls downloaded_data/public/DR2/mosaics/P000+23/

mosaic-blanked.fits


Next, you need to download the catalog directly from the website at this link (3.9 GB). This dataloader is currently built to work with the [Radio-optical cross match](https://lofar-surveys.org/dr2_release.html#:~:text=Radio%2Doptical%20crossmatch%20catalogue) catalog described in [Hardcastle et al. 2023](https://arxiv.org/abs/2309.00102).

In [4]:
!wget -P downloaded_data/ https://lofar-surveys.org/public/DR2/catalogues/combined-release-v1.1-LM_opt_mass.fits

--2024-09-23 16:22:14--  https://lofar-surveys.org/public/DR2/catalogues/combined-release-v1.1-LM_opt_mass.fits
Resolving lofar-surveys.org (lofar-surveys.org)... 147.197.135.238
Connecting to lofar-surveys.org (lofar-surveys.org)|147.197.135.238|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 3934039680 (3.7G) [image/fits]
Saving to: ‘downloaded_data/combined-release-v1.1-LM_opt_mass.fits’


2024-09-23 16:24:55 (23.3 MB/s) - ‘downloaded_data/combined-release-v1.1-LM_opt_mass.fits’ saved [3934039680/3934039680]



## Split the Catalogue

This will take the full catalog and split it into one catalog per image and save those into the folder where each of those images is stored. This is what Cata2Data currently expects - lists of images and catalogs with equal length to use to construct a dataloader.

In [1]:
from astropy.table import Table

t = Table.read("downloaded_data/combined-release-v1.1-LM_opt_mass.fits", memmap=True)
t.meta



OrderedDict([('EXTNAME', 'Joined'),
             ('TCOMM53',
              'Profile-weighted fraction of the flux from other sources divided by the total flux in r (typically [0,1])'),
             ('TUCD53', 'pos.angDistance'),
             ('DATE-HDU', '2023-06-10T10:05:15'),
             ('STILVERS', '4.1'),
             ('STILCLAS', 'uk.ac.starlink.votable.FitsPlusTableWriter'),
             ('TCOMM1', 'Object identifier (ILT name)'),
             ('TCOMM2', 'Radio right ascension (mean position )'),
             ('TCOMM3', 'Radio declination (mean position)'),
             ('TCOMM4', 'Error on radio right ascension'),
             ('TCOMM5', 'Error on radio declination'),
             ('TCOMM6', '144-MHz total flux density'),
             ('TCOMM7', 'Error on total flux density'),
             ('TCOMM8', '144-MHz peak flux density'),
             ('TCOMM9', 'Error on peak flux density'),
             ('TCOMM10', 'PyBDSF source code or Z for composite source'),
             ('TCOMM

In [4]:
t['RA']

<Column name='RA' dtype='float64' unit='deg' length=4167359>
 0.0001397891057877132
0.00011082670675932604
 0.0001816635306256531
0.00023666072613770517
 0.0003524535386459082
 0.0003573802828782391
0.00044235317748270164
 0.0005162365512774159
 0.0006475578061639897
 0.0006889562333691757
 0.0007453990920034403
 0.0008821790719935052
                   ...
     359.9991599081277
     359.9992068877408
     359.9993720624978
    359.99935940683304
     359.9994620133685
    359.99955121832613
     359.9996404494238
    359.99966051410416
     359.9997741092799
    359.99984246358594
    359.99996960231243
    359.99996945769766

In [21]:
!python Cata2Data/examples/lotssdr2/catalog_splitter.py --catalogue_path downloaded_data/combined-release-v1.1-LM_opt_mass.fits --image_paths downloaded_data/public/DR2/mosaics/P000+23/


['downloaded_data/public/DR2/mosaics/P000+23/mosaic-blanked.fits']
^C


In [32]:
from astropy.io import fits
from astropy.wcs import WCS

with fits.open('downloaded_data/public/DR2/mosaics/P000+23/mosaic-blanked.fits') as hdul:
    wcs = WCS(hdul[0].header)
    image_data = hdul[0].data
image_data.shape

(8938, 8938)

In [2]:
df = t.to_pandas()

NameError: name 't' is not defined

In [1]:
df = df.dropna(subset=["RA", "DEC"])
field_name = os.path.dirname('downloaded_data/public/DR2/mosaics/P000+23/mosaic-blanked.fits').split("/")[-1]
original_sample_count = df.shape[0]
df = df.loc[t["Mosaic_ID"] == field_name.encode("UTF-8")]
print(
    f"SIMPLE FILTER: field_name: {field_name}; samples: {df.shape[0]}; original_sample_count: {original_sample_count}; sample_estimate: {int(1/841*original_sample_count)}"
)
df

NameError: name 'df' is not defined

In [29]:
! ls downloaded_data/public/DR2/mosaics/P000+23/

mosaic-blanked.fits


In [14]:
import site
site.main()

In [24]:
from data import LoTTSDataset
from torchvision.transforms import v2
import torch

transforms = v2.Compose(
    [
        v2.ToImage(),
        v2.ToDtype(torch.float32),
        v2.Resize(size=(64, 64)),
    ]
)

data = LoTTSDataset(
    data_folder="downloaded_data",  # Change this to where you saved your data
    cutout_scaling=1.5,
    transform=transforms,
)

for i in range(len(data)):
    if i > 10:
        break
    data.plot(
        i,
        contours=True,
        sigma_name="Isl_rms",
        min_sigma=2,
        title=data.df.iloc[i]["Source_Name"] + data.df.iloc[i]["S_Code"],
    )

data.df.head()

Found 0 images and 0 catalogues in .downloaded_data. Using 0 fields.


ValueError: No objects to concatenate

In [10]:
import cata2data


ModuleNotFoundError: No module named 'cata2data'

In [11]:
!pip list

Package                          Version               Editable project location
-------------------------------- --------------------- -------------------------
absl-py                          1.4.0
accelerate                       0.34.2
aiohappyeyeballs                 2.4.0
aiohttp                          3.10.5
aiosignal                        1.3.1
alabaster                        0.7.16
albucore                         0.0.16
albumentations                   1.4.15
altair                           4.2.2
annotated-types                  0.7.0
anyio                            3.7.1
argon2-cffi                      23.1.0
argon2-cffi-bindings             21.2.0
array_record                     0.5.1
arviz                            0.19.0
astropy                          5.3.4
astropy-iers-data                0.2024.9.16.0.32.21
astunparse                       1.6.3
async-timeout                    4.0.3
atpublic                         4.1.0
attrs                            24.