## Concatenating data sets from multiple CCDs to make a more diverse data set.

**Author:** Aram Lee
**Date:** 2025-03-06
**File Name:** Concatenator.ipynb

### [Description]
Using a data set from just one CCD was not enough to provide background/moving object variations to the model, and hence we used a concatenated data set from at least 3 CCDs (44 exposures per CCD). We also select pairs with desired labels and object magnitudes on this notebook.

### [Required Libraries]
- numpy: 1.26.4

### [Workflow]  

Steps 1-3 are for training the model, and steps 4-6 are for using the model to detect TNOs.

|Step|File|Input|Output|Purpose|
|-|-|-|-|-|
|1|ImageCutter.ipynb|.fits (with artificial moving objects), .plantlist (artificial objects info)| .npy|Extract sub-images for training|
|2|Concatenator.ipynb **(Here)**|.npy (sub-images from ImageCutter)|.npy|Prepare dataset for training|
|3|Trainer.ipynb|.npy (dataset from Concatenator), .npy (target information)|.h5 (trained CNN models)|Train the model|
|-|-|-|-|-|
|4|ImageCutter.ipynb|.fits (without artificial moving objects)|.npy|Extract sub-images for detection|
|5|Predictor.ipynb|.npy (sub-images from ImageCutter), .npy (target info), .h5 (model)|.npy|Apply trained model to detect objects|
|6a|Link_sources_to_objects.py|.npy (classification and regression output from Predictor)|.npy|Detect moving objects (linear fitting method)|
|6b|CandidateFinder.ipynb|.npy (classification output from Predictor), .npy (sub-images, target info)|.csv|Detect moving objects (scoring method)|

In [1]:
import numpy as np
from numpy.random import permutation as perm
from math import ceil

In [2]:
# load multiple numpy arrays

def load_data(prefixes, path='trainingsets/'):
    return [np.load(f'{path}inp_{p}_P99NN.npy', allow_pickle=True) for p in prefixes], \
           [np.load(f'{path}tar_{p}_P99NN.npy', allow_pickle=True) for p in prefixes]

img, tar = load_data(['ch05', 'ch10', 'ch20'])

In [3]:
# This extracts only the infos that will be used on this research.
# target: (p1, p2, x1, y1, x2, y2, mag1, mag2)

def tar_maker(var):
    return [[i[0], i[1], i[2][0], i[2][1], i[3][0], i[3][1], i[6][7], i[7][7]] for i in var]

In [4]:
tar = [tar_maker(t) for t in tar]

In [5]:
# Creating a balanced set for binary labels [11, 10, 01, 00] with a ratio of 4:1:1:4. This part filters some of the data set.

def ind(var):
    where = {"11": [], "10": [], "01": [], "00": []}
    
    for i, e in enumerate(var):
        if np.logical_and(e[:2] == [1, 1], 0 < e[6] <= 25):
            where["11"].append(i)
        elif np.logical_and(e[:2] == [1, 0], 0 < e[6] <= 25):
            where["10"].append(i)
        elif np.logical_and(e[:2] == [0, 1], 0 < e[7] <= 25):
            where["01"].append(i)
        elif e[:2] == [0, 0]:
            where["00"].append(i)
    
    permuted = {k: perm(v) for k, v in where.items()}
    min_count = min(len(permuted["11"]), len(permuted["00"]))
    permuted["10"] = permuted["10"][:ceil(min_count / 4)]
    permuted["01"] = permuted["01"][:ceil(min_count / 4)]
    permuted["00"] = permuted["00"][:min_count]
    permuted["11"] = permuted["11"][:min_count]
    
    return tuple(permuted.values())

In [6]:
# Use the ind function to categorize and randomize the indices.

categories = [ind(t) for t in tar]

In [7]:
# Split the data set from each CCD and concatenate them into one train/validation set.

# Split data into train and test
def split(var, ratio=0.8):
    cut = ceil(len(var) * ratio)
    return var[:cut], var[cut:]

# Concatenate train and test sets
def concat(var):
    train, test = zip(*(split(v) for v in var))
    return np.concatenate(train), np.concatenate(test)

In [8]:
# Use the concat function to get the indicies of thetraining and validation set.
Train, Test = zip(*(concat(cat) for cat in categories))

In [9]:
# Use the built indicies to concatenate the images and targets from 3 different CCDs, and save them.

np.save('trainingsets/M_img_train.npy', np.concatenate([np.array(i)[t] for i, t in zip(img, Train)]))
np.save('trainingsets/M_tar_train.npy', np.concatenate([np.array(i)[t] for i, t in zip(tar, Train)]))
np.save('trainingsets/M_img_test.npy', np.concatenate([np.array(i)[t] for i, t in zip(tar, Test)]))
np.save('trainingsets/M_tar_test.npy', np.concatenate([np.array(i)[t] for i, t in zip(tar, Test)]))