# Unit test data

This directory contains very small, toy, data sets that are used
for unit tests.

## Object catalog: small_sky

This "object catalog" is 131 randomly generated radec values. 

- All radec positions are in the Healpix pixel order 0, pixel 11.
- IDs are integers from 700-831.

In [None]:
import hipscat_import.pipeline as runner
from hipscat_import.catalog.arguments import ImportArguments
from hipscat_import.index.arguments import IndexArguments
from hipscat_import.margin_cache.margin_cache_arguments import MarginCacheArguments
from hipscat_import.soap import SoapArguments
import tempfile
from pathlib import Path
from dask.distributed import Client

tmp_path = tempfile.TemporaryDirectory()
tmp_dir = tmp_path.name

hipscat_import_dir = "../../../hipscat-import/tests/hipscat_import/data/"

client = Client(n_workers=1, threads_per_worker=1, local_directory=tmp_dir)

### small_sky_order1

This catalog has the same data points as other small sky catalogs,
but is coerced to spreading these data points over partitions at order 1, instead
of order 0.

This means there are 4 leaf partition files, instead of just 1, and so can
be useful for confirming reads/writes over multiple leaf partition files.

NB: Setting `constant_healpix_order` coerces the import pipeline to create
leaf partitions at order 1.

In [None]:
args = ImportArguments(
    input_file_list=["small_sky_order1/small_sky_order1.csv"],
    output_path=".",
    file_reader="csv",
    output_artifact_name="small_sky_order1",
    constant_healpix_order=1,
    overwrite=True,
    tmp_dir=tmp_dir,
)
runner.pipeline_with_client(args, client)

### small_sky

This "object catalog" is 131 randomly generated radec values. 

- All radec positions are in the Healpix pixel order 0, pixel 11.
- IDs are integers from 700-831.

This catalog was generated with the following snippet:

In [None]:
args = ImportArguments(
    input_file_list=["small_sky_order1/small_sky_order1.csv"],
    output_path=".",
    file_reader="csv",
    output_artifact_name="small_sky",
    overwrite=True,
    tmp_dir=tmp_dir,
)
runner.pipeline_with_client(args, client)

### small_sky_order1_id_index

In [None]:
args = IndexArguments(
    input_catalog_path="./small_sky_order1",
    indexing_column="id",
    output_path=".",
    output_artifact_name="small_sky_order1_id_index",
    include_hipscat_index=False,
    compute_partition_size=200_000,
    overwrite=True,
    tmp_dir=tmp_dir,
)
runner.pipeline_with_client(args, client)

## Object catalog: small_sky_source

This "source catalog" is 131 detections at each of the 131 objects
in the "small_sky" catalog. These have a random magnitude, MJD, and 
band (selected from ugrizy). The full script that generated the values
can be found [here](https://github.com/delucchi-cmu/hipscripts/blob/main/twiddling/small_sky_source.py)

### small_sky_order1_source

In [None]:
args = ImportArguments(
    input_file_list=["raw/small_sky_source/small_sky_source.csv"],
    output_path=".",
    file_reader="csv",
    ra_column="object_ra",
    dec_column="object_dec",
    catalog_type="source",
    output_artifact_name="small_sky_order1_source",
    constant_healpix_order=1,
    overwrite=True,
    tmp_dir=tmp_dir,
)
runner.pipeline_with_client(args, client)

### small_sky_source_margin

This one is tricky, as it uses a catalog that we only have in the `hipscat` and `hipscat-import` test directories.

In [None]:
args = MarginCacheArguments(
    input_catalog_path=Path(hipscat_import_dir) / "small_sky_source_catalog",
    output_path=".",
    output_artifact_name="small_sky_source_margin",
    margin_threshold=180,
    margin_order=8,
    overwrite=True,
    tmp_dir=tmp_dir,
)
runner.pipeline_with_client(args, client)

### small_sky_order1_source_margin

In [None]:
args = MarginCacheArguments(
    input_catalog_path="small_sky_order1_source",
    output_path=".",
    output_artifact_name="small_sky_order1_source_margin",
    margin_threshold=7200,
    margin_order=4,
    overwrite=True,
    tmp_dir=tmp_dir,
)
runner.pipeline_with_client(args, client)

### small_sky_order3_source_margin

This one is similar to the previous margin catalogs but it is generated from a source catalog of order 3.

In [None]:
args = ImportArguments(
    input_file_list=["raw/small_sky_source/small_sky_source.csv"],
    output_path=".",
    file_reader="csv",
    ra_column="source_ra",
    dec_column="source_dec",
    catalog_type="source",
    output_artifact_name="small_sky_order3_source",
    constant_healpix_order=3,
    overwrite=True,
    tmp_dir=tmp_dir,
)
runner.pipeline_with_client(args, client)

args = MarginCacheArguments(
    input_catalog_path="small_sky_order3_source",
    output_path=".",
    output_artifact_name="small_sky_order3_source_margin",
    margin_threshold=300,
    margin_order=7,
    overwrite=True,
    tmp_dir=tmp_dir,
)
runner.pipeline_with_client(args, client)

## Connections between tables

### small_sky_to_o1source

In [None]:
args = SoapArguments(
    object_catalog_dir="small_sky",
    object_id_column="id",
    source_catalog_dir="small_sky_order1_source",
    source_object_id_column="object_id",
    source_id_column="source_id",
    output_path=".",
    output_artifact_name="small_sky_to_o1source",
    write_leaf_files=True,
    overwrite=True,
)
runner.pipeline_with_client(args, client)

### small_sky_to_o1source_soft

In [None]:
args = SoapArguments(
    object_catalog_dir="small_sky",
    object_id_column="id",
    source_catalog_dir="small_sky_order1_source",
    source_object_id_column="object_id",
    source_id_column="source_id",
    output_path=".",
    output_artifact_name="small_sky_to_o1source_soft",
    write_leaf_files=False,
    overwrite=True,
)
runner.pipeline_with_client(args, client)

## Perturbed object catalog

In order to test validity of cross match, we create a new version of the "small sky" catalog where each radec is slightly perturbed.

### small_sky_xmatch

The initial perturbation is stored as a CSV, and we can re-import from this raw data set.

In [None]:
args = ImportArguments(
    input_file_list=["raw/xmatch/small_sky_xmatch.csv"],
    output_path=".",
    file_reader="csv",
    output_artifact_name="small_sky_xmatch",
    pixel_threshold=100,
    overwrite=True,
    tmp_dir=tmp_dir,
)
runner.pipeline_with_client(args, client)

### small_sky_to_xmatch

Association table between the original "small sky" object catalog, and the perturbed "small sky xmatch" catalog.

Used to test joining THROUGH the association catalog.

In [None]:
args = SoapArguments(
    object_catalog_dir="small_sky",
    object_id_column="id",
    source_catalog_dir="small_sky_xmatch",
    source_object_id_column="id",
    source_id_column="id",
    output_path=".",
    write_leaf_files=True,
    output_artifact_name="small_sky_to_xmatch",
    overwrite=True,
)
runner.pipeline_with_client(args, client)

### small_sky_to_xmatch_soft

Similar to the above catalog, but does not generate leaf files

In [None]:
args = SoapArguments(
    object_catalog_dir="small_sky",
    object_id_column="id",
    source_catalog_dir="small_sky_xmatch",
    source_object_id_column="id",
    source_id_column="id",
    output_path=".",
    write_leaf_files=False,
    output_artifact_name="small_sky_to_xmatch_soft",
    overwrite=True,
)
runner.pipeline_with_client(args, client)

### small_sky_xmatch_margin

Create a margin catalog from the perturbed data points.

In [None]:
args = MarginCacheArguments(
    input_catalog_path="small_sky_xmatch",
    output_path=".",
    output_artifact_name="small_sky_xmatch_margin",
    margin_threshold=7200,
    margin_order=4,
    overwrite=True,
    tmp_dir=tmp_dir,
)
runner.pipeline_with_client(args, client)

### small_sky_left_xmatch

This adds a new point that's outside of the (0,11) pixel of the small sky catalog. Otherwise, the points are the same.

In [None]:
args = ImportArguments(
    input_file_list=["raw/xmatch/small_sky_left_xmatch.csv"],
    output_path=".",
    file_reader="csv",
    output_artifact_name="small_sky_left_xmatch",
    pixel_threshold=100,
    overwrite=True,
    tmp_dir=tmp_dir,
)
runner.pipeline_with_client(args, client)

In [None]:
tmp_path.cleanup()
client.close()