# Unit test data

This directory contains very small, toy, data sets that are used
for unit tests.

## Object catalog: small_sky

This "object catalog" is 131 randomly generated radec values. 

- All radec positions are in the Healpix pixel order 0, pixel 11.
- IDs are integers from 700-831.

In [None]:
import hipscat_import.pipeline as runner
from hipscat_import.catalog.arguments import ImportArguments
from hipscat_import.index.arguments import IndexArguments
from hipscat_import.margin_cache.margin_cache_arguments import MarginCacheArguments
import tempfile
from pathlib import Path

tmp_path = tempfile.TemporaryDirectory()
tmp_dir = tmp_path.name

hipscat_import_dir = "../../../hipscat-import/tests/hipscat_import/data/"

### small_sky

This "object catalog" is 131 randomly generated radec values. 

- All radec positions are in the Healpix pixel order 0, pixel 11.
- IDs are integers from 700-831.

This catalog was generated with the following snippet:

In [None]:
args = ImportArguments(
    input_path=Path(hipscat_import_dir)/"small_sky",
    output_path=".",
    file_reader="csv",
    output_artifact_name="small_sky",
    overwrite=True,
    tmp_dir=tmp_dir,
)
runner.pipeline(args)

### small_sky_order1

This catalog has the same data points as other small sky catalogs,
but is coerced to spreading these data points over partitions at order 1, instead
of order 0.

This means there are 4 leaf partition files, instead of just 1, and so can
be useful for confirming reads/writes over multiple leaf partition files.

NB: Setting `constant_healpix_order` coerces the import pipeline to create
leaf partitions at order 1.

This catalog was generated with the following snippet:

In [None]:
args = ImportArguments(
    input_path=Path(hipscat_import_dir)/"small_sky",
    output_path=".",
    file_reader="csv",
    output_artifact_name="small_sky_order1",
    constant_healpix_order=1,
    overwrite=True,
    tmp_dir=tmp_dir,
)
runner.pipeline(args)

### small_sky_order1_id_index

An index table (that is NOT a "true" hipscat catalog) to map the "id" field to the partition the row can be found in.

In [None]:
args = IndexArguments(
    input_catalog_path="small_sky",
    indexing_column="id",
    output_path=".",
    output_artifact_name="small_sky_order1_id_index",
    include_hipscat_index=False,
    compute_partition_size=200_000,
    overwrite=True,
    tmp_dir=tmp_dir,
)
runner.pipeline(args)

### small_sky_order1_margin

This catalog exists as an margin cache of the small_sky_order1 table,
allowing spatial operations to be performed efficiently and accurately.

NB: 

- The setting `margin_threshold` at 7200 arcseconds (2 degrees) is much higher than
  a usual margin cache would be generated at, but is used because the small sky test
  dataset is sparse.
- The `small_sky_order1` catalog only contains points in Norder1, Npix=[44, 45, 46, 47], but the margin catalog also contains points in Norder0, Npix=4 due to negative pixel margins.


This catalog was generated using the following snippet:

In [None]:
margin_args = MarginCacheArguments(
    margin_threshold=7200,
    input_catalog_path="small_sky_order1",
    output_path=".",
    output_artifact_name="small_sky_order1_margin",
    overwrite=True,
    tmp_dir=tmp_dir,
)
runner.pipeline(margin_args)

### small_sky_to_small_sky_order1

Association table that maps (pretty naively) the `small_sky` to `small_sky_order1`. Note that these are the *same* catalog data, but the stored pixels are at different healpix orders.

Note also that this doesn't really create a catalog! This is faking out a "soft" association catalog, which just contains the partition join information, and not the actual matching rows. It's not generated by any "import pipeline", but just through writing the files directly.

In [None]:
import pandas as pd
from hipscat.catalog.association_catalog.partition_join_info import PartitionJoinInfo

join_pixels= pd.DataFrame.from_dict(
    {
        "Norder": [0, 0, 0, 0],
        "Npix": [11, 11, 11, 11],
        "join_Norder": [1, 1, 1, 1],
        "join_Npix": [44, 45, 46, 47],
    }
)
join_info = PartitionJoinInfo(join_pixels, catalog_base_dir="small_sky_to_small_sky_order1")
join_info.write_to_csv()
join_info.write_to_metadata_files()


## Source catalog: small_sky_source

This "source catalog" is 131 detections at each of the 131 objects
in the "small_sky" catalog. These have a random magnitude, MJD, and 
band (selected from ugrizy). The full script that generated the values
can be found [here](https://github.com/delucchi-cmu/hipscripts/blob/main/twiddling/small_sky_source.py)

The catalog was generated with the following snippet, using raw data 
from the `hipscat-import` file.

NB: `pixel_threshold=3000` is set just to make sure that we're generating
a handful of files at various healpix orders.

In [None]:
args = ImportArguments(
    input_path=Path(hipscat_import_dir)/"small_sky_source",
    output_path=".",
    file_reader="csv",
    ra_column="source_ra",
    dec_column="source_dec",
    catalog_type= "source",
    pixel_threshold= 3000,
    output_artifact_name="small_sky_source",
    overwrite=True,
    tmp_dir=tmp_dir,
)
runner.pipeline(args)

### small_sky_source_object_index

This catalog exists as an index of the SOURCE table, using the OBJECT ID
as the indexed column. This means you should be able to quickly find
partions of SOURCES for a given OBJECT ID.

NB: 

- Setting `compute_partition_size` to something less than `1_000_000` 
  coerces the import pipeline to create smaller result partitions, 
  and so we have three distinct index partitions.
- Setting `include_hipscat_index=False` keeps us from needing a row for every 
  source and lets the indexing pipeline create only one row per 
  unique objectId/Norder/Npix

In [None]:
args = IndexArguments(
    input_catalog_path="small_sky_source",
    indexing_column="object_id",
    output_path=".",
    output_artifact_name="small_sky_source_object_index",
    include_hipscat_index=False,
    compute_partition_size=200_000,
    overwrite=True,
    tmp_dir=tmp_dir,
)
runner.pipeline(args)

In [None]:
tmp_path.cleanup()