# CLOUD unit test data

There are two types of data used in unit tests in this repo: local and cloud. This notebook concerns itself only with the CLOUD versions of test data, so you can re-generate it.

This also works to initialize data in a new cloud provider, instead of simply copying an existing data set.

## Object catalog: small sky

This is the same "object catalog" with 131 randomly generated radec values inside the order0-pixel11 healpix pixel that is used in hipscat and LSDB unit test suites.

In [None]:
import os
import tempfile
from upath import UPath

import hipscat_import.pipeline as runner
from hipscat_import.catalog.arguments import ImportArguments
from hipscat_import.index.arguments import IndexArguments
from hipscat_import.margin_cache.margin_cache_arguments import MarginCacheArguments

tmp_path = tempfile.TemporaryDirectory()
tmp_dir = tmp_path.name

storage_options = {
    "account_key": os.environ.get("ABFS_LINCCDATA_ACCOUNT_KEY"),
    "account_name": os.environ.get("ABFS_LINCCDATA_ACCOUNT_NAME"),
}
storage_options

output_path = UPath("abfs://hipscat/pytests/data", protocol="abfs", **storage_options)

### small_sky

This catalog was generated with the following snippet:

In [None]:
args = ImportArguments(
    input_path="small_sky_parts",
    highest_healpix_order=1,
    file_reader="csv",
    output_path=output_path,
    output_artifact_name="small_sky",
    tmp_dir=tmp_dir,
    dask_tmp=tmp_dir,
)
runner.pipeline(args)

### small_sky_order1

This catalog has the same data points as other small sky catalogs, but is coerced to spreading these data points over partitions at order 1, instead of order 0.

This means there are 4 leaf partition files, instead of just 1, and so can be useful for confirming reads/writes over multiple leaf partition files.

NB: Setting `constant_healpix_order` coerces the import pipeline to create leaf partitions at order 1.

This catalog was generated with the following snippet:

In [None]:
args = ImportArguments(
    input_path="small_sky_parts",
    file_reader="csv",
    constant_healpix_order=1,
    output_path=output_path,
    output_artifact_name="small_sky_order1",
    tmp_dir=tmp_dir,
    dask_tmp=tmp_dir,
)
runner.pipeline(args)

### small_sky_object_index

An index table mapping the `"id"` field in the `small_sky_order` catalog to the pixels they can be found in.

In [None]:
args = IndexArguments(
    input_catalog_path="small_sky_order1",
    indexing_column="id",
    output_path=output_path,
    output_artifact_name="small_sky_object_index",
    tmp_dir=tmp_dir,
    dask_tmp=tmp_dir,
)
runner.pipeline(args)

In [None]:
margin_args = MarginCacheArguments(
    margin_threshold=7200,
    input_catalog_path="small_sky_order1",
    output_path=output_path,
    output_artifact_name="small_sky_order1_margin",
    tmp_dir=tmp_dir,
    dask_tmp=tmp_dir,
)
runner.pipeline(margin_args)

### small_sky_xmatch


In [None]:
args = ImportArguments(
    input_file_list=["xmatch/xmatch_catalog_raw.csv"],
    file_reader="csv",
    constant_healpix_order=1,
    output_path=output_path,
    output_artifact_name="small_sky_xmatch",
    pixel_threshold=100,
    tmp_dir=tmp_dir,
    dask_tmp=tmp_dir,
)
runner.pipeline(args)

In [None]:
tmp_path.cleanup()