## Architecture

Each data schema supported by XRADIO is organized into its own sub-package, with a shared `_utils` directory that contains code common to multiple sub-packages. The current architecture includes the `measurement_set` and `image` sub-packages ([see the list of planned XRADIO schemas](https://xradio.readthedocs.io/en/latest/overview.html#XRADIO-Schemas)).

The user-facing API is implemented in the `.py` files located at the top level of each sub-package directory, while private functions are housed in a dedicated sub-directory, such as `_measurement_set`. This sub-directory contains folders for each supported storage backend, as well as a `_utils` folder for common functions used across backends.

For instance, in the `measurement_set` sub-package, XRADIO currently supports a `zarr`-based backend. Additionally, we offer limited support for `casacore table Measurement Set v2` (`msv2`), through a conversion function that allows users to convert data from Measurement Set v2 (stored in Casacore tables) to Measurement Set v4 (stored using zarr).

<img src="https://docs.google.com/drawings/d/1afPe5oro26NMTkAKpK9iif0adNA0B4R9otLookOixvI/pub?w=943&amp;h=732">

<!--Link to google drawing: https://docs.google.com/drawings/d/1afPe5oro26NMTkAKpK9iif0adNA0B4R9otLookOixvI/edit?usp=sharing -->

## Software Framework

XRADIO is built using the following core packages:
- `xarray`: Provides the framework for defining and implementing data schemas.
- `dask` and `distributed`: Enable parallel execution for handling large datasets efficiently.
- `zarr` ([zarr specification](https://zarr-specs.readthedocs.io/en/latest/v2/v2.0.html)): Used as a storage backend for scalable, chunked data.
- `python-casacore` ([Casacore Table Data System (CTDS) File Formats](https://casacore.github.io/casacore-notes/260.pdf)): Used as a storage backend, with ongoing development toward a lightweight, pure Python replacement.
- `pyasdm` (under development): A Python-based storage backend in progress, designed for accessing ASDM (Astronomy Science Data Model) data.


## Schema Design

For this section to make sense please ensure to have completed the [foundational reading](https://xradio.readthedocs.io/en/latest/overview.html#Foundational-Reading) on Xarray terminology.

For each of the schemas data is organized into:
- [xarray Datasets](https://docs.xarray.dev/en/stable/generated/xarray.Dataset.html): A multi-dimensional, array database of labeled n-dimensional arrays.
- `XRADIO Processing Sets`: XRADIO-specific data structure, based on a Python dictionary, that consists of a collection of `xarray Datasets`. We will be looking into replacing the Processing set with [`xarray Datatree`](https://xarray-datatree.readthedocs.io/en/latest/) in the future.


### Coordinates and Data Variables

In XRADIO, we follow these conventions when defining a data schema:
- **Coordinates**: Values used to label plots (e.g., numbers or strings). Coordinate names are always in lower case and use snake_case.
- **Data Variables**: Numerical values used for plotting. Data variable names are always in upper case and use snake_case.

For instance, in the [Measurement Set v4 schema](https://xradio.readthedocs.io/en/latest/measurement_set/schema_and_api/measurement_set_schema.html), `antenna_name` and `frequency` are coordinates, while `VISIBILITY` data are data variables.

### Measures

Both data variables and coordinates can have additional metadata, such as associated coordinates and units, stored in their attributes. XRADIO’s measures are based on [`python-casacore` measures](https://casacore.github.io/python-casacore/casacore_measures.html), with updates to align with [astropy coordinate](https://docs.astropy.org/en/stable/coordinates/index.html) naming conventions. The table below outlines the different types of XRADIO measures:

<iframe src="https://docs.google.com/spreadsheets/d/e/2PACX-1vQRZyrmK41kXbeaq1V7UFK8IDO5u-zIt5I-4xUbxjOX7oK5muw0vFufreSLMn23KOqtawWjkgtGyfTR/pubhtml?gid=1504318014&single=true" 
        width="80%" 
        height="600" 
        frameborder="0" 
        scrolling="no">
</iframe>

### Coordinate Labels

For some types of measures, the data consists of values that are labeled using coordinate labels. These labels provide context for interpreting the data:

<iframe src="https://docs.google.com/spreadsheets/d/e/2PACX-1vQRZyrmK41kXbeaq1V7UFK8IDO5u-zIt5I-4xUbxjOX7oK5muw0vFufreSLMn23KOqtawWjkgtGyfTR/pubhtml?gid=1901188197&single=true"
        width="100%" 
        height="600" 
        frameborder="0" 
        scrolling="no">
</iframe>


### Measures Example

The following example illustrates how measures information is included in both a data variable (`FIELD_PHASE_CENTER`) and a coordinate (`time`). The `FIELD_PHASE_CENTER` data variable has the dimensions `time` and `sky_dir_label`. Note that the `sky_coord` measure requires only the `sky_dir_label` dimension, not the `time` dimension. 

In [18]:
import xarray as xr
phase_center = xr.DataArray()

import numpy as np
import xarray as xr
import pandas as pd

#Create an empty Xarray Dataset.
xds = xr.Dataset()

#Create the time coordinate with time measures attributes.
time = xr.DataArray(pd.date_range('2000-01-01', periods=3).astype('datetime64[s]').astype(int), dims='time', attrs={'type': 'time', 'units': 's', 'format':'unix', 'scale':'utc'})

#Create FIELD_PHASE_CENTER data variable with coordinates time x sky_dir_label.
coords = {'time': time,
          'sky_dir_label': ['ra', 'dec']}

data = np.array([[-2.10546176, -0.29611873],
       [-2.10521098, -0.29617315],
       [-2.1050196, -0.2961987]])

xds['FIELD_PHASE_CENTER'] = xr.DataArray(data, coords=coords, dims=['time', 'sky_dir_label'])

# Add sky_coord measures attributes to FIELD_PHASE_CENTER.
xds['FIELD_PHASE_CENTER'].attrs = {
    "type": "sky_coord",
    "units": ["rad", "rad"],
    "frame": "icrs",
}

xds

In [19]:
# Example of creating an Astropy SkyCoord object from the FIELD_PHASE_CENTER data variable.
from astropy.coordinates import SkyCoord
astropy_skycoord = SkyCoord(ra=xds.FIELD_PHASE_CENTER.sel(sky_dir_label='ra').values,dec=xds.FIELD_PHASE_CENTER.sel(sky_dir_label='dec').values,unit='rad',frame=xds.FIELD_PHASE_CENTER.attrs['frame'])
astropy_skycoord

<SkyCoord (ICRS): (ra, dec) in deg
    [(239.36592723, -16.96635346), (239.38029586, -16.9694715 ),
     (239.39126113, -16.97093541)]>

## Lazy and Eager Functions

- Functions prefixed with `open_` perform **lazy execution**, meaning only metadata—such as coordinates and attributes—are loaded into memory. Data variables, though not immediately loaded, are represented as lazy [Dask Arrays](https://docs.dask.org/en/stable/generated/dask.array.Array.html). These arrays only load data into memory when you explicitly call the `.compute()` method.

- Functions prefixed with `load_` perform **eager execution**, loading all data into memory immediately. These functions can be integrated with [dask.delayed](https://docs.dask.org/en/stable/delayed.html) for more flexible execution.