## Import xradio

In [None]:
import os, pprint
from importlib.metadata import version

try:
    os.system("pip install --upgrade xradio")

    import xradio

    print("Using xradio version", version("xradio"))

except ImportError as exc:
    print(f"Could not import xradio: {exc}")

## Download example MSv2

# Preparation

In [None]:
import toolviper

msv2_name = "Antennae_North.cal.lsrk.ms"
toolviper.utils.data.download(file=msv2_name)

# Processing Set

## Convert MSv2 => Processing Set (PS)

In [None]:
from xradio.correlated_data.convert_msv2_to_processing_set import convert_msv2_to_processing_set

msv2_name = "Antennae_North.cal.lsrk.ms"
convert_out = "Antennae_North.cal.lsrk.vis.zarr"
convert_msv2_to_processing_set(
    in_file=msv2_name,
    out_file=convert_out,
    overwrite=True,
)

## Lazy read PS

In [None]:
from xradio.correlated_data import open_processing_set

convert_out = "Antennae_North.cal.lsrk.vis.zarr"
intents = ["OBSERVE_TARGET#ON_SOURCE"]
ps = open_processing_set(convert_out, intents=intents)

In [None]:
ps.summary()

## PS Structure

A processing set is simply a dictionary of MSv4s (one per observation, field, intent, spectral window - polarization...):

In [None]:
len(ps)

In [None]:
ps.keys()

# MSv4


## Main dataset

We can take one of the items of the Processing Set to look into the contents of that MSv4. Every MSv4 represents the data as an xarray dataset, similarly as in earlier CNGI prototypes. The data variables (visibilities, weights, flags, etc.) can be manipulated and used in computations using the xarray API.

In [None]:
main_xds = ps[
    "Antennae_North.cal.lsrk_01"
]

In [None]:
main_xds

#### Coordinates

In [None]:
main_xds.polarization

In [None]:
main_xds.uvw_label

In [None]:
main_xds.coords["baseline_id"]

In [None]:
main_xds.time

#### Data vars

In [None]:
main_xds.VISIBILITY

In [None]:
main_xds.FLAG

In [None]:
main_xds.VISIBILITY.max()

In [None]:
main_xds.VISIBILITY.max().compute()
# main_xds.VISIBILITY.max().values

## Metadata

The MS metadata can be found in the attributes of the main_xds. Metadata is stored in differente ways:
- in additional xarray (sub)datasets, "sub-xds"
- in attributes of coordinates and data variables
- in Python dictionaries.

An example of sub-xds is the antenna dataset. And example of dictionary is the Field info dict.

### Metadata in sub-xds. Antenna dataset

The MSv4 has xarray datasets in its attributes that represent metadata where n-dimensional arrays is included. This would be the equivalent to subtables of the MSv2. Let's look into the antenna sub-xds:


In [None]:
ant_xds = main_xds.attrs["antenna_xds"]

In [None]:
ant_xds

As an xarray dataset, the antenna sub-xds can be used via the same API as the main xds.

In [None]:
ant_xds.ANTENNA_POSITION  # .values to load and see them

### Attributes of Data Arrays and Coordinates. Quantities and Measures

All data variables and coordinates can have quantity and measures information in their attributes section along with other relevant metadata. These measures are specified as dictionaries in the attribute of the data variable or coordinate, with keys `units` and `type` in addition to other keys depending on the type of quantity. The naming conventions are based on `astropy`. For example a quantity of casacore/`position` type, such as the antenna positions, is a quantity with `type: "earth_location"`

For reference, this is the list of measures in the current Processing Set/MSv4 spec:
https://docs.google.com/spreadsheets/d/14a6qMap9M5r_vjpLnaBKxsR9TF4azN5LVdOxLacOX-s/edit#gid=1504318014, with naming conventions based on astropy. For example, a casacore `direction` is a `sky_coord`.



#### Time coordinate
The time coordinate is a time measure (keys: `type`, `units`, `time_scale`, `format`) but also contains for example `integration_time` which is a quantity.

In [None]:
main_xds.time

##### Quantities and measures that are not xarray

When a quantity or a measure is not an xarray, it is specificed as a dictionary with a format based on xarray's [xarray.DataArray.from_dict()](https://docs.xarray.dev/en/stable/generated/xarray.DataArray.from_dict.html) and it has the following keys:
`{"dims": ..., "data": ..., "attrs": quantity/measures_dict}`. The `integration_time` attribute included in the  attributes of the time coordinate is an example:

In [None]:
pprint.pprint(main_xds.time.attrs)

#### Frequency coordinate

The `frequency` coordinate is a `spectral_coord` measure and as such has the following keys in its attributes: `type`, `units`, and `frame`. In addition, the attributes contain the `channel_width`, `spectral_window_name`, and `reference_frequency`.

Any metadata that is a quantity or measure (non-id numbers) is placed in the relevant measures or quantity dictionary.

In [None]:
main_xds.frequency

In the frequency coordinate we have example of:
- quantity given as a dict: `channel_width`
- measure given as a dict: `reference_frequency` (a `spectral_coord` ~= casacore/frequency)

In [None]:
pprint.pprint(main_xds.frequency.attrs)

### Metadata in dicts. Field info.

The MSv4 also allows for info dictionaries in the attribute section of the dataset. This is used when no n-dimensional data is required. The relevant measures metadata is included, similarly as with coordinates and data variables (when non-id) in xarray datasets.

An example is the field_info where the delay_direction, phase_direction, and reference_direction are stored as `sky_coord` measures (keys: `type`, `units`, `reference_frame`).

In [None]:
main_xds.VISIBILITY.field_and_source_xds

## Selection examples

One can use the usual selection functionality of xarray with all arrays, the main dataset and all sub datasets. For example, selection by labels, `sel()`:

In [None]:
sel_xds = main_xds.sel(frequency=slice(3.43939e11, 3.4397e11))
sel_xds.frequency

Or selection by indices, `isel()`

In [None]:
isel_xds = main_xds.isel(frequency=slice(1, 4))
isel_xds.frequency

In [None]:
sel_xds.equals(isel_xds)

In [None]:
sel_xds.identical(isel_xds)

In [None]:
isel_xds

In [None]:
sel_xds