# Processing Set Visibility Tutorial

This tutorial can be run on Google Colaboratory via this [link](https://colab.research.google.com/github/casangi/xradio/blob/main/docs/source/measurement_set/tutorials/ps_vis.ipynb)



# Preparation

## Import xradio

In [1]:
import os, pprint
from importlib.metadata import version

try:
    os.system("pip install --upgrade xradio")

    import xradio

    print("Using xradio version", version("xradio"))

except ImportError as exc:
    print(f"Could not import xradio: {exc}")

Collecting xradio
  Downloading xradio-0.0.44-py3-none-any.whl.metadata (4.5 kB)
Downloading xradio-0.0.44-py3-none-any.whl (202 kB)
Installing collected packages: xradio
  Attempting uninstall: xradio
    Found existing installation: xradio 0.0.43
    Uninstalling xradio-0.0.43:
      Successfully uninstalled xradio-0.0.43
Successfully installed xradio-0.0.44
Using xradio version 0.0.44


## Download example MSv2

In [2]:
import toolviper

toolviper.utils.data.download(file="Antennae_North.cal.lsrk.split.ms")

[[38;2;128;05;128m2024-11-19 11:28:03,997[0m] [38;2;50;50;205m    INFO[0m[38;2;112;128;144m   toolviper: [0m Updating file metadata information ...  
 

 

Antennae_North.cal.lsrk.split.ms.zip:   0%|          | 0.00/1.49M [00:00<?, ?iB/s]

# Processing Set

## Convert MSv2 => Processing Set (PS)

Before running the conversion function we can get an estimate of the resources that will be needed:

In [3]:
from xradio.measurement_set import estimate_conversion_memory_and_cores

msv2_name = "Antennae_North.cal.lsrk.split.ms"
mem_estimate, max_reasonable_cores, suggested_cores = estimate_conversion_memory_and_cores(msv2_name)
mem_estimate, max_reasonable_cores, suggested_cores

ImportError: cannot import name 'estimate_conversion_memory_and_cores' from 'xradio.measurement_set' (/home/fedemp/ws_xradio_dev/venv_xradio_python312/lib/python3.12/site-packages/xradio/measurement_set/__init__.py)

The function used to estimate resources gives:
- an estimate of memory required in GiB,
- a maximum "reasonable" number of cores to use when converting in parallel, which is the number of partitions or MSv4s in the output processing set,
- and a suggested number of cores to use, as a rule of thumb the maximum / 4.

If we want to run the conversion in parallel, using Dask, we can initialize a "VIPER" client. In this example we use a local Dask client with the suggested number of cores = Dask workers:

In [None]:
do_parallel = True
if do_parallel:
    from toolviper import dask
    viper_client = toolviper.dask.local_client(cores=suggested_cores)
    viper_client

Convert the example MeasurementSet v2 to Processing Set:

In [None]:
from xradio.measurement_set import convert_msv2_to_processing_set

convert_out = "Antennae_North.cal.lsrk.split.vis.zarr"
convert_msv2_to_processing_set(
    in_file=msv2_name,
    out_file=convert_out,
    overwrite=True,
)

## Lazy open PS

In [None]:
from xradio.measurement_set import open_processing_set
convert_out = "Antennae_North.cal.lsrk.split.vis.zarr"

ps = open_processing_set(convert_out, intents=["OBSERVE_TARGET#ON_SOURCE"])

In [None]:
ps.summary()

## PS Structure

A processing set is simply a dictionary of MSv4s (one per observation, field, intent, spectral window - polarization...):

In [None]:
len(ps)

In [None]:
ps.keys()

In [None]:
ps.plot_phase_centers()

In [None]:
ps.plot_antenna_positions()

## MSv4


## Main dataset

We can take one of the items of the Processing Set to look into the contents of that MSv4. Every MSv4 represents the data as an xarray dataset, similarly as in earlier CNGI prototypes. The data variables (visibilities, weights, flags, etc.) can be manipulated and used in computations using the xarray API.

In [None]:
main_xds = ps[
    "Antennae_North.cal.lsrk.split_00"
]

### Coordinates

In [None]:
main_xds

In [None]:
main_xds.polarization

In [None]:
main_xds.uvw_label

In [None]:
main_xds.coords["baseline_id"]

In [None]:
main_xds.time

### Data vars

In [None]:
main_xds.VISIBILITY

In [None]:
main_xds.FLAG

In [None]:
main_xds.VISIBILITY.max()

In [None]:
main_xds.VISIBILITY.max().compute()
# main_xds.VISIBILITY.max().values

## Metadata

The MS metadata can be found in the attributes of the `main_xds`. Metadata is stored in differente ways:
- in additional xarray sub-datasets, "sub-xds"
- in attributes of coordinates and data variables
- in Python dictionaries.

Most sub-xds are found in the attributes of the `main_xds`, but there are also sub-xds in the attributes of some data variables.
An example of sub-xds of the `main_xds` is the antenna dataset (`antenna_xds`). An example of dictionary is the `partition_info` dict.

### Metadata in sub-xds. Antenna dataset

The MSv4 has xarray datasets in its attributes that represent metadata where n-dimensional arrays is included. Some examples are the `antenna_xds`, `weather_xds` and `pointing_xds`. This would be the equivalent to some subtables of the MSv2. Let's look into the antenna sub-xds:


In [None]:
ant_xds = main_xds.attrs["antenna_xds"]

In [None]:
ant_xds

As an xarray dataset, the antenna sub-xds can be used via the same API as the main xds.

In [None]:
ant_xds.ANTENNA_POSITION  # .values to load and see them

In [None]:
ant_xds.antenna_name.values

In [None]:
ant_xds.ANTENNA_DISH_DIAMETER

In [None]:
ant_xds.ANTENNA_RECEPTOR_ANGLE

### Attributes of Data Arrays and Coordinates. Quantities and Measures

Data variables and coordinates can have quantity and measures information in their attributes section along with other relevant metadata. These measures are specified as dictionaries in the attribute of the data variable or coordinate, with keys `units` and `type` in addition to other keys depending on the type of measure. The naming conventions are based on `astropy`. For example a quantity of casacore/`position` type, such as the antenna positions, is a measure with `type: "location"`

For reference, this is the list of measures in the current Processing Set/MSv4 spec:
https://docs.google.com/spreadsheets/d/1KIaYp6Qru1appToleyVqRdOEy9hmPhirpg0yR3ovAx0/edit?gid=1504318014#gid=1504318014, with naming conventions based on astropy. For example, a casacore `direction` is a `sky_coord`.



#### Time coordinate
The time coordinate is a time measure (keys: `type`, `units`, `time_scale`, `format`) but also contains for example `integration_time` which is a quantity.

In [None]:
main_xds.time

##### Quantities and measures that are not xarray

When a quantity or a measure is not an xarray, it is specificed as a dictionary with a format based on xarray's [xarray.DataArray.from_dict()](https://docs.xarray.dev/en/stable/generated/xarray.DataArray.from_dict.html) and it has the following keys:
`{"dims": ..., "data": ..., "attrs": quantity/measures_dict}`. The `integration_time` attribute included in the  attributes of the time coordinate is an example where we can see the metadata of a time measure:

In [None]:
pprint.pprint(main_xds.time.attrs)

#### Frequency coordinate

The `frequency` coordinate is a `spectral_coord` measure and as such has the following keys in its attributes: `type`, `units`, and `observer`. In addition, the attributes contain fields such as `channel_width`, `spectral_window_name`, and `reference_frequency`.

Any metadata that is a quantity or measure (non-id numbers) is placed in the relevant measures or quantity dictionary.

In [None]:
main_xds.frequency

In the frequency coordinate we have example of:
- quantity given as a dict: `channel_width`
- measure given as a dict: `reference_frequency` (a `spectral_coord` ~= casacore/frequency)

In [None]:
pprint.pprint(main_xds.frequency.attrs)

### Metadata in dicts. Observation, processor and partition info.

The MSv4 also allows for info dictionaries in the attribute section of the dataset. This is used when no n-dimensional data is required. The relevant measures metadata is included, similarly as with coordinates and data variables (when non-id) in xarray datasets.

An MSv4 has observation and processor info dicts, for example:

In [None]:
main_xds.observation_info

In [None]:
main_xds.processor_info

Another example is the `partition_info` dict, which describes the partition of the original MSv2 that is included in the `main_xds`:

In [None]:
main_xds.partition_info

### Metadata in sub-xds of data variables. Field_and_source sub-dataset.

A special example of sub-xds is the `xds` which is included in the attributes of the VISIBILITY data variable. This way, transformations applied on the visibilities can be reflected in variables such as the field phase center or the source direction. Here data variables such as `FIELD_PHASE_CENTER` or `SOURCE_DIRECTION` are stored as `sky_coord` measures (their attributes contain the following keys: `type`, `units`, `frame`).

In [None]:
field_and_source_xds = main_xds.VISIBILITY.field_and_source_xds

In [None]:
field_and_source_xds

In [None]:
field_and_source_xds.FIELD_PHASE_CENTER

In [None]:
field_and_source_xds.SOURCE_LOCATION

## Selection examples

One can use the usual selection functionality of xarray with all arrays, the main dataset and all sub datasets. For example, selection by labels, `sel()`:

In [None]:
sel_xds = main_xds.sel(frequency=slice(3.43939e11, 3.4397e11))
sel_xds.frequency

Or selection by indices, `isel()`

In [None]:
isel_xds = main_xds.isel(frequency=slice(1, 4))
isel_xds.frequency

In [None]:
sel_xds.equals(isel_xds)

In [None]:
sel_xds.identical(isel_xds)