# Overview

The Measurement Set v4 (MS v4) schema defines how correlated data (interferometer visibilities and single-dish spectra) can be represented in memory using datasets that consist of n-dimensional arrays labeled with coordinates and meta-information contained in attributes (see [foundational reading](https://xradio.readthedocs.io/en/latest/overview.html#Foundational-Reading)). The MS v4 implementation differs from the `MS v2` implementation in [`casacore`](https://github.com/casacore/casacore) primarily in its use of Python and off-the-shelf packages for data structures and data serialization, contrasting with casacore's bespoke C++ approach.

Reference documents consulted for the MS v4 schema design:

- [MeasurementSet definition version 2.0](https://casacore.github.io/casacore-notes/229.pdf)
- [MeasurementSet definition version 3.0Î²](https://casacore.github.io/casacore-notes/264.pdf)
- [MeasurementSet VLBI (Very Long Baseline Interferometry) extensions](https://casacore.github.io/casacore-notes/265.pdf)
- [CASA Ephemeris Data](https://casadocs.readthedocs.io/en/latest/notebooks/external-data.html#Ephemeris-Data)
- [ASDM (Astronomy Science Data Model): SDM Tables Short Description](https://drive.google.com/file/d/16a3g0GQxgcO7N_ZabfdtexQ8r2jRbYIS/view)

The current MS v4 schema focuses on offline processing capabilities and does not encompass all information present in the ASDM. However, its design allows for future expansion to incorporate additional data as needed (see [schema versioning](https://xradio.readthedocs.io/en/latest/overview.html#Schema-Versioning) section). It's important to note that MS v4 is not backward compatible with either MS v2 or MS v3, representing a significant evolution in the data model.

## Key Changes

- Data is stored in Datasets of labeled n-dimensional arrays (called data variables) instead of tables.
- The table concept of rows has been replaced by relevant dimensions. For example, the VISIBILITY column in the MAIN table of MS v2 is now an n-dimensional array with dimensions time x baseline x frequency x polarization (row has been split into time x baseline).
- Most keys that used to be implicit numbered indices have been changed to descriptive names. This improves code readability, allows for sub-selecting data without reindexing, and enables easy data combination. For example, `antenna_name` is used instead of `antenna_id`.
- The concept of data description is deprecated and replaced by `spectral_window_name` and `polarization_setup`.
- Versioning of the VISIBILITY/SPECTRUM, WEIGHT, UVW, and FLAG data variables is done using [data groups](https://xradio.readthedocs.io/en/latest/measurement_set_overview.html#Data-Groups).
- The [JPL Horizons ephemerides](https://casadocs.readthedocs.io/en/latest/notebooks/external-data.html#Ephemeris-Data) was used to create the ephemeris schema instead of the MS v2.
- Field, source, and ephemeris data have been combined into a single dataset.
- Antenna and feed data have been combined into a single dataset (an MS v4 can only have one feed type per antenna).

## Use Cases

The MS v4 has been designed to satisfy the following use cases:

- Radio Interferometry 
- Single Dish Observations
- On the fly Mosaics Observations 
- Ephemeris Observations 
- Heterogeneous Antenna VLBI 
- Phased Array Stations (PAS)
- Phased Array Feeds (PAF)


| Dataset | Use Case |
|---------|----------|
| Antennae_North.cal.lsrk.split | ALMA Interferometer |
| small_meerkat | MeerKAT Interferometer |
| AA2-Mid-sim_00000 | SKA Mid Interferometer |
| small_lofar | LOFAR Interferometer |
| global_vlbi_gg084b_reduced | VLBI (VLBA+EVN) |
| VLBA_TL016B_split.ms | VLBA |
| ngEHT_E17A10.0.bin0000.source0000_split | ngEHT |
| venus_ephem_test.ms | ALMA Ephemeris |
| sdimaging.ms | ALMA Single Dish |
| ALMA_uid___A002_X1003af4_X75a3.split.avg.ms | Ephemeris Mosaic (Sun) |
| VLASS3.2.sb45755730.eb46170641.60480.16266136574.split.v6.ms | VLASS (OTF Mosaic) |


## Depreciation List

field_ids, spw_ids, antenna_ids, source_ids, ephemeris_ids, feed_ids, polarization_ids, processor_ids (scan_numbers and baseline_id, remain)
- FIELD_DELAY_CENTER
- ddi
- SOURCE_PROPER_MOTION (use ephemeris)
- FEED_AXIS_OFFSET
- FEED_OFFSET
- BEAM_OFFSET
- POLARIZATION_RESPONSE
- BASELINE_REFERENCE
- TARGET
- SOURCE_OFFSET
- ON_SOURCE
- POINTING_OFFSET
- TRACKING
Flag command
Pulsars

Since it is the review let us know if there something critical for your use case.

## Planned future work
Beam models
Interferometer model

## Schema Layout

measurement_v4_set contains the data for a single:
observation, spectral window, polarization setup, observation mode (set of intents). Say something about intents
Because of these simplifying assumptions many of the current subtables are no longer needed. Single feed and beam.

For mosaics where the phase center is rapidly varying such as VLASS do not partion on field.

<!--Link to google drawing: https://docs.google.com/drawings/d/1afPe5oro26NMTkAKpK9iif0adNA0B4R9otLookOixvI/edit?usp=sharing -->

<img src="https://docs.google.com/drawings/d/e/2PACX-1vQVgjF5xNeIv8gpi2G3R8JXw2bNkVIUXdizIZluCGdnHc4z79ryW2fNUycJAd_CQh9sXLwdlx1oiAAX/pub?w=690&amp;h=510">


## Processing Set


## Data Groups 

```Python
Example: ms_xds.attrs['data_groups'] = {'base':{'correlated_data':'VISIBILITY','flag':'FLAG','weight':'WEIGHT','uvw':'UVW'}, 
                                        'imaging':{'correlated_data':'VISIBILITY_CORRECTED','flag':'FLAG','weight':'WEIGHT_IMAGING','uvw':'UVW'}}
```


# Subpackage Layout

We can to support directly reading from WSU-ASDM, Zarr, Netcdf.
MSv2 we currently don't plan since querying is expensive.

Will eventually replace Python-casacore.
Asdm library does not exist yet.

Types of correlated datasets visibility (single dish) and spectrum (single dish)

Add link to API

<!--Link to google drawing: https://docs.google.com/drawings/d/1PMnElSu6YcMC9ovLOTARdYlLtZn7SJhUBdIHZYX52iU/edit?usp=sharing -->

Will eventually replace Python-casacore.
Asdm library does not exist yet.

https://docs.google.com/drawings/d/1P9AI3D4VzPGw8O72dPBkz1iED8-ITM96anLeru0dvgM/edit?usp=sharing

<img src="https://docs.google.com/drawings/d/e/2PACX-1vSxpmAjQ9Zmg2g5DqmwfiyE2i83Ci1EDeBuY7h8mHPYiokX-il9Omp2h7qmg5ZGbDPOJYUoCFHcL8E3/pub?w=885&h=706">


| Dataset | Telescope | Mosaic | Ephemeris | VLBI | PAS | PAF |
|---------|-----------|:------:|:---------:|:----:|:---:|:---:|
| Antennae_North.cal.lsrk.split | ALMA |  x  |    |   | | |
| AA2-Mid-sim_00000 | Simulated SKA Mid |   |     |   | | |
| small_meerkat | MeerKAT  |   |     |   |  | | 
| small_lofar | LOFAR  |   |     |   | x | | 
| global_vlbi_gg084b_reduced | VLBA+EVN |   |     |  x  |  | | 
| VLBA_TL016B_split.ms | VLBA |   |     |  x  |  | | 
| ngEHT_E17A10.0.bin0000.source0000_split | Simulated ngEHT |    |     |  x  |  | | 
| venus_ephem_test.ms | ALMA |  x  |   x  |   |  | | 
| ALMA_uid___A002_X1003af4_X75a3.split.avg.ms | ALMA |  x  |   x  |   |  | | 
| VLASS3.2.sb45755730.eb46170641.60480.16266136574.split.v6.ms | VLA (VLASS) | x  |    |   |  | | 
| askap_59750_altaz_2settings| ASKAP |  |    |   |  | x | 
| askap_59754_altaz_2weights_0| ASKAP |  |    |   |  | x | 
| askap_59754_altaz_2weights_15| ASKAP |  |    |   |  | x | 
| askap_59755_eq_interleave_0| ASKAP |  |    |   |  | x | 
| askap_59755_eq_interleave_15| ASKAP |  |    |   |  | x | 


| Dataset | Telescope | Multi-Target | OFF | Ephemeris | Scan Pattern |
|---------|-----------|--------------|-----|-----------|--------------|
| sdimaging | GBT |  | relative |  | raster |
| uid___A002_Xced5df_Xf9d9small | ALMA | | horizontal |  | raster |
| uid___A002_Xe3a5fd_Xe38e.small | ALMA | | relative | x | raster |
| uid___A002_X1015532_X1926f.small | ALMA | x | absolute|  |  | raster |
| uid___A002_Xae00c5_X2e6b.small | ALMA |  | relative | x | fast |


