# Review Instructions

Please review the MSv4 `antenna_xds` schema and the XRADIO interface (`ps['MSv4_name'].antenna_xds`). Note that the PS (processing set) interface or the main_xds should not be reviewed.

The `antenna_xds` schema specification: https://docs.google.com/spreadsheets/d/14a6qMap9M5r_vjpLnaBKxsR9TF4azN5LVdOxLacOX-s/edit#gid=257301047

## Preparatory Material
Go over Xarray nomenclature and selection syntax:
- https://docs.xarray.dev/en/latest/user-guide/terminology.html
- https://docs.xarray.dev/en/latest/user-guide/indexing.html

MSv2 and CASA documentation:
- MSv2 schema: https://casacore.github.io/casacore-notes/229.pdf
- MSv3 schema: https://casacore.github.io/casacore-notes/264.pdf

## `antenna_xds` Schema
The ANTENNA, FEED, and INTERFEROMETER_MODEL (VLBI) tables in the MSv2 contain closely related information:

- ANTENNA:
- FEED:
- INTERFEROMETER_MODEL (VLBI): (single field and spectral window)


Use cases:

## Key Questions to Answer
### Schema Questions
- 1.1) Are there missing use cases?
- 1.2) Is all the information present needed for offline processing?
- 1.3) Should we get rid of antenna_ids and move to just using antenna_name + "_" + station (this would also require a change to main_xds)? This would simplify doing baseline parallelism over multiple converted MS v2s since no reindexing would be required.
- 1.4) (VLBI) Instead of storing BASELINE_REFERENCE in main_xds can we store it in the antenna_xds? This would assume that for the duration of the MS v4 that the reference antennas remain constant.
- 1.5) Is the order of the dims correct (antenna_id)?
- 1.6) Should BEAM_OFFSET be sky_dir_label (Ra, Dec) or local_sky_label (Az, Alt)?
- 1.7) Do we need time dimension with BEAM_OFFSET, FEED_OFFSET, RECEPTOR_ANGLE, 
- 1.8) Should we add prefex to organize data variables? For example PHASE_DELAY -> VLBI_PHASE_DELAY?
- 1.9) Should we include the POLARIZATION_RESPONSE doesn't seem to be used?

  
### XRADIO
2.1) After reviewing the XARRAY documentation and the descriptions of the data variables in the `antenna_xds` schema, do you find the XARRAY interface intuitive and easy to use?


# Environment instructions

It is recommended to use the conda environment manager to create a clean, self-contained runtime where xradio and all its dependencies can be installed:

```bash
conda create --name xradio python=3.11 --no-default-packages
conda activate xradio
```

Clone the repository, checkout the review branch and do a local install:

```bash
git clone https://github.com/casangi/xradio.git
git checkout 168-review-ms_xdsattrsantenna_xds-schema-and-xradio-interface
cd xradio
pip install -e .
```

On macOS it is required to pre-install python-casacore using ```bash conda install -c conda-forge python-casacore```.

# Download Data

In [1]:
from xradio.vis.convert_msv2_to_processing_set import convert_msv2_to_processing_set
from xradio.vis.read_processing_set import read_processing_set
import graphviper

graphviper.utils.data.download(file="VLBA_TL016B_split_lsrk.ms")

[[38;2;128;05;128m2024-08-06 19:34:12,815[0m] [38;2;50;50;205m    INFO[0m[38;2;112;128;144m  graphviper: [0m Updating file metadata information ...  
 

[[38;2;128;05;128m2024-08-06 19:34:13,742[0m] [38;2;50;50;205m    INFO[0m[38;2;112;128;144m  graphviper: [0m File exists: VLBA_TL016B_split_lsrk.ms 


# Start Dask cluster 
Choose an approriate number of cores and memory_limit (this is per core).

In [2]:
from graphviper.dask.client import local_client

viper_client = local_client(cores=4, memory_limit="4GB")
viper_client

[[38;2;128;05;128m2024-08-06 19:34:13,824[0m] [38;2;50;50;205m    INFO[0m[38;2;112;128;144m  graphviper: [0m Checking parameter values for [38;2;50;50;205mclient[0m.[38;2;50;50;205mlocal_client[0m 
[[38;2;128;05;128m2024-08-06 19:34:13,825[0m] [38;2;50;50;205m    INFO[0m[38;2;112;128;144m  graphviper: [0m Module path: [38;2;50;50;205m/Users/jsteeb/Dropbox/graphviper/[0m 
[[38;2;128;05;128m2024-08-06 19:34:14,487[0m] [38;2;50;50;205m    INFO[0m[38;2;112;128;144m      client: [0m Created client <MenrvaClient: 'tcp://127.0.0.1:62663' processes=4 threads=4, memory=14.90 GiB> 


0,1
Connection method: Cluster object,Cluster type: distributed.LocalCluster
Dashboard: http://127.0.0.1:8787/status,

0,1
Dashboard: http://127.0.0.1:8787/status,Workers: 4
Total threads: 4,Total memory: 14.90 GiB
Status: running,Using processes: True

0,1
Comm: tcp://127.0.0.1:62663,Workers: 4
Dashboard: http://127.0.0.1:8787/status,Total threads: 4
Started: Just now,Total memory: 14.90 GiB

0,1
Comm: tcp://127.0.0.1:62675,Total threads: 1
Dashboard: http://127.0.0.1:62678/status,Memory: 3.73 GiB
Nanny: tcp://127.0.0.1:62666,
Local directory: /var/folders/b7/dx896v1x4yjb9v6rvs_n2hs00000gp/T/dask-scratch-space/worker-iluf60ua,Local directory: /var/folders/b7/dx896v1x4yjb9v6rvs_n2hs00000gp/T/dask-scratch-space/worker-iluf60ua
Tasks executing:,Tasks in memory:
Tasks ready:,Tasks in flight:
CPU usage: 0.0%,Last seen: Just now
Memory usage: 64.36 MiB,Spilled bytes: 0 B
Read bytes: 0.0 B,Write bytes: 0.0 B

0,1
Comm: tcp://127.0.0.1:62676,Total threads: 1
Dashboard: http://127.0.0.1:62677/status,Memory: 3.73 GiB
Nanny: tcp://127.0.0.1:62668,
Local directory: /var/folders/b7/dx896v1x4yjb9v6rvs_n2hs00000gp/T/dask-scratch-space/worker-xs0k8m_2,Local directory: /var/folders/b7/dx896v1x4yjb9v6rvs_n2hs00000gp/T/dask-scratch-space/worker-xs0k8m_2
Tasks executing:,Tasks in memory:
Tasks ready:,Tasks in flight:
CPU usage: 0.0%,Last seen: Just now
Memory usage: 64.19 MiB,Spilled bytes: 0 B
Read bytes: 0.0 B,Write bytes: 0.0 B

0,1
Comm: tcp://127.0.0.1:62683,Total threads: 1
Dashboard: http://127.0.0.1:62684/status,Memory: 3.73 GiB
Nanny: tcp://127.0.0.1:62670,
Local directory: /var/folders/b7/dx896v1x4yjb9v6rvs_n2hs00000gp/T/dask-scratch-space/worker-qt3k7dcy,Local directory: /var/folders/b7/dx896v1x4yjb9v6rvs_n2hs00000gp/T/dask-scratch-space/worker-qt3k7dcy
Tasks executing:,Tasks in memory:
Tasks ready:,Tasks in flight:
CPU usage: 0.0%,Last seen: Just now
Memory usage: 65.30 MiB,Spilled bytes: 0 B
Read bytes: 0.0 B,Write bytes: 0.0 B

0,1
Comm: tcp://127.0.0.1:62674,Total threads: 1
Dashboard: http://127.0.0.1:62679/status,Memory: 3.73 GiB
Nanny: tcp://127.0.0.1:62672,
Local directory: /var/folders/b7/dx896v1x4yjb9v6rvs_n2hs00000gp/T/dask-scratch-space/worker-hfce7142,Local directory: /var/folders/b7/dx896v1x4yjb9v6rvs_n2hs00000gp/T/dask-scratch-space/worker-hfce7142
Tasks executing:,Tasks in memory:
Tasks ready:,Tasks in flight:
CPU usage: 0.0%,Last seen: Just now
Memory usage: 64.88 MiB,Spilled bytes: 0 B
Read bytes: 0.0 B,Write bytes: 0.0 B


# Convert dataset

In [3]:
from xradio.vis.convert_msv2_to_processing_set import convert_msv2_to_processing_set

in_file = "VLBA_TL016B_split_lsrk.ms"
out_file = "VLBA_TL016B_split_lsrk.vis.zarr"

convert_msv2_to_processing_set(
    in_file=in_file,
    out_file=out_file,
    parallel=False,
    overwrite=True,
)

[[38;2;128;05;128m2024-08-06 19:34:14,516[0m] [38;2;50;50;205m    INFO[0m[38;2;112;128;144m      client: [0m Partition scheme that will be used: ['DATA_DESC_ID', 'OBSERVATION_ID', 'FIELD_ID'] 
[[38;2;128;05;128m2024-08-06 19:34:14,569[0m] [38;2;50;50;205m    INFO[0m[38;2;112;128;144m      client: [0m Number of partitions: 4 
[[38;2;128;05;128m2024-08-06 19:34:14,570[0m] [38;2;50;50;205m    INFO[0m[38;2;112;128;144m      client: [0m OBSERVATION_ID [0], DDI [0], STATE [-1], FIELD [0], SCAN [0] 
[[38;2;128;05;128m2024-08-06 19:34:14,814[0m] [38;2;50;50;205m    INFO[0m[38;2;112;128;144m      client: [0m OBSERVATION_ID [0], DDI [0], STATE [-1], FIELD [1], SCAN [0] 
[[38;2;128;05;128m2024-08-06 19:34:15,100[0m] [38;2;50;50;205m    INFO[0m[38;2;112;128;144m      client: [0m OBSERVATION_ID [0], DDI [1], STATE [-1], FIELD [0], SCAN [0] 
[[38;2;128;05;128m2024-08-06 19:34:15,346[0m] [38;2;50;50;205m    INFO[0m[38;2;112;128;144m      client: [0m OBSERVATION_ID 

In [4]:
import numpy as np
a = np.array([1, 2, 3, 4, 5])
b = np.array([5, 4, 3, 2, 1])

np.concatenate([a, b])


array([1, 2, 3, 4, 5, 5, 4, 3, 2, 1])

# Inspect Processing Set

In [5]:
import pandas as pd

# Set the maximum number of rows displayed before scrolling
pd.set_option("display.max_rows", 1000)

from xradio.vis.read_processing_set import read_processing_set

ps = read_processing_set("VLBA_TL016B_split_lsrk.vis.zarr")
ps.summary()

Unnamed: 0,name,obs_mode,shape,polarization,spw_name,field_name,source_name,field_coords,start_frequency,end_frequency
0,VLBA_TL016B_split_lsrk_3,obs_0,"(540, 55, 6, 2)","[RR, LL]",spw_1,[J1154+6022_1],[Unknown],"[fk5, 11h54m04.54s, 60d22m20.82s]",5068199000.0,5070699000.0
1,VLBA_TL016B_split_lsrk_2,obs_0,"(200, 55, 6, 2)","[RR, LL]",spw_1,[4C39.25_0],[Unknown],"[fk5, 9h27m03.01s, 39d02m20.85s]",5068199000.0,5070699000.0
2,VLBA_TL016B_split_lsrk_0,obs_0,"(200, 55, 6, 2)","[RR, LL]",spw_0,[4C39.25_0],[Unknown],"[fk5, 9h27m03.01s, 39d02m20.85s]",5004196000.0,5006697000.0
3,VLBA_TL016B_split_lsrk_1,obs_0,"(540, 55, 6, 2)","[RR, LL]",spw_0,[J1154+6022_1],[Unknown],"[fk5, 11h54m04.54s, 60d22m20.82s]",5004196000.0,5006697000.0


# Inspect antenna_xds:

In [6]:
ant_xds = ps['VLBA_TL016B_split_lsrk_0'].attrs['antenna_xds'].load()
ant_xds

In [7]:
ant_xds.mount

In [8]:
import pandas as pd

# Assuming 'df' is your DataFrame and 'planets' is the column with the lists
df = pd.DataFrame({
    'planets': [['sun', 'neptune', 'jupiter'], ['sun', 'saturn','mars'], ['neptune'], ['mars', 'mecurary', 'venus']]
})

# Select all rows where the list in the 'planets' column contains 'sun'
df_sun = df[df['planets'].apply(lambda x: 'sun' in x)]

print(df_sun)

                   planets
0  [sun, neptune, jupiter]
1      [sun, saturn, mars]
