# Review Instructions

Please review the MSv4 processing_set class https://github.com/casangi/xradio/blob/main/src/xradio/vis/_processing_set.py

The processing set is a loose collection of MSv4 which might come from multiple MSv2 (or ASDMS). Consequently, arbitrary ids are avoided in favor of descriptive strings.

Run the notebook using:
- ```partition_scheme=['FIELD_ID']``` 
and 
- ```partition_scheme=[]``` 

## Key Questions to Answer
1) Is there additional information to display in the summary table?
2) Are the docstrings sufficient?
3) Are there missing data selection use cases?
4) ...

# Environment instructions

It is recommended to use the conda environment manager to create a clean, self-contained runtime where xradio and all its dependencies can be installed:

```bash
conda create --name xradio python=3.11 --no-default-packages
conda activate xradio
```

Clone the repository, checkout the review branch and do a local install:

```bash
git clone https://github.com/casangi/xradio.git
git checkout 213-fix-ps-selection
cd xradio
pip install -e .
```

On macOS it is required to pre-install python-casacore using ```bash conda install -c conda-forge python-casacore```.

# Download Data

In [1]:
from xradio.vis.convert_msv2_to_processing_set import convert_msv2_to_processing_set
from xradio.vis.read_processing_set import read_processing_set
import graphviper

graphviper.utils.data.download(file="ALMA_uid___A002_X1003af4_X75a3.split.avg.ms")

[[38;2;128;05;128m2024-08-05 13:15:50,640[0m] [38;2;50;50;205m    INFO[0m[38;2;112;128;144m  graphviper: [0m Updating file metadata information ...  
 

[[38;2;128;05;128m2024-08-05 13:15:51,543[0m] [38;2;50;50;205m    INFO[0m[38;2;112;128;144m  graphviper: [0m File exists: ALMA_uid___A002_X1003af4_X75a3.split.avg.ms 


# Start Dask cluster 
Choose an approriate number of cores and memory_limit (this is per core).

In [2]:
from graphviper.dask.client import local_client

viper_client = local_client(cores=4, memory_limit="4GB")
viper_client

[[38;2;128;05;128m2024-08-05 13:15:51,633[0m] [38;2;50;50;205m    INFO[0m[38;2;112;128;144m  graphviper: [0m Checking parameter values for [38;2;50;50;205mclient[0m.[38;2;50;50;205mlocal_client[0m 
[[38;2;128;05;128m2024-08-05 13:15:51,634[0m] [38;2;50;50;205m    INFO[0m[38;2;112;128;144m  graphviper: [0m Module path: [38;2;50;50;205m/Users/jsteeb/Dropbox/graphviper/[0m 


Perhaps you already have a cluster running?
Hosting the HTTP server on port 61448 instead


[[38;2;128;05;128m2024-08-05 13:15:52,323[0m] [38;2;50;50;205m    INFO[0m[38;2;112;128;144m      client: [0m Created client <MenrvaClient: 'tcp://127.0.0.1:61449' processes=4 threads=4, memory=14.90 GiB> 


0,1
Connection method: Cluster object,Cluster type: distributed.LocalCluster
Dashboard: http://127.0.0.1:61448/status,

0,1
Dashboard: http://127.0.0.1:61448/status,Workers: 4
Total threads: 4,Total memory: 14.90 GiB
Status: running,Using processes: True

0,1
Comm: tcp://127.0.0.1:61449,Workers: 4
Dashboard: http://127.0.0.1:61448/status,Total threads: 4
Started: Just now,Total memory: 14.90 GiB

0,1
Comm: tcp://127.0.0.1:61462,Total threads: 1
Dashboard: http://127.0.0.1:61464/status,Memory: 3.73 GiB
Nanny: tcp://127.0.0.1:61452,
Local directory: /var/folders/b7/dx896v1x4yjb9v6rvs_n2hs00000gp/T/dask-scratch-space/worker-5lx9wc6c,Local directory: /var/folders/b7/dx896v1x4yjb9v6rvs_n2hs00000gp/T/dask-scratch-space/worker-5lx9wc6c
Tasks executing:,Tasks in memory:
Tasks ready:,Tasks in flight:
CPU usage: 0.0%,Last seen: Just now
Memory usage: 65.67 MiB,Spilled bytes: 0 B
Read bytes: 0.0 B,Write bytes: 0.0 B

0,1
Comm: tcp://127.0.0.1:61460,Total threads: 1
Dashboard: http://127.0.0.1:61466/status,Memory: 3.73 GiB
Nanny: tcp://127.0.0.1:61454,
Local directory: /var/folders/b7/dx896v1x4yjb9v6rvs_n2hs00000gp/T/dask-scratch-space/worker-68qw2ee8,Local directory: /var/folders/b7/dx896v1x4yjb9v6rvs_n2hs00000gp/T/dask-scratch-space/worker-68qw2ee8
Tasks executing:,Tasks in memory:
Tasks ready:,Tasks in flight:
CPU usage: 0.0%,Last seen: Just now
Memory usage: 65.23 MiB,Spilled bytes: 0 B
Read bytes: 0.0 B,Write bytes: 0.0 B

0,1
Comm: tcp://127.0.0.1:61463,Total threads: 1
Dashboard: http://127.0.0.1:61470/status,Memory: 3.73 GiB
Nanny: tcp://127.0.0.1:61456,
Local directory: /var/folders/b7/dx896v1x4yjb9v6rvs_n2hs00000gp/T/dask-scratch-space/worker-pmwjtxr6,Local directory: /var/folders/b7/dx896v1x4yjb9v6rvs_n2hs00000gp/T/dask-scratch-space/worker-pmwjtxr6
Tasks executing:,Tasks in memory:
Tasks ready:,Tasks in flight:
CPU usage: 0.0%,Last seen: Just now
Memory usage: 66.19 MiB,Spilled bytes: 0 B
Read bytes: 0.0 B,Write bytes: 0.0 B

0,1
Comm: tcp://127.0.0.1:61461,Total threads: 1
Dashboard: http://127.0.0.1:61465/status,Memory: 3.73 GiB
Nanny: tcp://127.0.0.1:61458,
Local directory: /var/folders/b7/dx896v1x4yjb9v6rvs_n2hs00000gp/T/dask-scratch-space/worker-mdauhshs,Local directory: /var/folders/b7/dx896v1x4yjb9v6rvs_n2hs00000gp/T/dask-scratch-space/worker-mdauhshs
Tasks executing:,Tasks in memory:
Tasks ready:,Tasks in flight:
CPU usage: 0.0%,Last seen: Just now
Memory usage: 65.20 MiB,Spilled bytes: 0 B
Read bytes: 0.0 B,Write bytes: 0.0 B


# Convert dataset

In [3]:
from xradio.vis.convert_msv2_to_processing_set import convert_msv2_to_processing_set
import os

in_file = "ALMA_uid___A002_X1003af4_X75a3.split.avg.ms"
out_file = "ALMA_uid___A002_X1003af4_X75a3.split.avg.zarr"
os.system("rm -rf "+out_file)

partition_scheme=['FIELD_ID'] #can be ephemeris_interpolate=True/False, Default
#partition_scheme=[] #must be ephemeris_interpolate=True, Rapid OTF mode

convert_msv2_to_processing_set(
    in_file=in_file,
    out_file=out_file,
    parallel=True,
    overwrite=True,
    ephemeris_interpolate=True,
    partition_scheme=partition_scheme
)

[[38;2;128;05;128m2024-08-05 13:15:53,690[0m] [38;2;50;50;205m    INFO[0m[38;2;112;128;144m      client: [0m Partition scheme that will be used: ['DATA_DESC_ID', 'OBS_MODE', 'OBSERVATION_ID', 'FIELD_ID'] 
[[38;2;128;05;128m2024-08-05 13:15:54,396[0m] [38;2;50;50;205m    INFO[0m[38;2;112;128;144m      client: [0m Number of partitions: 96 
[[38;2;128;05;128m2024-08-05 13:15:54,396[0m] [38;2;50;50;205m    INFO[0m[38;2;112;128;144m      client: [0m OBSERVATION_ID [0], DDI [0], STATE [0], FIELD [0], SCAN [7] 
[[38;2;128;05;128m2024-08-05 13:15:54,397[0m] [38;2;50;50;205m    INFO[0m[38;2;112;128;144m      client: [0m OBSERVATION_ID [0], DDI [0], STATE [16], FIELD [0], SCAN [7] 
[[38;2;128;05;128m2024-08-05 13:15:54,397[0m] [38;2;50;50;205m    INFO[0m[38;2;112;128;144m      client: [0m OBSERVATION_ID [0], DDI [0], STATE [17], FIELD [0], SCAN [7] 
[[38;2;128;05;128m2024-08-05 13:15:54,398[0m] [38;2;50;50;205m    INFO[0m[38;2;112;128;144m      client: [0m OBS

# Inspect Processing Set

In [4]:
import pandas as pd

# Set the maximum number of rows displayed before scrolling
pd.set_option("display.max_rows", 1000)

from xradio.vis.read_processing_set import read_processing_set

ps = read_processing_set("ALMA_uid___A002_X1003af4_X75a3.split.avg.zarr")
ps.summary()

Unnamed: 0,name,obs_mode,shape,polarization,spw_name,field_name,source_name,field_coords,start_frequency,end_frequency
0,ALMA_uid___A002_X1003af4_X75a3.split.avg_52,OBSERVE_TARGET#ON_SOURCE,"(9, 51, 4, 1)",[XX],WVR#NOMINAL_1,[Sun_10_18],[Sun_10_0],Ephemeris,184550000000.0,190550000000.0
1,ALMA_uid___A002_X1003af4_X75a3.split.avg_55,OBSERVE_TARGET#ON_SOURCE,"(9, 51, 4, 1)",[XX],WVR#NOMINAL_1,[Sun_10_21],[Sun_10_0],Ephemeris,184550000000.0,190550000000.0
2,ALMA_uid___A002_X1003af4_X75a3.split.avg_63,OBSERVE_TARGET#ON_SOURCE,"(6, 51, 4, 1)",[XX],WVR#NOMINAL_1,[Sun_10_29],[Sun_10_0],Ephemeris,184550000000.0,190550000000.0
3,ALMA_uid___A002_X1003af4_X75a3.split.avg_90,OBSERVE_TARGET#ON_SOURCE,"(9, 1326, 7, 2)","[XX, YY]",X767114449#ALMA_RB_06#BB_1#SW-01#FULL_RES_2,[Sun_10_24],[Sun_10_0],Ephemeris,229960900000.0,230054700000.0
4,ALMA_uid___A002_X1003af4_X75a3.split.avg_64,"CALIBRATE_ATMOSPHERE#OFF_SOURCE,CALIBRATE_WVR#...","(2, 1326, 7, 2)","[XX, YY]",X767114449#ALMA_RB_06#BB_1#SW-01#FULL_RES_2,[Sun_10_0],[Sun_10_0],Ephemeris,229960900000.0,230054700000.0
5,ALMA_uid___A002_X1003af4_X75a3.split.avg_30,OBSERVE_TARGET#ON_SOURCE,"(8, 51, 1, 2)","[XX, YY]",X767114449#ALMA_RB_06#BB_4#SQLD_0,[Sun_10_28],[Sun_10_0],Ephemeris,248000000000.0,248000000000.0
6,ALMA_uid___A002_X1003af4_X75a3.split.avg_37,OBSERVE_TARGET#ON_SOURCE,"(9, 51, 4, 1)",[XX],WVR#NOMINAL_1,[Sun_10_3],[Sun_10_0],Ephemeris,184550000000.0,190550000000.0
7,ALMA_uid___A002_X1003af4_X75a3.split.avg_39,OBSERVE_TARGET#ON_SOURCE,"(9, 51, 4, 1)",[XX],WVR#NOMINAL_1,[Sun_10_5],[Sun_10_0],Ephemeris,184550000000.0,190550000000.0
8,ALMA_uid___A002_X1003af4_X75a3.split.avg_65,"CALIBRATE_ATMOSPHERE#AMBIENT,CALIBRATE_WVR#AMB...","(2, 1326, 7, 2)","[XX, YY]",X767114449#ALMA_RB_06#BB_1#SW-01#FULL_RES_2,[Sun_10_0],[Sun_10_0],Ephemeris,229960900000.0,230054700000.0
9,ALMA_uid___A002_X1003af4_X75a3.split.avg_91,OBSERVE_TARGET#ON_SOURCE,"(9, 1326, 7, 2)","[XX, YY]",X767114449#ALMA_RB_06#BB_1#SW-01#FULL_RES_2,[Sun_10_25],[Sun_10_0],Ephemeris,229960900000.0,230054700000.0


# Using ps.sel() with summary table column names

In [5]:
#Note that no selection is applied on the MS data so even if field_name=['Sun_10_10','Sun_10_11'] all the fields are kept.
ps.sel(field_coords='Ephemeris',field_name=['Sun_10_10','Sun_10_11']).summary() #Select all Ephemeris data and where any of the fields are 'Sun_10_10' or 'Sun_10_11'.



Unnamed: 0,name,obs_mode,shape,polarization,spw_name,field_name,source_name,field_coords,start_frequency,end_frequency
0,ALMA_uid___A002_X1003af4_X75a3.split.avg_13,OBSERVE_TARGET#ON_SOURCE,"(12, 51, 1, 2)","[XX, YY]",X767114449#ALMA_RB_06#BB_4#SQLD_0,[Sun_10_11],[Sun_10_0],Ephemeris,248000000000.0,248000000000.0
1,ALMA_uid___A002_X1003af4_X75a3.split.avg_76,OBSERVE_TARGET#ON_SOURCE,"(9, 1326, 7, 2)","[XX, YY]",X767114449#ALMA_RB_06#BB_1#SW-01#FULL_RES_2,[Sun_10_10],[Sun_10_0],Ephemeris,229960900000.0,230054700000.0
2,ALMA_uid___A002_X1003af4_X75a3.split.avg_12,OBSERVE_TARGET#ON_SOURCE,"(12, 51, 1, 2)","[XX, YY]",X767114449#ALMA_RB_06#BB_4#SQLD_0,[Sun_10_10],[Sun_10_0],Ephemeris,248000000000.0,248000000000.0
3,ALMA_uid___A002_X1003af4_X75a3.split.avg_77,OBSERVE_TARGET#ON_SOURCE,"(9, 1326, 7, 2)","[XX, YY]",X767114449#ALMA_RB_06#BB_1#SW-01#FULL_RES_2,[Sun_10_11],[Sun_10_0],Ephemeris,229960900000.0,230054700000.0
4,ALMA_uid___A002_X1003af4_X75a3.split.avg_44,OBSERVE_TARGET#ON_SOURCE,"(9, 51, 4, 1)",[XX],WVR#NOMINAL_1,[Sun_10_10],[Sun_10_0],Ephemeris,184550000000.0,190550000000.0
5,ALMA_uid___A002_X1003af4_X75a3.split.avg_45,OBSERVE_TARGET#ON_SOURCE,"(9, 51, 4, 1)",[XX],WVR#NOMINAL_1,[Sun_10_11],[Sun_10_0],Ephemeris,184550000000.0,190550000000.0


# Using ps.sel() with query and summary table column names

In [6]:
ps.sel(query="start_frequency > 2.46e11",field_coords='Ephemeris',field_name=['Sun_10_10','Sun_10_11']).summary() #Select all Ephemeris data and where any of the fields are 'Sun_10_10' or 'Sun_10_11'.

Unnamed: 0,name,obs_mode,shape,polarization,spw_name,field_name,source_name,field_coords,start_frequency,end_frequency
0,ALMA_uid___A002_X1003af4_X75a3.split.avg_13,OBSERVE_TARGET#ON_SOURCE,"(12, 51, 1, 2)","[XX, YY]",X767114449#ALMA_RB_06#BB_4#SQLD_0,[Sun_10_11],[Sun_10_0],Ephemeris,248000000000.0,248000000000.0
1,ALMA_uid___A002_X1003af4_X75a3.split.avg_12,OBSERVE_TARGET#ON_SOURCE,"(12, 51, 1, 2)","[XX, YY]",X767114449#ALMA_RB_06#BB_4#SQLD_0,[Sun_10_10],[Sun_10_0],Ephemeris,248000000000.0,248000000000.0
