# Review Instructions

Please review the MSv4 processing_set class https://github.com/casangi/xradio/blob/main/src/xradio/vis/_processing_set.py

The processing set is a loose collection of MSv4 which might come from multiple MSv2 (or ASDMS). Consequently, arbitrary ids are avoided in favor of descriptive strings.

Run the notebook using:
- ```partition_scheme=['FIELD_ID']``` 
and 
- ```partition_scheme=[]``` 

## Key Questions to Answer
1) Is there additional information to display in the summary table?
2) Are the docstrings sufficient?
3) Are there missing data selection use cases?
4) ...

# Environment instructions

It is recommended to use the conda environment manager to create a clean, self-contained runtime where xradio and all its dependencies can be installed:

```bash
conda create --name xradio python=3.11 --no-default-packages
conda activate xradio
```

Clone the repository, checkout the review branch and do a local install:

```bash
git clone https://github.com/casangi/xradio.git
git checkout 213-fix-ps-selection
cd xradio
pip install -e .
```

On macOS it is required to pre-install python-casacore using ```bash conda install -c conda-forge python-casacore```.

# Download Data

In [1]:
from xradio.vis.convert_msv2_to_processing_set import convert_msv2_to_processing_set
from xradio.vis.read_processing_set import read_processing_set
import graphviper

graphviper.utils.data.download(file="ALMA_uid___A002_X1003af4_X75a3.split.avg.ms")

[[38;2;128;05;128m2024-08-13 11:11:14,186[0m] [38;2;50;50;205m    INFO[0m[38;2;112;128;144m  graphviper: [0m Updating file metadata information ...  
 

[[38;2;128;05;128m2024-08-13 11:11:15,240[0m] [38;2;50;50;205m    INFO[0m[38;2;112;128;144m  graphviper: [0m File exists: ALMA_uid___A002_X1003af4_X75a3.split.avg.ms 


# Start Dask cluster 
Choose an approriate number of cores and memory_limit (this is per core).

In [2]:
from graphviper.dask.client import local_client

viper_client = local_client(cores=4, memory_limit="4GB")
viper_client

[[38;2;128;05;128m2024-08-13 11:11:15,323[0m] [38;2;50;50;205m    INFO[0m[38;2;112;128;144m  graphviper: [0m Checking parameter values for [38;2;50;50;205mclient[0m.[38;2;50;50;205mlocal_client[0m 
[[38;2;128;05;128m2024-08-13 11:11:15,323[0m] [38;2;50;50;205m    INFO[0m[38;2;112;128;144m  graphviper: [0m Module path: [38;2;50;50;205m/Users/jsteeb/Downloads/yes/envs/zinc/lib/python3.11//site-packages/[0m 


Perhaps you already have a cluster running?
Hosting the HTTP server on port 63329 instead


[[38;2;128;05;128m2024-08-13 11:11:16,105[0m] [38;2;50;50;205m    INFO[0m[38;2;112;128;144m      client: [0m Created client <MenrvaClient: 'tcp://127.0.0.1:63330' processes=4 threads=4, memory=14.90 GiB> 


0,1
Connection method: Cluster object,Cluster type: distributed.LocalCluster
Dashboard: http://127.0.0.1:63329/status,

0,1
Dashboard: http://127.0.0.1:63329/status,Workers: 4
Total threads: 4,Total memory: 14.90 GiB
Status: running,Using processes: True

0,1
Comm: tcp://127.0.0.1:63330,Workers: 4
Dashboard: http://127.0.0.1:63329/status,Total threads: 4
Started: Just now,Total memory: 14.90 GiB

0,1
Comm: tcp://127.0.0.1:63341,Total threads: 1
Dashboard: http://127.0.0.1:63343/status,Memory: 3.73 GiB
Nanny: tcp://127.0.0.1:63333,
Local directory: /var/folders/b7/dx896v1x4yjb9v6rvs_n2hs00000gp/T/dask-scratch-space/worker-8_6few90,Local directory: /var/folders/b7/dx896v1x4yjb9v6rvs_n2hs00000gp/T/dask-scratch-space/worker-8_6few90
Tasks executing:,Tasks in memory:
Tasks ready:,Tasks in flight:
CPU usage: 0.0%,Last seen: Just now
Memory usage: 66.48 MiB,Spilled bytes: 0 B
Read bytes: 0.0 B,Write bytes: 0.0 B

0,1
Comm: tcp://127.0.0.1:63342,Total threads: 1
Dashboard: http://127.0.0.1:63345/status,Memory: 3.73 GiB
Nanny: tcp://127.0.0.1:63335,
Local directory: /var/folders/b7/dx896v1x4yjb9v6rvs_n2hs00000gp/T/dask-scratch-space/worker-6w9yy3gd,Local directory: /var/folders/b7/dx896v1x4yjb9v6rvs_n2hs00000gp/T/dask-scratch-space/worker-6w9yy3gd
Tasks executing:,Tasks in memory:
Tasks ready:,Tasks in flight:
CPU usage: 0.0%,Last seen: Just now
Memory usage: 66.89 MiB,Spilled bytes: 0 B
Read bytes: 0.0 B,Write bytes: 0.0 B

0,1
Comm: tcp://127.0.0.1:63347,Total threads: 1
Dashboard: http://127.0.0.1:63349/status,Memory: 3.73 GiB
Nanny: tcp://127.0.0.1:63337,
Local directory: /var/folders/b7/dx896v1x4yjb9v6rvs_n2hs00000gp/T/dask-scratch-space/worker-8ud403d2,Local directory: /var/folders/b7/dx896v1x4yjb9v6rvs_n2hs00000gp/T/dask-scratch-space/worker-8ud403d2
Tasks executing:,Tasks in memory:
Tasks ready:,Tasks in flight:
CPU usage: 0.0%,Last seen: Just now
Memory usage: 66.00 MiB,Spilled bytes: 0 B
Read bytes: 1.67 MiB,Write bytes: 1.67 MiB

0,1
Comm: tcp://127.0.0.1:63348,Total threads: 1
Dashboard: http://127.0.0.1:63351/status,Memory: 3.73 GiB
Nanny: tcp://127.0.0.1:63339,
Local directory: /var/folders/b7/dx896v1x4yjb9v6rvs_n2hs00000gp/T/dask-scratch-space/worker-m8nsb2zr,Local directory: /var/folders/b7/dx896v1x4yjb9v6rvs_n2hs00000gp/T/dask-scratch-space/worker-m8nsb2zr
Tasks executing:,Tasks in memory:
Tasks ready:,Tasks in flight:
CPU usage: 0.0%,Last seen: Just now
Memory usage: 65.84 MiB,Spilled bytes: 0 B
Read bytes: 0.0 B,Write bytes: 0.0 B


# Convert dataset

In [3]:
from xradio.vis.convert_msv2_to_processing_set import convert_msv2_to_processing_set
import os

in_file = "ALMA_uid___A002_X1003af4_X75a3.split.avg.ms"
out_file = "ALMA_uid___A002_X1003af4_X75a3.split.avg.zarr"
os.system("rm -rf "+out_file)

partition_scheme=['FIELD_ID'] #can be ephemeris_interpolate=True/False, Default
#partition_scheme=[] #must be ephemeris_interpolate=True, Rapid OTF mode

convert_msv2_to_processing_set(
    in_file=in_file,
    out_file=out_file,
    parallel=True,
    overwrite=True,
    ephemeris_interpolate=True,
    partition_scheme=partition_scheme
)

[[38;2;128;05;128m2024-08-13 11:11:16,407[0m] [38;2;50;50;205m    INFO[0m[38;2;112;128;144m      client: [0m Partition scheme that will be used: ['DATA_DESC_ID', 'OBS_MODE', 'OBSERVATION_ID', 'FIELD_ID'] 
[[38;2;128;05;128m2024-08-13 11:11:16,985[0m] [38;2;50;50;205m    INFO[0m[38;2;112;128;144m      client: [0m Number of partitions: 96 
[[38;2;128;05;128m2024-08-13 11:11:16,985[0m] [38;2;50;50;205m    INFO[0m[38;2;112;128;144m      client: [0m OBSERVATION_ID [0], DDI [0], STATE [0], FIELD [0], SCAN [7] 
[[38;2;128;05;128m2024-08-13 11:11:17,257[0m] [38;2;50;50;205m    INFO[0m[38;2;112;128;144m      client: [0m OBSERVATION_ID [0], DDI [0], STATE [16], FIELD [0], SCAN [7] 
[[38;2;128;05;128m2024-08-13 11:11:17,549[0m] [38;2;50;50;205m    INFO[0m[38;2;112;128;144m      client: [0m OBSERVATION_ID [0], DDI [0], STATE [17], FIELD [0], SCAN [7] 
[[38;2;128;05;128m2024-08-13 11:11:17,719[0m] [38;2;50;50;205m    INFO[0m[38;2;112;128;144m      client: [0m OBS

# Inspect Processing Set

In [None]:
import pandas as pd

# Set the maximum number of rows displayed before scrolling
pd.set_option("display.max_rows", 1000)

from xradio.vis.read_processing_set import read_processing_set

ps = read_processing_set("ALMA_uid___A002_X1003af4_X75a3.split.avg.zarr")
ps.summary()

# Using ps.sel() with summary table column names

In [None]:
#Note that no selection is applied on the MS data so even if field_name=['Sun_10_10','Sun_10_11'] all the fields are kept.
ps.sel(field_coords='Ephemeris',field_name=['Sun_10_10','Sun_10_11']).summary() #Select all Ephemeris data and where any of the fields are 'Sun_10_10' or 'Sun_10_11'.



# Using ps.sel() with query and summary table column names

In [None]:
ps.sel(query="start_frequency > 2.46e11",field_coords='Ephemeris',field_name=['Sun_10_10','Sun_10_11']).summary() #Select all Ephemeris data and where any of the fields are 'Sun_10_10' or 'Sun_10_11'.

# Ordering MSv4 Selection

In [None]:
summary_df = ps.sel(obs_mode='OBSERVE_TARGET#ON_SOURCE').summary()
summary_df=summary_df.sort_values(by=['start_frequency'],ascending=True)

summary_df

In [None]:
#First MS is then give by:
first_ms_name = summary_df['name'][0]
ps[first_ms_name]

# Sel by number

In [None]:

from xradio.vis.read_processing_set import read_processing_set

ps = read_processing_set("ALMA_uid___A002_X1003af4_X75a3.split.avg.zarr")
ps.summary()

min_freq = min(ps.summary()['start_frequency'])
ps.sel(start_frequency=min_freq).summary()

# Require exact match in selection criteria

In [None]:
from xradio.vis.read_processing_set import read_processing_set

ps = read_processing_set("ALMA_uid___A002_X1003af4_X75a3.split.avg.zarr")
ps.sel(name='ALMA_uid___A002_X1003af4_X75a3.split.avg_01',string_exact_match=True).summary()

# Allow for partial match in string. (select all MSv4 with field_names with "Sun_10" in the name).

In [None]:
from xradio.vis.read_processing_set import read_processing_set

ps = read_processing_set("ALMA_uid___A002_X1003af4_X75a3.split.avg.zarr")
ps.sel(field_name='Sun_10',string_exact_match=False).summary()

# Allow for partial match in string. (select all MSv4 with ALMA_uid___A002_X1003af4_X75a3.split.avg in the name).

This example does nothing but would be useful for a PS created from MSv4s that are created from different MSv2s.

In [None]:
from xradio.vis.read_processing_set import read_processing_set

ps = read_processing_set("ALMA_uid___A002_X1003af4_X75a3.split.avg.zarr")
ps.sel(name='ALMA_uid___A002_X1003af4_X75a3.split.avg',string_exact_match=False).summary()