# SP-5327 Demo - Improve MWA data discoverability

<https://confluence.skatelescope.org/display/SRCSC/SP-5327+Demo>

we will:

- obtain MWA [visibility](https://confluence.skatelescope.org/display/SRCSC/MWA+Visibilities+on+SRCNet) data
- upload this to Rucio
- update ObsCore metadata in Rucio
- find the observation in TAP

## Running this Notebook

This notebook is designed to be run inside a modified karabo docker image. 
`sp5327demo.Dockerfile` is available in the sp5327demo branch

I recommend using the VSCode DevContainers extension.

## (optional) Query MWA visibility data

We can query the MWA TAP service to see what data is available. Details:

- [MWA ASVO documentation](https://mwatelescope.atlassian.net/wiki/spaces/MP/pages/24970532/MWA+ASVO+VO+Services)
- [MWA TAP Schema](https://mwatelescope.atlassian.net/wiki/spaces/MP/pages/24970424/TAP+mwa.observation+Schema+and+Examples)

In this example query, we are looking for all publicly available observations near Centaurus A at 175MHz

In [1]:
import pyvo
from astropy.time import Time, TimeDelta
from sys import stderr, argv

# get gpstime of proprietary period, 18 months (548 days) ago
proprietary = (Time.now() - TimeDelta(548, format="jd")).gps
tap = pyvo.dal.TAPService("http://vo.mwatelescope.org/mwa_asvo/tap")
obs = (
    tap.search(
        f"""
SELECT TOP 10
    obs_id, starttime_utc, ra_pointing, dec_pointing, channel_numbers_csv,
    mwa_array_configuration, good_tiles, dataquality, sun_elevation,
    deleted_flag, gpubox_files_archived, total_archived_data_bytes,
    freq_res, int_time
FROM mwa.observation
WHERE CONTAINS(
    POINT('ICRS', ra_pointing, dec_pointing),  -- pointing center
    CIRCLE('ICRS', 201.3667, -43.0192, 5)      -- is 5 degrees off CenA
) = 1
AND channel_numbers_csv LIKE '%137%'           -- has channel 137 (175MHz)
AND obs_id < {proprietary}                     -- nonproprietary
AND good_tiles >= 112                          -- 14 good receivers
AND dataquality <= 1                           -- no known issues
AND sun_elevation < 0                          -- sun is not up
AND deleted_flag!='TRUE'                       -- not deleted
AND gpubox_files_archived > 1                  -- data available
-- AND freq_res <= 10                          -- (optional) 10kHz resolution or less
-- AND int_time <= 1                           -- (optional) 1s integration or less
-- AND mwa_array_configuration = 'Phase II Compact' -- (optional) compact => more short baselines
ORDER BY obs_id DESC
"""
    )
    .to_table()
    .to_pandas()
    .dropna(axis=1, how="all")
)
obs["config"] = obs["mwa_array_configuration"].str.split(" ").str[-1]
del obs["mwa_array_configuration"]
obs["gigabytes"] = obs["total_archived_data_bytes"] / 1e9
for col in [
    "dataquality",
    "sun_elevation",
    "deleted_flag",
    "gpubox_files_archived",
    "total_archived_data_bytes",
    "channel_numbers_csv",
]:
    del obs[col]
display(obs)

Unnamed: 0,obs_id,starttime_utc,ra_pointing,dec_pointing,good_tiles,freq_res,int_time,config,gigabytes
0,1376647872,2023-08-21T10:10:54.000Z,198.376404,-42.690933,117,10.0,0.5,Compact,115.409457
1,1376561440,2023-08-20T10:10:22.000Z,197.259338,-42.690193,133,10.0,0.5,Compact,115.409457
2,1376475016,2023-08-19T10:09:58.000Z,196.175659,-42.689518,133,10.0,0.5,Compact,115.409457
3,1376388584,2023-08-18T10:09:26.000Z,205.159927,-44.801117,133,10.0,0.5,Compact,115.409457
4,1376302160,2023-08-17T10:09:02.000Z,204.076279,-44.800129,133,10.0,0.5,Compact,115.409457
5,1376215728,2023-08-16T10:08:30.000Z,202.959305,-44.79916,133,10.0,0.5,Compact,115.409457
6,1376129304,2023-08-15T10:08:06.000Z,201.875687,-44.798267,133,10.0,0.5,Compact,115.409457
7,1376042872,2023-08-14T10:07:34.000Z,200.758743,-44.797398,133,10.0,0.5,Compact,115.409457
8,1375956448,2023-08-13T10:07:10.000Z,199.675171,-44.7966,132,10.0,0.5,Compact,115.409457
9,1375870016,2023-08-12T10:06:38.000Z,198.558258,-44.795826,133,10.0,0.5,Compact,115.409457


## (optional) Download MWA visibility data

Create an account on [MWA ASVO](https://asvo.mwatelescope.org/), obtain your API key with these [instructions](https://mwatelescope.atlassian.net/wiki/spaces/MP/pages/24972779/MWA+ASVO+Command+Line+Clients) and set the environment variable

```bash
export MWA_ASVO_API_KEY="..."
```

Submit a conversion job to ASVO. Many conversion parameters are available. Measurement sets and uvfits supported.

```bash
docker run -it --rm -e MWA_ASVO_API_KEY mwatelescope/giant-squid:latest \
    submit-conv -w -p output=uvfits,avg_freq_res=40,avg_time_res=2,flag_edge_width=80 \
    1184702048
```

conversion takes a few minutes or hours, you can check status with

```bash
docker run -it --rm -e MWA_ASVO_API_KEY mwatelescope/giant-squid:latest \
    list
```

and download with

```bash
docker run -it --rm -v ${PWD}:${PWD} -w ${PWD} -e MWA_ASVO_API_KEY mwatelescope/giant-squid:latest \
    download \
    1184702048
```

## Here's one I prepared earlier

For this demo, we will add a new mwa observation to Rucio, 1184702048

processing steps have already been applied:

- [aoflagger](https://aoflagger.readthedocs.io/en/latest/) flags applied, and averaged to 2s_40kHz during preprocessing
- [ssins](https://ssins.readthedocs.io/en/latest/) flags calculated from the averaged data (ssins should be applied at the raw resolution, but we don't have enough memory on Setonix for this)
- calibrated by [hyperdrive](https://mwatelescope.github.io/mwa_hyperdrive/) with 8000 sources from this eor sky model  (300 calibration iterations, 30 lambda minimum uv cutoff)
- averaged to 8s_80kHz
- hyperdrive ionospheric subtraction of 1000 sources (regular subtraction of additional 7000 sources)

I've made this data publicly available for this demo.

In [None]:
import requests

url = "https://projects.pawsey.org.au/high0.uvfits/hyp_1184702048_ionosub_ssins_30l_src8k_300it_8s_80kHz_i1000.uvfits"
vis_path = "hyp_1184702048_ionosub.uvfits"

response = requests.get(url, stream=True)
response.raise_for_status()

with open(vis_path, "wb") as file:
    for chunk in response.iter_content(chunk_size=8192):
        file.write(chunk)

## (Optional) What about Measurement Sets?

Unlike uvfits, CASA Measurement Sets are not supported in Rucio because it is a hierarchical format (see: [SP-4620](https://jira.skatelescope.org/browse/SP-4620)), but you can convert uvfits to Measurement Set with casa.

```bash
docker run --rm -it -v $PWD:$PWD -w $PWD d3vnull0/casa casa importuvfits('vis.uvfits', 'vis.ms')
```

## Generate ObsCore Metadata

We will use Karabo's [`Visibility`](https://i4ds.github.io/Karabo-Pipeline/karabo.simulation.html#karabo.simulation.visibility.Visibility) class to read the visibility metadata and generate ObsCore metadata.

This takes a visibility file, and generates a new ObsCore metadata file in json format.

Previously, this worked with both Measurement Sets and OSKAR visibility files. 

Uvfits support was added in [choc-234](https://jira.skatelescope.org/browse/CHOC-234). 

A [pull request](https://github.com/i4Ds/Karabo-Pipeline/pull/646) is open, but we're just sorting out an issue with Conda before merging.

In [None]:
from karabo.data.obscore import ObsCoreMeta
from karabo.data.src import RucioMeta
from karabo.simulation.visibility import Visibility
from astropy.time import Time
import os

# default values
lifetime=365.25*86400 # lifetime on Rucio in seconds (1 year)
namespace="sp4896"

# create a visibility object from the downloaded file
vis = Visibility(vis_path)
# generate ObsCore metadata from visibility
vis_ocm = ObsCoreMeta.from_visibility(
    vis=vis,
    calibrated=False,
)
# convert to Rucio metadata
vis_rm = RucioMeta(
    namespace=namespace,  # needs to be specified by Rucio service
    name=os.path.split(vis.path)[-1],  # remove path-infos for `name`
    lifetime=lifetime,  # 1 day
    dataset_name=None,
    meta=vis_ocm,
)

# Set additional ObsCore fields
# -> collection
vis_ocm.obs_collection = "MRO/MWA"

# -> for MWA, obsid is nearest 8 second GPS time
start_time = int(Time(vis_ocm.t_min, format="mjd").gps)
vis_ocm.obs_id = str(int(start_time // 8) * 8)
vis_ocm.obs_publisher_did = ObsCoreMeta.get_ivoid(
    authority="org.mwatelescope",
    path=f"/obs_id/{vis_ocm.obs_id}",
    query=None,
    fragment=None,
)

vis_path_meta = RucioMeta.get_meta_fname(fname=vis.path)
_ = vis_rm.to_dict(fpath=vis_path_meta)
print(f"Created {vis_path_meta=}")

display(vis_ocm.to_dict())

## Upload to Rucio

We will use the `rucio` command line client to upload the data to Rucio.

We'll start a docker shell in the rucio client container

```bash
docker run -it --rm \
    -e RUCIO_CFG_CLIENT_ACCOUNT=dev_null \
    -e RUCIO_CFG_OIDC_SCOPE="openid profile offline_access wlcg.groups storage.create:/" \
    -v $PWD:$PWD \
    -w $PWD \
    registry.gitlab.com/ska-telescope/src/src-dm/ska-src-dm-da-rucio-client:release-35.6.0
```

ensure that `storage.create:/` is added to your oauth scope (this must be the first command you run in the image!)

```bash
rucio --oidc-scope "openid profile offline_access wlcg.groups storage.create:/" whoami
```

### list endpoints

```bash
rucio list-rses
```

### upload to a working rucio endpoint

```bash
rucio upload --rse JPSRC_STORM --scope sp4896_mwa --lifetime 86400 karabo/examples/hyp_1184702048_ionosub.uvfits
```

### check the upload was successful

```bash
rucio list-file-replicas sp4896_mwa:hyp_1184702048_ionosub.uvfits
```

| SCOPE | NAME | FILESIZE | ADLER32 | RSE: REPLICA |
|-------|------|----------|---------|-------------|
| sp4896_mwa | hyp_1184702048_ionosub.uvfits | 1.908 GB | f3c2e027 | JPSRC_STORM: davs://jp-src-s000.mtk.nao.ac.jp:8443/storm/sa/sp4896_mwa/88/5d/hyp_1184702048_ionosub.uvfits |

## Set Rucio Metadata

<details>

originally found on confluence <https://confluence.skatelescope.org/download/attachments/300810226/set_metadata.py>

referenced on this page <https://confluence.skatelescope.org/display/SRCSC/CHOC-92%3A+Ingest+image+data+with+rucio-upload+and+add-metadata>

should be run in Rucio container, e.g.

```bash
docker run -it --rm \
    -e RUCIO_CFG_CLIENT_ACCOUNT=dev_null \
    -e RUCIO_CFG_RUCIO_HOST='https://rucio.srcnet.skao.int' \
    -e RUCIO_CFG_AUTH_HOST='https://rucio-auth.srcnet.skao.int' \
    -e RUCIO_CFG_AUTH_TYPE=oidc \
    -v $PWD:$PWD \
    -w $PWD \
    registry.gitlab.com/ska-telescope/src/src-dm/ska-src-dm-da-rucio-client:release-35.6.0
```
</default>

In [None]:
from os.path import basename
from json import loads
from rucio.client.didclient import DIDClient

# set with your own Rucio account
rucio_account = "dev_null"

with open(vis_path_meta) as f:
  metadata = loads(f.read())
did_client = DIDClient()
result = did_client.set_metadata_bulk(
  scope=namespace,
  name=basename(vis_path),
  meta=metadata['meta'],
)
print(result)

## Query SRCNet TAP service

let's check that the data we uploaded is available in the SRCNet TAP service

In [None]:
import pyvo
tap_srcnet = pyvo.dal.TAPService('https://dachs.ivoa.srcnet.skao.int:443/tap')
obs_srcnet = tap_srcnet.search(f"""
SELECT * FROM ivoa.obscore where obs_publisher_did = '{vis_ocm.obs_publisher_did}'
""").to_table().to_pandas().dropna(axis=1, how='all')
if not len(obs_srcnet):
    print("no obs found!")
else:
    display(obs_srcnet.iloc[0])

## Cleanup After Demo

In [None]:
# deliberate break to avoid running the rest of the code
raise SystemExit(0)

In [None]:
# remove the files created by this notebook
os.remove(vis_path)
os.remove(vis_path_meta)