[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/casangi/astrohack/blob/main/docs/tutorial_vla.ipynb)

![astrohack](_media/astrohack_logo.png)

In [None]:
import os

from importlib.metadata import version

try:
    import astrohack
    
    print('AstroHACK version', version('astrohack'), 'already installed.')
except ImportError as e:
    print(e)
    print('Installing AstroHACK')
    
    os.system("pip install astrohack")
    
    import astrohack 
    print('astrohack version', version('astrohack'), ' installed.')

# VLA Data Tutorial

### Important External Information

- #### xarray Official Documentation ([docs](https://docs.xarray.dev/en/stable/)).
- #### Dask Official Documentation ([docs](https://www.dask.org/)).
- #### zarr Official Documentation ([docs](https://zarr.readthedocs.io/en/stable/))

## Download Tutorial Data

In [None]:
from astrohack.gdown_utils import gdown_data

gdown_data('ea25_cal_small_spw1_4_60_ea04_after.ms', download_folder='data')

## Holography Data File API

As part of the `astroHACK` API a set of functions to allow users to easily open on disk holography files has been provided. Each function takes an `astroHACK` holography file name as an argument and returns an object related to the given file. Each object allows the user to access data via dictionary keys with values consisting of the relevant holography dataset. Each object allows provides a `summary()` helper object to print out the available keys for each file. An example call for each file type is show below and the API documentation for all data-io functions can be found [here](https://astrohack.readthedocs.io/en/latest/_api/autoapi/astrohack/dio/index.html).

In [4]:
from astrohack.dio import open_holog
from astrohack.dio import open_image
from astrohack.dio import open_panel
from astrohack.dio import open_pointing

holog_data = open_holog(file='./data/ea25_cal_small_spw1_4_60_ea04_after.holog.zarr')
image_data = open_image(file='./data/ea25_cal_small_spw1_4_60_ea04_after.image.zarr')
panel_data = open_panel(file='./data/ea25_cal_small_spw1_4_60_ea04_after.panel.zarr')
pointing_data = open_pointing(file='./data/ea25_cal_small_spw1_4_60_ea04_after.point.zarr')

## Setup Dask Local Cluster

The local Dask client that will handle scheduling and managing of worker for the parallelization can be initialized as below. The user has the option of choosing the number of cores and memory allocations for each worker howerver, we recommend a minimum of 8Gb per core with standard settings.


A significant amount of information related to the client and scheduling can be found usign the [Dask Dashboard](https://docs.dask.org/en/stable/dashboard.html). This is a built in dashboard that comes with Dask allows the user to monitor the works during processing. This is especially useful for profilling. For those that are interested in working soley within Jupyterlab a dashboard extension is availabe for [Jupyterlab](https://github.com/dask/dask-labextension#dask-jupyterlab-extension).

![dashboard](_media/dashboard.png)

from astrohack.astrohack_client import astrohack_local_client

client = astrohack_local_client(cores=2, memory_limit='8GB')
client

## Extract Holog

The extraction and restructuring of the holography data if done using the `extract_holog` function. This function is a direct replacement for the `UVHOL` AIPS function. The input to `extract_holog` is a compound dictionary containing the holography-relevant run information including, *scan*, *mapping* and *antenna* information. The structure of the compound dictionary is shown below and a detail description of the structure of the *holog_obs_description* dictionary can be found in the documentation [here](https://astrohack.readthedocs.io/en/latest/_api/autoapi/astrohack/extract_holog/index.html).

Inline information on the input paramters can also be gotten using `help(extract_holog)` in the cell.

In [None]:
from astrohack.extract_holog import extract_holog

scans=[
    8, 9, 10, 12, 13, 14, 16, 17, 18, 23, 24, 25, 
    27, 28, 29, 31, 32, 33, 38, 39, 40, 42, 43, 44, 
    46, 47, 48, 53, 54, 55, 57
]

holog_obs_description = {
    'map_0' :{
        'scans': scans,
        'ant':{
            'ea25':[
                'ea04'
            ]
        }
    }, 
    'ddi':[0]
}

#holog_obs_description['ddi'] = [0]

holog_mds = extract_holog(
    ms_name='data/ea25_cal_small_spw1_4_60_ea04_after.ms', 
    holog_obs_dict=holog_obs_description,
    data_col='CORRECTED_DATA',
    parallel=True,
    overwrite=True
)

Once `extract_holog` is finished two files are create: The extracted pointing information is written to disk in the form of a `<point_name>.point.zarr` and the extracted holography data is written to disk as `<point_name>.holog.zarr`. In addition, a holography data object is returned. The `holog_mds` object is a python dict containing the extracted holography data found in `.holog.zarr` but with extended functionality such as providing a summary of the run infomation in table form. Below for each `DDI` we can see the available `scan` and `antenna` information.


___point_name.point.zarr:___ <span style="color:red"> The pointing zarr file contains position and pointing information extracted from the pointing table of the input measurement set. In addition, the antenna and mapping scans information is listed for each antenna. The pointing object is structured as a simple dictionary with `key:value` sets with the key being the antenna id and the value being the pointing dataset. </span>

```
point_mds = 
{
   ant_0: point_ds,
            ⋮
   ant_n: point_ds
}
```


___holog_name.holog.zarr:___ <span style="color:red"> The holog zarr file contains ungridded data extracted from the pointing and main tables from the measurement set. The holog file includes the directional, visibility and weight information recorded on a shared time axis; the sampling is done because the native sample rates between the pointing and main tables are not the same. In addition, the meta data such as sampled parallactic data (beginning, middle and end of scan) and l(m) extent is recorded in the file attributes. The holog file structure is a compound dictionary keyed according to `ddi` -> `map` -> `ant` with values consisting of the holog dataset. </span>

```
holog_mds = 
{
   ddi_0:{
          map_0:{
                 ant_0: holog_ds,
                          ⋮
                 ant_n: holog_ds
                },
              ⋮
          map_p: …
         },
       ⋮
   ddi_m: …
}

```

An example of the holog dataset object is show below.

In [None]:
holog_mds['ddi_0']['map_0']['ant_ea25']

A summary of the available key values can be obtained using the summary convenience function.

In [None]:
holog_mds.summary()

In this case there is only one selction in the holography file but the `mds` meta data can be examined in the ordinary way. In addition to this the `numpy` arrays for the data are accessed in a manner similar to `pandas` tables. For instance accessing the data for the `DIRECTIONAL_COSINES` below would be simply
```
>> holog_mds['ddi_0']['map_0']['ant_ea25'].DIRECTIONAL_COSINES.values
>> array([[-0.00433549, -0.0027946 ],
       [-0.00870191, -0.00682571],
       [-0.00965634, -0.00908509],
       ...,
       [ 0.00966373,  0.00957556],
       [ 0.00966267,  0.00957601],
       [ 0.00965895,  0.00956941]])

>> holog_mds['ddi_0']['map_0']['ant_ea25'].DIRECTIONAL_COSINES.values.shape
>> (9145, 2)

```
where the dimension are given in the `mds` output for each data variable (in this case `(time, lm)`). A more in-depth overview of how to interact with Dask Dataset can be found [here](https://tutorial.dask.org/).

## Holog

The `holog` function processes the holography data and produces a holog image file on disk with the suffix, `.image.zarr`. This function is a direct replacement for the `HOLOG` AIPS function. It is required that the user provide the `grid_size` and `cell_size` when processing holography data. The `grid_size` defines the number of `l x m`  points used to when doing the gridding. The `cell_size` defines the value in arseconds of each grid spacing. More in-depth parameter information can be found in readthedocs [here](https://astrohack.readthedocs.io/en/latest/_api/autoapi/astrohack/holog/index.html).

Inline information on the input paramters can also be gotten using `help(holog)` in the cell.

In [None]:
import numpy as np

from astrohack import holog

cell_size = np.array([-0.0006442, 0.0006442]) # arcseconds
grid_size = np.array([31, 31])                # pixels

image_mds = holog(
    holog_name='data/ea25_cal_small_spw1_4_60_ea04_after.holog.zarr',
    grid_size=grid_size, 
    cell_size=cell_size, 
    overwrite=True,
    phase_fit=True,
    apply_mask=True,
    to_stokes=True,
    parallel=True
)

___image_name.image.zarr:___ <span style="color:red"> The image zarr file contains gridded image data the beam, extracted aperture and the amplitude and phase components. It also contains all of the relevant coordinate information. The image file structure is a compound dictionary keys according to `ant` -> `ddi` with the dictionary values consisting of the image dataset. </span>

```
image_mds = 
{
   ant_0:{
          ddi_0: image_ds,
                 ⋮               
          ddi_m: image_ds
         },
       ⋮
   ant_n: …
}

```


An example of the image dataset object is show below.

In [None]:
image_mds['ant_ea25']['ddi_0']

A summary of the available key values can be obtained using the summary convenience function.

In [None]:
image_mds.summary()

Each of the holography output files is a compound dictionary with respect to the run parameters and contains a Dask Dataset, this means that the holography files have access to all native Dask functionality. The user can use their favorite plotting package to visualize the data or use Dask's internal functions to do simple filtering and plotting.

In [None]:
image_mds['ant_ea25']['ddi_0'].ANGLE.isel(chan=0, pol=0).plot()

## Panel

The `panel` function takes the place of and expands the `PANEL` AIPS function to processes the image information and derives adjustements to the dish panels. This produces a file on disk of format `.panel.zarr` containing information on corrections, residuals and screw adjustments. As an added bonus the `panel` function has a helper function to convert aips data to astrohack format and process it using the `aips_holog_to_astrohack` function. For a full description of the operation and arguments of the `panel` function see [docs](https://astrohack.readthedocs.io/en/latest/_api/autoapi/astrohack/panel/index.html).

In [None]:
from astrohack.panel import panel

panel_model = 'rigid'

panel_mds = panel(
    image_name='data/ea25_cal_small_spw1_4_60_ea04_after.image.zarr', 
    panel_model=panel_model, 
    parallel=True,
    overwrite=True
)

___panel_name.panel.zarr:___ <span style="color:red"> The panel zarr file contains process information regarding the per panel scre corrections as well as residuals, masks and phase corrections used to produce them. The panel file structure is a compound dictionary keyed according to `ant` -> `ddi` with the value consisting of the panel dataset.</span>

```
panel_mds = 
{
   ant_0:{
          ddi_0: panel_ds,
                 ⋮               
          ddi_m: panel_ds
         },
       ⋮
   ant_n: …

```

An example of the panel dataset object is show below.

In [None]:
panel_mds['ant_ea25']['ddi_0']

A summary of the available key values can be obtained using the summary convenience function.

In [None]:
panel_mds.summary()