# Flagging

Demonstration of prototype ngCASA flagging functions. This is work in progress.

This walkthrough is designed to be run in a Jupyter notebook on Google Colaboratory. To open the notebook in colab, go [here](https://colab.research.google.com/github/FedeMPouzols/cngi_prototype/blob/ngcasa_flagging_experiments/docs/flagging.ipynb)



## Get required packages and data

In [16]:
# TEMPORARY WORKAROUND for Python 3.7 - you must run this notebook twice!!
import os
import numpy as np

if np.__version__ != '1.20.1':
    print('updating numpy and killing runtime....\nPlease re-run this notebook (Runtime -> Run all)')
    os.system('pip install --upgrade numpy==1.20.1')
    os._exit(00)

In [17]:
# Installation
import os
# (WIP note: we might get by without casatools, if we prepare some .vis.zarr available for download re-check)
print("installing casa6 + cngi (takes a minute or two)...")
!apt-get install libgfortran3

!pip install --index-url https://casa-pip.nrao.edu:443/repository/pypi-group/simple casatasks==6.2.0.96
!pip install --index-url https://casa-pip.nrao.edu:443/repository/pypi-group/simple casadata
!pip install --extra-index-url https://test.pypi.org/simple/ cngi-prototype==0.0.75

installing casa6 + cngi (takes a minute or two)...
Reading package lists... Done
Building dependency tree       
Reading state information... Done
libgfortran3 is already the newest version (6.5.0-2ubuntu1~18.04).
0 upgraded, 0 newly installed, 0 to remove and 29 not upgraded.
Looking in indexes: https://casa-pip.nrao.edu:443/repository/pypi-group/simple
Looking in indexes: https://casa-pip.nrao.edu:443/repository/pypi-group/simple
Looking in indexes: https://pypi.org/simple, https://test.pypi.org/simple/


In [18]:
# Retrieve and extract demonstration datasets
print('retrieving MS tarfiles...')
!gdown -q --id 15HfB4rJKqEH7df088Ge5YLrCTXBIax6R
!gdown -q --id 1N9QSs2Hbhi-BrEHx5PA54WigXt8GGgx1
print('extracting MS tarfiles...')
!tar -xf M100.ms.tar
!tar -xzf sis14_twhya_calibrated_flagged.ms.tar.gz
# TODO: download the ".flagonline.txt" file
print('complete')

retrieving MS tarfiles...
extracting MS tarfiles...
complete


## Initialize the Environment
Omitting for now. I think we don't need it here and it shouldn't show up here at the ngCASA level anyway.

## Load demo data

In [19]:
from cngi.conversion import convert_ms

zarr_name = 'twhya.vis.zarr' 
mxds = convert_ms('sis14_twhya_calibrated_flagged.ms', outfile=zarr_name) 

Completed ddi 0  process time 22.39 s
Completed subtables  process time 0.96 s
                                                  


In [24]:
from cngi.dio import describe_vis 
# We will be working with the only SPW (xds0) present in this dataset
vis_dset = mxds.xds0
describe_vis(zarr_name)

processing partition 2 of 2                                                  

Unnamed: 0_level_0,spw_id,pol_id,times,baselines,chans,pols,size_MB
xds,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
xds0,0,0,410,210,384,2,1586


## Flag summaries
Summaries of flags by different dimensions in a dictionary, similar to the dictionary produce by CASA flagdata in 'summary' mode.

In [21]:
from ngcasa.flagging import summary
# <temporary hack>
mxds.xds0['presence_baseline'] = mxds.xds0.DATA.notnull().any(['chan', 'pol'])
# </temporary hack>
# Get the initial flagging status
counts = summary(mxds, 0)
counts

ImportError: ignored

## Handling flag versions




In [None]:
from ngcasa.flagging import manager_list, manager_add, manager_remove
print('* Printing list of flag variables - checkpoint 0:')
versions = manager_list(vis_dset)
print(versions)

vis_flags = manager_add(vis_dset, 'start', 'flags state at start')
# vis_flags = manual_unflag(vis_flags, [{'time': slice('2011-09-16T15:38:17','2011-10-16T18:39:50')}])
vis_flags = manager_add(vis_flags, 'backup', 'backup description')
#
vis_flags = manager_add(vis_flags, 'final', 'backup second descr')
#
print('* Printing list - checkpoint A:')
versions = manager_list(vis_flags)
print(versions)

# fxds['FLAG'] = fxds['FLAG_final']
vis_flags['FLAG'] = vis_flags['FLAG_backup'] | vis_flags['FLAG_start']

# We can always drop versions that are no longer useful
vis_flags = manager_remove(vis_flags, 'start')
vis_flags = manager_remove(vis_flags, 'backup')
vis_flags = manager_remove(vis_flags, 'final')
print('* Printing list - checkpoint B:')
versions = manager_list(vis_flags)
print(versions)
# FLAG variables as added as regular data variables in the xarray Datasets
# An aditional attribute (flag_variables) are added for bookkeeping
vis_flags

## Running flagging methods
A few illustrative examples, trying to cover all the categories of flagging methods.



### Manual flagging and meta-information based methods
Methods based on data selection and/or meta-information. Simple examples with data selection based flagging and unflagging.

In [None]:
from ngcasa.flagging import manual_flag, manual_unflag

vis_dset = mxds.xds0
#
vis_dset = manager_add(vis_dset, 'checkpointA', 'after a couple of example manual selections')
#
vis_dset = manager_add(vis_dset, 'manual_sels', 'after applying a few manual flags')

# unflag one antenna (not present)
vis_partial = manual_unflag(mxds, 0, [{'antenna': 'DV01'}])
# unflag one antenna (present)
vis_partial = manual_unflag(mxds, 0, [{'antenna': 'DV02'}])
# unflag all
vis_unflagged = manual_unflag(mxds, 0, [])
# <horror> figure out how to deal with return mxds (use cngi._utils._io.vis_xds_packager?)
mxds.xds0['FLAG'] = vis_unflagged['FLAG']
# </horror>
counts = summary(mxds, 0)
print('* Flags after unflagging:')
counts

In [None]:
from cngi.vis import applyflags
from cngi.vis import visplot
flagged_manual = applyflags(vis_dset, flags=['FLAG'])
plot_axes = ['time', 'chan']
visplot(flagged_manual.DATA, plot_axes)
visplot(vis_dset.DATA, plot_axes)


### Manual flagging with command lists
An example of application of a list of manual flagging commands, resembling use cases from pipelines. An additional required input is the file of flagging commands. Here we use a `.flagonline.txt` file as used in pipelines, where we usually find of the order of 1000s or 10s of thousands of commands. The `.flagonline.txt` is the lion's share of the full list of commands used by pipelines (`.flagcmds.txt`), where the `.flagonline.txt` list of commands is extended with a much shorter list of additional commands that may include a number of summaries, selections based on intent and frequency, the shadow method, etc.

The selection syntax used is the Xarray selection syntax (see examples in the [Visibilities walkthrough ](visibilities.ipynb)), with selection by label `xds.sel(...)`.

In [None]:
vis_flags = manual_unflag(mxds, 0, [{'time': slice('2011-09-16T15:38:17','2011-10-16T18:39:50')}])
#
vis_flags = manual_unflag(mxds, 0, [{}])

# Flag two groups of adjacent ~20 chans
vis_flags = manual_flag(mxds, 0, [{'chan': slice(3.7266e11, 3.7271e11)}, 
                                  {'chan': slice(3.7276e9, 3.728e11)}])

# Flag some baselines, by ID
vis_flags = manual_flag(mxds, 0, [{'baseline': [133, 134, 135]}])
# Flag polarization, by ID
vis_flgs = manual_flag(mxds, 0, [{'pol': 9}])

# vis_dset = manual_flag(vis_dset, cmdfile='.flagonline.txt')
vis_flags = manager_add(vis_flags, 'manual_list', 'after applying list of selection commands')


# <horror> figure out how to deal with return mxds (use cngi._utils._io.vis_xds_packager?)
mxds.xds0['FLAG'] = vis_flags['FLAG']
# </horror>
print('* Printing list - after flagging:')
versions = manager_list(vis_flags)
print(versions)

# vis_flags = manual_flag(mxds, 0, cmd_filename='short_twhya.flagonline.txt')

print('* Flags after flagging some channels and baselines:')
counts = summary(mxds, 0)
print(counts)

### Auto-flagging methods
An illustrative example using the `auto_clip` method. Other auto-flagging methods such as tfcrop, rflag, and uvbin are not implemented.

In [None]:
from ngcasa.flagging import auto_clip
# 
# vis_dset = manager_add(vis_dset, 'auto_clip_test1', 'after applying clip')
versions = manager_list(vis_dset)
print(versions)

vis_clip = auto_clip(vis_dset, 10, 35)
# <horror> figure out how to deal with return mxds (use cngi._utils._io.vis_xds_packager?)
mxds.xds0['FLAG'] = vis_clip['FLAG']
# </horror>
counts_clip = summary(mxds, 0)
print(counts_clip)

flagged_clip = applyflags(vis_clip, flags=['FLAG'])
plot_axes = ['time', 'chan']
visplot(flagged_clip.DATA, plot_axes)

## Applying flags
To apply a version of flags on a visibilities dataset, before going on to further processing, the function `cngi.vis.applyflags` should be applied. Some examples can be found in the [Continuum Imaging Example](imaging/continuum_imaging_example.ipynb) or the [Visibilities walkthrough](visibilities.ipynb). `cngi.vis.applyflags` sets the flagged data values to NaN. This has the effect that those NaN values are effectively excluded from subsequent CNGI/ngCASA processing. Other components of CNGI and ngCASA, such as imaging, will ignore those NaN values.

In [None]:
from cngi.vis import applyflags
from cngi.vis import visplot

versions = manager_list(vis_dset)
versions

plot_coords = ['time', 'chan']
visplot(vis_dset.DATA, plot_coords)
vis_dataset_flagged = applyflags(vis_dset, flags=['FLAG'])
# Further processing: visualization, calibration, imaging, etc. with flags applied (flagged data excluded)
# ...

#flagged_manual = applyflags(vis_dset, flags=['FLAG_MANUAL_LIST'])
#visplot(flagged_manual.DATA, plot_coords)

#flagged_clip = applyflags(vis_dset, flags=['FLAG_AUTO_CLIP'])
#visplot(flagged_clip.DATA, plot_coords)