### Notebook to explore use of xarray in RASCIL

RASCIL v0.1.19b uses Classes and numpy structured arrays as data holders. This simple approach
has been sufficient for development so far but will be limiting for future development.
For example, selection based on multiple coordinates will be needed for more flexible
processing. In this notebook, we investigate xarray capabilities.

    http://xarray.pydata.org/

xarray must be installed:

    pip install xarray

We make use of rascil functions to create Visibility objects.


In [1]:
import xarray
import pandas
import numpy

from astropy.coordinates import SkyCoord
import astropy.units as u
from astropy.time import Time

from rascil.data_models.polarisation import PolarisationFrame
from rascil.processing_components import create_named_configuration, \
    create_blockvisibility

Create a standard RASCIL BlockVisibility object

In [2]:
lowcore = create_named_configuration('LOWBD2-CORE')
times = (numpy.pi / 43200.0) * numpy.arange(-4*3600, +4*3600.0, 1800)
frequency = numpy.linspace(1.0e8, 1.1e8, 3)
channel_bandwidth = numpy.array([1e7, 1e7, 1e7])
# Define the component and give it some spectral behaviour
f = numpy.array([100.0, 20.0, -10.0, 1.0])
flux = numpy.array([f, 0.8 * f, 0.6 * f])
phasecentre = SkyCoord(ra=+180.0 * u.deg, dec=-35.0 * u.deg, frame='icrs',
                       equinox='J2000')
vis = create_blockvisibility(lowcore, times, frequency,
                             channel_bandwidth=channel_bandwidth,
                             phasecentre=phasecentre,
                             integration_time=30.0,
                             polarisation_frame=PolarisationFrame("linear"),
                             weight=1.0)
print(vis)

BlockVisibility:
	Source unknown
	Phasecentre: <SkyCoord (ICRS): (ra, dec) in deg
    (180., -35.)>
	Number of visibility blocks: 16
	Number of integrations: 16
	Visibility shape: (16, 166, 166, 3, 4)
	Number of flags: 2661312
	Number of channels: 3
	Frequency: [1.00e+08 1.05e+08 1.10e+08]
	Channel bandwidth: [10000000. 10000000. 10000000.]
	Number of polarisations: 4
	Polarisation Frame: linear
	Configuration: LOWBD2-CORE
	Metadata: {}



First construct an xarray.DataArray for the visibility. We name the dimensions and give the coordinates.

In [4]:
nants = vis.nants

def gen_vis(vis):
    for itime, time in enumerate(vis.time):
        for ant1 in vis.antenna1:
            for ant2 in vis.antenna1:
                for channel, frequency in enumerate(vis.frequency):
                    for ipol, pol in enumerate(vis.polarisation_frame.names):
                        yield vis.vis[itime, ant2, ant1, channel, ipol]

v_array = numpy.array(list(gen_vis(vis)))
print(v_array.shape)

xvis_array = xarray.DataArray(vis.vis,
                              dims=["time", "antenna2", "antenna1", "frequency", "polarisation"],
                              coords={"time": vis.time,
                                      "antenna2": vis.antenna2,
                                      "antenna1": vis.antenna1,
                                      "frequency": vis.frequency,
                                      "polarisation": vis.polarisation_frame.names})

print(xvis_array)

AttributeError: 'BlockVisibility' object has no attribute 'antenna1'

Now we can create an xarray.Dataset which is a dictionary of DataArrays and a list of attributes

In [4]:
coords = {"time": vis.time,
          "polarisation": vis.polarisation_frame.names,
          "spatial": numpy.zeros([3])}

xvis_dict = dict()
xvis_dict["vis"] = xarray.DataArray(vis.vis, dims=["time", "polarisation"])
xvis_dict["uvw"] = xarray.DataArray(vis.uvw, dims=["time", "spatial"])
xvis_dict["antenna1"] = xarray.DataArray(vis.antenna1, dims=["time"])
xvis_dict["antenna2"] = xarray.DataArray(vis.antenna2, dims=["time"])
xvis_dict["datetime"] = \
    xarray.DataArray(Time(vis.time / 86400.0, format='mjd', scale='utc').datetime64, dims=["time"])
xvis_dict["weight"] = xarray.DataArray(vis.weight, dims=["time", "polarisation"])
xvis_dict["imaging_weight"] = xarray.DataArray(vis.imaging_weight,
                                               dims=["time", "polarisation"])
xvis_dict["flags"] = xarray.DataArray(vis.flags, dims=["time", "polarisation"])
xvis_dict["frequency"] = xarray.DataArray(vis.frequency, dims=["time"])
xvis_dict["channel_bandwidth"] = xarray.DataArray(vis.channel_bandwidth, dims=["time"])
xvis_dict["integration_time"] = xarray.DataArray(vis.integration_time, dims=["time"])
xvis = xarray.Dataset(xvis_dict, coords=coords)
xvis.attrs['source'] = vis.source


xarray has excellent informative prints for most objects.

In [5]:
print(xvis)

<xarray.Dataset>
Dimensions:            (polarisation: 4, spatial: 3, time: 657360)
Coordinates:
  * time               (time) float64 5.085e+09 5.085e+09 ... 5.085e+09
  * polarisation       (polarisation) <U2 'XX' 'XY' 'YX' 'YY'
  * spatial            (spatial) float64 0.0 0.0 0.0
Data variables:
    vis                (time, polarisation) complex128 0j 0j 0j 0j ... 0j 0j 0j
    uvw                (time, spatial) float64 10.69 12.45 17.77 ... 31.07 44.37
    antenna1           (time) int64 0 0 0 0 0 0 0 ... 163 163 163 164 164 164
    antenna2           (time) int64 1 1 1 2 2 2 3 ... 165 165 165 165 165 165
    datetime           (time) datetime64[ns] 2020-01-01T17:30:36.436666595 .....
    weight             (time, polarisation) float64 1.0 1.0 1.0 ... 1.0 1.0 1.0
    imaging_weight     (time, polarisation) float64 1.0 1.0 1.0 ... 1.0 1.0 1.0
    flags              (time, polarisation) int64 0 0 0 0 0 0 0 ... 0 0 0 0 0 0
    frequency          (time) float64 1e+08 1.05e+08 ... 1.05e

Take a slice in times and polarisation

In [6]:
print(xvis.visibility[100:110, 0:1])

AttributeError: 'Dataset' object has no attribute 'visibility'

By label

In [None]:
print(xvis.sel({"polarisation":["XX", "YY"]}))

By antenna1. Note that where returns a Dataset with values masked.

In [None]:
print(xvis.where(xvis.antenna1<10).uvw)

By uvw distance

In [None]:
print(xvis.where(xvis.uvwdist<40.0).uvw)

By time

In [None]:
print(xvis.where(xvis.datetime>numpy.datetime64("2020-01-01T23:00:00")).datetime)

Sorting: the visibility is constructed so that antenna1 varies least. Let's try a sort by antenna2.

In [None]:
print(xvis.sortby("antenna2").antenna2)

Now create a calculated columns

e,g, x.assign(temperature_f=lambda x: x.temperature_c * 9 / 5 + 32)

In [None]:
xvis.assign(uvdist=numpy.hypot(xvis.uvw[:,0], xvis.uvw[:,1]))
print(xvis)

We can sort by uvdist. Note that the sequence of antenna1 and antenna2 changes

In [None]:
print(xvis.sortby('uvdist'))

Rebinning in one data coordinate - let's try bins in uvdist. Only print out the uvdist range and number of samples
for each bin.

In [None]:
%timeit -n 1 -r 1 for result in xvis.groupby_bins("uvdist", bins=25): print(result[0], result[1].dims['time'])


Although we declared antenna1 and antenna2 as DataArrays, we can still index by them. We first have to
set them as index variables. The indexes attribute shows all the index variables allowed. antenna1, antenna2 are
part of a MultiIndex called baseline.

In [None]:
xvis_antenna_selected=xvis
xvis_antenna_selected=xvis_antenna_selected.set_index(baseline=("antenna1", "antenna2"))
print(xvis_antenna_selected.indexes)

We can see that antenna1, antenna2 are part of a pandas.MultiIndex called baseline. So our selection is
via a tuple containing range for antenna1 and antenna2. We can specify that coordinate elements that are fully
flagged are to be dropped.

In [None]:
print(xvis_antenna_selected.sel(baseline=([1, 2], [6,7,8]), drop=True))

We can sort by the MultiIndex baseline

In [None]:
print(xvis_antenna_selected.sortby(['baseline', 'time'], ascending=False))

In [None]:
xvis_antenna_selected.sel(baseline=([1, 2], [6,7,8])).visibility[...] = 2.0