### Notebook to explore use of xarray in RASCIL

RASCIL v0.1.19b uses Classes and numpy structured arrays as data holders. This simple approach
has been sufficient for development so far but will be limiting for future development.
For example, selection based on multiple coordinates will be needed for more flexible
processing. In this notebook, we investigate xarray capabilities.

    http://xarray.pydata.org/

xarray must be installed:

    pip install xarray

We make use of rascil functions to create Visibility objects.


In [1]:
import xarray
import numpy

from astropy.coordinates import SkyCoord
import astropy.units as u
from astropy.time import Time

from rascil.data_models.polarisation import PolarisationFrame
from rascil.processing_components import create_named_configuration, create_visibility

In /Users/timcornwell/opt/anaconda3/envs/rascil_env/lib/python3.7/site-packages/matplotlib/mpl-data/stylelib/_classic_test.mplstyle: 
The text.latex.preview rcparam was deprecated in Matplotlib 3.3 and will be removed two minor releases later.
In /Users/timcornwell/opt/anaconda3/envs/rascil_env/lib/python3.7/site-packages/matplotlib/mpl-data/stylelib/_classic_test.mplstyle: 
The mathtext.fallback_to_cm rcparam was deprecated in Matplotlib 3.3 and will be removed two minor releases later.
In /Users/timcornwell/opt/anaconda3/envs/rascil_env/lib/python3.7/site-packages/matplotlib/mpl-data/stylelib/_classic_test.mplstyle: Support for setting the 'mathtext.fallback_to_cm' rcParam is deprecated since 3.3 and will be removed two minor releases later; use 'mathtext.fallback : 'cm' instead.
In /Users/timcornwell/opt/anaconda3/envs/rascil_env/lib/python3.7/site-packages/matplotlib/mpl-data/stylelib/_classic_test.mplstyle: 
The validate_bool_maybe_none function was deprecated in Matplotlib 3.3 an

Create a standard RASCIL Visibility object

In [2]:
lowcore = create_named_configuration('LOWBD2-CORE')
times = (numpy.pi / 43200.0) * numpy.arange(-4*3600, +4*3600.0, 1800)
frequency = numpy.linspace(1.0e8, 1.1e8, 3)
channel_bandwidth = numpy.array([1e7, 1e7, 1e7])
# Define the component and give it some spectral behaviour
f = numpy.array([100.0, 20.0, -10.0, 1.0])
flux = numpy.array([f, 0.8 * f, 0.6 * f])
phasecentre = SkyCoord(ra=+180.0 * u.deg, dec=-35.0 * u.deg, frame='icrs',
                       equinox='J2000')
vis = create_visibility(lowcore, times, frequency,
                             channel_bandwidth=channel_bandwidth,
                             phasecentre=phasecentre,
                             integration_time=30.0,
                             polarisation_frame=PolarisationFrame("linear"),
                             weight=1.0)
print(vis)

Visibility:
	Source: unknown
	Number of visibilities: 657360
	Number of channels: 3
	Frequency: [1.00e+08 1.05e+08 1.10e+08]
	Channel bandwidth: [10000000.]
	Number of polarisations: 4
	Visibility shape: (657360, 4)
	Number flags: 0
	Polarisation Frame: linear
	Phasecentre: <SkyCoord (ICRS): (ra, dec) in deg
    (180., -35.)>
	Configuration: LOWBD2-CORE
	Metadata: None



First construct an xarray.DataArray for the visibility. We name the dimensions and give the coordinates.

In [3]:
xvis_array = xarray.DataArray(vis.vis,
                              dims=["time", "polarisation"],
                              coords={"time": vis.time, "polarisation": vis.polarisation_frame.names})
print(xvis_array)

<xarray.DataArray (time: 657360, polarisation: 4)>
array([[0.+0.j, 0.+0.j, 0.+0.j, 0.+0.j],
       [0.+0.j, 0.+0.j, 0.+0.j, 0.+0.j],
       [0.+0.j, 0.+0.j, 0.+0.j, 0.+0.j],
       ...,
       [0.+0.j, 0.+0.j, 0.+0.j, 0.+0.j],
       [0.+0.j, 0.+0.j, 0.+0.j, 0.+0.j],
       [0.+0.j, 0.+0.j, 0.+0.j, 0.+0.j]])
Coordinates:
  * time          (time) float64 5.085e+09 5.085e+09 ... 5.085e+09 5.085e+09
  * polarisation  (polarisation) <U2 'XX' 'XY' 'YX' 'YY'


Now we can create an xarray.Dataset which is a dictionary of DataArrays and a list of attributes

In [4]:
dims = ("time", "polarisation", "spatial")

coords = {"time": vis.time,
          "polarisation": vis.polarisation_frame.names,
          "spatial": numpy.zeros([3])}

xvis_dict = dict()
xvis_dict["visibility"] = xarray.DataArray(vis.vis, dims=["time", "polarisation"])
xvis_dict["uvw"] = xarray.DataArray(vis.uvw, dims=["time", "spatial"])
xvis_dict["uvwdist"] = xarray.DataArray(vis.uvwdist, dims=["time"])
xvis_dict["antenna1"] = xarray.DataArray(vis.antenna1, dims=["time"])
xvis_dict["antenna2"] = xarray.DataArray(vis.antenna2, dims=["time"])
xvis_dict["datetime"] = \
    xarray.DataArray(Time(vis.time / 86400.0, format='mjd', scale='utc').datetime64, dims=["time"])
xvis_dict["weight"] = xarray.DataArray(vis.weight, dims=["time", "polarisation"])
xvis_dict["imaging_weight"] = xarray.DataArray(vis.imaging_weight,
                                               dims=["time", "polarisation"])
xvis_dict["flags"] = xarray.DataArray(vis.flags, dims=["time", "polarisation"])
xvis_dict["frequency"] = xarray.DataArray(vis.frequency, dims=["time"])
xvis_dict["channel_bandwidth"] = xarray.DataArray(vis.channel_bandwidth, dims=["time"])
xvis_dict["integration_time"] = xarray.DataArray(vis.integration_time, dims=["time"])
xvis = xarray.Dataset(xvis_dict, coords=coords)
xvis.attrs['source'] = vis.source


xarray has excellent informative prints for most objects.

In [5]:
print(xvis)

<xarray.Dataset>
Dimensions:            (polarisation: 4, spatial: 3, time: 657360)
Coordinates:
  * time               (time) float64 5.085e+09 5.085e+09 ... 5.085e+09
  * polarisation       (polarisation) <U2 'XX' 'XY' 'YX' 'YY'
  * spatial            (spatial) float64 0.0 0.0 0.0
Data variables:
    visibility         (time, polarisation) complex128 0j 0j 0j 0j ... 0j 0j 0j
    uvw                (time, spatial) float64 10.69 12.45 16.41 ... 31.07 74.17
    uvwdist            (time) float64 16.41 17.23 18.05 ... 67.43 70.8 74.17
    antenna1           (time) int64 0 0 0 0 0 0 0 ... 163 163 163 164 164 164
    antenna2           (time) int64 1 1 1 2 2 2 3 ... 165 165 165 165 165 165
    datetime           (time) datetime64[ns] 2020-01-01T17:30:36.436666595 ... 2020-01-02T00:59:22.717916630
    weight             (time, polarisation) float64 1.0 1.0 1.0 ... 1.0 1.0 1.0
    imaging_weight     (time, polarisation) float64 1.0 1.0 1.0 ... 1.0 1.0 1.0
    flags              (time, polaris

Take a slice in times and polarisation

In [6]:
print(xvis.visibility[100:110, 0:1])

<xarray.DataArray 'visibility' (time: 10, polarisation: 1)>
array([[0.+0.j],
       [0.+0.j],
       [0.+0.j],
       [0.+0.j],
       [0.+0.j],
       [0.+0.j],
       [0.+0.j],
       [0.+0.j],
       [0.+0.j],
       [0.+0.j]])
Coordinates:
  * time          (time) float64 5.085e+09 5.085e+09 ... 5.085e+09 5.085e+09
  * polarisation  (polarisation) <U2 'XX'


By label

In [7]:
print(xvis.sel({"polarisation":["XX", "YY"]}))

<xarray.Dataset>
Dimensions:            (polarisation: 2, spatial: 3, time: 657360)
Coordinates:
  * time               (time) float64 5.085e+09 5.085e+09 ... 5.085e+09
  * polarisation       (polarisation) <U2 'XX' 'YY'
  * spatial            (spatial) float64 0.0 0.0 0.0
Data variables:
    visibility         (time, polarisation) complex128 0j 0j 0j 0j ... 0j 0j 0j
    uvw                (time, spatial) float64 10.69 12.45 16.41 ... 31.07 74.17
    uvwdist            (time) float64 16.41 17.23 18.05 ... 67.43 70.8 74.17
    antenna1           (time) int64 0 0 0 0 0 0 0 ... 163 163 163 164 164 164
    antenna2           (time) int64 1 1 1 2 2 2 3 ... 165 165 165 165 165 165
    datetime           (time) datetime64[ns] 2020-01-01T17:30:36.436666595 ... 2020-01-02T00:59:22.717916630
    weight             (time, polarisation) float64 1.0 1.0 1.0 ... 1.0 1.0 1.0
    imaging_weight     (time, polarisation) float64 1.0 1.0 1.0 ... 1.0 1.0 1.0
    flags              (time, polarisation) int

By antenna1. Note that where returns a Dataset with values masked.

In [8]:
print(xvis.where(xvis.antenna1<10).uvw)

<xarray.DataArray 'uvw' (time: 657360, spatial: 3)>
array([[10.69008852, 12.4457611 , 16.40655241],
       [11.22459295, 13.06804915, 17.22688003],
       [11.75909738, 13.69033721, 18.04720765],
       ...,
       [        nan,         nan,         nan],
       [        nan,         nan,         nan],
       [        nan,         nan,         nan]])
Coordinates:
  * time     (time) float64 5.085e+09 5.085e+09 ... 5.085e+09 5.085e+09
  * spatial  (spatial) float64 0.0 0.0 0.0


By uvw distance

In [9]:
print(xvis.where(xvis.uvwdist<40.0).uvw)

<xarray.DataArray 'uvw' (time: 657360, spatial: 3)>
array([[10.69008852, 12.4457611 , 16.40655241],
       [11.22459295, 13.06804915, 17.22688003],
       [11.75909738, 13.69033721, 18.04720765],
       ...,
       [        nan,         nan,         nan],
       [        nan,         nan,         nan],
       [        nan,         nan,         nan]])
Coordinates:
  * time     (time) float64 5.085e+09 5.085e+09 ... 5.085e+09 5.085e+09
  * spatial  (spatial) float64 0.0 0.0 0.0


By time

In [10]:
print(xvis.where(xvis.datetime>numpy.datetime64("2020-01-01T23:00:00")).datetime)

<xarray.DataArray 'datetime' (time: 657360)>
array([                          'NaT',                           'NaT',
                                 'NaT', ...,
       '2020-01-02T00:59:22.717916630', '2020-01-02T00:59:22.717916630',
       '2020-01-02T00:59:22.717916630'], dtype='datetime64[ns]')
Coordinates:
  * time     (time) float64 5.085e+09 5.085e+09 ... 5.085e+09 5.085e+09


Sorting: the visibility is constructed so that antenna1 varies least. Let's try a sort by antenna2.

In [11]:
print(xvis.sortby("antenna2").antenna2)

<xarray.DataArray 'antenna2' (time: 657360)>
array([  1,   1,   1, ..., 165, 165, 165])
Coordinates:
  * time     (time) float64 5.085e+09 5.085e+09 ... 5.085e+09 5.085e+09


We can sort by uvwdist. Note that the sequence of antenna1 and antenna2 changes

In [12]:
print(xvis.sortby('uvwdist'))

<xarray.Dataset>
Dimensions:            (polarisation: 4, spatial: 3, time: 657360)
Coordinates:
  * time               (time) float64 5.085e+09 5.085e+09 ... 5.085e+09
  * polarisation       (polarisation) <U2 'XX' 'XY' 'YX' 'YY'
  * spatial            (spatial) float64 0.0 0.0 0.0
Data variables:
    visibility         (time, polarisation) complex128 0j 0j 0j 0j ... 0j 0j 0j
    uvw                (time, spatial) float64 0.5026 -7.775 ... -16.03 290.2
    uvwdist            (time) float64 7.791 7.845 7.85 ... 289.0 289.5 290.2
    antenna1           (time) int64 45 41 45 65 2 62 ... 153 153 157 153 157 157
    antenna2           (time) int64 62 45 62 85 5 80 ... 159 163 163 163 163 159
    datetime           (time) datetime64[ns] 2020-01-01T17:30:36.436666595 ... 2020-01-02T00:59:22.717916630
    weight             (time, polarisation) float64 1.0 1.0 1.0 ... 1.0 1.0 1.0
    imaging_weight     (time, polarisation) float64 1.0 1.0 1.0 ... 1.0 1.0 1.0
    flags              (time, pola

Rebinning in one data coordinate - let's try bins in uvwdist. Only print out the uvwdist range and number of samples
for each bin.

In [13]:
%timeit -n 1 -r 1 for result in xvis.groupby_bins("uvwdist", bins=25): print(result[0], result[1].dims['time'])


(7.509, 19.087] 17095
(19.087, 30.384] 28203
(30.384, 41.68] 39165
(41.68, 52.976] 46858
(52.976, 64.272] 52582
(64.272, 75.568] 55662
(75.568, 86.864] 56502
(86.864, 98.16] 55611
(98.16, 109.456] 52740
(109.456, 120.752] 48051
(120.752, 132.048] 42634
(132.048, 143.344] 36774
(143.344, 154.64] 30309
(154.64, 165.936] 24541
(165.936, 177.232] 19332
(177.232, 188.528] 15286
(188.528, 199.825] 11626
(199.825, 211.121] 8735
(211.121, 222.417] 6226
(222.417, 233.713] 4339
(233.713, 245.009] 2691
(245.009, 256.305] 1482
(256.305, 267.601] 639
(267.601, 278.897] 233
(278.897, 290.193] 44
4.34 s ± 0 ns per loop (mean ± std. dev. of 1 run, 1 loop each)


Although we declared antenna1 and antenna2 as DataArrays, we can still index by them. We first have to
set them as index variables. The indexes attribute shows all the index variables allowed. antenna1, antenna2 are
part of a MultiIndex called baseline.

In [14]:
xvis_antenna_selected=xvis
xvis_antenna_selected=xvis_antenna_selected.set_index(baseline=("antenna1", "antenna2"))
print(xvis_antenna_selected.indexes)

time: Float64Index([5084616636.4366665, 5084616636.4366665, 5084616636.4366665,
                    5084616636.4366665, 5084616636.4366665, 5084616636.4366665,
                    5084616636.4366665, 5084616636.4366665, 5084616636.4366665,
                    5084616636.4366665,
                    ...
                    5084643562.7179165, 5084643562.7179165, 5084643562.7179165,
                    5084643562.7179165, 5084643562.7179165, 5084643562.7179165,
                    5084643562.7179165, 5084643562.7179165, 5084643562.7179165,
                    5084643562.7179165],
                   dtype='float64', name='time', length=657360)
polarisation: Index(['XX', 'XY', 'YX', 'YY'], dtype='object', name='polarisation')
spatial: Float64Index([0.0, 0.0, 0.0], dtype='float64', name='spatial')
baseline: MultiIndex([(  0,   1),
                      (  0,   1),
                      (  0,   1),
                      (  0,   2),
                      (  0,   2),
                      (  0

We can see that antenna1, antenna2 are part of a pandas.MultiIndex called baseline. So our selection is
via a tuple containing range for antenna1 and antenna2. We can specify that coordinate elements that are fully
flagged are to be dropped.

In [15]:
print(xvis_antenna_selected.sel(baseline=([1, 2], [6,7,8]), drop=True))

<xarray.Dataset>
Dimensions:            (baseline: 288, polarisation: 4, spatial: 3, time: 657360)
Coordinates:
  * time               (time) float64 5.085e+09 5.085e+09 ... 5.085e+09
  * polarisation       (polarisation) <U2 'XX' 'XY' 'YX' 'YY'
  * spatial            (spatial) float64 0.0 0.0 0.0
  * baseline           (baseline) MultiIndex
  - antenna1           (baseline) int64 1 1 1 1 1 1 1 1 1 ... 2 2 2 2 2 2 2 2 2
  - antenna2           (baseline) int64 6 6 6 6 6 6 6 6 6 ... 8 8 8 8 8 8 8 8 8
Data variables:
    visibility         (time, polarisation) complex128 0j 0j 0j 0j ... 0j 0j 0j
    uvw                (time, spatial) float64 10.69 12.45 16.41 ... 31.07 74.17
    uvwdist            (time) float64 16.41 17.23 18.05 ... 67.43 70.8 74.17
    datetime           (time) datetime64[ns] 2020-01-01T17:30:36.436666595 ... 2020-01-02T00:59:22.717916630
    weight             (time, polarisation) float64 1.0 1.0 1.0 ... 1.0 1.0 1.0
    imaging_weight     (time, polarisation) float64 1

We can sort by the MultiIndex baseline

In [16]:
print(xvis_antenna_selected.sortby(['baseline', 'time'], ascending=False))

<xarray.Dataset>
Dimensions:            (baseline: 657360, polarisation: 4, spatial: 3, time: 657360)
Coordinates:
  * time               (time) float64 5.085e+09 5.085e+09 ... 5.085e+09
  * polarisation       (polarisation) <U2 'XX' 'XY' 'YX' 'YY'
  * spatial            (spatial) float64 0.0 0.0 0.0
  * baseline           (baseline) MultiIndex
  - antenna1           (baseline) int64 164 164 164 164 164 164 ... 0 0 0 0 0 0
  - antenna2           (baseline) int64 165 165 165 165 165 165 ... 1 1 1 1 1 1
Data variables:
    visibility         (time, polarisation) complex128 0j 0j 0j 0j ... 0j 0j 0j
    uvw                (time, spatial) float64 67.35 31.07 74.17 ... 12.45 16.41
    uvwdist            (time) float64 74.17 70.8 67.43 ... 18.05 17.23 16.41
    datetime           (time) datetime64[ns] 2020-01-02T00:59:22.717916630 ... 2020-01-01T17:30:36.436666595
    weight             (time, polarisation) float64 1.0 1.0 1.0 ... 1.0 1.0 1.0
    imaging_weight     (time, polarisation) float6