# Selecting Data

This notebook shows how to select data in dysh.
By selecting data, you can narrow down which scans or integrations the calibration routines will operate on.
We call such narrowing down a "selection rule."
You create selection rules through methods of [``GBTFITSLoad``](https://dysh.readthedocs.io/en/latest/reference/modules/dysh.fits.html#module-dysh.fits.gbtfitsload.GBTFITSLoad), which uses an instance of [``Selection``](https://dysh.readthedocs.io/en/latest/reference/modules/dysh.util.html#dysh.util.selection.Selection) as an attribute called ``selection``.
You can create multiple selection rules that will be logically ANDed to create a final rule at calibration time.

## Loading Modules
We start by loading the modules we will use in this notebook. 

In [None]:
# These modules are required for the notebook.
import astropy.units as u
from astropy.time import Time
from dysh.log import init_logging
from dysh.fits.gbtfitsload import GBTFITSLoad

# These modules are only used to download the data.
from pathlib import Path
from dysh.util.download import from_url

Set the logging to INFO level.
This is only required in notebooks.

In [None]:
init_logging(2)

## Data Retrieval

Download the example SDFITS data, if necessary.

The code below will download an SDFITS file from [](http://www.gb.nrao.edu/dysh/example_data) and put it in a data directory.

In [None]:
url = "http://www.gb.nrao.edu/dysh/example_data/hi-survey/data/AGBT04A_008_02.raw.acs/AGBT04A_008_02.raw.acs.fits"
savepath = Path.cwd() / "data"
savepath.mkdir(exist_ok=True) # Create the data directory if it does not exist.
filename = from_url(url, savepath)

## Data Loading

Next, we use [``GBTFITSLoad``](https://dysh.readthedocs.io/en/latest/reference/modules/dysh.fits.html#module-dysh.fits.gbtfitsload.GBTFITSLoad) to load the data, and then its [``summary``](https://dysh.readthedocs.io/en/latest/reference/modules/dysh.fits.html#module-dysh.fits.gbtfitsload.GBTFITSLoad.summary) method to inspect its contents.
We add the UTC column to the ``summary``.

In [None]:
sdfits = GBTFITSLoad(filename)
sdfits.summary(add_columns=["UTC"])

## Using Selection

Now we show various ways in which [``GBTFITSLoad.selection``](https://dysh.readthedocs.io/en/latest/reference/modules/dysh.fits.html#dysh.fits.gbtfitsload.GBTFITSLoad.selection) can be used to select data.

### Select by column value

One way of selecting data is by specifying a value for an SDFITS column name. (The column name case insensitive, but the value is not). 
For example, we can select data which has OBJECT="U8249" or OBJECT="U8249" using the following.

**Note** Selecting values in a list will logically OR those values.  So `object=["U8249","U11017"]` will select scans with either object.  But multiple keywords (columns) will be logically ANDed (see below).

In [None]:
sdfits.select(object=["U8249","U11017"])

We can view the contents of the selection using its `show` method.
This displays the selection as a table.
The `# Selected` column gives the number of records (integrations) selected by each selection rule. 
Each time we create a new selection, it is assigned a unique id and tag.

In [None]:
sdfits.selection.show()

We can also specify the tag name to have a more meaningful value. 

In [None]:
sdfits.select(proc="OffOn", tag='proc onoff')

In [None]:
sdfits.selection.show()

### Combining Selections

Once we have multiple selection rules in the ``Selection`` object, we can combine them into a single selection using the ``final`` property. 
This will return a [``pandas.DataFrame``](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.html).

In [None]:
sdfits.selection.final

In this particular case, we wind up with 520 integrations. This is a selection of objects U8249 or U11017 AND proc OnOff. Because keywords in the same selection rule are logically ANDed, this could also have been accomplished via

`sdfits.select(object=["U8249","U11017"], proc="OnOff")`

(Try it yourself).

You can also see a summary of the selected data, using the ``selected`` argument of [``summary``](https://dysh.readthedocs.io/en/latest/reference/modules/dysh.fits.html#module-dysh.fits.gbtfitsload.GBTFITSLoad.summary):

In [None]:
sdfits.summary(selected=True)

### Remove selection rules
You can remove a selection rule by `id` or `tag`.
Multiple rows with the same tag will all be removed.

In [None]:
sdfits.selection.remove(id=0)
sdfits.selection.show()

To remove all selection rules use [``clear_selection``](https://dysh.readthedocs.io/en/latest/reference/modules/dysh.fits.html#dysh.fits.GBTFITSLoad.clear_selection).
After using it, the ``Selection`` will show no selection rules.

In [None]:
sdfits.clear_selection()
sdfits.selection.show()

### Select by Range

It is also possible to define a selection given a range of values using [``select_range``](https://dysh.readthedocs.io/en/latest/reference/modules/dysh.util.html#dysh.util.selection.Selection.select_range).
In this case the selection must be specified using either a list, ``[]``, or a tuple, ``()``, with a start and an end value. Ranges are considered inclusive of both ends.
Lower limits are give by ``(value,None)`` or ``(value,)``.
Upper limits are given by ``(None,value)``, since ``(,value)`` is not valid Python.
For coordinates the default unit is taken to be degrees.
Other units can be explicitly given.
Both ``()`` and ``[]`` are valid for indicated ranges, but only tuples can be a lower limit ``(value,)``.

For example to select only rows where the right ascension is greater than 114 degrees:

In [None]:
sdfits.select_range(ra=(114,), tag="RA>=114 deg")
sdfits.selection.show()

(Right Ascension is the FITS CRVAL2 column).

To select rows where the elevation is below 80 degrees:

In [None]:
sdfits.select_range(elevation=[None,80], tag="EL<80")
sdfits.selection.show()

(Note elevation column is ELEVATIO because the FITS standard only allow 8 characters for column names).

We can check that the selections were applied properly by inspecting the final result and a subset of its columns. 

In [None]:
sdfits.selection.final[["OBJECT","CRVAL2","ELEVATIO"]]

It is also possible to use units during selection.
For example

In [None]:
sdfits.select_range(dec=[854, 855] * u.arcmin, tag="14.23<=DEC<=14.25")
sdfits.selection.show()

In [None]:
sdfits.summary(selected=True)

### Select Within a Range

It is also possible to specify the midpoint and a range to make a selection.
In this case we use [``select_within``](https://dysh.readthedocs.io/en/latest/reference/modules/dysh.util.html#dysh.util.selection.Selection.select_within) and specify the mean value and the +- range.

For example to select between elevation of 50-10 and 50+10 we would use

In [None]:
sdfits.select_within(elevation=(50,10), tag="EL=50+/-10")
sdfits.selection.show()

Which shows a selection between 40 and 60 degrees of elevation.

Now, when you do a [``getps``](https://dysh.readthedocs.io/en/latest/reference/modules/dysh.fits.html#dysh.fits.GBTFITSLoad.getps) it will operate only on the selected data.

(Currently, you can't preselect [``ifnum``](https://dysh.readthedocs.io/en/latest/reference/glossary.html#term-ifnum), [``plnum``](https://dysh.readthedocs.io/en/latest/reference/glossary.html#term-plnum), or [``fdnum``](https://dysh.readthedocs.io/en/latest/reference/glossary.html#term-fdnum); they must be provided as method arguments).

In [None]:
sb = sdfits.getps(ifnum=0, plnum=0, fdnum=0)
sb.timeaverage().plot(ymin=-.1, ymax=.1, xmin=1.404E9, xmax=1.406E9)

### Using Aliases

[``Selection``](https://dysh.readthedocs.io/en/latest/reference/modules/dysh.util.html#dysh.util.selection.Selection) knows about certain aliases for column names.
For example, the SDFITS column ELEVATIO can also be selected using ELEVATION.
The aliases are defined in the ``aliases`` attribute of ``Selection``.

In [None]:
sdfits.selection.aliases

It is also possible to add your own aliases.
For example to use target and az as aliases for OBJECT and AZIMUTH we would use

In [None]:
sdfits.selection.alias({'target':'object','az':'azimuth'})
sdfits.selection.aliases

Then you can select using your aliases:

In [None]:
sdfits.select(target="U8249")
sdfits.selection.show()

Notice that this will only affect the aliases for this particular instance of a [``GBTFITSLoad``](https://dysh.readthedocs.io/en/latest/reference/modules/dysh.fits.html#module-dysh.fits.gbtfitsload.GBTFITSLoad).
Any new [``GBTFITSLoad``](https://dysh.readthedocs.io/en/latest/reference/modules/dysh.fits.html#module-dysh.fits.gbtfitsload.GBTFITSLoad) objects will not know about these aliases.

### Empty Selections

Any selection that results in no data being selected is ignored.
You will get a warning message in this case.

In [None]:
sdfits.selection.select(target='foobar')
sdfits.selection.show()

### Time Selections

UTC time ranges can be selected with [``astropy.time.Time``](https://docs.astropy.org/en/stable/api/astropy.time.Time.html#astropy.time.Time) objects.
This checks against the UTC timestamp column.
For LST, use ``select_range(lst=[number1,number2])``.



In [None]:
# clear the selection for this demonstration
sdfits.clear_selection()

In [None]:
sdfits.select_range(utc=(Time("2004-04-22T05:27:20.12", scale="utc"),
                         Time("2004-04-22T05:48:43.12", scale="utc")),
                    tag="time range")
sdfits.selection.show()

In [None]:
sdfits.selection.final[["SCAN","OBJECT","UTC", "LST","PLNUM","IFNUM","FDNUM"]]

### Channel Selection

You can selection a contiguous range of channels using [``select_channel``](https://dysh.readthedocs.io/en/latest/reference/modules/dysh.util.html#dysh.util.selection.Selection.select_channel), and the integrations will be trimmed to that channel range during calibration. The final spectrum will have the input channel range.  As with [``select_range``](https://dysh.readthedocs.io/en/latest/reference/modules/dysh.util.html#dysh.util.selection.Selection.select_range), channel ranges are inclusive at both ends.

In [None]:
sdfits.select_channel([2000,6000], tag="channels")

In [None]:
sdfits.selection.show()

In [None]:
sb = sdfits.getps(ifnum=0, plnum=1, fdnum=0)
sb.timeaverage().plot(xaxis_unit="channel")

The resulting [``Spectrum``](https://dysh.readthedocs.io/en/latest/reference/modules/dysh.spectra.html#dysh.spectra.spectrum.Spectrum) only has 4000 channels, as specified by the channel selection.

Note that you can only have one channel selection rule at a time.

In [None]:
try: 
    sdfits.select_channel([60,70])
except Exception as e:
    print(e)