# Selecting Data

This notebook shows how to select data in `dysh`.
We illustrate this using the `Selection` class of `dysh`.
We use this approach to show the various method available, however, using a `Selection` object will have no effect on the data itself.
At the end of the notebook we show how the same selections can be accomplished using a `GBTFITSLoad` object, so that the selections made are actually applied to the data.

We start by loading the modules we will use in this notebook.

In [None]:
# These modules are required for the tutorial.
import astropy.units as u
from astropy.time import Time
from dysh.fits.gbtfitsload import GBTFITSLoad
from dysh.util.selection import Selection

# These modules are only used to download the data.
from pathlib import Path
from dysh.util.download import from_url

## Data Retrieval

Download the example SDFITS data, if necessary.

The code below will download an SDFITS file from http://www.gb.nrao.edu/dysh/example_data and put it in a data directory.
The data directory must exist where this notebook is being run from, otherwise the downloaded SDFITS will be named data.
The example will work either way, but be aware if you find a new file named data after running it.

In [None]:
url = "http://www.gb.nrao.edu/dysh/example_data/hi-survey/data/AGBT04A_008_02.raw.acs/AGBT04A_008_02.raw.acs.fits"
savepath = Path.cwd() / "data"
filename = from_url(url, savepath)

## Data Loading

Next, we use `GBTFITSLoad` to load the data, and then its `summary` method to inspect its contents.

In [None]:
sdfits = GBTFITSLoad(filename)
sdfits.summary()

## Create a Selection Object for SDFITS Data

We will show how to select data using a `Selection` object.
We start by creating the `Selection` object and putting it into a variable named `selection_object`.

In [None]:
selection_object = Selection(sdfits)

## Using Selection

Now we show various ways in which the `Selection` object can be used to select data.

### Select by Column Names

One way of selecting data is by specifying a value for a column name.
For example, we can select data which has OBJECT="U8249" and polarization number 0 using the following.

In [None]:
selection_object.select(object="U8249", plnum=0)

We can view the contents of the selection using its `show` method.

In [None]:
selection_object.show()

This displays the selection as a table.
In the backround, each time we create a new selection, it is assigned an id and tag.

We can also specify the tag name to have a more meaningful value. In this case we will select both polarizations.

In [None]:
selection_object.select(plnum=[0, 1], tag="plnums")

In [None]:
selection_object.show()

### Combining Selections

Once we have multiple selection rules in our `Selection` object, we can combine them into a single selection using the `final` method of `Selection`. 
This will return a `~pandas.DataFrame`.

In [None]:
selection_object.final

In this particular case, we have 152 rows.

### Remove Selections

This can be done by id or tag.
Multiple rows with the same tag will all be removed.

In [None]:
selection_object.remove(id=0)
selection_object.remove(tag='plnums')
selection_object.show()

To remove all selections use `Selection.clear`, like

In [None]:
selection_object.clear()

In [None]:
selection_object.show()

### Select by Range

It is also possible to define a selection given a range of values.
In this case the selection must be specified using either a list, `[]`, or a tuple, `()`, with a start and an end value.
Lower limits are give by `(value,None)` or `(value,)`.
Upper limits are given by `(None,value)`, since `(,value)` is not valid `python`.
For coordinates the default unit is taken to be degrees.
Other units can be explicitly given.
Both `()` and `[]` are valid for indicated ranges, but only tuples can be used if `(value,)` for lower limit.

For example to select only rows where the right ascention is greater than 114 degrees we would use

In [None]:
selection_object.select_range(ra=(114,))

In [None]:
selection_object.show()

and to select rows where the elevation is below 80 degrees

In [None]:
selection_object.select_range(elevation=[None,80])

In [None]:
selection_object.show()

We can check that the selections were applied properly by inspecting at the final result and its "ELEVATIO" column.

It is also possible to use units during selection.
For example

In [None]:
selection_object.select_range(dec=[854, 855] * u.arcmin)
selection_object.show()

Selection keywords are case insensitive, so for example using `DeC` is the same as `dec`.
Note also elevation is aliased here to elevatio (the actual SDFITS keyword)

In [None]:
selection_object.select_range(eLEVaTIon=[None,80])
selection_object.show()

Notice that the selections with ids 1 and 3 are the same. 
By default, `Selection` will not check for duplicates (this makes it swifter).

### Select Within a Range

It is also possible to specify the midpoint and a range to make a selection.
In this case we use `select_within` and specify the mean value and the +- range.

For example to select between elevation of 50-10 and 50+10 we would use

In [None]:
selection_object.select_within(eleVation=(50,10))
selection_object.show()

Which shows a selection between 40 and 50 degrees of elevation.

### Using Aliases

`Selection` knows about certain aliases for column names.
For example, the SDFITS column ELEVATIO can also be selected using ELEVATION.
The aliases are defined in the `aliases` attribute of `Selection`.

In [None]:
selection_object.aliases

It is also possible to add your own aliases.
For example to use target and az as aliases for OBJECT and AZIMUTH we would use

In [None]:
selection_object.alias({'target':'object','az':'azimuth'})

In [None]:
selection_object.aliases

In [None]:
selection_object.select(target="U8249")
selection_object.show()

Notice that this will only affect the aliases for this particular instance of a `Selection`.
Any new Selection objects will not know about these aliases.

In [None]:
Selection(sdfits).aliases

### Empty Selections

Any selection that results in no data being selected is ignored.
You will get a warning message in this case.

In [None]:
selection_object.select(target='foobar')

### Time Selections

UTC time ranges can be selected with Time objects.
This checks against the UTC timestamp column.
For LST, use select_range(lst=[number1,number2]).



In [None]:
selection_object.select_range(utc=(Time("2004-04-22T06:08:05", scale="utc"),
                                   Time("2004-04-22T06:08:26", scale="utc")))

In [None]:
selection_object.show()

In [None]:
selection_object.final["UTC"]

### Channel Selection

To select channels there is a special method, `Selection.select_channel`.
Channels can be ranges, individual channels or combinations there of.
Note that selecting channels does not down select rows.

In [None]:
a = [1, 4, (30, 40)]
selection_object.select_channel(a)
selection_object.show()

Note that you can only have one channel selection rule at a time.

In [None]:
try: 
    selection_object.select_channel([60,70])
except Exception as e:
    print(e)

### Applying Selections to Your Data

So far we have seen how to create and manage selections.
However, these have been made with a separate `Selection` object.
All of the methods exposed above are also available through the `GBTFITSLoad` object.
For example, to list the selections we'd use `GBTFITSLoad.selection.show()`, to clear the selections `GBTFITSLoad.selection.clear()`, and to select in a range `GBTFITSLoad.select_range()`.

To show the effect we start by using `gettp` with the basic required selection of ifnum, plnum and fdnum.

In [None]:
tp_all = sdfits.gettp(ifnum=0, plnum=0, fdnum=0)

In [None]:
len(tp_all)

That is all 81 scans were selected.

Now we select something and show the selection.

In [None]:
sdfits.select_range(eLEVaTIon=[None,80])
sdfits.selection.show()

Now repeat the `gettp` call and notice the difference.

In [None]:
tp_selection = sdfits.gettp(ifnum=0, plnum=0, fdnum=0)
len(tp_selection)

Now only 74 scans are selected, the ones that have an elevation below 80 degrees.