# dysh demo   
-----------
## (CAVEAT:   where noted, a few features added after 0.3.0 are used here)
July 11, 2024:     merge "main" into the "get_dysh" branch for this to work as intended

* smoothing
* spectrum indexing
* spectrum writing
* dysh_data simplified file finder

## General

The richness of the Python ecosystem allows for several ways to run python code. Here are three common ones we've used with **dysh**:

1. Jupyter Labs/**Notebooks** (this demo)
2. **dysh**: an ipython based CLI, with predefined settings to make working with dysh easy. Close to the GBTIDL experience
3. Introspective GUI environments, such as **spyder**. Can be useful for some developers


## Calibrating and Smoothing a Position Switched (PS) observation

This notebook shows how to use `dysh` to calibrate and smooth a PS observation.   For the example below we will use data from the Position-Switch example. The following dysh commands are the simplest to get and smooth a spectrum (leaving out all the function arguments):

      sdf = GBTFITSLoad()          # load an SDFITS file
      sb = sdf.getps()             # get a PS observation into one or more ScanBlocks
      ta = sb.timeaverage()        # timeaverage all ScanBlocks, where allowed!
      tb = ta.smooth()             # smooth (and decimate) the spectrum 
      tb.plot()                    # plot the spectrum                    

or if you wish to make use of the Python object chaining:

       GBTFITSLoad().getps().timeaverage().smooth().plot()
      

### First load the modules we're going to need

In [None]:
import numpy as np
import astropy.units as u
from dysh.fits.gbtfitsload import GBTFITSLoad
from dysh.util.files import dysh_data

### Define "rolled_stats", a helper function

In [None]:
# show the mean and std for data and for a 1-rolled data
def rolled_stats(data, label='stats:'):
    delta = data[1:] - data[:-1]
    print(label,data.mean(), data.std(), delta.mean(), delta.std(), delta.std()/data.std()/np.sqrt(2))

# to test, check that rolled_stats has a sqrt(2) higher RMS
np.random.seed(123)
rolled_stats(np.random.normal(0,1,10000))

## Single Dish Math

The hot/sky calibration returns a system temperature (a scalar):
$$
 T_{sys} = T_{cal} { { <SKY> } \over { <HOT - SKY> } } + T_{cal}/2
$$
where the averaging operator avoids the edges of the passbaand.  


After this a comparison between the ON and OFF gives the astronomical signal:
$$
  T_A = T_{sys}  {   { ON - OFF } \over {OFF} }
$$

## Load the Position-Switch SDFITS file example

The ``dysh_data()`` function can use ``example=`` or ``test=``, where the latter is just a very short example that is included with the github source. For any real work (as in this demo) we use the longer example= version which has 151 integrations, and then some.


In [None]:
filename = dysh_data(example='getps')      # CAVEAT: new feature
print(filename)

In [None]:
!ls -l

In [None]:
sdfits = GBTFITSLoad(filename)
sdfits.summary(verbose=False)

In [None]:
#   use the fact that we store the SDFITS as a panda's DataFrame,  sdfits._index.keys()
sdfits._index.keys()

In [None]:
# check one or more columns
sdfits._index[["DATE-OBS","TCAL","INTNUM","IFNUM","PLNUM","CAL"]]

## Behind the scenes: SDFITS file data storage

In this particular PS case there are 6040 rows, slowest variable listed first

1. nscans = 2 : scans 152 and 153 (the ON and OFF, or SIG and REF)
2. ntime = 151 integrations of 1" each
3. nif = 5 IF's
4. npol = 2 POL's
5. ncal = 2 : CALON/CALOFF
6. nchan = 32768 channels

The run time of this PS case is thus:   2 x 151sec x 2 ~ 10 mins   (IF and POL are simultaneous). Slewing took about 12.5 sec, looking at the DATE-OBS between rows 6020 and 6021

Thus the spectral data could be written as as 6 dimensional array

      data[nscan][ntime][nif][npol][ncal][nchan]

whereas all other columns are 5 dimensional:

      tcal[nscan][ntime][nif][npol][ncal]


### Plotting the very first RAW spectrum:

In [None]:
sp0 = sdfits.getspec(0)
sp0.plot()

## Get a time-averaged spectrum at the highest resolution

This test data has 151 integrations of 1 second, so time averaging should make things better. 



In [None]:
sb = sdfits.getps(scan=152, ifnum=0, plnum=0)
ta = sb.timeaverage(weights='tsys')
ta.plot(xaxis_unit="chan", yaxis_unit="mK", ymin=-100, ymax=500, grid=True)
rolled_stats(ta.flux[21000:28000])
print("expect:0.1802747611297031 K 0.05488477800860452 K (mean and std)")    # regression from previous runs

## Smooth in a few ways

By default smoothing will also decimate the signal, to (roughly) make each channel independant of the next. This assuming the input signal had independant channels. If the input was oversampled by a factor of 2, the smoothed signal will be as well, although you can manually decimate by a different value too, for example by using ``decimate=8`` .

### Smoothing by 16 channels

Since we smooth to a gauss of FWHM 16 channels, the noise should go down by a factor of 4 (54 mK to 12 mK).


In [None]:
ts1 = ta.smooth('gaussian', 16)
# ts1 = ts1[200:1800]                    # CAVEAT: new feature
ts1.plot(xaxis_unit="chan", yaxis_unit="mK", ymin=-100, ymax=500, grid=True)
rolled_stats(ts1.flux[21000//16:28000//16])

### Smoothing more

Now smoothing by 320 channels should result in a noise of 55/sqrt(320) or 3 mK, exacty as measures. The rolled RMS ratio is very close to 1, so neighboring channels are not related. If you would decimate by 160, you would see this ratio drop. Be sure to adjust the range of channels for any new ``rolled_stats()``

In [None]:
ts2 = ta.smooth('box', 320)
ts2.plot(xaxis_unit="chan", yaxis_unit="mK", ymin=-100, ymax=500, grid=True)
rolled_stats(ts2.flux[60:96])

In [None]:
ta.plot(xaxis_unit="km/s", yaxis_unit="mK", ymin=-100, ymax=500)

In [None]:
ts2.plot(xaxis_unit="km/s", yaxis_unit="mK", ymin=-100, ymax=500)

# Baseline Subtraction

Given that we are not interested in the edges, we'll define a baseline from say 2000,3500 and 4500,6000 and only plot between 2000 and 6000 km/s

In [None]:
# recompute a very smooth spectrum, gives 256 channels
ts2 = ta.smooth('box', 128)
ts2.plot(xaxis_unit="chan", yaxis_unit="mK", ymin=-50, ymax=300, grid=True)
#
ts2.baseline(degree=0,model="poly",exclude=[(0,80),(120,140),(180,255)], remove=True)
ts2.plot(xaxis_unit="chan", yaxis_unit="mK", ymin=-50, ymax=150, grid=True, xmin=80, xmax=180)
print("baseline_model:",ts2.baseline_model)

### Output a spectrum

One a Spectrum has been assembled, it can be output into a number of output formats.


In [None]:
# CAVEAT: new feature
ta.write("ngc2415.txt",  format="basic", overwrite=True)
ta.write("ngc2415.fits", format="fits",  overwrite=True)     # SDFITS dialect, ds9 cannot view this
ta.write("ngc2415.ecsv", format="ecsv",  overwrite=True)

## Smoothing the reference ("OFF") scan

Under certain circumstances it can be beneficial to (boxcar) smooth the reference (OFF) signal before the usual
(ON-OFF)/OFF calibration. 

*Technical note*:  if you want to achieve identical results to GBTIDL, the width of the boxcar needs to be odd.


In [None]:
sb = sdfits.getps(scan=152, ifnum=0, plnum=0, smoothref=31)
ta = sb.timeaverage(weights='tsys')
ta.plot(xaxis_unit="chan", yaxis_unit="mK", ymin=-100, ymax=500, grid=True)
rolled_stats(ta.flux[21000:28000])

We could smooth this spectrum the normal way, as was done a few cells ago, and not much difference is visible, except for the noise level.


In [None]:
ts2 = ta.smooth('box', 320)
ts2.plot(xaxis_unit="chan", yaxis_unit="mK", ymin=-100, ymax=500, grid=True)
rolled_stats(ta.flux[60:95])

Although the RMS has gone down (53 mK to 40 mK), the signal correlation has degraded a small amount from 0.98 to 0.91 due to the added correlation of the reference smoothing.

## That one liner...

Caveat:   this only works for the short test= version 

In [None]:
GBTFITSLoad(dysh_data(test='getps')).getps().timeaverage().smooth('box',256).plot()