# Handling VERITAS spectra

After defining a data file format for VERITAS spectra -- https://github.com/chbrandt/veritas/blob/master/data_formatting-v2.5.rst -- we'll now handle its data/metadata to see how it goes.

A very fundamental level of any data format is to be able to transform it to (and from) FITS -- without loosing information, clearly -- as FITS is the standard format in Astrophysics.

## Testing case: Mkn421

Seven data files with *spectra* from Mkn421 are available to build those SED plots.
The files are in ECSV format as proposed in https://github.com/chbrandt/veritas/blob/master/data_formatting-v2.rst.

In [1]:
%ls

handling_mkn421-Adding interaction to plots.ipynb
handling_mkn421.ipynb
handling_mkn421-Structuring the plot.ipynb
handling_mkn421-Transforming_to_FITS.ipynb
Mkn421_VERITAS_2008_highA.csv
Mkn421_VERITAS_2008_highA.fits
Mkn421_VERITAS_2008_highB.csv
Mkn421_VERITAS_2008_highB.fits
Mkn421_VERITAS_2008_highC.csv
Mkn421_VERITAS_2008_highC.fits
Mkn421_VERITAS_2008_low.csv
Mkn421_VERITAS_2008_low.fits
Mkn421_VERITAS_2008_mid.csv
Mkn421_VERITAS_2008_mid.fits
Mkn421_VERITAS_2008_veryhigh.csv
Mkn421_VERITAS_2008_veryhigh.fits
Mkn421_VERITAS_2008_verylow.csv
Mkn421_VERITAS_2008_verylow.fits
test1.fits


In [2]:
%cat 'Mkn421_VERITAS_2008_highA.csv'

# %ECSV 0.9
# ---
# meta: !!omap
# - OBJECT: Mrk 421
#
# - DESCRIBE:
#    Spectral points for multiwavelength campaign;
#    Observations taken between 2008 January 01 and 2008 June 05;
#    Flux sensitivity 0.8e-10 < flux(E>1TeV) < 1.1e-10
#
# - MJD:
#    START: 54502.46971
#    END: 54622.18955
#
# - ARTICLE:
#    label: Ap.J. 738, 25 (2011)
#    url: http://iopscience.iop.org/0004-637X/738/1/25/
#    arxiv: http://arxiv.org/abs/1106.1210
#    ads: http://adsabs.harvard.edu/abs/2011ApJ...738...25A
#
# - COMMENTS:
#    - Name=Mrk421_2008_highA
#    - z=0.031
#    - LiveTime(h)=1.4
#    - significance=73.0
#
# - SED_TYPE: diff_flux_points
#
# datatype:
# - name: e_ref
#   unit: TeV
#   datatype: float64
# - name: dnde
#   unit: ph / (m2 TeV s)
#   datatype: float64
# - name: dnde_errn
#   unit: ph / (m2 TeV s)
#   datatype: float64
# - name: dnde_errp
#   unit: ph / (m2 TeV s)
#   datatype: float64
#
e_ref dnde       dnde_errn  dnde_errp
0.275

The format -- ECSV -- has been chosen for it is human readable and its metadata availability.
The library we have to use is Astropy.

In [3]:
from astropy.table import Table
filename = 'Mkn421_VERITAS_2008_highA.csv'
tt = Table.read(filename, format='ascii.ecsv')

In [4]:
tt

e_ref,dnde,dnde_errn,dnde_errp
TeV,ph / (m2 s TeV),ph / (m2 s TeV),ph / (m2 s TeV)
float64,float64,float64,float64
0.275,1.702e-05,3.295e-06,3.295e-06
0.34,1.289e-05,1.106e-06,1.106e-06
0.42,8.821e-06,6.072e-07,6.072e-07
0.519,5.777e-06,3.697e-07,3.697e-07
0.642,3.509e-06,2.351e-07,2.351e-07
0.793,2.151e-06,1.525e-07,1.525e-07
0.98,1.302e-06,1.024e-07,1.024e-07
1.212,6.273e-07,6.117e-08,6.117e-08
1.498,3.31e-07,3.853e-08,3.853e-08
1.851,1.661e-07,2.401e-08,2.401e-08


In [5]:
import json
print json.dumps(tt.meta,indent=4)

{
    "OBJECT": "Mrk 421", 
    "DESCRIBE": "Spectral points for multiwavelength campaign; Observations taken between 2008 January 01 and 2008 June 05; Flux sensitivity 0.8e-10 < flux(E>1TeV) < 1.1e-10", 
    "MJD": {
        "START": 54502.46971, 
        "END": 54622.18955
    }, 
    "ARTICLE": {
        "url": "http://iopscience.iop.org/0004-637X/738/1/25/", 
        "arxiv": "http://arxiv.org/abs/1106.1210", 
        "ads": "http://adsabs.harvard.edu/abs/2011ApJ...738...25A", 
        "label": "Ap.J. 738, 25 (2011)"
    }, 
    "COMMENTS": [
        "Name=Mrk421_2008_highA", 
        "z=0.031", 
        "LiveTime(h)=1.4", 
        "significance=73.0"
    ], 
    "SED_TYPE": "diff_flux_points"
}


### Transform to FITS

Let's write down the current table to FITS. Ideally, every piece of information will be saved.

In [6]:
fileout = '.'.join([ ''.join(filename.split('.')[:-1]), 'fits' ])
tt.write(fileout, format='fits',overwrite=True)



*Well...we clearly see some issues with FITS.*
There are some issues when header/meta contains (1) nested structures (ie, dictionaries) -- they are not written to the file, while an `warning` is emitted --, as well as (2) long keywords (bigger than 8 characters) paired with long values (greater than ~70 characters).

If/when writing to FITS we should, then, deal with those issues.

### Working out the header

To handle long and nested keywords we will have to apply some kind of word-shortening & flattening algorithms.

First, notice that the only nested (or container) structure that handle us issues are *dictionaries*, but not *lists* for instance. 
Since *dictionaries* keep the (useful) information at their leaves, to flatten such structure is "as simple as" working out the *keys* that bring to the (useful) information.

For instance, let us take the "`MJD`" structure in our original header:
```
MJD
  - INI : 54502.46971
  - END : 54622.18955
```
We can flatten such structure by merging the *keys*:
```
MJDINI = 54502.46971
MJDEND = 54622.18955
```
To keep the transition, from the original file (ECSV) to the FITS, as clear as possible, we can even add those translations somehow in the header; It seems to be a symbol (e.g, "`:`") can be used where keys were concatenated and use it as the `keyword` for the new keyword as `value`:
```
SUBS_MJDINI = MJD:INI
```

The second aspect we have to deal with are the long keywords, keywords longer than 8 chars.
I propose to solve this one adopting the result of one of the following two-step process:
```
if length( KEYWORD ) is greater than 8:
    NEW_KEYWORD = remove_vowels_from( KEYWORD )
if length( NEW_KEYWORD ) is greater than 8:
    NEW_KEYWORD = KEYWORD[:8]
```
Again, we can add such translation to table's meta information.
Let's take as an example the keyword `DESCRIPTION`.
`DESCRIPTION` has more than 8 characters (11, actually); we solve that by removing its vowels:
```
DSCRPTN="Spectral points for multiwavelength campaign; Observations taken between 2008 January 01 and 2008 June 05; Flux sensitivity 0.8e-10 < flux(E>1TeV) < 1.1e-10"
SUBS_DSCRPTN=DESCRIPTION
```

In [7]:
def flatten_header(table):
    separator = '-'
    def flatten_dict(key,value_dict):
        outish = []
        for k,v in value_dict.iteritems():
            if isinstance(v,dict):
                flat = flatten_dict(k,v)
                for kp,vp in flat:
                    kn = separator.join([ str(k), kp ])
                    outish.append((kn,vp))
            else:
                outish.append((k,v))
        return [ (separator.join([ str(key), k]),v) for k,v in outish ]
    for k,v in table.meta.iteritems():
        if isinstance(v,dict):
            flat = flatten_dict(k,v)
            for kn,vn in flat:
                table.meta[kn] = vn
                try:
                    del table.meta[k]
                except:
                    pass
    return table.meta

_ = flatten_header(tt)

In [8]:
tt.meta

OrderedDict([('OBJECT', 'Mrk 421'),
             ('DESCRIBE',
              'Spectral points for multiwavelength campaign; Observations taken between 2008 January 01 and 2008 June 05; Flux sensitivity 0.8e-10 < flux(E>1TeV) < 1.1e-10'),
             ('COMMENTS',
              ['Name=Mrk421_2008_highA',
               'z=0.031',
               'LiveTime(h)=1.4',
               'significance=73.0']),
             ('SED_TYPE', 'diff_flux_points'),
             ('MJD-START', 54502.46971),
             ('MJD-END', 54622.18955),
             ('ARTICLE-url', 'http://iopscience.iop.org/0004-637X/738/1/25/'),
             ('ARTICLE-arxiv', 'http://arxiv.org/abs/1106.1210'),
             ('ARTICLE-ads',
              'http://adsabs.harvard.edu/abs/2011ApJ...738...25A'),
             ('ARTICLE-label', 'Ap.J. 738, 25 (2011)')])

In [9]:
def shorten_header_keywords(table):
    def shorten_word(word):
        new_word = ''.join( filter(lambda c:c.lower() not in 'aeiou', word) )
        if len(new_word) > 8:
            new_word = new_word[:8]
        return new_word
    to_remove = []
    to_add = []
    for k,v in table.meta.iteritems():
        if len(k) > 8:
            assert isinstance(k,(str,unicode))
            new_k = shorten_word(k)
            to_add.append((new_k,v))
            to_remove.append(k)
            to_add.append(('SUBS_'+new_k,k))
    for new_k,v in to_add:
        table.meta[new_k] = v
    for old_k in to_remove:
        del table.meta[old_k]
    return table.meta

#_ = shorten_header_keywords(tt)
#tt.meta

In [10]:
tt.write(fileout, format='fits', overwrite=True)



Clearly that is not the most beautiful solution, but we now can write freely to a FITS file.

In [11]:
t_ = Table.read(fileout, format='fits')
t_

e_ref,dnde,dnde_errn,dnde_errp
TeV,ph / (m2 s TeV),ph / (m2 s TeV),ph / (m2 s TeV)
float64,float64,float64,float64
0.275,1.702e-05,3.295e-06,3.295e-06
0.34,1.289e-05,1.106e-06,1.106e-06
0.42,8.821e-06,6.072e-07,6.072e-07
0.519,5.777e-06,3.697e-07,3.697e-07
0.642,3.509e-06,2.351e-07,2.351e-07
0.793,2.151e-06,1.525e-07,1.525e-07
0.98,1.302e-06,1.024e-07,1.024e-07
1.212,6.273e-07,6.117e-08,6.117e-08
1.498,3.31e-07,3.853e-08,3.853e-08
1.851,1.661e-07,2.401e-08,2.401e-08


In [12]:
t_.meta

OrderedDict([('OBJECT', 'Mrk 421'),
             ('DESCRIBE',
              'Spectral points for multiwavelength campaign; Observations taken between 2008 January 01 and 2008 June 05; Flux sensitivity 0.8e-10 < flux(E>1TeV) < 1.1e-10'),
             ('COMMENTS',
              ['Name=Mrk421_2008_highA',
               'z=0.031',
               'LiveTime(h)=1.4',
               'significance=73.0']),
             ('SED_TYPE', 'diff_flux_points'),
             ('MJD-START', 54502.46971),
             ('MJD-END', 54622.18955),
             ('ARTICLE-url', 'http://iopscience.iop.org/0004-637X/738/1/25/'),
             ('ARTICLE-arxiv', 'http://arxiv.org/abs/1106.1210'),
             ('ARTICLE-ads',
              'http://adsabs.harvard.edu/abs/2011ApJ...738...25A'),
             ('ARTICLE-label', 'Ap.J. 738, 25 (2011)')])

## Transform CSV to FITS

In [13]:
from astropy.table import Table
from glob import glob
for csv_file in glob('*.csv'):
    fits_file = '{}.fits'.format(csv_file[:-4])
    t = Table.read(csv_file, format='ascii.ecsv')
    _ = flatten_header(t)
    t.write(fits_file, format='fits', overwrite=True)

In [14]:
%ls

handling_mkn421-Adding interaction to plots.ipynb
handling_mkn421.ipynb
handling_mkn421-Structuring the plot.ipynb
handling_mkn421-Transforming_to_FITS.ipynb
Mkn421_VERITAS_2008_highA.csv
Mkn421_VERITAS_2008_highA.fits
Mkn421_VERITAS_2008_highB.csv
Mkn421_VERITAS_2008_highB.fits
Mkn421_VERITAS_2008_highC.csv
Mkn421_VERITAS_2008_highC.fits
Mkn421_VERITAS_2008_low.csv
Mkn421_VERITAS_2008_low.fits
Mkn421_VERITAS_2008_mid.csv
Mkn421_VERITAS_2008_mid.fits
Mkn421_VERITAS_2008_veryhigh.csv
Mkn421_VERITAS_2008_veryhigh.fits
Mkn421_VERITAS_2008_verylow.csv
Mkn421_VERITAS_2008_verylow.fits
test1.fits


In [15]:
import json
print json.dumps(Table.read('Mkn421_VERITAS_2008_highB.fits').meta, indent=4)

{
    "OBJECT": "Mrk 421", 
    "DESCRIBE": "Spectral points for multiwavelength campaign; Observations taken between 2008 January 01 and 2008 June 05; Flux sensitivity ? < flux(E>1TeV) < ?", 
    "COMMENTS": [
        "Name=Mrk421_2008_highB", 
        "z=0.031", 
        "LiveTime(h)=0.6", 
        "significance=47.3"
    ], 
    "SED_TYPE": "diff_flux_points", 
    "MJD-START": 54589.28015, 
    "MJD-END": 54589.3248, 
    "ARTICLE-url": "http://iopscience.iop.org/0004-637X/738/1/25/", 
    "ARTICLE-arxiv": "http://arxiv.org/abs/1106.1210", 
    "ARTICLE-ads": "http://adsabs.harvard.edu/abs/2011ApJ...738...25A", 
    "ARTICLE-label": "Ap.J. 738, 25 (2011)"
}
