## Code and Documentation to Decode Popup Buoy Transmitted/Recorded Data

## Data Structure

Data downloaded from Popup Buoy's directly, generate 8 binary files.  A complete description of these files can be found at **[link]**.  Relevant pieces will be included in the notebook.

Filenames:
- BOTDAT.TXT
- FILEPOS.TXT
- ICEDAT.TXT
- JPGxxxxx.JPG
- PRODAT.TXT
- SSTDAT.TXT
- SUMMARY.TXT

In [1]:
instfile_dic = {'bot_file': 'sampledata/BOTDAT.TXT',
                'ice_file': 'sampledata/ICEDAT.TXT',
                'pro_file': 'sampledata/PRODAT.TXT',
                'sst_file': 'sampledata/SSTDAT.TXT'}

For each data file, we are going to read in the entire file and convert the binary to hex.  There are multiple line, lengths we are going to have to address but the start of each record is denoted by 'FFFF'.  We can split the filestring on this parameter but we need to be aware of 'FFFFF' or 'FFFFFF' posibilities

***Basic Approach***

The two modules below will allow for a simple readin of the file for very simple analysis and debugging... the code of consequence that involves conversion of measurements from engineering units to science units as all defined in the PopUpBuoy CLASS in the next section

In [2]:
def HexView(file):
    with open(file, 'rb') as in_file:
        while True:
            hexdata = in_file.read().hex().upper()     # Read the shortest possible line
            if len(hexdata) == 0:                      # breaks loop once no more binary data is read
                break
            
            return(hexdata.upper())
        
def HexSplit(hexstr):
    if hexstr.find('FFFFF') == -1:
        print("No FFFFF, proceed to split on FFFF")
        sample_raw = hexstr.split('FFFF')[1:]
    else:
        print('FFFFF found')
        #this puts in the proper line endings but removes a variable 
        #   F from the end of each string.  Add the F string back
        sample_raw = []
        for substr in hexstr.split('FFFFF')[1:]: 
            sample_raw = sample_raw + (substr + 'F').split('FFFF')

        sample_raw[-1] = sample_raw[-1][:-1]
        
    return(sample_raw)

In [3]:
active_file = instfile_dic['bot_file']

hexstr = HexView(active_file)
sample_raw = HexSplit(hexstr)


No FFFFF, proceed to split on FFFF


# Class description and routine code

## Decode sample data for each file type

### Bottom Data (BOTDAT.TXT) / Under Ice Data (ICEDAT.TXT)

This data has two record lengths.  17 and 19.  Since we've split on 'FFFF' and broken the record up into samples, the initial 2bytes are no longer in the record so the record lengths are now 15 and 17 (which is a string length of 30 and 34 characters)

***MSG Decode Key***
![BotDecodeMsg](decode_images/BotDat_msg_decode.png)

***Engineering to Science Conversions***
![BotDecodeMsg](decode_images/BotDat_msg_decode.png)


### Profile Data (PRODAT.TXT) /

This data has two record lengths.  13 and 15.  Since we've split on 'FFFF' and broken the record up into samples, the initial 2bytes are no longer in the record so the record lengths are now 11 and 13 (which is a string length of 30 and 34 characters).  This file does not have the bottom temp or the reference temp fields.


### Defining the PopUpBuoy Class

In [26]:
import numpy as np
import pandas as pd

class PopUpBuoys(object):
    """Class definitions to read and Process PopUp Buoy Data Streams"""

    ############################################################
    ### The following constants should be placed into a module
    #    so that they can be updated with a config file for each 
    sample_interval = {'bottom':3600,
                       'ice': 3600,
                       'profile': .25,
                       'sst': 3*3600} #seconds
    
    underside_temp_cal = { 'Acoef':0.00121376381803238,
                           'Bcoef':0.000522637158831552,
                           'Ccoef':1.41820495016536E-06 }

    topside_temp_cal = { 'Acoef':0.00121410745269167,
                         'Bcoef':0.000522254475008962,
                         'Ccoef':1.43969129191958E-06}
    
    par_cal = { 'offset': 5458,
                'slope': 0.01415} #offset and slope

    fluor_cal = { 'offset': 40,
                  'slope': 0.020920502} #offset and slope    
    ###########################################################
    active_stream = 'bottom'
    def __init__(self, path):
        self.path = path
        self.instfile_dic = {'bottom': path + '/BOTDAT.TXT',
                             'ice': path + '/ICEDAT.TXT',
                             'profile': path + '/PRODAT.TXT',
                             'sst': path + '/SSTDAT.TXT'}
    
    def LoadCoefs(self, coef_file):
        pass
    
    def HexView(self, sample='', verbose=True):
        '''
        input: reference to proper filepointer, options are keys to the self.instfile_dic dictionary
        '''
        if not sample:
            sample=self.active_stream
        else:
            self.active_stream = sample
            
        file = self.instfile_dic[sample]
        with open(file, 'rb') as in_file:
            while True:
                hexdata = in_file.read().hex().upper()     # Read the shortest possible line
                if len(hexdata) == 0:                      # breaks loop once no more binary data is read
                    break
                self.hexstr = hexdata.upper()
                
                if verbose:
                    return(hexdata.upper())

    def HexSplit(self, verbose=True):
        '''
        input: results of HexView (inherits output)
        '''
        if self.hexstr.find('FFFFF') == -1:
            print("No FFFFF, proceed to split on FFFF")
            sample_raw = self.hexstr.split('FFFF')[1:]
        else:
            print('FFFFF found')
            #this puts in the proper line endings but removes a variable 
            #   F from the end of each string.  Add the F string back
            sample_raw = []
            for substr in self.hexstr.split('FFFFF')[1:]: 
                sample_raw = sample_raw + (substr + 'F').split('FFFF')

            sample_raw[-1] = sample_raw[-1][:-1]
        
        self.sample_raw = sample_raw
        
        if verbose:
            return(sample_raw)
    
    def Bottom(self, asPandas=False):
        try:
            self.sample_raw
        except:
            print("Run PopUpBuoys.HexView and PopUpBuoys.HexSplit First")
        
        data={}

        for sample_num, sample in enumerate(self.sample_raw):
            
            if len(sample) == 30: #2byte timeword
                
                time = int(sample[0:4],16) * self.sample_interval['bottom'] #seconds since 1970-01-01
                
                pressure =  (int(sample[4:8],16) - 16384)* 10 / 32768  #bar
                
                rawtvalue = int(sample[8:12],16)
                if rawtvalue >= 0x8000:
                    rawtvalue -= 0x10000
                topside_temp = 1 / ( self.topside_temp_cal['Acoef'] + 
                                     self.topside_temp_cal['Bcoef']*np.log(rawtvalue) + 
                                     self.topside_temp_cal['Ccoef']*np.log(rawtvalue)**3 ) - 273.15
                                                                        #temp DegC

                rawtvalue = int(sample[12:16],16)
                if rawtvalue >= 0x8000:
                    rawtvalue -= 0x10000
                underside_temp = 1 / ( self.underside_temp_cal['Acoef'] + 
                                       self.underside_temp_cal['Bcoef']*np.log(rawtvalue) + 
                                       self.underside_temp_cal['Ccoef']*np.log(rawtvalue)**3 ) - 273.15                
                                                                        #temp DegC
                
                temp_ref = int(sample[16:20],16)                            #temp ref in ADC
                if temp_ref >= 0x8000:
                    temp_ref -= 0x10000                
                
                rawpvalue = int(sample[20:24],16)
                if rawpvalue >= 0x8000:
                    rawpvalue -= 0x10000                   
                
                par = (rawpvalue - self.par_cal['offset'])* self.par_cal['slope'] / 0.73 #PAR in umolm-2s-1
                
                rawfvalue = int(sample[24:28],16)
                if rawfvalue >= 0x8000:
                    rawfvalue -= 0x10000                   
                
                fluor = (rawfvalue - self.fluor_cal['offset'])* self.fluor_cal['slope']  #concentration in ug/L 
                
                tilt = int(sample[28:30],16) #degrees
                
            elif len(sample) == 34: #4byte timeword
                
                time = int(sample[0:8],16) * self.sample_interval['bottom'] #seconds since 1970-01-01
                
                pressure =  (int(sample[8:12],16) - 16384)* 10 / 32768  #bar
                
                rawtvalue = int(sample[12:16],16)
                if rawtvalue >= 0x8000:
                    rawtvalue -= 0x10000
                topside_temp = 1 / ( self.topside_temp_cal['Acoef'] + 
                                     self.topside_temp_cal['Bcoef']*np.log(rawtvalue) + 
                                     self.topside_temp_cal['Ccoef']*np.log(rawtvalue)**3 ) - 273.15
                                                                        #temp DegC

                rawtvalue = int(sample[16:20],16)
                if rawtvalue >= 0x8000:
                    rawtvalue -= 0x10000
                underside_temp = 1 / ( self.underside_temp_cal['Acoef'] + 
                                       self.underside_temp_cal['Bcoef']*np.log(rawtvalue) + 
                                       self.underside_temp_cal['Ccoef']*np.log(rawtvalue)**3 ) - 273.15                
                                                                        #temp DegC
                
                temp_ref = int(sample[20:24],16)                            #temp ref in ADC
                if temp_ref >= 0x8000:
                    temp_ref -= 0x10000                
                
                rawpvalue = int(sample[24:28],16)
                if rawpvalue >= 0x8000:
                    rawpvalue -= 0x10000                   
                
                par = (rawpvalue - self.par_cal['offset'])* self.par_cal['slope'] / 0.73 #PAR in umolm-2s-1
                
                rawfvalue = int(sample[28:32],16)
                if rawfvalue >= 0x8000:
                    rawfvalue -= 0x10000                   
                
                fluor = (rawfvalue - self.fluor_cal['offset'])* self.fluor_cal['slope']  #concentration in ug/L 
                
                tilt = int(sample[32:34],16) #degrees
                
            else:
                time = pressure = topside_temp = underside_temp = temp_ref = np.nan
                par = fluor = tilt = np.nan
            #save to dictionary
            data[sample_num] = {'time':time,
                                'pressure':pressure,
                                'topside_temp':topside_temp,
                                'underside_temp':underside_temp,
                                'temp_ref':temp_ref,
                                'par':par,
                                'fluor':fluor,
                                'tilt':tilt}        
        
        if asPandas:
            data = pd.DataFrame.from_dict(data,orient='index')
        return(data)
    
    def SST(self, asPandas=False):
        try:
            self.sample_raw
        except:
            print("Run PopUpBuoys.HexView and PopUpBuoys.HexSplit First")
        
        data={}

    def Profile(self, asPandas=False):
        try:
            self.sample_raw
        except:
            print("Run PopUpBuoys.HexView and PopUpBuoys.HexSplit First")
        
        data={}

        for sample_num, sample in enumerate(self.sample_raw):
            
            if len(sample) == 22: #2byte timeword
                
                time = int(sample[0:4],16) * self.sample_interval['profile'] #seconds since 1970-01-01
                
                pressure =  (int(sample[4:8],16) - 16384)* 10 / 32768  #bar
                
                rawtvalue = int(sample[8:12],16)
                if rawtvalue >= 0x8000:
                    rawtvalue -= 0x10000
                topside_temp = 1 / ( self.topside_temp_cal['Acoef'] + 
                                     self.topside_temp_cal['Bcoef']*np.log(rawtvalue) + 
                                     self.topside_temp_cal['Ccoef']*np.log(rawtvalue)**3 ) - 273.15
                                                                        #temp DegC              
                
                rawpvalue = int(sample[12:16],16)
                if rawpvalue >= 0x8000:
                    rawpvalue -= 0x10000                   
                
                par = (rawpvalue - self.par_cal['offset'])* self.par_cal['slope'] / 0.73 #PAR in umolm-2s-1
                
                rawfvalue = int(sample[16:20],16)
                if rawfvalue >= 0x8000:
                    rawfvalue -= 0x10000                   
                
                fluor = (rawfvalue - self.fluor_cal['offset'])* self.fluor_cal['slope']  #concentration in ug/L 
                
                tilt = int(sample[20:22],16) #degrees
                
            elif len(sample) == 26: #4byte timeword
                
                time = int(sample[0:8],16) * self.sample_interval['profile'] #seconds since 1970-01-01
                
                pressure =  (int(sample[8:12],16) - 16384)* 10 / 32768  #bar
                
                rawtvalue = int(sample[12:16],16)
                if rawtvalue >= 0x8000:
                    rawtvalue -= 0x10000
                topside_temp = 1 / ( self.topside_temp_cal['Acoef'] + 
                                     self.topside_temp_cal['Bcoef']*np.log(rawtvalue) + 
                                     self.topside_temp_cal['Ccoef']*np.log(rawtvalue)**3 ) - 273.15
                                                                        #temp DegC
              
                
                rawpvalue = int(sample[16:20],16)
                if rawpvalue >= 0x8000:
                    rawpvalue -= 0x10000                   
                
                par = (rawpvalue - self.par_cal['offset'])* self.par_cal['slope'] / 0.73 #PAR in umolm-2s-1
                
                rawfvalue = int(sample[20:24],16)
                if rawfvalue >= 0x8000:
                    rawfvalue -= 0x10000                   
                
                fluor = (rawfvalue - self.fluor_cal['offset'])* self.fluor_cal['slope']  #concentration in ug/L 
                
                tilt = int(sample[24:26],16) #degrees
                
            else:
                time = pressure = topside_temp = np.nan
                par = fluor = tilt = np.nan
            #save to dictionary
            data[sample_num] = {'time':time,
                                'pressure':pressure,
                                'topside_temp':topside_temp,
                                'par':par,
                                'fluor':fluor,
                                'tilt':tilt}        
        
        if asPandas:
            data = pd.DataFrame.from_dict(data,orient='index')
        return(data)
        
    def Ice(self, asPandas=False):
        try:
            self.sample_raw
        except:
            print("Run PopUpBuoys.HexView and PopUpBuoys.HexSplit First")
        
        data={}

        for sample_num, sample in enumerate(self.sample_raw):
            
            if len(sample) == 30: #2byte timeword
                
                time = int(sample[0:4],16) * self.sample_interval['ice'] #seconds since 1970-01-01
                
                pressure =  (int(sample[4:8],16) - 16384)* 10 / 32768  #bar
                
                rawtvalue = int(sample[8:12],16)
                if rawtvalue >= 0x8000:
                    rawtvalue -= 0x10000
                topside_temp = 1 / ( self.topside_temp_cal['Acoef'] + 
                                     self.topside_temp_cal['Bcoef']*np.log(rawtvalue) + 
                                     self.topside_temp_cal['Ccoef']*np.log(rawtvalue)**3 ) - 273.15
                                                                        #temp DegC

                rawtvalue = int(sample[12:16],16)
                if rawtvalue >= 0x8000:
                    rawtvalue -= 0x10000
                underside_temp = 1 / ( self.underside_temp_cal['Acoef'] + 
                                       self.underside_temp_cal['Bcoef']*np.log(rawtvalue) + 
                                       self.underside_temp_cal['Ccoef']*np.log(rawtvalue)**3 ) - 273.15                
                                                                        #temp DegC
                
                temp_ref = int(sample[16:20],16)                            #temp ref in ADC
                if temp_ref >= 0x8000:
                    temp_ref -= 0x10000                
                
                rawpvalue = int(sample[20:24],16)
                if rawpvalue >= 0x8000:
                    rawpvalue -= 0x10000                   
                
                par = (rawpvalue - self.par_cal['offset'])* self.par_cal['slope'] / 0.73 #PAR in umolm-2s-1
                
                rawfvalue = int(sample[24:28],16)
                if rawfvalue >= 0x8000:
                    rawfvalue -= 0x10000                   
                
                fluor = (rawfvalue - self.fluor_cal['offset'])* self.fluor_cal['slope']  #concentration in ug/L 
                
                tilt = int(sample[28:30],16) #degrees
                
            elif len(sample) == 34: #4byte timeword
                
                time = int(sample[0:8],16) * self.sample_interval['ice'] #seconds since 1970-01-01
                
                pressure =  (int(sample[8:12],16) - 16384)* 10 / 32768  #bar
                
                rawtvalue = int(sample[12:16],16)
                if rawtvalue >= 0x8000:
                    rawtvalue -= 0x10000
                topside_temp = 1 / ( self.topside_temp_cal['Acoef'] + 
                                     self.topside_temp_cal['Bcoef']*np.log(rawtvalue) + 
                                     self.topside_temp_cal['Ccoef']*np.log(rawtvalue)**3 ) - 273.15
                                                                        #temp DegC

                rawtvalue = int(sample[16:20],16)
                if rawtvalue >= 0x8000:
                    rawtvalue -= 0x10000
                underside_temp = 1 / ( self.underside_temp_cal['Acoef'] + 
                                       self.underside_temp_cal['Bcoef']*np.log(rawtvalue) + 
                                       self.underside_temp_cal['Ccoef']*np.log(rawtvalue)**3 ) - 273.15                
                                                                        #temp DegC
                
                temp_ref = int(sample[20:24],16)                            #temp ref in ADC
                if temp_ref >= 0x8000:
                    temp_ref -= 0x10000                
                
                rawpvalue = int(sample[24:28],16)
                if rawpvalue >= 0x8000:
                    rawpvalue -= 0x10000                   
                
                par = (rawpvalue - self.par_cal['offset'])* self.par_cal['slope'] / 0.73 #PAR in umolm-2s-1
                
                rawfvalue = int(sample[28:32],16)
                if rawfvalue >= 0x8000:
                    rawfvalue -= 0x10000                   
                
                fluor = (rawfvalue - self.fluor_cal['offset'])* self.fluor_cal['slope']  #concentration in ug/L 
                
                tilt = int(sample[32:34],16) #degrees
                
            else:
                time = pressure = topside_temp = underside_temp = temp_ref = np.nan
                par = fluor = tilt = np.nan
            #save to dictionary
            data[sample_num] = {'time':time,
                                'pressure':pressure,
                                'topside_temp':topside_temp,
                                'underside_temp':underside_temp,
                                'temp_ref':temp_ref,
                                'par':par,
                                'fluor':fluor,
                                'tilt':tilt}        
        
        if asPandas:
            data = pd.DataFrame.from_dict(data,orient='index')
        return(data)

## Sample Evaluation of routine

Imagine a buoy with ID number 119087.  Instantiate a PopUpBuoys class with the relative (or absolute) path to the location of the download/reconstructed data files

In [27]:
ID119087 = PopUpBuoys('sampledata')

Call the routine to read and convert the binary file to a hex string... the sample parameter is the name of the data type.

sample options are:
+ bottom
+ sst
+ profile
+ ice

passing 'verbose=True' returns the hex string

In [28]:
ID119087.HexView(sample='bottom',verbose=True)

'FFFF000049BE298E2960214E1D30005907FFFF000149C0296C2944214F1A2C004A07FFFF000249C029502933214F18AC003D07FFFF000349C229452931214F17A0003D07FFFF000449C3294A293B214F15F0004507FFFF000549C429542949214F1558004307FFFF000649C629612957214F1552004207FFFF000749C529682962214F1552003407FFFF000849C72975296F214F1552003707FFFF000949C929842980214F1552003607FFFF000A49C92993298F214F1552002F07FFFF000B49CA29A329A0214F1552004C07FFFF000C49CA29B329B0214F1552003D07FFFF000D49CA29BE29BD214F1552003807FFFF000E49CB29CC29CB214F1552002E07FFFF000F49CC29D729D5214F1552003E07FFFF001049CD29E329E2214F15C0003A07'

In [29]:
ID119087.HexSplit(verbose=True)

No FFFFF, proceed to split on FFFF


['000049BE298E2960214E1D30005907',
 '000149C0296C2944214F1A2C004A07',
 '000249C029502933214F18AC003D07',
 '000349C229452931214F17A0003D07',
 '000449C3294A293B214F15F0004507',
 '000549C429542949214F1558004307',
 '000649C629612957214F1552004207',
 '000749C529682962214F1552003407',
 '000849C72975296F214F1552003707',
 '000949C929842980214F1552003607',
 '000A49C92993298F214F1552002F07',
 '000B49CA29A329A0214F1552004C07',
 '000C49CA29B329B0214F1552003D07',
 '000D49CA29BE29BD214F1552003807',
 '000E49CB29CC29CB214F1552002E07',
 '000F49CC29D729D5214F1552003E07',
 '001049CD29E329E2214F15C0003A07']

In [30]:
bottom_data = ID119087.Bottom(asPandas=True)

In [31]:
bottom_data

Unnamed: 0,time,pressure,topside_temp,underside_temp,temp_ref,par,fluor,tilt
0,0,0.761108,-134.342398,-133.999116,8526,39.038493,1.025105,7
1,3600,0.761719,-134.287268,-133.953589,8527,24.074384,0.711297,7
2,7200,0.761719,-134.241714,-133.925879,8527,16.631096,0.439331,7
3,10800,0.762329,-134.223779,-133.922616,8527,11.436301,0.439331,7
4,14400,0.762634,-134.231934,-133.938925,8527,3.062603,0.606695,7
5,18000,0.762939,-134.24823,-133.961729,8527,0.116301,0.564854,7
6,21600,0.76355,-134.269389,-133.984497,8527,0.0,0.543933,7
7,25200,0.763245,-134.280769,-134.002362,8527,0.0,0.251046,7
8,28800,0.763855,-134.301881,-134.023448,8527,0.0,0.313808,7
9,32400,0.764465,-134.326205,-134.050976,8527,0.0,0.292887,7


In [32]:
ID119087.HexView(sample='ice',verbose=False)
ID119087.HexSplit(verbose=False)
ice_data = ID119087.Ice(asPandas=True)

No FFFFF, proceed to split on FFFF


In [33]:
ice_data

Unnamed: 0,time,pressure,topside_temp,underside_temp,temp_ref,par,fluor,tilt
0,10800,0.038147,-134.233564,-134.044504,8527,328.609521,0.857741,13
1,14400,0.037842,-134.13378,-133.973117,8527,451.036096,0.376569,14
2,18000,0.037842,-133.862144,-133.917719,8527,500.677397,0.732218,14
3,21600,0.037231,-134.128855,-133.881763,8527,187.419658,0.690377,13
4,25200,0.037231,-134.100917,-133.870304,8527,98.972466,0.753138,13
5,28800,0.037231,-134.076223,-133.834233,8527,53.653699,0.313808,13
6,32400,0.037231,-134.114071,-133.840798,8527,27.718493,0.606695,14
7,36000,0.037231,-134.115714,-133.849,8527,1.046712,0.334728,14
8,39600,0.037842,-134.13378,-133.858836,8527,0.0,0.292887,13
9,43200,0.037842,-134.130497,-133.86539,8527,0.0,0.543933,14


In [34]:
ID119087.HexView(sample='profile',verbose=True)
ID119087.HexSplit(verbose=True)
pro_data = ID119087.Profile(asPandas=True)

FFFFF found


In [35]:
pro_data

Unnamed: 0,time,pressure,topside_temp,par,fluor,tilt
0,162.50,0.374146,-134.496963,34.134452,0.271967,7
1,168.75,0.372620,-134.498564,34.483356,0.251046,8
2,175.00,0.371704,-134.498564,35.316849,0.167364,11
3,181.25,0.367737,-134.498564,36.169726,0.271967,19
4,187.50,0.362549,-134.498564,37.642877,0.125523,21
5,193.75,0.353699,-134.498564,38.631438,0.376569,20
6,200.00,0.345764,-134.498564,39.135411,0.774059,14
7,206.25,0.342407,-134.498564,38.495753,0.397490,5
8,212.50,0.337524,-134.498564,39.368014,0.355649,9
9,218.75,0.327454,-134.498564,40.666712,0.271967,13
