# OPTAA Checker

This notebook is designed to check the SPKIR csv calibration file in pull request. The process I follow is:
1. Read in the OPTAA csv from the pull request into a pandas dataframe
2. Identify the source file of the calibration coefficients
3. Parse the calibration coefficients directly from the source file
4. Compare the OPTAA csv from the pull request with the csv parsed from the source file

**====================================================================================================================**

The first step is to load relevant packages:

In [1]:
import csv
import re
import os
import shutil
import numpy as np
import pandas as pd

In [2]:
from utils import *

In [3]:
from zipfile import ZipFile
import string

**====================================================================================================================**
Define the directories where the **csv** file to check is stored, and where the **source** file is stored. Make sure to check the following information on your machine via your terminal first:
1. The branch of your local asset-management repository matches the location of the OPTAA cals.
2. Your local asset-management repository has the requisite **csv** file to check
3. You have downloaded the **source** of the csv file

In [4]:
csv_dir = '/home/andrew/Documents/OOI-CGSN/asset-management/calibration/OPTAAD/'
source_dir = '/media/andrew/OS/Users/areed/Documents/Project_Files/Records/Instrument_Records/OPTAA/OPTAA_Cal/'

**====================================================================================================================**
### Find & Parse the source file
Now, we want to find the source file of the calibration coefficients, parse the data using the optaa parser, and read the data into a pandas dataframe. The key pieces of data needed for the parser are:
1. Instrument UID: This is needed to initialize the OPTAA parser
2. Source file: This is the full path to the source file. Zip files are acceptable input.

In [57]:
source_name = '189'
for file in os.listdir(source_dir):
    if source_name in file:
        source_file = file
        print(source_file)

OPTAA-D_AC-S_SN_189_Calibration_Files_2014-09-18.zip
OPTAA-D_AC-S_SN_189_Calibration_Files_2014-09-26.zip
OPTAA-D_AC-S_SN_189_Calibration_Files_2015-12-29.zip
OPTAA-D_AC-S_SN_189_Calibration_Files_2016-01-07.zip
OPTAA-D_AC-S_SN_189_Calibration_Files_2017-06-14.zip
OPTAA-D_AC-S_SN_189_Calibration_Sheet_2014-09-26.pdf
OPTAA-D_AC-S_SN_189_Pump_Configuration_2014-09-26.pdf


In [58]:
source_file = 'OPTAA-D_AC-S_SN_189_Calibration_Files_2014-09-18.zip'

Initialize the parser:

In [59]:
optaa = OPTAACalibration('CGINS-OPTAAD-00189')

Read in the calibration coefficients:

In [60]:
optaa.load_cal(source_dir+source_file)

Write the csv to a temporary local folder:

In [61]:
temp_directory = '/'.join((os.getcwd(),'temp'))
# Check if the temp directory exists; if it already does, purge and rewrite
if os.path.exists(temp_directory):
    shutil.rmtree(temp_directory)
    ensure_dir(temp_directory)
else:
    ensure_dir(temp_directory)

In [62]:
optaa.write_csv(temp_directory)

Write CGINS-OPTAAD-00189__20140918.csv to /home/andrew/Documents/OOI-CGSN/QAQC_Sandbox/Metadata_Review/temp? [y/n]: y


In [63]:
os.listdir(temp_directory)

['CGINS-OPTAAD-00189__20140918__CC_tcarray.ext',
 'CGINS-OPTAAD-00189__20140918.csv',
 'CGINS-OPTAAD-00189__20140918__CC_taarray.ext']

In [64]:
optaa.uid, optaa.serial, optaa.date

('CGINS-OPTAAD-00189', 'ACS-189', '20140918')

**====================================================================================================================**
### Check the data
Now, we have generated local csv and ext files from the data. We can now reload that data into python as a pandas dataframe, which will allow for a direct comparison with the existing data. 

In [65]:
sn = optaa.serial.split('-')[1].zfill(5)
dt = optaa.date

In [66]:
source_csv = pd.read_csv(temp_directory+'/CGINS-OPTAAD-'+sn+'__'+dt+'.csv')
source_csv

Unnamed: 0,serial,name,value,notes
0,ACS-189,CC_acwo,"[-0.570835, -0.453077, -0.372919, -0.320278, -...",Source file: OPTAA-D_AC-S_SN_189_Calibration_...
1,ACS-189,CC_awlngth,"[400.3, 403.7, 407.5, 410.7, 414.4, 418.0, 421...",
2,ACS-189,CC_ccwo,"[-1.86381, -1.792485, -1.693523, -1.593497, -1...",
3,ACS-189,CC_cwlngth,"[398.7, 402.3, 405.7, 409.4, 413.0, 416.6, 420...",
4,ACS-189,CC_taarray,SheetRef:CC_taarray,
5,ACS-189,CC_tbins,"[0.827978, 1.395665, 2.505246, 3.46939, 4.4658...",
6,ACS-189,CC_tcal,20.4,
7,ACS-189,CC_tcarray,SheetRef:CC_tcarray,


In [67]:
source_csv['notes'].iloc[0]

' Source file: OPTAA-D_AC-S_SN_189_Calibration_Files_2014-09-18.zip > acs189.dev'

In [68]:
source_taarray = pd.read_csv(temp_directory+'/CGINS-OPTAAD-'+sn+'__'+dt+'__CC_taarray.ext',header=None)
source_taarray.head()

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,...,27,28,29,30,31,32,33,34,35,36
0,-0.145675,-0.139658,-0.131296,-0.122736,-0.11413,-0.105052,-0.097488,-0.090617,-0.084078,-0.07766,...,0.011167,0.017979,0.023176,0.026718,0.033652,0.039911,0.044405,0.049212,0.052084,0.054542
1,-0.130657,-0.124367,-0.116078,-0.108109,-0.099944,-0.091969,-0.085414,-0.078769,-0.073256,-0.067303,...,0.010468,0.0159,0.020831,0.024246,0.029703,0.035136,0.039176,0.043623,0.045787,0.047778
2,-0.115754,-0.109857,-0.102219,-0.094906,-0.08744,-0.080098,-0.073841,-0.06805,-0.063249,-0.057849,...,0.008904,0.013776,0.017529,0.020455,0.025082,0.029691,0.032906,0.036224,0.038294,0.040449
3,-0.100815,-0.095226,-0.088166,-0.0816,-0.075339,-0.068584,-0.063055,-0.057592,-0.053611,-0.049222,...,0.007768,0.011231,0.015072,0.017525,0.021398,0.025084,0.027707,0.030838,0.032158,0.033852
4,-0.088646,-0.083307,-0.077044,-0.071352,-0.065216,-0.059279,-0.05448,-0.050036,-0.046067,-0.042329,...,0.006667,0.010319,0.013077,0.014995,0.018565,0.021757,0.023826,0.026319,0.027779,0.028852


In [69]:
source_tcarray = pd.read_csv(temp_directory+'/CGINS-OPTAAD-'+sn+'__'+dt+'__CC_tcarray.ext',header=None)
source_tcarray.head()

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,...,27,28,29,30,31,32,33,34,35,36
0,-0.027317,-0.023573,-0.020214,-0.015941,-0.013892,-0.01098,-0.008193,-0.005636,-0.003988,-0.004318,...,0.001,0.000875,0.001269,0.001027,0.001344,0.000538,0.001391,0.000828,0.00108,0.000132
1,-0.030619,-0.027079,-0.021762,-0.017794,-0.015727,-0.011312,-0.01026,-0.008006,-0.005649,-0.006861,...,0.001739,0.002469,0.00199,0.002978,0.002759,0.002153,0.002293,0.002899,0.002496,0.002256
2,-0.033051,-0.028219,-0.024642,-0.020817,-0.017666,-0.014359,-0.011927,-0.009388,-0.007416,-0.006961,...,0.002032,0.00221,0.003709,0.00307,0.00351,0.003737,0.003908,0.003479,0.003264,0.002888
3,-0.033812,-0.030512,-0.026011,-0.020884,-0.018855,-0.016178,-0.013916,-0.011805,-0.011284,-0.009873,...,0.001182,0.00215,0.002037,0.002493,0.003295,0.003495,0.003553,0.003214,0.003465,0.003232
4,-0.034155,-0.030486,-0.025406,-0.021658,-0.018499,-0.015,-0.013523,-0.01233,-0.009814,-0.009168,...,0.001424,0.002229,0.002911,0.003954,0.004397,0.003898,0.003644,0.004324,0.003915,0.003504


**====================================================================================================================**
Load the csv from asset management in order to compare

In [70]:
csv_filename = 'CGINS-OPTAAD-00189__20141212.csv'
csv_file = pd.read_csv(csv_dir+csv_filename)

In [71]:
csv_file

Unnamed: 0,serial,name,value,notes
0,ACS-189,CC_acwo,"[-0.570835, -0.453077, -0.372919, -0.320278, -...",
1,ACS-189,CC_awlngth,"[400.3, 403.7, 407.5, 410.7, 414.4, 418.0, 421...",
2,ACS-189,CC_ccwo,"[-1.86381, -1.792485, -1.693523, -1.593497, -1...",
3,ACS-189,CC_cwlngth,"[398.7, 402.3, 405.7, 409.4, 413.0, 416.6, 420...",
4,ACS-189,CC_taarray,SheetRef:CC_taarray,
5,ACS-189,CC_tbins,"[0.827978, 1.395665, 2.505246, 3.46939, 4.4658...",
6,ACS-189,CC_tcal,20.4,
7,ACS-189,CC_tcarray,SheetRef:CC_tcarray,


In [72]:
taarray = pd.read_csv(csv_dir + 'CGINS-OPTAAD-00189__20141212__CC_taarray.ext',header=None)
tcarray = pd.read_csv(csv_dir + 'CGINS-OPTAAD-00189__20141212__CC_tcarray.ext',header=None)

In [73]:
taarray

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,...,27,28,29,30,31,32,33,34,35,36
0,-0.145675,-0.139658,-0.131296,-0.122736,-0.114130,-0.105052,-0.097488,-0.090617,-0.084078,-0.077660,...,0.011167,0.017979,0.023176,0.026718,0.033652,0.039911,0.044405,0.049212,0.052084,0.054542
1,-0.130657,-0.124367,-0.116078,-0.108109,-0.099944,-0.091969,-0.085414,-0.078769,-0.073256,-0.067303,...,0.010468,0.015900,0.020831,0.024246,0.029703,0.035136,0.039176,0.043623,0.045787,0.047778
2,-0.115754,-0.109857,-0.102219,-0.094906,-0.087440,-0.080098,-0.073841,-0.068050,-0.063249,-0.057849,...,0.008904,0.013776,0.017529,0.020455,0.025082,0.029691,0.032906,0.036224,0.038294,0.040449
3,-0.100815,-0.095226,-0.088166,-0.081600,-0.075339,-0.068584,-0.063055,-0.057592,-0.053611,-0.049222,...,0.007768,0.011231,0.015072,0.017525,0.021398,0.025084,0.027707,0.030838,0.032158,0.033852
4,-0.088646,-0.083307,-0.077044,-0.071352,-0.065216,-0.059279,-0.054480,-0.050036,-0.046067,-0.042329,...,0.006667,0.010319,0.013077,0.014995,0.018565,0.021757,0.023826,0.026319,0.027779,0.028852
5,-0.079889,-0.075072,-0.068984,-0.063615,-0.058377,-0.053163,-0.048639,-0.044622,-0.041001,-0.037521,...,0.005468,0.008671,0.011022,0.013047,0.015839,0.018576,0.020343,0.022642,0.023882,0.024933
6,-0.071903,-0.067338,-0.061939,-0.056964,-0.052108,-0.047252,-0.043055,-0.039325,-0.035917,-0.032661,...,0.005429,0.008379,0.010363,0.012017,0.014244,0.017063,0.018782,0.020749,0.021580,0.022582
7,-0.066123,-0.061880,-0.056628,-0.052153,-0.047605,-0.043151,-0.039475,-0.035820,-0.032892,-0.030051,...,0.004519,0.006912,0.008766,0.010121,0.012479,0.014999,0.016342,0.017944,0.018884,0.019682
8,-0.060377,-0.056353,-0.051642,-0.047493,-0.043129,-0.039190,-0.035606,-0.032226,-0.029770,-0.026865,...,0.003902,0.005775,0.007822,0.008940,0.010997,0.013194,0.014461,0.016119,0.016886,0.017780
9,-0.056018,-0.052233,-0.047700,-0.043785,-0.039701,-0.035850,-0.032652,-0.029610,-0.027207,-0.024614,...,0.003610,0.005417,0.006920,0.008100,0.010003,0.011901,0.013261,0.014640,0.015478,0.016194


In [74]:
csv_file == source_csv

Unnamed: 0,serial,name,value,notes
0,True,True,True,False
1,True,True,False,False
2,True,True,True,False
3,True,True,False,False
4,True,True,True,False
5,True,True,False,False
6,True,True,True,False
7,True,True,True,False


In [75]:
for file in os.listdir(csv_dir):
    if '189' in file:
        print(file)

CGINS-OPTAAD-00189__20141212__CC_tcarray.ext
CGINS-OPTAAD-00189__20141212__CC_taarray.ext
CGINS-OPTAAD-00189__20160527__CC_tcarray.ext
CGINS-OPTAAD-00189__20170614__CC_tcarray.ext
CGINS-OPTAAD-00189__20170614.csv
CGINS-OPTAAD-00189__20160527__CC_taarray.ext
CGINS-OPTAAD-00189__20170614__CC_taarray.ext
CGINS-OPTAAD-00189__20160527.csv
CGINS-OPTAAD-00189__20141212.csv


**====================================================================================================================**
# OPTAA Parser
Below is a parser for the OPTAA calibration file. The following methods are available as part of the OPTAACalibration class:
* **OPTAACalibration.load_cal**:
        
         Wrapper function to load all of the calibration coefficients
        
         Args:
            filepath - path to the directory with filename which has the
                calibration coefficients to be parsed and loaded
         Calls:
            open_cal
            parse_cal
            
* **OPTAACalibration.load_qct**:

        Wrapper function to load the calibration coefficients from
        the QCT checkin.
            

It is used as follows:
1. Initialize the OPTAA class using the **UID** for the OPTAA with the following code: OPTAA = OPTAACalibration(UID)
2. 

In [9]:
from zipfile import ZipFile
class OPTAACalibration():
    # Class that stores calibration values for CTDs.

    def __init__(self, uid):
        self.serial = None
        self.nbins = None
        self.uid = uid
        self.sigfig = 6
        self.date = []
        self.coefficients = {
            'CC_acwo': [],
            'CC_awlngth': [],
            'CC_ccwo': [],
            'CC_cwlngth': [],
            'CC_taarray': 'SheetRef:CC_taarray',
            'CC_tbins': [],
            'CC_tcal': [],
            'CC_tcarray': 'SheetRef:CC_tcarray'
        }
        self.tcarray = []
        self.taarray = []
        self.notes = {
            'CC_acwo': '',
            'CC_awlngth': '',
            'CC_ccwo': '',
            'CC_cwlngth': '',
            'CC_taarray': '',
            'CC_tbins': '',
            'CC_tcal': '',
            'CC_taarray': ''
        }

    @property
    def uid(self):
        return self._uid

    @uid.setter
    def uid(self, d):
        r = re.compile('.{5}-.{6}-.{5}')
        if r.match(d) is not None:
            self._uid = d
            serial = d.split('-')[-1].lstrip('0')
            self.serial = 'ACS-' + serial
        else:
            raise Exception(f"The instrument uid {d} is not a valid uid. Please check.")

            
    def load_cal(self, filepath):
        """
        Wrapper function to load all of the calibration coefficients
        
        Args:
            filepath - path to the directory with filename which has the
                calibration coefficients to be parsed and loaded
        Calls:
            open_cal
            parse_cal
        """
        
        data = self.open_dev(filepath)
        
        self.parse_dev(data)
        
        
    def load_qct(self, filepath):
        """
        Wrapper function to load the calibration coefficients from
        the QCT checkin.
        """
        
        data = self.open_dev(filepath)
        
        self.parse_qct(data)
    
    
    def open_dev(self, filepath):
        """
        Function that opens and reads in cal file
        information for a OPTAA. Zipfiles are acceptable inputs.
        """
        
        if filepath.endswith('.zip'):
            with ZipFile(filepath) as zfile:
                # Check if OPTAA has the .dev file
                filename = [name for name in zfile.namelist() if name.lower().endswith('.dev')]
                
                # Get and open the latest calibration file
                if len(filename) == 1:
                    data = zfile.read(filename[0]).decode('ascii')
                    self.source_file(filepath, filename[0])
                    
                elif len(filename) > 1:
                    raise FileExistsError(f"Multiple .dev files found in {filepath}.")

                else:
                    raise FileNotFoundError(f"No .dev file found in {filepath}.")
                        
        elif filepath.lower().endswith('.dev'):
            with open(filepath) as file:
                data = file.read()
            self.source_file(filepath, file)
                
        elif filepath.lower().endswith('.dat'):
            with open(filepath) as file:
                data = file.read()
            self.source_file(filepath, file)
            
        else:
            raise FileNotFoundError(f"No .dev file found in {filepath}.")
        
        return data


    def source_file(self, filepath, filename):
        """
        Routine which parses out the source file and filename
        where the calibration coefficients are sourced from.
        """
        
        if filepath.lower().endswith('.dev'):
            dcn = filepath.split('/')[-2]
            filename = filepath.split('/')[-1]
        else:
            dcn = filepath.split('/')[-1]
        
        self.source = f'Source file: {dcn} > {filename}'
        

    def parse_dev(self, data):
        """
        Function to parse the .dev file in order to load the
        calibration coefficients for the OPTAA.
        
        Args:
            data - opened .dev file in ascii-format
        """
        
        for line in data.splitlines():
            # Split the data based on data -> header split
            parts = line.split(';')
                # If the len isn't number 2, 
            if len(parts) is not 2:
                # Find the calibration temperature and date
                if 'tcal' in line.lower():
                    line = ''.join((x for x in line if x not in [y for y in string.punctuation if y is not '/']))
                    parts = line.split()
                    # Calibration temperature
                    tcal = parts[1].replace('C','')
                    tcal = float(tcal)/10
                    self.coefficients['CC_tcal'] = tcal
                    # Calibration date
                    date = parts[-1].strip(string.punctuation)
                    self.date = pd.to_datetime(date).strftime('%Y%m%d')
        
            else:
                info, comment = parts
                
                if comment.strip().startswith('temperature bins'):
                    tbins = [float(x) for x in info.split()]
                    self.coefficients['CC_tbins'] = tbins
                    
                elif comment.strip().startswith('number'):
                    self.nbins = int(float(info.strip()))
                    
                elif comment.strip().startswith('C'):
                    if self.nbins is None:
                        raise AttributeError(f'Failed to load number of temperature bins.')
                        
                    # Parse out the different calibration coefficients
                    parts = info.split()
                    cwlngth = float(parts[0][1:])
                    awlngth = float(parts[1][1:])
                    ccwo = float(parts[3])
                    acwo = float(parts[4])
                    tcrow = [float(x) for x in parts[5:self.nbins+5]]
                    acrow = [float(x) for x in parts[self.nbins+5:2*self.nbins+5]]
                
                    # Now put the coefficients into the coefficients dictionary
                    self.coefficients['CC_acwo'].append(acwo)
                    self.coefficients['CC_awlngth'].append(awlngth)
                    self.coefficients['CC_ccwo'].append(ccwo)
                    self.coefficients['CC_cwlngth'].append(cwlngth)
                    self.tcarray.append(tcrow)
                    self.taarray.append(acrow)
                    
                    
    def parse_qct(self, data):
        """
        This function is designed to parse the QCT file, which contains the
        calibration data in slightly different format than the .dev file
        
        
        """
        
        for line in data.splitlines():
            if 'WetView' in line:
                _, _, _, date, time = line.split()
                try:
                    date_time = date + ' ' + time
                    self.date = pd.to_datetime(date_time).strftime('%Y%m%d')
                except:
                    date_time = from_excel_ordinal(float(date) + float(time))
                    self.date = pd.to_datetime(date_time).strftime('%Y%m%d')
                continue
                
            parts = line.split(';')
            
            if len(parts) == 2:
                if comment.strip().startswith('temperature bins'):
                    tbins = [float(x) for x in info.split()]
                    self.coefficients['CC_tbins'] = tbins
                    
                elif comment.strip().startswith('number'):
                    self.nbins = int(float(info.strip()))
                    
                elif comment.strip().startswith('C'):
                    if self.nbins is None:
                        raise AttributeError(f'Failed to load number of temperature bins.')
                    # Parse out the different calibration coefficients
                    parts = info.split()
                    cwlngth = float(parts[0][1:])
                    awlngth = float(parts[1][1:])
                    ccwo = float(parts[3])
                    acwo = float(parts[4])
                    tcrow = [float(x) for x in parts[5:self.nbins+5]]
                    acrow = [float(x) for x in parts[self.nbins+5:(2*self.nbins)+5]]
                    
                    # Now put the coefficients into the coefficients dictionary
                    self.coefficients['CC_acwo'].append(acwo)
                    self.coefficients['CC_awlngth'].append(awlngth)
                    self.coefficients['CC_ccwo'].append(ccwo)
                    self.coefficients['CC_cwlngth'].append(cwlngth)
                    self.tcarray.append(tcrow)
                    self.taarray.append(acrow)                
    
                        
    def write_csv(self, outpath):
        """
        This function writes the correctly named csv file for the ctd to the
        specified directory.

        Args:
            outpath - directory path of where to write the csv file
        Raises:
            ValueError - raised if the CTD object's coefficient dictionary
                has not been populated
        Returns:
            self.to_csv - a csv of the calibration coefficients which is
                written to the specified directory from the outpath.
        """

        # Run a check that the coefficients have actually been loaded
        if len(self.coefficients.values()) <= 2:
            raise ValueError('No calibration coefficients have been loaded.')

        # Create a dataframe to write to the csv
        data = {
            'serial': [self.serial]*len(self.coefficients),
            'name': list(self.coefficients.keys()),
            'value': list(self.coefficients.values())
        }
        df = pd.DataFrame().from_dict(data)
      
        # Now merge the coefficients dataframe with the notes
        notes = pd.DataFrame().from_dict({
            'name':list(self.notes.keys()),
            'notes':list(self.notes.values())
        })
        df = df.merge(notes, how='outer', left_on='name', right_on='name')
            
        # Add in the source file
        df['notes'].iloc[0] = df['notes'].iloc[0] + ' ' + self.source
        
        # Sort the data by the coefficient name
        df = df.sort_values(by='name')

        # Generate the csv names
        csv_name = self.uid + '__' + self.date + '.csv'
        tca_name = self.uid + '__' + self.date + '__' + 'CC_tcarray.ext'
        taa_name = self.uid + '__' + self.date + '__' + 'CC_taarray.ext'
        
        def write_array(filename, cal_array):
            with open(filename, 'w') as out:
                array_writer = csv.writer(out)
                array_writer.writerows(cal_array)

        # Write the dataframe to a csv file
        check = input(f"Write {csv_name} to {outpath}? [y/n]: ")
        # check = 'y'
        if check.lower().strip() == 'y':
            df.to_csv(outpath+'/'+csv_name, index=False)
            write_array(outpath+'/'+tca_name, self.tcarray)
            write_array(outpath+'/'+taa_name, self.taarray)