# OPTAA METADATA REVIEW

The notebook describes the process for checking the calibration csvs located in the OOI-CGSN asset management repository on GitHub. The purpose is to identify when errors were made during entering the calibration csvs. This includes checking the following information:

1. The calibration date - this information is stored in the filename of the csv
2. Calibration source - identifying all the possible sources of calibration information, and determine which file should supply the calibration info
3. Calibration coeffs - checking the accuracy and precision of the numbers stored in the calibration coefficients

========================================================================================================================

The first step is to load relevant packages:

In [1]:
import csv
import re
import os
import shutil
import numpy as np
import pandas as pd

In [2]:
from utils import *

**=========================================================================================================================**
Define some useful functions for the metadata review (in future will move to a utilities file):

In [3]:
def get_qct_files(df, qct_directory):
    qct_dict = {}
    uids = list(set(df['UID']))
    for uid in uids:
        df['UID_match'] = df['UID'].apply(lambda x: True if uid in x else False)
        qct_series = df[df['UID_match'] == True]['QCT Testing']
        qct_series = list(qct_series.iloc[0].split('\n'))
        qct_dict.update({uid:qct_series})
    return qct_dict

In [4]:
def get_calibration_files(serial_nums, cal_directory):
    calibration_files = {}
    for uid in serial_nums.keys():
        sn = serial_nums.get(uid)
        sn = str(sn[0])
        files = []
        for file in os.listdir(cal_directory):
            if 'Calibration_File' in file:
                if sn in file:
                    files.append(file)
        calibration_files.update({uid:files})
    return calibration_files

In [5]:
# Now I need to load the all of the csv files based on their UID
def load_csv_info(csv_dict,filepath):
    """
    Loads the calibration coefficient information contained in asset management
    
    Args:
        csv_dict - a dictionary which associates an instrument UID to the
            calibration csv files in asset management
        filepath - the path to the directory containing the calibration csv files
    Returns:
        csv_cals - a dictionary which associates an instrument UID to a pandas
            dataframe which contains the calibration coefficients. The dataframes
            are indexed by the date of calibration
    """
    
    # Load the calibration data into pandas dataframes, which are then placed into
    # a dictionary by the UID
    csv_cals = {}
    for uid in csv_dict:
        cals = pd.DataFrame()
        for file in csv_dict[uid]:
            data = pd.read_csv(filepath+file)
            date = file.split('__')[1].split('.')[0]
            data['CAL DATE'] = pd.to_datetime(date)
            cals = cals.append(data)
        csv_cals.update({uid:cals})
        
    # Pivot the dataframe to be sorted based on calibration date
    for uid in csv_cals:
        csv_cals[uid] = csv_cals[uid].pivot(index=csv_cals[uid]['CAL DATE'], columns='name')['value']
        
    return csv_cals

In [6]:
def splitDataFrameList(df,target_column):
    ''' 
    df = dataframe to split,
    target_column = the column containing the values to split
    separator = the symbol used to perform the split
    returns: a dataframe with each entry for the target column separated, with each element moved into a new row. 
    The values in the other columns are duplicated across the newly divided rows.
    '''
    
    def splitListToRows(row,row_accumulator,target_column):
        split_row = row[target_column]
        for s in split_row:
            new_row = row.to_dict()
            new_row[target_column] = s
            row_accumulator.append(new_row)
            
    new_rows = []
    df.apply(splitListToRows,axis=1,args = (new_rows,target_column))
    new_df = pd.DataFrame(new_rows)
    return new_df

In [7]:
# Now, write a function to copy over the file
def copy_to_local(cal_path):
    """
    Function which copies the files from the cal_path to a locally
    created temp directory
    """
    
    for filepath in cal_path:
        # Create a folder in which to save extracted data
        folder, *ignore = filepath.split('/')[-1].split('.')
        savedir = '/'.join((os.getcwd(),'temp','cal_data',folder))
        # Now make sure that the save directory exists and can be used
        ensure_dir(savedir)
    
        if filepath.endswith('.zip'):
            with ZipFile(filepath,'r') as zfile:
                for file in zfile.namelist():
                    zfile.extract(file,path=savedir)    
        else:
            shutil.copy(filepath, savedir)

**=======================================================================================================================**
Define the directories where the QCT, Pre, and Post deployment document files are stored, where the vendor documents are stored, where asset tracking is located, and where the calibration csvs are located.

In [35]:
doc_directory = '/media/andrew/OS/Users/areed/Documents/Project_Files/Records/Instrument_Records/OPTAA/OPTAA_Results/'
cal_directory = '/media/andrew/OS/Users/areed/Documents/Project_Files/Records/Instrument_Records/OPTAA/OPTAA_Cal/'
asset_management_directory = '/home/andrew/Documents/OOI-CGSN/ooi-integration/asset-management/calibration/OPTAAD/'

In [16]:
excel_spreadsheet = '/media/andrew/OS/Users/areed/Documents/Project_Files/Documentation/System/System Notebook/WHOI_Asset_Tracking.xlsx'
sheet_name = 'Sensors'

In [17]:
OPTAA = whoi_asset_tracking(spreadsheet=excel_spreadsheet,sheet_name=sheet_name,instrument_class='OPTAA')

In [26]:
OPTAA[['UID','QCT Testing']]

Unnamed: 0,UID,QCT Testing
933,CGINS-OPTAAD-00123,3305-00113-00002\n3305-00113-00011\n3305-00113...
934,CGINS-OPTAAD-00129,3305-00113-00005\n3305-00113-00067\n3305-00113...
935,CGINS-OPTAAD-00130,3305-00113-00006\n3305-00113-00044\n3305-00113...
936,CGINS-OPTAAD-00150,3305-00113-00014\n3305-00113-00088
937,CGINS-OPTAAD-00151,3305-00113-00015\n3305-00113-00083\n3305-00113...
938,CGINS-OPTAAD-00152,3305-00113-00016\n3305-00113-00036\n3305-00113...
939,CGINS-OPTAAD-00153,3305-00113-00017\n3305-00113-00094 3305-0...
940,CGINS-OPTAAD-00159,3305-00113-00018\n3305-00113-00086\n3305-00113...
941,CGINS-OPTAAD-00165,3305-00113-00020\n3305-00113-00072\n3305-00113...
942,CGINS-OPTAAD-00185,3305-00112-00030\n3305-00113-00085\n3305-00113...


**=======================================================================================================================**
Now, I want to load all the calibration csvs and group them by UID:

In [18]:
uids = sorted( list( set(OPTAA['UID']) ) )

In [19]:
csv_dict = {}
asset_management = os.listdir(asset_management_directory)
for uid in uids:
    files = [file for file in asset_management if uid in file]
    csv_dict.update({uid: sorted(files)})

In [20]:
csv_dict

{'CGINS-OPTAAD-00123': ['CGINS-OPTAAD-00123__20131121.csv',
  'CGINS-OPTAAD-00123__20131121__CC_taarray.ext',
  'CGINS-OPTAAD-00123__20131121__CC_tcarray.ext',
  'CGINS-OPTAAD-00123__20151021.csv',
  'CGINS-OPTAAD-00123__20151021__CC_taarray.ext',
  'CGINS-OPTAAD-00123__20151021__CC_tcarray.ext'],
 'CGINS-OPTAAD-00129': ['CGINS-OPTAAD-00129__20131121.csv',
  'CGINS-OPTAAD-00129__20131121__CC_taarray.ext',
  'CGINS-OPTAAD-00129__20131121__CC_tcarray.ext',
  'CGINS-OPTAAD-00129__20160513.csv',
  'CGINS-OPTAAD-00129__20160513__CC_taarray.ext',
  'CGINS-OPTAAD-00129__20160513__CC_tcarray.ext',
  'CGINS-OPTAAD-00129__20170209.csv',
  'CGINS-OPTAAD-00129__20170209__CC_taarray.ext',
  'CGINS-OPTAAD-00129__20170209__CC_tcarray.ext'],
 'CGINS-OPTAAD-00130': ['CGINS-OPTAAD-00130__20150509.csv',
  'CGINS-OPTAAD-00130__20150509__CC_taarray.ext',
  'CGINS-OPTAAD-00130__20150509__CC_tcarray.ext',
  'CGINS-OPTAAD-00130__20161125.csv',
  'CGINS-OPTAAD-00130__20161125__CC_taarray.ext',
  'CGINS-OPTAAD-

**=======================================================================================================================**
Get the serial numbers of the instruments and match them to the UIDs:

In [23]:
serial_dict = {}
for uid in uids:
    sn = OPTAA[OPTAA['UID'] == uid]['Supplier\nSerial Number']
    serial_dict.update({uid: str(sn.iloc[0])})    

In [24]:
serial_dict

{'CGINS-OPTAAD-00123': '123',
 'CGINS-OPTAAD-00129': '129',
 'CGINS-OPTAAD-00130': '130',
 'CGINS-OPTAAD-00150': '150',
 'CGINS-OPTAAD-00151': '151',
 'CGINS-OPTAAD-00152': '152',
 'CGINS-OPTAAD-00153': '153',
 'CGINS-OPTAAD-00159': '159',
 'CGINS-OPTAAD-00165': '165',
 'CGINS-OPTAAD-00185': '185',
 'CGINS-OPTAAD-00187': '187',
 'CGINS-OPTAAD-00189': '189',
 'CGINS-OPTAAD-00193': '193',
 'CGINS-OPTAAD-00205': '205',
 'CGINS-OPTAAD-00206': '206',
 'CGINS-OPTAAD-00207': '207',
 'CGINS-OPTAAD-00209': '209',
 'CGINS-OPTAAD-00218': '218',
 'CGINS-OPTAAD-00222': '222',
 'CGINS-OPTAAD-00240': '240',
 'CGINS-OPTAAD-00241': '241',
 'CGINS-OPTAAD-00242': '242',
 'CGINS-OPTAAD-00245': '245',
 'CGINS-OPTAAD-00254': '254',
 'CGINS-OPTAAD-00255': '255',
 'CGINS-OPTAAD-00256': '256',
 'CGINS-OPTAAD-00257': '257'}

**=======================================================================================================================**
The OPTAA QCT capture files are stored with the following Document Control Numbers (DCNs): 3305-00113-XXXXX. Most are storead as **.dat** files which are easy to parse and decode (same formatting as the **.dev** files). However, some are stored as Excel (**.xlsx**) files, which are much trickier to parse.




In [238]:
files = [file for file in os.listdir(doc_directory) if 'A' in file or 'B' in file]
qct_files = []
for file in files:
    if '113' in file:
        qct_files.append(file)
    else:
        pass

In [239]:
qct_dict = {}
for uid in uids:
    # Get the QCT Document numbers from the asset tracking sheet
    OPTAA['UID_match'] = OPTAA['UID'].apply(lambda x: True if uid in x else False)
    qct_series = OPTAA[OPTAA['UID_match'] == True]['QCT Testing']
    qct_series = list(qct_series.iloc[0].split('\n'))
    qct_dict.update({uid:qct_series})
qct_paths = {}
for uid in sorted(qct_dict.keys()):
    paths = []
    for file in qct_dict.get(uid):
        path = generate_file_path(doc_directory, file, ext=['.dat','.xlsx'])
        paths.append(path)
    qct_paths.update({uid: paths})

In [241]:
qct_paths;

**=======================================================================================================================**
Get the pre-deployment capture files which have the following DCN: 3305-00313-XXXXX. However, the OPTAA Predeployment procedure does not involve capturing any calibration information. Thus, we do not have any relevant calibration values to test the calibration csvs against.

In [242]:
for uid in sorted(csv_dict.keys()):
    paths = []
    for file in csv_dict.get(uid):
        path = generate_file_path(asset_management_directory, file, ext=['.csv','.ext'])
        paths.append(path)
    csv_paths.update({uid: paths})

In [243]:
csv_paths

{'CGINS-OPTAAD-00123': ['/home/andrew/Documents/OOI-CGSN/ooi-integration/asset-management/calibration/OPTAAD/CGINS-OPTAAD-00123__20131121.csv',
  '/home/andrew/Documents/OOI-CGSN/ooi-integration/asset-management/calibration/OPTAAD/CGINS-OPTAAD-00123__20131121__CC_taarray.ext',
  '/home/andrew/Documents/OOI-CGSN/ooi-integration/asset-management/calibration/OPTAAD/CGINS-OPTAAD-00123__20131121__CC_tcarray.ext',
  '/home/andrew/Documents/OOI-CGSN/ooi-integration/asset-management/calibration/OPTAAD/CGINS-OPTAAD-00123__20151021.csv',
  '/home/andrew/Documents/OOI-CGSN/ooi-integration/asset-management/calibration/OPTAAD/CGINS-OPTAAD-00123__20151021__CC_taarray.ext',
  '/home/andrew/Documents/OOI-CGSN/ooi-integration/asset-management/calibration/OPTAAD/CGINS-OPTAAD-00123__20151021__CC_tcarray.ext'],
 'CGINS-OPTAAD-00129': ['/home/andrew/Documents/OOI-CGSN/ooi-integration/asset-management/calibration/OPTAAD/CGINS-OPTAAD-00129__20131121.csv',
  '/home/andrew/Documents/OOI-CGSN/ooi-integration/as

**=======================================================================================================================** Find and return the calibration files which contain vendor supplied calibration information. This is achieved by searching the calibration directories and matching serial numbers to UIDs:

In [244]:
serial_nums = get_serial_nums(OPTAA, uids)

In [245]:
cal_dict = get_calibration_files(serial_nums, cal_directory)

In [246]:
cal_paths = {}
for uid in sorted(cal_dict.keys()):
    paths = []
    for file in cal_dict.get(uid):
        path = generate_file_path(cal_directory, file, ext=['.zip','.cap', '.txt', '.log'])
        paths.append(path)
    cal_paths.update({uid: paths})

In [248]:
cal_paths;

**=======================================================================================================================**
# Parsing Calibration Coefficients
Above, we have worked through identifying and mapping the calibration files, pre-deployment files, and post-deployment files to the individual instruments through their UIDs and serial numbers. The next step is to open the relevant files and parse out the calibration coefficients. This will require writing a parser for the NUTNRs, including sub-functions to handle the different characteristics of the ISUS and SUNA instruments.

Start by opening the calibration files and read the data:

In [289]:
class OPTAACalibration():
    # Class that stores calibration values for CTDs.

    def __init__(self, uid):
        self.serial = None
        self.nbins = None
        self.uid = uid
        self.sigfig = 6
        self.date = []
        self.coefficients = {
            'CC_acwo': [],
            'CC_awlngth': [],
            'CC_ccwo': [],
            'CC_cwlngth': [],
            'CC_taarray': 'SheetRef:CC_taarray',
            'CC_tbins': [],
            'CC_tcal': [],
            'CC_tcarray': 'SheetRef:CC_tcarray'
        }
        self.tcarray = []
        self.taarray = []
        self.notes = {
            'CC_acwo': '',
            'CC_awlngth': '',
            'CC_ccwo': '',
            'CC_cwlngth': '',
            'CC_taarray': '',
            'CC_tbins': '',
            'CC_tcal': '',
            'CC_taarray': ''
        }

    @property
    def uid(self):
        return self._uid

    @uid.setter
    def uid(self, d):
        r = re.compile('.{5}-.{6}-.{5}')
        if r.match(d) is not None:
            self._uid = d
            serial = d.split('-')[-1].lstrip('0')
            self.serial = 'ACS-' + serial
        else:
            raise Exception(f"The instrument uid {d} is not a valid uid. Please check.")

            
    def load_cal(self, filepath):
        """
        Wrapper function to load all of the calibration coefficients
        
        Args:
            filepath - path to the directory with filename which has the
                calibration coefficients to be parsed and loaded
        Calls:
            open_cal
            parse_cal
        """
        
        data = self.open_dev(filepath)
        
        self.parse_dev(data)
    
    
    def open_dev(self, filepath):
        """
        Function that opens and reads in cal file
        information for a OPTAA. Zipfiles are acceptable inputs.
        """
        
        if filepath.endswith('.zip'):
            with ZipFile(filepath) as zfile:
                # Check if OPTAA has the .dev file
                filename = [name for name in zfile.namelist() if name.lower().endswith('.dev')]
                
                # Get and open the latest calibration file
                if len(filename) == 1:
                    data = zfile.read(filename[0]).decode('ascii')
                    self.source_file(filepath, filename[0])
                    
                elif len(filename) > 1:
                    raise FileExistsError(f"Multiple .dev files found in {filepath}.")

                else:
                    raise FileNotFoundError(f"No .dev file found in {filepath}.")
                        
        elif filepath.lower().endswith('.dev'):
                with open(filepath) as file:
                    data = file.read()
                self.source_file(filepath, file)
        else:
            raise FileNotFoundError(f"No .dev file found in {filepath}.")
        
        return data


    def source_file(self, filepath, filename):
        """
        Routine which parses out the source file and filename
        where the calibration coefficients are sourced from.
        """
        
        if filepath.lower().endswith('.dev'):
            dcn = filepath.split('/')[-2]
            filename = filepath.split('/')[-1]
        else:
            dcn = filepath.split('/')[-1]
        
        self.source = f'Source file: {dcn} > {filename}'
        

    def parse_dev(self, data):
        """
        Function to parse the .dev file in order to load the
        calibration coefficients for the OPTAA.
        
        Args:
            data - opened .dev file in ascii-format
        """
        
        for line in data.splitlines():
            # Split the data based on data -> header split
            parts = line.split(';')
                # If the len isn't number 2, 
            if len(parts) is not 2:
                # Find the calibration temperature and date
                if 'tcal' in line:
                    line = ''.join((x for x in line if x not in [y for y in string.punctuation if y is not '/']))
                    parts = line.split()
                    # Calibration temperature
                    tcal = float(parts[1])
                    self.coefficients['CC_tcal'] = tcal
                    # Calibration date
                    date = parts[-1].strip(string.punctuation)
                    self.date = pd.to_datetime(date).strftime('%Y%m%d')
        
            else:
                info, comment = parts
                
                if comment.strip().startswith('temperature bins'):
                    tbins = [float(x) for x in info.split()]
                    self.coefficients['CC_tbins'] = tbins
                    
                elif comment.strip().startswith('number'):
                    self.nbins = int(info)
                    
                elif comment.strip().startswith('C'):
                    if self.nbins is None:
                        raise AttributeError(f'Failed to load number of temperature bins.')
                        
                    # Parse out the different calibration coefficients
                    parts = info.split()
                    cwlngth = float(parts[0][1:])
                    awlngth = float(parts[1][1:])
                    ccwo = float(parts[3])
                    acwo = float(parts[4])
                    tcrow = [float(x) for x in parts[5:self.nbins+5]]
                    acrow = [float(x) for x in parts[self.nbins+5:2*self.nbins+5]]
                
                    # Now put the coefficients into the coefficients dictionary
                    self.coefficients['CC_acwo'].append(acwo)
                    self.coefficients['CC_awlngth'].append(awlngth)
                    self.coefficients['CC_ccwo'].append(ccwo)
                    self.coefficients['CC_cwlngth'].append(cwlngth)
                    self.tcarray.append(tcrow)
                    self.taarray.append(acrow)
    
                        
    def write_csv(self, outpath):
        """
        This function writes the correctly named csv file for the ctd to the
        specified directory.

        Args:
            outpath - directory path of where to write the csv file
        Raises:
            ValueError - raised if the CTD object's coefficient dictionary
                has not been populated
        Returns:
            self.to_csv - a csv of the calibration coefficients which is
                written to the specified directory from the outpath.
        """

        # Run a check that the coefficients have actually been loaded
        if len(self.coefficients.values()) <= 2:
            raise ValueError('No calibration coefficients have been loaded.')

        # Create a dataframe to write to the csv
        data = {
            'serial': [self.serial]*len(self.coefficients),
            'name': list(self.coefficients.keys()),
            'value': list(self.coefficients.values())
        }
        df = pd.DataFrame().from_dict(data)
      
        # Now merge the coefficients dataframe with the notes
        notes = pd.DataFrame().from_dict({
            'name':list(self.notes.keys()),
            'notes':list(self.notes.values())
        })
        df = df.merge(notes, how='outer', left_on='name', right_on='name')
            
        # Add in the source file
        df['notes'].iloc[0] = df['notes'].iloc[0] + ' ' + self.source
        
        # Sort the data by the coefficient name
        df = df.sort_values(by='name')

        # Generate the csv names
        csv_name = self.uid + '__' + self.date + '.csv'
        tca_name = self.uid + '__' + self.date + '__' + 'CC_tcarray.ext'
        taa_name = self.uid + '__' + self.date + '__' + 'CC_taarray.ext'
        
        def write_array(filename, cal_array):
            with open(filename, 'w') as out:
                array_writer = csv.writer(out)
                array_writer.writerows(cal_array)

        # Write the dataframe to a csv file
        check = input(f"Write {csv_name} to {outpath}? [y/n]: ")
        # check = 'y'
        if check.lower().strip() == 'y':
            df.to_csv(outpath+'/'+csv_name, index=False)
            write_array(outpath+'/'+tca_name, self.tcarray)
            write_array(outpath+'/'+taa_name, self.taarray)

In [210]:
optaa = OPTAACalibration(uid=uid)

In [211]:
optaa.load_cal(cal_paths[uid][0])

In [212]:
optaa.date

'20121128'

In [213]:
optaa.coefficients

{'CC_acwo': [-0.465231,
  -0.274376,
  -0.131399,
  -0.024706,
  0.058295,
  0.126055,
  0.185656,
  0.240417,
  0.289584,
  0.33791,
  0.381069,
  0.423962,
  0.465806,
  0.505779,
  0.544813,
  0.582403,
  0.618499,
  0.653872,
  0.687644,
  0.720226,
  0.751347,
  0.78086,
  0.808414,
  0.834566,
  0.860689,
  0.886948,
  0.913385,
  0.938929,
  0.963533,
  0.987193,
  1.009682,
  1.031291,
  1.052552,
  1.07342,
  1.093032,
  1.110473,
  1.124673,
  1.134672,
  1.140164,
  1.140155,
  1.134724,
  1.12522,
  1.111056,
  1.10325,
  1.102718,
  1.109283,
  1.119172,
  1.130082,
  1.140827,
  1.151058,
  1.160595,
  1.168841,
  1.174887,
  1.177852,
  1.178076,
  1.176677,
  1.17593,
  1.177861,
  1.181059,
  1.18391,
  1.183523,
  1.178057,
  1.165895,
  1.145849,
  1.11669,
  1.076874,
  1.025265,
  0.96053,
  0.880828,
  0.784785,
  0.669853,
  0.535218,
  0.382249,
  0.215282,
  0.04013,
  -0.13566,
  -0.30612,
  -0.460697,
  -0.588048,
  -0.685993,
  -0.75626,
  -0.803541,
  -0.83

In [214]:
optaa.serial

'ACS-123'

In [215]:
temp_directory = '/'.join((os.getcwd(),'temp'))

In [216]:
if os.path.isdir(temp_directory):
    shutil.rmtree(temp_directory)
    ensure_dir(temp_directory)
else:
    ensure_dir(temp_directory)

In [217]:
optaa.write_csv(temp_directory)

Write CGINS-OPTAAD-00123__20121128.csv to /home/andrew/Documents/OOI-CGSN/QAQC_Sandbox/Metadata_Review/temp? [y/n]: y


In [218]:
os.listdir(temp_directory)

['CGINS-OPTAAD-00123__20121128__CC_taarray.ext',
 'CGINS-OPTAAD-00123__20121128__CC_tcarray.ext',
 'CGINS-OPTAAD-00123__20121128.csv']

In [219]:
df_optaa = pd.read_csv(temp_directory+'/'+'CGINS-OPTAAD-00123__20121128.csv')
with open(temp_directory+'/'+'CGINS-OPTAAD-00123__20121128__CC_tcarray.ext') as file:
    tcarray = file.read()
with open(temp_directory+'/'+'CGINS-OPTAAD-00123__20121128__CC_taarray.ext') as file:
    taarray = file.read()

In [220]:
df_optaa

Unnamed: 0,serial,name,value,notes
0,ACS-123,CC_acwo,"[-0.465231, -0.274376, -0.131399, -0.024706, 0...",Source file: OPTAA-D_AC-S_SN_123_Calibration_...
1,ACS-123,CC_awlngth,"[401.0, 405.1, 409.3, 413.2, 417.4, 422.2, 427...",
2,ACS-123,CC_ccwo,"[0.449779, 0.538688, 0.617422, 0.682801, 0.742...",
3,ACS-123,CC_cwlngth,"[400.1, 404.3, 408.4, 412.1, 416.2, 421.0, 425...",
4,ACS-123,CC_taarray,SheetRef:CC_taarray,
5,ACS-123,CC_tbins,"[3.684881, 4.425816, 5.55977, 6.46719, 7.47476...",
6,ACS-123,CC_tcal,19.0,
7,ACS-123,CC_tcarray,SheetRef:CC_tcarray,


In [221]:
df_optaa['notes'].iloc[0]

' Source file: OPTAA-D_AC-S_SN_123_Calibration_Files.zip > acs123.dev'

In [208]:
optaa.coefficients['CC_tcal'][0]

19.0

**=======================================================================================================================**
# Source Loading of Calibration Coefficients
With an OPTAA Calibration object created, we can now begin parsing the different calibration sources for each OPTAA. We will then compare all of the calibration values from each of the sources, checking for any discrepancies between them.

Below, I plan on going through each of the NUTNR UIDs, and parse the data into csvs. For sources which contain multiple sources, I plan on extracting each of the calibrations to a temporary folder using the following structure:

    <local working directory>/<temp>/<source>/data/<calibration file>
    
The separate calibrations will be saved using the standard UFrame naming convention with the following directory structure:

    <local working directory>/<temp>/<source>/<calibration csv>
    
The csvs themselves will also be copied to the temporary folder. This allows for the program to be looking into the same temp directory for every NUTNR check.

In [249]:
uid = uids[0]

In [250]:
temp_directory = '/'.join((os.getcwd(),'temp'))
# Check if the temp directory exists; if it already does, purge and rewrite
if os.path.exists(temp_directory):
    shutil.rmtree(temp_directory)
    ensure_dir(temp_directory)

Copy the existing csvs from asset management to the temp directory:

In [251]:
for path in csv_paths[uid]:
    savedir = '/'.join((temp_directory,'csv'))
    ensure_dir(savedir)
    savepath = '/'.join((savedir, path.split('/')[-1]))
    shutil.copyfile(path, savepath)

In [253]:
os.listdir(temp_directory+'/csv')

['CGINS-OPTAAD-00123__20151021__CC_taarray.ext',
 'CGINS-OPTAAD-00123__20151021.csv',
 'CGINS-OPTAAD-00123__20131121.csv',
 'CGINS-OPTAAD-00123__20151021__CC_tcarray.ext',
 'CGINS-OPTAAD-00123__20131121__CC_tcarray.ext',
 'CGINS-OPTAAD-00123__20131121__CC_taarray.ext']

**=======================================================================================================================**
Load the calibration coefficients from the vendor calibration source files. Start by extracting or copying them to the source data folder in the temporary directory.

In [254]:
cal_paths[uid]

['/media/andrew/OS/Users/areed/Documents/Project_Files/Records/Instrument_Records/OPTAA/OPTAA_Cal/OPTAA-D_AC-S_SN_123_Calibration_Files.zip',
 '/media/andrew/OS/Users/areed/Documents/Project_Files/Records/Instrument_Records/OPTAA/OPTAA_Cal/OPTAA-D_AC-S_SN_123_Calibration_Files_2013-07-17.zip',
 '/media/andrew/OS/Users/areed/Documents/Project_Files/Records/Instrument_Records/OPTAA/OPTAA_Cal/OPTAA-D_AC-S_SN_123_Calibration_Files_2015-07-20.zip',
 '/media/andrew/OS/Users/areed/Documents/Project_Files/Records/Instrument_Records/OPTAA/OPTAA_Cal/OPTAA-D_AC-S_SN_123_Calibration_Files_2016-09-29.zip']

Extract the calibration zip files to the local temp directory:

In [255]:
for path in cal_paths[uid]:
    with ZipFile(path) as zfile:
        files = [name for name in zfile.namelist() if name.lower().endswith('.dev')]
        for file in files:
            exdir = path.split('/')[-1].strip('.zip')
            expath = '/'.join((temp_directory,'cal','data',exdir))
            ensure_dir(expath)
            zfile.extract(file,path=expath)

In [260]:
os.listdir(temp_directory+'/cal/data/OPTAA-D_AC-S_SN_123_Calibration_Files_2016-09-29')

['acs123.dev']

Write the vendor calibration files to csvs following the UFrame convention:

In [290]:
savedir = '/'.join((temp_directory,'cal'))
ensure_dir(savedir)
# Now parse the calibration coefficients
for dirpath, dirnames, filenames in os.walk('/'.join((temp_directory,'cal','data'))):
    for file in filenames:
        filepath = os.path.join(dirpath, file)
        # With the filepath for the given calibration retrived, I can now start an instance of the NUTNR Calibration
        # object and begin parsing the coefficients
        optaa = OPTAACalibration(uid)
        optaa.load_cal(filepath)
        optaa.write_csv(savedir)

Write CGINS-OPTAAD-00123__20160929.csv to /home/andrew/Documents/OOI-CGSN/QAQC_Sandbox/Metadata_Review/temp/cal? [y/n]: y
Write CGINS-OPTAAD-00123__20150701.csv to /home/andrew/Documents/OOI-CGSN/QAQC_Sandbox/Metadata_Review/temp/cal? [y/n]: y
Write CGINS-OPTAAD-00123__20121128.csv to /home/andrew/Documents/OOI-CGSN/QAQC_Sandbox/Metadata_Review/temp/cal? [y/n]: y
Write CGINS-OPTAAD-00123__20130716.csv to /home/andrew/Documents/OOI-CGSN/QAQC_Sandbox/Metadata_Review/temp/cal? [y/n]: y


**=======================================================================================================================**
# Calibration Coefficient Comparison
We have now successfully parsed the calibration files from all the possible sources: the vendor calibration files, the pre-deployments files, and the post-deployment files. Furthermore, we have saved csvs in the UFrame format for all of these calibrations. Now, we want to load those csvs into pandas dataframes, which allow for easy element-by-element comparison of calibration coefficients.