# NUTNRB Checker

This notebook is designed to check the NUTNR csv calibration file in pull request. The process I follow is:
1. Read in the NUTNR csv from the pull request into a pandas dataframe
2. Identify the source file of the calibration coefficients
3. Parse the calibration coefficients directly from the source file
4. Compare the NUTNR csv from the pull request with the csv parsed from the source file

**====================================================================================================================**

The first step is to load relevant packages:

In [2]:
import csv
import re
import os
import shutil
import numpy as np
import pandas as pd

In [3]:
from utils import *

In [4]:
from zipfile import ZipFile
import string

**====================================================================================================================**
Define the directories where the **csv** file to check is stored, and where the **source** file is stored. Make sure to check the following information on your machine via your terminal first:
1. The branch of your local asset-management repository matches the location of the OPTAA cals.
2. Your local asset-management repository has the requisite **csv** file to check
3. You have downloaded the **source** of the csv file

In [5]:
doc_directory = '/media/andrew/OS/Users/areed/Documents/Project_Files/Records/Instrument_Records/NUTNR/NUTNR_Results/'
cal_directory = '/media/andrew/OS/Users/areed/Documents/Project_Files/Records/Instrument_Records/NUTNR/NUTNR_Cal/'
asset_management_directory = '/home/andrew/Documents/OOI-CGSN/ooi-integration/asset-management/calibration/NUTNRB/'
glider_directory = '/media/andrew/OS/Users/areed/Documents/Project_Files/Records/Platform_Records/Gliders/Instruments/NUTNR-M/'

In [6]:
os.listdir(glider_directory)

['NUTNR-M_SUNA_SN_657_Calibration_Files_2015-07-13.zip',
 'NUTNR-M_SUNA_SN_658_Calibration_Files_2015-07-29.zip',
 'NUTNR-M_SUNA_SN_659_Calibration_Files_2015-07-29.zip',
 'NUTNR-M_SUNA_SN_660_Calibration_Files_2015-07-29.zip',
 'NUTNR-M_SUNA_SN_661_Calibration_Files_2015-07-29.zip',
 'NUTNR-M_SUNA_SN_662_Calibration_Files_2015-07-23.zip',
 'NUTNR-M_SUNA_SN_708_Calibration_Files_2015-09-24.zip',
 'NUTNR-M_SUNA_SN_709_Calibration_Files_2015-09-22.zip',
 'NUTNR-M_SUNA_SN_710_Calibration_Files_2015-09-28.zip',
 'NUTNR-M_SUNA_SN_711_Calibration_Files_2015-09-23.zip',
 'NUTNR-M_SUNA_SN_727_Calibration_Files_2015-10-14.zip',
 'NUTNR-M_SUNA_SN_729_Calibration_Files_2015-10-13.zip',
 'NUTNR-M_SUNA_SN_730_Calibration_Files_2015-10-08.zip',
 '_V']

In [7]:
excel_spreadsheet = '/media/andrew/OS/Users/areed/Documents/Project_Files/Documentation/System/System Notebook/WHOI_Asset_Tracking.xlsx'
sheet_name = 'Sensors'

In [8]:
NUTNR = whoi_asset_tracking(spreadsheet=excel_spreadsheet,sheet_name=sheet_name,instrument_class='NUTNR',series='B')
NUTNR

Unnamed: 0,Instrument Class,Series,Supplier Serial Number,WHOI #,OOI #,UID,Model,CGSN PN,Firmware Version,Supplier,...,QCT Testing,PreDeployment,Post Deployment,Refurbishment/ Repair,DO Number,Date Received,Deployment History,Current Deployment,Instrument Location on Current Deployment,Notes
876,NUTNR,B,239,115084,A00065,CGINS-NUTNRB-00239,ISUS,1336-00014-00002,3.2.4,Satlantic,...,3305-00108-00004\n3305-00108-00048\n3305-00108...,3305-00308-00001,3305-00508-00040,3305-00900-00075\n3305-00900-00144\n3305-00900...,WH-SC12-5-NUTNR-1001,2012-11-13 00:00:00,GI01SUMO-00001\nCP04OSSM-00006,,NSIF,Reading High nitrate levels during calibration...
877,NUTNR,B,240,115085,A00066,CGINS-NUTNRB-00240,ISUS,1336-00014-00002,3.2.4,Satlantic,...,3305-00108-00003\n3305-00108-00029\n3305-00108...,,,3305-00900-00008\n3305-00900-00231\n3305-00900...,WH-SC12-5-NUTNR-1001,2012-11-13 00:00:00,CP01CNSM-00001\nGS01SUMO-00002,,NSIF,"09/2017: Clock issue - resets back to Jan 1, 2..."
878,NUTNR,B,260,115671,A00383,CGINS-NUTNRB-00260,ISUS,1336-00014-00002,3.2.4,Satlantic,...,3305-00108-00010\n3305-00108-00056,,3305-00508-00010,3305-00900-00109\n3305-00900-00317,WH-SC12-05-NUTNR-1004,2013-08-12 00:00:00,GI Spare\nCP1 spare\nGS01SUMO-00001\nCP04OSSM-...,,NSIF,Sent to vedor as part of trade in for new (SUN...
879,NUTNR,B,261,115672,A00384,CGINS-NUTNRB-00261,ISUS,1336-00014-00002,3.2.4,Satlantic,...,3305-00108-00011\n3305-00108-00021\n3305-00108...,3305-00308-00007\n3305-00308-00031\n3305-00308...,3305-00508-00022\n3305-00508-00041,3305-00900-00071\n3305-00900-00173\n3305-00900...,WH-SC12-05-NUTNR-1004,2013-08-12 00:00:00,CP03ISSM-00002\nCP01CNSM-00005\nCP01CNSM-00007,,NSIF,Returned to vendor 6/24/14 (RMA#2014-125)\nSen...
880,NUTNR,B,262,115673,A00385,CGINS-NUTNRB-00262,ISUS,1336-00014-00002,3.2.4,Satlantic,...,3305-00108-00012\n3305-00108-00064,3305-00308-00002,,3305-00900-00064\n3305-00900-00153\n3305-00900...,WH-SC12-05-NUTNR-1004,2013-08-12 00:00:00,GI01SUMO-00001,,,Sent to vedor as part of trade in for new (SUN...
881,NUTNR,B,266,116564,A00880,CGINS-NUTNRB-00266,ISUS,1336-00014-00002,3.2.4,Satlantic,...,3305-00108-00015\n3305-00108-00040\n3305-00108...,3305-00308-00004\n3305-00308-00032,3305-00508-00024,3305-00900-00036\n3305-00900-00173\n3305-00900...,WH-SC12-05-NUTNR-1006,2014-06-26 00:00:00,CP04OSSM-00001\nCP04OSSM-00004\nCP01CNSM-00008,,NSIF,Sent to vedor as part of trade in for new (SUN...
882,NUTNR,B,267,116562,A00878,CGINS-NUTNRB-00267,ISUS,1336-00014-00002,3.2.4,Satlantic,...,3305-00108-00018\n3305-00108-00053,,3305-00508-00009,3305-00900-00109\n3305-00900-00363,WH-SC12-05-NUTNR-1006,2014-06-26 00:00:00,GS01SUMO-00001\nCP01CNSM-00006,,NSIF,Sent to vedor as part of trade in for new (SUN...
883,NUTNR,B,268,116563,A00879,CGINS-NUTNRB-00268,ISUS,1336-00014-00002,3.2.4,Satlantic,...,3305-00108-00017\n3305-00108-00046\n3305-00108...,3305-00308-00003\n3305-00308-00034,3305-00508-00023,3305-00900-00071\n3305-00900-00173\n3305-00900...,WH-SC12-05-NUTNR-1006,2014-06-26 00:00:00,CP03ISSM-00001\nCP03ISSM-00004\nCP03ISSM-00006,,NSIF,4/2016: Scheduled to sample every half hour. S...
884,NUTNR,B,269,116565,A00881,CGINS-NUTNRB-00269,ISUS,1336-00014-00002,3.2.4,Satlantic,...,3305-00108-00016\n3305-00108-00047\n3305-00114...,3305-00308-00005\n3305-00308-00009,3305-00508-00001,3305-00900-00071\n3305-00900-00317,WH-SC12-05-NUTNR-1006,2014-06-26 00:00:00,CP01CNSM-00002\nCP01CNSM-00003\nCP03ISSM-00005,,NSIF,Sent to vedor as part of trade in for new (SUN...
885,NUTNR,B,270,116899,A01142,CGINS-NUTNRB-00270,ISUS,1336-00014-00002,3.2.4,Satlantic,...,3305-00108-00020\n3305-00108-00050,3305-00308-00006\n3305-00308-00008,3305-00508-00038,3305-00900-00071\n3305-00900-00317,WH-SC12-05-NUTNR-1006,2014-09-24 00:00:00,CP3a Spare\nCP04OSSM-00002\nGI01SUMO-00003,,NSIF,Sent to vedor as part of trade in for new (SUN...


**====================================================================================================================**
### Find & Parse the source file
Now, we want to find the source file of the calibration coefficients, parse the data using the optaa parser, and read the data into a pandas dataframe. The key pieces of data needed for the parser are:
1. Instrument UID: This is needed to initialize the OPTAA parser
2. Source file: This is the full path to the source file. Zip files are acceptable input.

If the predeployment file is not listed in asset tracking, need to hunt through all the predeployment files for the possible candidates:

In [None]:
files = [file for file in os.listdir(doc_directory) if 'A' in file]
pre_files = []
for file in files:
    if '308' in file or '327' in file:
        pre_files.append(file)

In [None]:
pre_paths = []
predeployment = {}
for file in pre_files:
    path = generate_file_path(doc_directory, file, ext=['.zip'])
    with ZipFile(path) as zfile:
        cal_files = [file for file in zfile.namelist() if file.lower().endswith('.cal')]
        if len(cal_files) > 0:
            data = zfile.read(cal_files[0]).decode('ascii')
            lines = data.splitlines()
            _, items, *ignore = lines[0].split(',')
            inst, sn, *ignore = items.split()
            sn = sn.lstrip('0')
            if inst == 'SUNA':
                sn = 'NTR-'+sn
    if predeployment.get(sn) is None:
        predeployment.update({sn: [file]})
    else:
        predeployment[sn].append(file)

In [None]:
predeployment

In [None]:
sn = '708'
file = predeployment.get(sn)
file

In [None]:
pre_path = generate_file_path(doc_directory, file[0], ['.zip'])
pre_path

Initialize the parser:

In [9]:
os.listdir(cal_directory)

['NUTNR-B_ISUS_SN_239_Calibration_Files _2012-11-30.zip',
 'NUTNR-B_ISUS_SN_239_Calibration_Files_2012-10-08.zip',
 'NUTNR-B_ISUS_SN_239_Calibration_Files_2016-02-29.zip',
 'NUTNR-B_ISUS_SN_239_Calibration_Files_2016-09-30.zip',
 'NUTNR-B_ISUS_SN_239_Verification_Checklist_2012-10-08.pdf',
 'NUTNR-B_ISUS_SN_239_Verification_Checklist_2012-11-30.pdf',
 'NUTNR-B_ISUS_SN_239_Verification_Checklist_2016-02-29.pdf',
 'NUTNR-B_ISUS_SN_239_Verification_Checklist_2016-09-30.pdf',
 'NUTNR-B_ISUS_SN_240_Calibration_Files _2012-11-30.zip',
 'NUTNR-B_ISUS_SN_240_Calibration_Files_2012-10-08.zip',
 'NUTNR-B_ISUS_SN_244_Calibration_Files_2015-07-16.zip',
 'NUTNR-B_ISUS_SN_244_Calibration_Files_2016-07-12.zip',
 'NUTNR-B_ISUS_SN_244_Calibration_Files_2017-08-02.zip',
 'NUTNR-B_ISUS_SN_244_Refurb_report_2015-07-16.pdf',
 'NUTNR-B_ISUS_SN_244_Verification_Checklist_2012-11-12.pdf',
 'NUTNR-B_ISUS_SN_244_Verification_Sheets_2015-07-16.pdf',
 'NUTNR-B_ISUS_SN_253_Calibration_Files_2012-12-27.zip',
 'NUTN

In [10]:
for file in os.listdir(doc_directory):
    if '00327-00073' in file:
        print(file)

3305-00327-00073-A.zip


In [53]:
sn = '760'
filepath = doc_directory + '/' + '3305-00327-00075-A.zip'

In [54]:
nutnr = NUTNRCalibration('CGINS-NUTNRM-'+sn.zfill(5))

Read in the calibration coefficients:

In [55]:
nutnr.load_cal(filepath)

In [56]:
nutnr.coefficients

{'CC_cal_temp': '20.06',
 'CC_di': [12.125,
  17.54166667,
  27.16666667,
  35.125,
  35.29166667,
  38.5,
  40.41666667,
  37.875,
  41.5,
  52.125,
  105.16666667,
  328.625,
  1023.45833333,
  2693.45833333,
  5709.95833333,
  9817.16666667,
  13997.95833333,
  17334.04166667,
  19562.08333333,
  20945.5,
  21823.45833333,
  22521.33333333,
  23250.375,
  24109.45833333,
  25204.125,
  26572.58333333,
  28244.20833333,
  30246.58333333,
  32522.75,
  35032.5,
  37705.25,
  40332.70833333,
  42710.75,
  44686.79166667,
  46031.875,
  46537.58333333,
  46357.04166667,
  45388.91666667,
  43829.66666667,
  41946.75,
  39946.66666667,
  38029.33333333,
  36314.79166667,
  34877.66666667,
  33760.0,
  33048.75,
  32646.83333333,
  32628.41666667,
  32923.04166667,
  33563.875,
  34524.20833333,
  35811.0,
  37325.83333333,
  39104.0,
  41120.33333333,
  43251.29166667,
  45488.0,
  47666.08333333,
  49763.875,
  51639.66666667,
  53151.79166667,
  54160.04166667,
  54653.66666667,
  5456

Write the csv to a temporary local folder:

In [57]:
temp_directory = '/'.join((os.getcwd(),'temp'))
# Check if the temp directory exists; if it already does, purge and rewrite
if os.path.exists(temp_directory):
    shutil.rmtree(temp_directory)
    ensure_dir(temp_directory)
else:
    ensure_dir(temp_directory)

In [59]:
nutnr.write_csv(temp_directory)

Write CGINS-NUTNRM-00760__20190802.csv to /home/andrew/Documents/OOI-CGSN/QAQC_Sandbox/Metadata_Review/temp? [y/n]: y


In [60]:
(nutnr.uid, nutnr.serial, nutnr.date)

('CGINS-NUTNRM-00760', 'NTR-0760', ['20190802', '20151126'])

In [61]:
nutnr.source

'Source file: 3305-00327-00075-A.zip > 3305-00327-00075-A/SNA0760B.CAL'

**====================================================================================================================**
### Check the data
Now, we have generated local csv and ext files from the data. We can now reload that data into python as a pandas dataframe, which will allow for a direct comparison with the existing data. 

In [62]:
def reformat_arrays(array):
    # First, need to strip extraneous characters from the array
    array = array.replace("'","").replace('[','').replace(']','')
    # Next, split the array into a list
    array = array.split(',')
    # Now, need to eliminate any white space surrounding the individual coeffs
    array = [num.strip() for num in array]
    # Next, float the nums
    array = [float(num) for num in array]
    # Check if the array is len == 1; if so, can just return the number
    if len(array) == 1:
        array = array[0]
    # Now we are done
    return array

In [64]:
#sn = nutnr.serial.zfill(5)
dt = max(nutnr.date)

In [65]:
source_csv = pd.read_csv(temp_directory+'/CGINS-NUTNRM-00760'+'__'+dt+'.csv')
source_csv['value'] = source_csv['value'].apply(lambda x: reformat_arrays(x))
#source_csv['serial'] = 1029
source_csv

Unnamed: 0,serial,name,value,notes
0,NTR-0760,CC_cal_temp,20.06,Source file: 3305-00327-00075-A.zip > 3305-00...
1,NTR-0760,CC_di,"[12.125, 17.54166667, 27.16666667, 35.125, 35....",
2,NTR-0760,CC_eno3,"[0.00090013, 0.00346387, -0.00188127, -0.00140...",
3,NTR-0760,CC_eswa,"[0.00907539, 0.00594526, 0.01134671, 0.0056445...",
4,NTR-0760,CC_lower_wavelength_limit_for_spectra_fit,217,217
5,NTR-0760,CC_upper_wavelength_limit_for_spectra_fit,240,240
6,NTR-0760,CC_wl,"[189.38, 190.17, 190.96, 191.76, 192.55, 193.3...",


In [66]:
path = '/home/andrew/Documents/OOI-CGSN/asset-management/calibration/NUTNRM/CGINS-NUTNRM-00760__20190802.csv'
path

'/home/andrew/Documents/OOI-CGSN/asset-management/calibration/NUTNRM/CGINS-NUTNRM-00760__20190802.csv'

In [67]:
am_csv = pd.read_csv(path)
am_csv['value'] = am_csv['value'].apply(lambda x: reformat_arrays(x))
am_csv

Unnamed: 0,serial,name,value,notes
0,760,CC_cal_temp,20.06,source_file 3305-00327-00075-A_SN_760_Pre-Depl...
1,760,CC_di,"[12.125, 17.54166667, 27.16666667, 35.125, 35....",
2,760,CC_eno3,"[0.00090013, 0.00346387, -0.00188127, -0.00140...",
3,760,CC_eswa,"[0.00907539, 0.00594526, 0.01134671, 0.0056445...",
4,760,CC_lower_wavelength_limit_for_spectra_fit,217,Constant
5,760,CC_upper_wavelength_limit_for_spectra_fit,240,Constant
6,760,CC_wl,"[189.38, 190.17, 190.96, 191.76, 192.55, 193.3...",


In [None]:
am_csv['notes'].iloc[0]

In [None]:
source_csv['notes'].iloc[0]

In [68]:
source_csv == am_csv

Unnamed: 0,serial,name,value,notes
0,False,True,True,False
1,False,True,True,False
2,False,True,True,False
3,False,True,True,False
4,False,True,True,False
5,False,True,True,False
6,False,True,True,False


In [None]:
result = {}
for k,val in enumerate(am_csv['value'].iloc[1]):
    check = source_csv['value'].iloc[1][k] == val
    if not check:
        result.update({k:val})
result

In [None]:
result = {}
for k,val in enumerate(am_csv['value'].iloc[3]):
    check = source_csv['value'].iloc[3][k] == val
    if not check:
        result.update({k:val})
result

In [None]:
source_csv['value'].iloc[0] - am_csv['value'].iloc[0]

In [None]:
stuff

In [1]:
import re
import pandas as pd
import numpy as np
from zipfile import ZipFile

class NUTNRCalibration():
    # Class that stores calibration values for CTDs.

    def __init__(self, uid):
        self.serial = None
        self.uid = uid
        self.coefficients = {
            'CC_cal_temp':[],
            'CC_di':[],
            'CC_eno3':[],
            'CC_eswa':[],
            'CC_lower_wavelength_limit_for_spectra_fit':'217',
            'CC_upper_wavelength_limit_for_spectra_fit':'240',
            'CC_wl':[]
        }
        self.date = []
        self.notes = {
            'CC_cal_temp':'',
            'CC_di':'',
            'CC_eno3':'',
            'CC_eswa':'',
            'CC_lower_wavelength_limit_for_spectra_fit':'217',
            'CC_upper_wavelength_limit_for_spectra_fit':'240',
            'CC_wl':''
        }

    @property
    def uid(self):
        return self._uid

    @uid.setter
    def uid(self, d):
        r = re.compile('.{5}-.{6}-.{5}')
        if r.match(d) is not None:
            self._uid = d
        else:
            raise Exception(f"The instrument uid {d} is not a valid uid. Please check.")
            
    def load_cal(self, filepath):
        """
        Wrapper function to load all of the calibration coefficients
        
        Args:
            filepath - path to the directory with filename which has the
                calibration coefficients to be parsed and loaded
        Calls:
            open_cal
            parse_cal
        """
        
        data = self.open_cal(filepath)
        
        self.parse_cal(data)
    
    
    def open_cal(self, filepath):
        """
        Function that opens and reads in cal file
        information for a NUTNR. Zipfiles are acceptable inputs.
        """
        
        if filepath.endswith('.zip'):
            with ZipFile(filepath) as zfile:
                # Check if ISUS or SUNA to get the appropriate name
                filename = [name for name in zfile.namelist()
                            if name.lower().endswith('.cal') and 'z' not in name.lower()]
                
                # Get and open the latest calibration file
                if len(filename) == 1:
                    data = zfile.read(filename[0]).decode('ascii')
                    self.source_file(filepath, filename[0])
                    
                elif len(filename) > 1:
                    filename = [max(filename)]
                    data = zfile.read(filename[0]).decode('ascii')
                    self.source_file(filepath, filename[0])

                else:
                    FileExistsError(f"No .cal file found in {filepath}")
                        
        elif filepath.lower().endswith('.cal'):
            if 'z' not in filepath.lower().split('/')[-1]:
                with open(filepath) as file:
                    data = file.read()
                self.source_file(filepath, file)
        else:
            pass
        
        return data
            
        
    def source_file(self, filepath, filename):
        """
        Routine which parses out the source file and filename
        where the calibration coefficients are sourced from.
        """
        
        if filepath.lower().endswith('.cal'):
            dcn = filepath.split('/')[-2]
            filename = filepath.split('/')[-1]
        else:
            dcn = filepath.split('/')[-1]
        
        self.source = f'Source file: {dcn} > {filename}'
        
    
    def parse_cal(self, data):
        
        for k,line in enumerate(data.splitlines()):
            
            if line.startswith('H'):
                _, info, *ignore = line.split(',')
                
                # The first line of the cal file contains the serial number
                if k == 0:
                    _, sn, *ignore = info.split()
                    if 'SUNA' in info:
                        self.serial = 'NTR-' + sn
                    else:
                        self.serial = sn
                    
                
                # File creation time is when the instrument was calibrated.
                # May be multiple times for different cal coeffs
                if 'file creation time' in info.lower():
                    _, _, _, date, time = info.split()
                    date_time = pd.to_datetime(date + ' ' + time).strftime('%Y%m%d')
                    self.date.append(date_time)
                    
                # The temperature at which it was calibrated
                if 't_cal_swa' in info.lower() or 't_cal' in info.lower():
                    _, cal_temp = info.split()
                    self.coefficients['CC_cal_temp'] = cal_temp
                    
            # Now parse the calibration coefficients
            if line.startswith('E'):
                _, wl, eno3, eswa, _, di = line.split(',')
                
                self.coefficients['CC_wl'].append(float(wl))
                self.coefficients['CC_di'].append(float(di))
                self.coefficients['CC_eno3'].append(float(eno3))
                self.coefficients['CC_eswa'].append(float(eswa))
                
                
    def write_csv(self, outpath):
        """
        This function writes the correctly named csv file for the ctd to the
        specified directory.

        Args:
            outpath - directory path of where to write the csv file
        Raises:
            ValueError - raised if the CTD object's coefficient dictionary
                has not been populated
        Returns:
            self.to_csv - a csv of the calibration coefficients which is
                written to the specified directory from the outpath.
        """

        # Run a check that the coefficients have actually been loaded
        if len(self.coefficients.values()) <= 2:
            raise ValueError('No calibration coefficients have been loaded.')

        # Create a dataframe to write to the csv
        data = {
            'serial': [self.serial]*len(self.coefficients),
            'name': list(self.coefficients.keys()),
            'value': list(self.coefficients.values())
        }
        df = pd.DataFrame().from_dict(data)

        # Define a function to reformat the notes into an uniform system
        def reformat_notes(x):
            # First, get rid of 
            try:
                np.isnan(x)
                x = ''
            except:
                x = str(x).replace('[','').replace(']','')
            return x
        
        # Now merge the coefficients dataframe with the notes
        if len(self.notes) > 0:
            notes = pd.DataFrame().from_dict({
                'name':list(self.notes.keys()),
                'notes':list(self.notes.values())
            })
            df = df.merge(notes, how='outer', left_on='name', right_on='name')
        else:
            df['notes'] = ''
            
        # Add in the source file
        df['notes'].iloc[0] = df['notes'].iloc[0] + ' ' + self.source
        
        # Sort the data by the coefficient name
        df = df.sort_values(by='name')

        # Generate the csv name
        cal_date = max(self.date)
        csv_name = self.uid + '__' + cal_date + '.csv'

        # Write the dataframe to a csv file
        check = input(f"Write {csv_name} to {outpath}? [y/n]: ")
        # check = 'y'
        if check.lower().strip() == 'y':
            df.to_csv(outpath+'/'+csv_name, index=False)