# NUTNR METADATA REVIEW
The notebook describes the process for checking the calibration csvs located in the OOI-CGSN asset management repository on GitHub. The purpose is to identify when errors were made during entering the calibration csvs. 

**========================================================================================================================**

The first step is to load relevant packages:

In [1]:
import csv
import re
import os
import numpy as np
import pandas as pd

In [12]:
from utils import *

**=========================================================================================================================**
Define some useful functions for the metadata review (in future will move to a utilities file):

In [13]:
def get_qct_files(df, qct_directory):
    qct_dict = {}
    uids = list(set(df['UID']))
    for uid in uids:
        df['UID_match'] = df['UID'].apply(lambda x: True if uid in x else False)
        qct_series = df[df['UID_match'] == True]['QCT Testing']
        qct_series = list(qct_series.iloc[0].split('\n'))
        qct_dict.update({uid:qct_series})
    return qct_dict

In [14]:
def get_calibration_files(serial_nums, cal_directory):
    calibration_files = {}
    for uid in serial_nums.keys():
        sn = serial_nums.get(uid)
        sn = str(sn[0])
        files = []
        for file in os.listdir(cal_directory):
            if 'Calibration_File' in file:
                if sn in file:
                    files.append(file)
        calibration_files.update({uid:files})
    return calibration_files

In [15]:
# Now I need to load the all of the csv files based on their UID
def load_csv_info(csv_dict,filepath):
    """
    Loads the calibration coefficient information contained in asset management
    
    Args:
        csv_dict - a dictionary which associates an instrument UID to the
            calibration csv files in asset management
        filepath - the path to the directory containing the calibration csv files
    Returns:
        csv_cals - a dictionary which associates an instrument UID to a pandas
            dataframe which contains the calibration coefficients. The dataframes
            are indexed by the date of calibration
    """
    
    # Load the calibration data into pandas dataframes, which are then placed into
    # a dictionary by the UID
    csv_cals = {}
    for uid in csv_dict:
        cals = pd.DataFrame()
        for file in csv_dict[uid]:
            data = pd.read_csv(filepath+file)
            date = file.split('__')[1].split('.')[0]
            data['CAL DATE'] = pd.to_datetime(date)
            cals = cals.append(data)
        csv_cals.update({uid:cals})
        
    # Pivot the dataframe to be sorted based on calibration date
    for uid in csv_cals:
        csv_cals[uid] = csv_cals[uid].pivot(index=csv_cals[uid]['CAL DATE'], columns='name')['value']
        
    return csv_cals

In [16]:
def splitDataFrameList(df,target_column):
    ''' 
    df = dataframe to split,
    target_column = the column containing the values to split
    separator = the symbol used to perform the split
    returns: a dataframe with each entry for the target column separated, with each element moved into a new row. 
    The values in the other columns are duplicated across the newly divided rows.
    '''
    
    def splitListToRows(row,row_accumulator,target_column):
        split_row = row[target_column]
        for s in split_row:
            new_row = row.to_dict()
            new_row[target_column] = s
            row_accumulator.append(new_row)
            
    new_rows = []
    df.apply(splitListToRows,axis=1,args = (new_rows,target_column))
    new_df = pd.DataFrame(new_rows)
    return new_df

In [17]:
# Now, write a function to copy over the file
def copy_to_local(cal_path):
    """
    Function which copies the files from the cal_path to a locally
    created temp directory
    """
    
    for filepath in cal_path:
        # Create a folder in which to save extracted data
        folder, *ignore = filepath.split('/')[-1].split('.')
        savedir = '/'.join((os.getcwd(),'temp','cal_data',folder))
        # Now make sure that the save directory exists and can be used
        ensure_dir(savedir)
    
        if filepath.endswith('.zip'):
            with ZipFile(filepath,'r') as zfile:
                for file in zfile.namelist():
                    zfile.extract(file,path=savedir)    
        else:
            shutil.copy(filepath, savedir)

**====================================================================================================================**
Define the directories where the QCT, Pre, and Post deployment document files are stored, where the vendor documents are stored, where asset tracking is located, and where the calibration csvs are located.

In [18]:
qct_directory = '/media/andrew/OS/Users/areed/Documents/Project_Files/Records/Instrument_Records/NUTNR/NUTNR_Results/'
cal_directory = '/media/andrew/OS/Users/areed/Documents/Project_Files/Records/Instrument_Records/NUTNR/NUTNR_Cal/'
asset_management_directory = '/home/andrew/Documents/OOI-CGSN/ooi-integration/asset-management/calibration/NUTNRB/'

In [19]:
excel_spreadsheet = '/media/andrew/OS/Users/areed/Documents/Project_Files/Documentation/System/System Notebook/WHOI_Asset_Tracking.xlsx'
sheet_name = 'Sensors'

In [20]:
NUTNR = whoi_asset_tracking(spreadsheet=excel_spreadsheet,sheet_name=sheet_name,instrument_class='NUTNR',series='B')
NUTNR

Unnamed: 0,Instrument Class,Series,Supplier Serial Number,WHOI #,OOI #,UID,Model,CGSN PN,Firmware Version,Supplier,...,QCT Testing,PreDeployment,Post Deployment,Refurbishment/ Repair,DO Number,Date Received,Deployment History,Current Deployment,Instrument Location on Current Deployment,Notes
876,NUTNR,B,239,115084,A00065,CGINS-NUTNRB-00239,ISUS,1336-00014-00002,3.2.4,Satlantic,...,3305-00108-00004\n3305-00108-00048\n3305-00108...,3305-00308-00001,3305-00508-00040,3305-00900-00075\n3305-00900-00144\n3305-00900...,WH-SC12-5-NUTNR-1001,2012-11-13 00:00:00,GI01SUMO-00001\nCP04OSSM-00006,,NSIF,Reading High nitrate levels during calibration...
877,NUTNR,B,240,115085,A00066,CGINS-NUTNRB-00240,ISUS,1336-00014-00002,3.2.4,Satlantic,...,3305-00108-00003\n3305-00108-00029\n3305-00108...,,,3305-00900-00008\n3305-00900-00231\n3305-00900...,WH-SC12-5-NUTNR-1001,2012-11-13 00:00:00,CP01CNSM-00001\nGS01SUMO-00002,,NSIF,"09/2017: Clock issue - resets back to Jan 1, 2..."
878,NUTNR,B,260,115671,A00383,CGINS-NUTNRB-00260,ISUS,1336-00014-00002,3.2.4,Satlantic,...,3305-00108-00010\n3305-00108-00056,,3305-00508-00010,3305-00900-00109\n3305-00900-00317,WH-SC12-05-NUTNR-1004,2013-08-12 00:00:00,GI Spare\nCP1 spare\nGS01SUMO-00001\nCP04OSSM-...,,NSIF,Sent to vedor as part of trade in for new (SUN...
879,NUTNR,B,261,115672,A00384,CGINS-NUTNRB-00261,ISUS,1336-00014-00002,3.2.4,Satlantic,...,3305-00108-00011\n3305-00108-00021\n3305-00108...,3305-00308-00007\n3305-00308-00031\n3305-00308...,3305-00508-00022\n3305-00508-00041,3305-00900-00071\n3305-00900-00173\n3305-00900...,WH-SC12-05-NUTNR-1004,2013-08-12 00:00:00,CP03ISSM-00002\nCP01CNSM-00005\nCP01CNSM-00007,,NSIF,Returned to vendor 6/24/14 (RMA#2014-125)\nSen...
880,NUTNR,B,262,115673,A00385,CGINS-NUTNRB-00262,ISUS,1336-00014-00002,3.2.4,Satlantic,...,3305-00108-00012\n3305-00108-00064,3305-00308-00002,,3305-00900-00064\n3305-00900-00153\n3305-00900...,WH-SC12-05-NUTNR-1004,2013-08-12 00:00:00,GI01SUMO-00001,,,Sent to vedor as part of trade in for new (SUN...
881,NUTNR,B,266,116564,A00880,CGINS-NUTNRB-00266,ISUS,1336-00014-00002,3.2.4,Satlantic,...,3305-00108-00015\n3305-00108-00040\n3305-00108...,3305-00308-00004\n3305-00308-00032,3305-00508-00024,3305-00900-00036\n3305-00900-00173\n3305-00900...,WH-SC12-05-NUTNR-1006,2014-06-26 00:00:00,CP04OSSM-00001\nCP04OSSM-00004\nCP01CNSM-00008,,NSIF,Sent to vedor as part of trade in for new (SUN...
882,NUTNR,B,267,116562,A00878,CGINS-NUTNRB-00267,ISUS,1336-00014-00002,3.2.4,Satlantic,...,3305-00108-00018\n3305-00108-00053,,3305-00508-00009,3305-00900-00109\n3305-00900-00363,WH-SC12-05-NUTNR-1006,2014-06-26 00:00:00,GS01SUMO-00001\nCP01CNSM-00006,,NSIF,Sent to vedor as part of trade in for new (SUN...
883,NUTNR,B,268,116563,A00879,CGINS-NUTNRB-00268,ISUS,1336-00014-00002,3.2.4,Satlantic,...,3305-00108-00017\n3305-00108-00046\n3305-00108...,3305-00308-00003\n3305-00308-00034,3305-00508-00023,3305-00900-00071\n3305-00900-00173\n3305-00900...,WH-SC12-05-NUTNR-1006,2014-06-26 00:00:00,CP03ISSM-00001\nCP03ISSM-00004\nCP03ISSM-00006,,NSIF,4/2016: Scheduled to sample every half hour. S...
884,NUTNR,B,269,116565,A00881,CGINS-NUTNRB-00269,ISUS,1336-00014-00002,3.2.4,Satlantic,...,3305-00108-00016\n3305-00108-00047\n3305-00114...,3305-00308-00005\n3305-00308-00009,3305-00508-00001,3305-00900-00071\n3305-00900-00317,WH-SC12-05-NUTNR-1006,2014-06-26 00:00:00,CP01CNSM-00002\nCP01CNSM-00003\nCP03ISSM-00005,,NSIF,Sent to vedor as part of trade in for new (SUN...
885,NUTNR,B,270,116899,A01142,CGINS-NUTNRB-00270,ISUS,1336-00014-00002,3.2.4,Satlantic,...,3305-00108-00020\n3305-00108-00050,3305-00308-00006\n3305-00308-00008,3305-00508-00038,3305-00900-00071\n3305-00900-00317,WH-SC12-05-NUTNR-1006,2014-09-24 00:00:00,CP3a Spare\nCP04OSSM-00002\nGI01SUMO-00003,,NSIF,Sent to vedor as part of trade in for new (SUN...


**======================================================================================================================**
Now, I want to load all the calibration csvs and group them by UID:

In [24]:
uids = sorted( list( set(NUTNR['UID']) ) )

['CGINS-NUTNRB-00239',
 'CGINS-NUTNRB-00240',
 'CGINS-NUTNRB-00260',
 'CGINS-NUTNRB-00261',
 'CGINS-NUTNRB-00262',
 'CGINS-NUTNRB-00266',
 'CGINS-NUTNRB-00267',
 'CGINS-NUTNRB-00268',
 'CGINS-NUTNRB-00269',
 'CGINS-NUTNRB-00270',
 'CGINS-NUTNRB-00271',
 'CGINS-NUTNRB-00272',
 'CGINS-NUTNRB-00273',
 'CGINS-NUTNRB-00274',
 'CGINS-NUTNRB-00275',
 'CGINS-NUTNRB-00276',
 'CGINS-NUTNRB-00277',
 'CGINS-NUTNRB-00280',
 'CGINS-NUTNRB-00283',
 'CGINS-NUTNRB-00284',
 'CGINS-NUTNRB-00285',
 'CGINS-NUTNRB-01062',
 'CGINS-NUTNRB-01063',
 'CGINS-NUTNRB-01064',
 'CGINS-NUTNRB-01065',
 'CGINS-NUTNRB-01086',
 'CGINS-NUTNRB-01087',
 'CGINS-NUTNRB-01088',
 'CGINS-NUTNRB-01089',
 'CGINS-NUTNRB-01090',
 'CGINS-NUTNRB-01091',
 'CGINS-NUTNRB-01092',
 'CGINS-NUTNRB-01093',
 'CGINS-NUTNRB-01094',
 'CGINS-NUTNRB-01102',
 'CGINS-NUTNRB-01103',
 'CGINS-NUTNRB-01106',
 'CGINS-NUTNRB-01107']

In [30]:
csv_dict = {}
asset_management = os.listdir(asset_management_directory)
for uid in uids:
    files = [file for file in asset_management if uid in file]
    csv_dict.update({uid: sorted(files)})

**======================================================================================================================**
Get the serial numbers of the instruments and match them to the UIDs:

In [46]:
serial_dict = {}
for uid in uids:
    sn = NUTNR[NUTNR['UID'] == uid]['Supplier\nSerial Number']
    serial_dict.update({uid: str(sn.iloc[0])})    

int