# CGSN Metadata Communications
Author: Andrew Reed

Date: 2019-08-21

Ver: 1.02

This notebook lays out the development process for querying the relevant data fields necessary for CGSN to fill out the UW Metadata Changes & Communications spreadsheet. The goal is to explore the possible sources of the necessary information by M2M calls to OOINet based upon the available information recorded in CGSN's Metadata Tracking Spreadsheet. Eventually, the exploratory and development process laid out below will be transitioned into an automated function which fills out the requisite information with manual execution of the relevant scripts and code.


In [1]:
import os, shutil, sys, time, re, requests, csv, datetime, pytz
import pandas as pd
import numpy as np
import netCDF4 as nc
import xarray as xr

Set my OOINet username, token, and the base url for querying the system via M2M:

In [2]:
username = 'OOIAPI-C9OSZAQABG1H3U'
token = 'JA48WUQVG7F'

In [3]:
base_url = 'https://ooinet.oceanobservatories.org/api/m2m'
sensor_url = '12576/sensor/inv'
asset_url = '12587/asset'

In [4]:
# Specify some functions to convert timestamps
ntp_epoch = datetime.datetime(1900, 1, 1)
unix_epoch = datetime.datetime(1970, 1, 1)
ntp_delta = (unix_epoch - ntp_epoch).total_seconds()

def ntp_seconds_to_datetime(ntp_seconds):
    return datetime.datetime.utcfromtimestamp(ntp_seconds - ntp_delta).replace(microsecond=0)
  
def convert_ooi_time(ms):
    if ms is None:
        return None
    elif np.isnan(ms):
        return None
    else:
        return datetime.datetime.utcfromtimestamp(ms/1000)

**====================================================================================================================**
### Metadata Review Tracking Spreadsheet
Load and process the metadata review tracking spreadsheet used by CGSN, eliminating the few edgecases (such as a couple of DOSTAs) that have Bad Calibrations (i.e. calibrations that can't be fixed) and any empty or null rows in the spreadsheet.

In [5]:
metadata_review = pd.read_excel('CGSN Metadata Review.xlsx',sheet_name='Cal Review Log')
metadata_review.dropna(subset=['CLASS-SERIES'], inplace=True)
metadata_review = metadata_review[metadata_review['Original Calibration CSV'] != 'Bad']
metadata_review.head()

Unnamed: 0,CLASS-SERIES,S/N,Cal Date,Original Calibration CSV,Vendor Docs exist,Cal coeff match,Filename correct,In progress?,Duplicate,Notes,Pull request #,Date pull request submitted,Pull request verified primary,Pull request verified secondary,Unnamed: 14
0,PCO2W-B,C0050,2012-08-28 00:00:00,CGINS-PCO2WB-C0050__20150315,Yes,Yes,"No, need to change date","SW, AP",No,changed date (RGT),659.0,2019-03-28 00:00:00,CD,AR,
1,PCO2W-B,C0050,2016-05-20 00:00:00,CGINS-PCO2WB-C0050__20161125,Yes,Yes,"No, need to change date","SW, AP",No,changed date (RGT),659.0,2019-03-28 00:00:00,CD,AR,
2,PCO2W-B,C0051,2012-08-28 00:00:00,CGINS-PCO2WB-C0051__20140910,Yes,Yes,"No, need to change date","SW, AP",No,changed date (RGT),659.0,2019-03-28 00:00:00,CD,AR,
3,PCO2W-B,C0051,2015-12-10 00:00:00,CGINS-PCO2WB-C0051__20160513,Yes,Yes,"No, need to change date","SW, AP",No,changed date (RGT),659.0,2019-03-28 00:00:00,CD,AR,
4,PCO2W-B,C0051,2016-12-21 00:00:00,CGINS-PCO2WB-C0051__20161221,Yes,Yes,Yes,"SW, AP",No,,,,,,


Next, process the metadata review tracking spreadsheet (MRTS) to create the following information:
1. UID
2. New Calibration CSV filename: this is what files are renamed to if the original csv filename is found to be incorrect, such as when the wrong calibration date was used. This is important to know since, when querying calibration and deployment data from OOINet via M2M, the new csv names are the way files are identified. We can build the new CSV filenames from the instrument UID, which was previously bilt from the instrument class-series and serial number, and the correct/corrected calibration date.
3. Error Classification: this is how idenified errors in the calibration csvs are grouped
    * Wrong cal date - if the calibration date in the csv filename was wrong
    * Wrong cal coef - this is if a calibration coefficient in the csv was identified as being incorrect
    * Is missing - if a calibration csv file that should be in asset management is missing
    * Is duplicate - if a calibration csv in asset management is identified as a duplicate of another csv and should be deleted
    * Is good - the calibration date and calibration coefficients were all correct and the file is not a duplicate.
    
The preceding information is done via a series of simple functions applied to the appropriate dataframe columns.

In [6]:
def reformat_calDate(x):
    if type(x) is int:
        return pd.to_datetime(str(x))
    else:
        return x

In [7]:
metadata_review['Cal Date'] = metadata_review['Cal Date'].apply(reformat_calDate)

In [8]:
metadata_review[metadata_review['CLASS-SERIES'] == 'CTDBP-C']

Unnamed: 0,CLASS-SERIES,S/N,Cal Date,Original Calibration CSV,Vendor Docs exist,Cal coeff match,Filename correct,In progress?,Duplicate,Notes,Pull request #,Date pull request submitted,Pull request verified primary,Pull request verified secondary,Unnamed: 14
237,CTDBP-C,16-07208,2012-10-25 00:00:00,CGINS-CTDBPC-07208__20121025.csv,Yes,Yes,Yes,,No,updated filename in Vault (RGT)\nQCT exists (3...,,,,,
238,CTDBP-C,16-07208,2015-02-19 00:00:00,CGINS-CTDBPC-07208__20150219.csv,Yes,Yes,Yes,,No,,,,,,
239,CTDBP-C,16-07208,2015-12-12 00:00:00,CGINS-CTDBPC-07208__20151212.csv,Yes,Yes,Yes,,No,updated filename in Vault (RGT),,,,,
240,CTDBP-C,16-07208,2017-06-13 00:00:00,CGINS-CTDBPC-07208__20170613.csv,Yes,Yes,Yes,,No,,,,,,
241,CTDBP-C,16-07208,2018-05-30 00:00:00,CGINS-CTDBPC-07208__20180530.csv,Yes,Yes,Yes,,No,,,,,,
242,CTDBP-C,16-50003,2014-01-30 00:00:00,CGINS-CTDBPC-50003__20140130.csv,Yes,"CC_pa0, CC_ptca1, CC_ptca2, CC_ptempa2",Yes,RGT,No,updated filename in Vault (RGT),73.0,2019-07-31 00:00:00,,,
243,CTDBP-C,16-50003,2015-12-15 00:00:00,CGINS-CTDBPC-50003__20151215.csv,Yes,Yes,Yes,,No,CAL & XMLCON files are in associated cal zip f...,,,,,
244,CTDBP-C,16-50003,2016-12-17 00:00:00,CGINS-CTDBPC-50003__20161217.csv,Yes,Yes,Yes,,No,No CAL or XMLCON; updated filename in Vault (RGT),,,,,
245,CTDBP-C,16-50003,2017-12-12 00:00:00,CGINS-CTDBPC-50003__20171212.csv,Yes,Yes,Yes,,No,,,,,,
246,CTDBP-C,16-50003,2019-01-23 00:00:00,\nCGINS-CTDBPC-50003__20190123.csv,Yes,CC_ptca0,Yes,RGT,No,QCT exists (3305-00102-00209) - csv exists but...,73.0,2019-07-31 00:00:00,,,


In [9]:
def generate_uid(inst, sn, whoi_inst=True):
    """
    Function which takes in instrument class - series and serial number to generate an instrument uid. The exception
    to the rule is the METBK instruments, which are classified as Loggers, and thus are recorded as METLGR
    """
    
    # Clean the names of the class-series
    if '-' in inst:
        inst = inst.replace('-','')
        
    # Clean the serial numbers
    sn = str(sn)
    if '-' in sn:
        ind = sn.index('-')
        sn = sn[ind+1:].zfill(5)
    elif len(sn) < 5:
        sn = sn.zfill(5)
    else:
        pass
    
    # If the instrument is a METBK, have to handle differently
    if 'METBKA' in inst:
        inst = 'METLGR'
        if 'UNKNOWN' in sn:
            sn = sn.split('\n')[-1]
        else:
            sn = sn[3:].zfill(5)   
        
    # Generate the UID
    if whoi_inst == True:
        uid = '-'.join(('CGINS',inst,sn))
        
    return uid

In [10]:
metadata_review['UID'] = metadata_review.apply(lambda x: generate_uid(x['CLASS-SERIES'], x['S/N']), axis=1)

In [11]:
def wrong_cal_date(x):
    if type(x) == str:
        if 'no' in x.lower():
            return True
        else:
            return False
    else:
        return False

In [12]:
metadata_review['Wrong Date'] = metadata_review['Filename correct'].apply(wrong_cal_date)

In [13]:
def wrong_cal_coef(x):
    if type(x) == str:
        if 'yes' in x.lower():
            return False
        else:
            return True
    elif np.isnan(x):
        return False
    else:
        return False

In [14]:
metadata_review['Wrong cal'] = metadata_review['Cal coeff match'].apply(wrong_cal_coef)

In [15]:
def is_missing(x):
    if type(x) is str:
        if x.lower() == 'new':
            return True
        else:
            return False
    else:
        return False

In [16]:
metadata_review['Is missing'] = metadata_review['Duplicate'].apply(is_missing)

In [17]:
def is_duplicate(x):
    if type(x) is str:
        if x.lower() == 'yes':
            return True
        else:
            return False
    else:
        return False        

In [18]:
metadata_review['Is duplicate'] = metadata_review['Duplicate'].apply(is_duplicate)

In [19]:
def is_good(x):
    if any(x) == True:
        return False
    else:
        return True

In [20]:
metadata_review['Is good'] = metadata_review[['Wrong Date', 'Wrong cal', 'Is missing', 'Is duplicate']].apply(is_good, axis=1)

Check the classification:

In [21]:
metadata_review[['Wrong Date','Wrong cal','Is missing','Is duplicate','Is good']].head(10)

Unnamed: 0,Wrong Date,Wrong cal,Is missing,Is duplicate,Is good
0,True,False,False,False,False
1,True,False,False,False,False
2,True,False,False,False,False
3,True,False,False,False,False
4,False,False,False,False,True
5,False,False,False,False,True
6,True,False,False,False,False
7,True,False,False,False,False
8,False,False,False,False,True
9,False,False,False,False,True


Generate the new csv filenames from the instrument class-series, serial number, and the correct/corrected calibration date:

In [22]:
def print_calDate(x):
    if type(x) == float:
        return None
    elif x == 'U':
        return None
    else:
        return x.strftime('%Y%m%d')

In [23]:
metadata_review['Cal Date'] = metadata_review['Cal Date'].apply(print_calDate)

In [24]:
def new_csv_filename(x):
    og_csv = x['Original Calibration CSV']
    if not og_csv.endswith('.csv') and og_csv != None:
        og_csv = og_csv + '.csv'
        x['Original Calibration CSV'] = og_csv
        
    if x['Is duplicate']:
        return np.nan
    elif x['Cal Date'] == None:
        return og_csv
    elif x['Wrong Date'] or x['Is missing']:
        new_csv = x['UID'] + '__' + x['Cal Date'] + '.csv'
        if new_csv == x['Original Calibration CSV'] and not x['Is missing']:
            print("Check calibration date for {} for errors.".format(x['Original Calibration CSV']))
        else:
            return new_csv
    else:
        return x['Original Calibration CSV']

In [25]:
metadata_review['New Calibration CSV'] = metadata_review.apply(new_csv_filename, axis=1)

Check calibration date for CGINS-ADCPSL-24661__20170626.csv for errors.
Check calibration date for CGINS-ADCPSL-24664__20170626.csv for errors.
Check calibration date for CGINS-ADCPSJ-22642__20170405.csv for errors.
Check calibration date for CGINS-ADCPSN-23579__20170109.csv for errors.


**====================================================================================================================**
## Metadata Communications Spreadsheet
This section steps through generating the requisite information need to fill out the UW metadata communication spreadsheet based upon CGSN's metadata review approach. I need to gather the following information for the spreadsheet:
* Array
* Platform
* Node
* Instrument
* RefDes
* Asset UID -  
* Serial
* Deployment(s)
* Github Change Date - I don't think this is necessary, since it doesn't affect the end user until a release and ingestion to OOINet
* OOI Change Date - Question on 
* CSV file name - this is the filename which is in the system (so changes which have not been pushed to OOI net should not be put on the spreadsheet?)
* Github URL - Is this also necessary, when they can directly call (via M2M) or download the calibration information from the Portal
* Change type - This is my 5 categories from above
* dateRange Start
* dateRange End
* Annotation

### Gather Relevant Data

Starting with my Metadata tracking spreadsheet above, I want to be able to use a series of M2M calls to the OOI API in order to get the data necessary to fill out the spreadsheet above. A wrinkle is that _only_ csv files which have been merged, push to ooi-integration, and ingested into OOINet can be identified by M2M. That means taking an instrument-by-instrument approach following our metadata-branching system on gitHub is preferable, in order to avoid getting ahead of files with changes not yet ingested into OOINet.

In [26]:
metadata_review['CLASS-SERIES'] = metadata_review['CLASS-SERIES'].apply(lambda x: x.replace('-',''))

#### Start by selecting an instrument class-series, preferably one which where the review has been finished and pushed to ooi-integration.

In [27]:
def get_deployData(uid):
    url = '/'.join((base_url,'12587','asset','deployments',uid+'?editphase=ALL'))
    data = requests.get(url, auth=(username, token)).json()
    df = pd.DataFrame(data)
    df.sort_values(by='deploymentNumber', inplace=True)
    df.reset_index(drop=True, inplace=True)
    return df

In [28]:
def get_calData(uid, deployData):
    """
    This function takes in the instrument uid and a dataframe of the
    deployment information for the uid, and loops through all of the
    instrument deployments to return the calibration data for the
    instrument for each individual deployment.
    """
    
    startTime = deployData['startTime']
    dt = 8.64E10 # microseconds in a day
    
    # Initialize tuples for non-mutable storage of data
    dataSource = ()
    lastModifiedTimestamp = ()
    instrument = ()
    serialNumber = ()
    
    # Loop over the deployment startTime and get the data
    for t in startTime:
        T1 = convert_ooi_time(t).strftime('%Y-%m-%dT%H:%M:%S.%fZ')
        T2 = convert_ooi_time(t+dt).strftime('%Y-%m-%dT%H:%M:%S.%fZ')
        # Generate the url and get the calibration data for a single deployment
        url = '/'.join((base_url,'12587','asset','cal?uid='+uid+'&beginDT={}&endDT={}'.format(T1,T2)))
        calData = requests.get(url, auth=(username, token)).json()
        # Fill out the data tuples
        instrument = instrument + (calData['description'],)
        serialNumber = serialNumber + (calData['serialNumber'],)
        dataSource = dataSource + (calData['calibration'][0]['calData'][0]['dataSource'],)
        lastModifiedTimestamp = lastModifiedTimestamp + (calData['calibration'][0]['calData'][0]['lastModifiedTimestamp'],)
        
    # Now, put the data tuples into the deploy data dataframe
    deployData['dataSource'] = dataSource
    deployData['lastModifiedTimestamp'] = lastModifiedTimestamp
    deployData['instrument'] = instrument
    deployData['serialNumber'] = serialNumber
    
    # Return the expanded deployment data
    return deployData

In [29]:
def reformat_dataSource(x):
    new = x.replace('_Cal_Info.xlsx','.csv')
    return new

In [30]:
cols = ('Array','Platform','Node','Instrument','RefDes','Asset ID','Serial Number','deployment','gitHub changeDate',
        'OOI changeDate','file','URL','changeType','dateRangeStart','dateRangeEnd','annotation','Wrong Date',
       'Wrong cal','Is missing','Is duplicate','Is good')

In [31]:
name_map = {
    'Array':None,
    'Platform':'subsite',
    'Node':'node',
    'Instrument':'instrument',
    'RefDes':'RefDes',
    'Asset ID':'UID',
    'Serial Number':'serialNumber',
    'deployment':'deploymentNumber',
    'gitHub changeDate':'Pull request #',
    'OOI changeDate':'lastModifiedTimestamp',
    'file':'dataSource',
    'dateRangeStart':'startTime',
    'dateRangeEnd':'endTime',
    'annotation':None,
    'Wrong Date':'Wrong Date',
    'Wrong cal':'Wrong cal',
    'Is missing':'Is missing',
    'Is duplicate':'Is duplicate',
    'Is good':'Is good'
}

In [32]:
def generate_arrayName(x):
    if 'GA' in x:
        arrayName = 'Global Argentine Basin'
    elif 'GI' in x:
        arrayName = 'Global Irminger Sea'
    elif 'GP' in x:
        arrayName = 'Global Station Papa'
    elif 'GS' in x:
        arrayName = 'Global Southern Ocean'
    elif 'CP' in x:
        arrayName = 'Coastal Pioneer'
    else:
        arrayName = np.nan
    return arrayName

In [33]:
def generate_gitHub_url(x):
    base_url = 'https://github.com/ooi-integration/asset-management/blob/master/calibration'
    inst = x.split('-')[1]
    full_url = '/'.join((base_url,inst,x))
    return full_url

In [34]:
def classify_changeType(x):
    statement = ''
    if x['Is good'] == True:
        return 'No errors found'
    elif x['Is missing'] == True:
        return 'Missing file added'
    elif x['Is duplicate'] == True:
        return 'File deleted'
    else:
        if x['Wrong Date'] == True:
            statement = statement + 'File renamed with correct date '
        if x['Wrong cal'] == True:
            statement = statement + 'Calibration coefficients were modified'
        return statement

In [35]:
def reformat_comdf(comdf):
    comdf['Array'] = comdf['Platform'].apply(generate_arrayName)
    comdf['OOI changeDate'] = comdf['OOI changeDate'].apply(convert_ooi_time)
    comdf['dateRangeStart'] = comdf['dateRangeStart'].apply(convert_ooi_time)
    comdf['dateRangeEnd'] = comdf['dateRangeEnd'].apply(convert_ooi_time)
    comdf['URL'] = comdf['file'].apply(generate_gitHub_url)
    comdf['changeType'] = comdf[['Wrong Date','Wrong cal','Is missing','Is duplicate','Is good']].apply(classify_changeType, axis=1)
    comdf.drop(columns=['Wrong Date','Wrong cal','Is missing','Is duplicate','Is good'], inplace=True)
    return comdf

In [36]:
print(np.unique(metadata_review['CLASS-SERIES']))

['ADCPSJ' 'ADCPSL' 'ADCPSN' 'ADCPTF' 'ADCPTG' 'CTDBPC' 'CTDBPD' 'CTDBPE'
 'CTDBPF' 'CTDBPP' 'DOSTAD' 'FLORDG' 'FLORTD' 'METBKA' 'NUTNRB' 'OPTAAD'
 'PCO2WB' 'PCO2WC' 'PHSEND' 'PHSENE' 'PHSENF' 'PRESFB' 'PRESFC' 'SPKIRB']


In [37]:
instrument = 'CTDBPC'

In [38]:
metadata_communications = pd.DataFrame(columns=cols).drop(columns=['Wrong Date','Wrong cal','Is missing','Is duplicate','Is good'])
metadata_communications

Unnamed: 0,Array,Platform,Node,Instrument,RefDes,Asset ID,Serial Number,deployment,gitHub changeDate,OOI changeDate,file,URL,changeType,dateRangeStart,dateRangeEnd,annotation


In [39]:
error_df = pd.DataFrame(columns=['uid','step'])
error_df

Unnamed: 0,uid,step


In [40]:
uids = np.unique(metadata_review[metadata_review['CLASS-SERIES'] == instrument]['UID'])
print(uids)

['CGINS-CTDBPC-06841' 'CGINS-CTDBPC-07208' 'CGINS-CTDBPC-50002'
 'CGINS-CTDBPC-50003' 'CGINS-CTDBPC-50056' 'CGINS-CTDBPC-50108'
 'CGINS-CTDBPC-50109']


In [41]:
for uid in uids:
    # Step 1: Get the deployment data
    try:
        deploydf = get_deployData(uid)
    except:
        edf = pd.DataFrame.from_dict({'uid':uid,'step':[1]})
        error_df = error_df.append(edf)
        continue
    # Step 2: Get the associated cal data for each deployment
    try:
        deploydf = get_calData(uid, deploydf)
    except:
        edf = pd.DataFrame.from_dict({'uid':uid,'step':[2]})
        error_df = error_df.append(edf)
        continue
    # Step 3: Reformat some of the deployment data to match OOI naming conventions
    try:
        deploydf['dataSource'] = deploydf['dataSource'].apply(reformat_dataSource)
        deploydf['RefDes'] = deploydf['subsite'] + '-' + deploydf['node'] + '-' + deploydf['sensor']
    except:
        edf = pd.DataFrame.from_dict({'uid':uid,'step':[3]})
        error_df = error_df.append(edf)
        continue
    # Step 4:
    try:
        udf = metadata_review[metadata_review['UID'] == uid]
    except:
        edf = pd.DataFrame.from_dict({'uid':uid,'step':[4]})
        error_df = error_df.append(edf)
        continue
    # Step 5:
    try:
        udf = udf.merge(deploydf, left_on='New Calibration CSV', right_on='dataSource')
    except:
        edf = pd.DataFrame.from_dict({'uid':uid,'step':[5]})
        error_df = error_df.append(edf)
        continue
    # Step 6: Generate the communications dataframe
    try:
        comdf = pd.DataFrame(columns=cols)
        for i in cols:
            if name_map.get(i) is not None:
                comdf[i] = udf[name_map.get(i)]
    except:
        edf = pd.DataFrame.from_dict({'uid':uid,'step':[6]})
        error_df = error_df.append(edf)
        continue
    # Step 7: Reformat many of the fields in the communications dataframe
    try:
        comdf = reformat_comdf(comdf)
    except:
        edf = pd.DataFrame.from_dict({'uid':uid,'step':[7]})
        error_df = error_df.append(edf)
        continue
    # Step 8: Append the comdf dataframe to the metadata_communications dataframe
    try:
        metadata_communications = metadata_communications.append(comdf)
    except:
        edf = pd.DataFrame.from_dict({'uid':uid,'step':[8]})
        error_df = error_df.append(edf)
        continue

In [42]:
comdf

Unnamed: 0,Array,Platform,Node,Instrument,RefDes,Asset ID,Serial Number,deployment,gitHub changeDate,OOI changeDate,file,URL,changeType,dateRangeStart,dateRangeEnd,annotation
0,Coastal Pioneer,CP03ISSM,RID27,CTD Pumped: CTDBP Series C,CP03ISSM-RID27-03-CTDBPC000,CGINS-CTDBPC-50109,16-50109,3,,2019-08-23 21:04:25.941,CGINS-CTDBPC-50109__20170718.csv,https://github.com/ooi-integration/asset-manag...,No errors found,2015-10-21 17:44:00,2016-05-14 13:45:00,
1,Coastal Pioneer,CP03ISSM,RID27,CTD Pumped: CTDBP Series C,CP03ISSM-RID27-03-CTDBPC000,CGINS-CTDBPC-50109,16-50109,5,,2019-08-23 21:04:25.941,CGINS-CTDBPC-50109__20170718.csv,https://github.com/ooi-integration/asset-manag...,No errors found,2016-10-11 13:39:00,2017-06-15 15:56:00,
2,Coastal Pioneer,CP03ISSM,RID27,CTD Pumped: CTDBP Series C,CP03ISSM-RID27-03-CTDBPC000,CGINS-CTDBPC-50109,16-50109,7,,2019-08-23 21:04:25.941,CGINS-CTDBPC-50109__20170718.csv,https://github.com/ooi-integration/asset-manag...,No errors found,2017-11-01 12:22:00,2018-03-28 16:56:00,
3,Coastal Pioneer,CP04OSSM,RID27,CTD Pumped: CTDBP Series C,CP04OSSM-RID27-03-CTDBPC000,CGINS-CTDBPC-50109,16-50109,10,,2019-08-23 21:05:48.276,CGINS-CTDBPC-50109__20180609.csv,https://github.com/ooi-integration/asset-manag...,No errors found,2019-04-05 14:55:00,NaT,


In [None]:
metadata_communications

In [None]:
filename = instrument + '_metadata_communications.csv'
filename

In [None]:
metadata_communications.to_csv(filename)

#### Get the relevant deployment info:

Rename the "data source" to match up with the calibration csv naming convention, and also generate the reference designator for the deployments of the instrument:

Now, I can merge the subselected dataframe above with the metadata review based on the key of 'New Calibration CSV'::'dataSource'. This should provide us with all the necessary data to fill the spreadsheet (that can filled out from M2M calls)

**====================================================================================================================**
### Filling the Metadata Communications Spreadsheet
Next, I want to begin filling the metadata communications spreadsheet. This will start by initializing a dataframe with the relevant columns, followed by a name mapping from the metadata tracking sheet::metadata communication spreadsheet.

In [None]:
comdf = pd.DataFrame(columns=cols)
for i in cols:
    if name_map.get(i) is not None:
        comdf[i] = udf[name_map.get(i)]

In [None]:
os.getcwd()

In [None]:
metadata_communications.to_csv('SPKIRB_metadata_communications.csv')