**SRP 04/21/2021:**

**PURPOSE:** Configure underway and discrete files that were extracted from the database.

In [1]:
import pandas as pd
import numpy as np

In [2]:
underway = pd.read_csv('/mnt/storage/labs/mitchell/projects/nasacms2018/analysis/data/gnatsat_workflow/01a-underway-gnats.csv', na_values=-999)
discrete = pd.read_csv('/mnt/storage/labs/mitchell/projects/nasacms2018/analysis/data/gnatsat_workflow/02a-discrete-gnats.csv', na_values=-999)
xbt = pd.read_csv('/mnt/storage/labs/mitchell/projects/nasacms2018/analysis/data/gnatsat_workflow/03-xbt-gnats.csv', na_values=-999)

### UW Formatting:

* Drop unnecessary columns
* Drop Nan columns
* Rename columns
* Calculate and insert bb standard error columns
* Check for cruisename and datetime disagreement
* Sort by datetime

In [9]:
uw = underway.copy()
uw_rm_cols = ['CruiseID','UWStation', 'Cast']
uw_rename_cols = {'Temperature':'UWTemperature','Salinity':'UWSalinity','SigmaTheta':'UWSigmaTheta'}
uw_nan_cols = [col for col in uw.columns[uw.isnull().all()]]

# Drop Columns:
uw.drop(columns=uw_rm_cols + uw_nan_cols, inplace=True)

# Rename Columns:
uw.rename(columns=uw_rename_cols, inplace=True)

# BB St.Err. Columns:
uw.insert(uw.columns.get_loc('bbprimeStd')+1, 'bbprimeStErr', uw['bbprimeStd']/np.sqrt(uw['numSamples']))
uw.insert(uw.columns.get_loc('bbtot532Std')+1, 'bbtot532StErr', uw['bbtot532Std']/np.sqrt(uw['numSamples']))
uw.insert(uw.columns.get_loc('bbacidStd')+1, 'bbacidStErr', uw['bbacidStd']/np.sqrt(uw['numSamples']))

# Sort Dataframe by Datetime
uw['UWTime'] = pd.to_datetime(uw['UWTime'])
uw.sort_values(by='UWTime', inplace=True, ignore_index=True)

uw.to_csv('/mnt/storage/labs/mitchell/projects/nasacms2018/analysis/data/gnatsat_workflow/01b-underway-formatted-gnats.csv', index=False)

### Discrete Formatting:

* Drop unnecessary columns
* Drop nan columns
* Rename columns
* Check for cruisename and datetime disagreement
* Sort by datetime
* Note that the discrete file was taken largely from the StationDataTable in the database. This table combines both Balch Lab discrete samples, and CTD data. However, for GNATS cruises, there is no CTD data. Therefore, all of the CTD related variables/columns will be null. We need to drop these columns.

In [10]:
d = discrete.copy()
d_rm_cols = ['StationNumber', 'BalchSampleNumber', 'Niskin', 'TimeFired', 'Forel-Ule']
d_nan_cols = [col for col in d.columns[d.isnull().all()]]

# Drop Columns:
d.drop(columns=d_rm_cols + d_nan_cols, inplace=True)

# Sort Dataframe by Datetime, secondarily by StationInfoID
d['StationTime'] = pd.to_datetime(d['StationTime'])
d.sort_values(by='StationTime', inplace=True, ignore_index=True)

d.to_csv('/mnt/storage/labs/mitchell/projects/nasacms2018/analysis/data/gnatsat_workflow/02b-discrete-formatted-gnats.csv', index=False)

* UWStation matches d.StationNumber


* Temperature is UW system temperature, from UWDataTable
* Salinity is UW system
* SigmaTheta from UW system
* StationInfoID matches XBT StationInfoID
* Longitude, Latitude, from discrete
* Temperature1, Temperature2 from CTD
* Salinity1, Salinity2 from CTD
* SigT1, SigT2, from CTD

In [22]:
xbt.columns

Index(['StationInfoID', 'CastType', 'StationTime', 'Latitude', 'Longitude',
       'Depth', 'Temperature1'],
      dtype='object')