# 01 - Request OOI data

We will import the OOI Irminger Sea CTD Cast and Discrete Water Sample Summary dataset from a comma separated variable file on the BCO-DMO ERDDAP server. 

More info about the BCO-DMO ERDDAP server can be found at: https://guide.bco-dmo.org/access-and-reuse/erddap

911407_v1_ooi_irminger_sea_discrete_water_sampling_data.csv

In [1]:
import pandas as pd

In [2]:
# Import OOI data to the workspace using the
# ERDDAP link to the public dataset CSV, 
# and display the first 5 rows of the table.
ooi_irm = pd.read_csv(
    "https://erddap.bco-dmo.org/erddap/files/"
    "bcodmo_dataset_911407_v1/"
    "911407_v1_ooi_irminger_sea_discrete"
    "_water_sampling_data.csv"
)
ooi_irm.head()

Unnamed: 0,Cruise,Station,Target_Asset,Start_Latitude,Start_Longitude,Start_Time,Cast,Cast_Flag,Bottom_Depth_at_Start_Position,CTD_File,...,Discrete_pH_Replicate_Flag,Calculated_Alkalinity,Calculated_DIC,Calculated_pCO2,Calculated_pH,Calculated_CO2aq,Calculated_Bicarb,Calculated_CO3,Calculated_Omega_C,Calculated_Omega_A
0,KN221-04,1,Test Site #1,62.107,-31.381667,2014-09-08T11:39:06.000Z,1,*0000000000000100,,KN22104001.hex,...,,,,,,,,,,
1,KN221-04,1,Test Site #1,62.107,-31.381667,2014-09-08T11:39:06.000Z,1,*0000000000000100,,KN22104001.hex,...,,,,,,,,,,
2,KN221-04,1,Test Site #1,62.107,-31.381667,2014-09-08T11:39:06.000Z,1,*0000000000000100,,KN22104001.hex,...,,,,,,,,,,
3,KN221-04,1,Test Site #1,62.107,-31.381667,2014-09-08T11:39:06.000Z,1,*0000000000000100,,KN22104001.hex,...,,,,,,,,,,
4,KN221-04,1,Test Site #1,62.107,-31.381667,2014-09-08T11:39:06.000Z,1,*0000000000000100,,KN22104001.hex,...,,,,,,,,,,


In [3]:
# Print a list of the 80 columns in the Dataframe
ooi_irm.columns

Index(['Cruise', 'Station', 'Target_Asset', 'Start_Latitude',
       'Start_Longitude', 'Start_Time', 'Cast', 'Cast_Flag',
       'Bottom_Depth_at_Start_Position', 'CTD_File', 'CTD_File_Flag',
       'Niskin_Bottle_Position', 'Niskin_Flag', 'CTD_Bottle_Closure_Time',
       'CTD_Pressure', 'CTD_Pressure_Flag', 'CTD_Depth', 'CTD_Latitude',
       'CTD_Longitude', 'CTD_Temperature_1', 'CTD_Temperature_1_Flag',
       'CTD_Temperature_2', 'CTD_Temperature_2_Flag', 'CTD_Conductivity_1',
       'CTD_Conductivity_1_Flag', 'CTD_Conductivity_2',
       'CTD_Conductivity_2_Flag', 'CTD_Salinity_1', 'CTD_Salinity_2',
       'CTD_Oxygen', 'CTD_Oxygen_Flag', 'CTD_Oxygen_Saturation',
       'CTD_Fluorescence', 'CTD_Fluorescence_Flag', 'CTD_Beam_Attenuation',
       'CTD_Beam_Transmission', 'CTD_Transmissometer_Flag', 'CTD_pH',
       'CTD_pH_Flag', 'Discrete_Oxygen', 'Discrete_Oxygen_Flag',
       'Discrete_Oxygen_Replicate_Flag', 'Discrete_Chlorophyll',
       'Discrete_Phaeopigment', 'Discrete_Fo_

In [4]:
# Index a subset of parameters from the summary spreadsheet
# using column names printed above. The definitions of parameter
# names can be found towards the bottom of the dataset
# page under "More Information about this dataset" > "Parameters".

# Note shortened strings
subset_vars = [
    "Cruise", "Target_Asset", "Start_Latitude",
    "Start_Longitude", "Start_Time", "Cast",
    "CTD_Bottle_Closure_Time", "CTD_Latitude",
    "CTD_Longitude", "CTD_Pressure", "CTD_Depth",
    "pH", "Nitrate", "Nutrients", "Fluor",
    "Temperature", "Salinity", "Oxygen",
    "Chlorophyll", "Phosphate"
]

In [5]:

# Create a new list containing any parameter
# representing a measurement of one of the
# BGC variables of interest.
var_columns = list(())
for x in subset_vars:
    columns = list(ooi_irm.columns)
    for k in ooi_irm.columns:
        if x not in k:
            columns.remove(k)
    var_columns.extend(columns)

In [6]:
# Save a smaller dataset after indexing the
# dataframe with our list of columns
subset_irm = ooi_irm[var_columns]
subset_irm.to_csv("../data/interim/irminger_sea_subset.csv", index=False)

In [7]:
# Note the interim file can be used in subsequent notebooks