# A Quick General Walkthrough from Ship to Shore to FOCI Datacenter

Data comes from FOCI investigation in one of two general manners:  
__delayed mode__ : downloaded upon retrieval of instrumentation  
__realtime mode__ : transmitted via serial connection or satellite uplink  

Realtime has historically been:
- ARGO (clsamerica)
- Irridium SBD packets (often ingested via EDD/SDIG rudics systems
- TELOS
- as well as via direct connection to instrumentation via serial ports

We also use the following terminalogy to describe data through various stages of processing:
- preliminary / near real time (NRT): this is data with minimal qc and minimal metadata... used for quick looks and initial explorations
- working: not often hosted but is an intermediary state where some qc has been applied but not completed
- final: vetted data with major QC attributes take care of

Common Data Types:
- Profile data (CTD) at point locations. Collected usually by Cruise
- Discrete Sampled Profile data (CTD nutrients/oxy/chlor bottles) at point locations. Collected usually by Cruise
- Moored timeseries data at point locations.  Organized first by depth, then collected by mooring site
- Moored profile (prawler).  TimeSeriesProfile.  Organized by mooring site.
- Moored gridded profile (binned gridded ADCP). Organized by mooring site.
- TrajectoryProfile (gliders). Organized by campaign

Common Synthesized Datasets:
- merged continuous profile data (CTD - commonly binned to 1m) with discrete data
- gridded (on a common timestep... usually hourly) moored timeseries data
- gridded (in depth and time?) timeseriesprofile data (1m and 1hr?)
- gridded (in depth and time?) TrajectoryProfile data (1m)

_Standard QC_
- baseline corrections to collocated field characterizations / calibrations
- application of most relevant calibration equations
- removal of spikes and singleton outliers (manual edits to a csv file or - see [EcoFOCIpy_1d_filter_example.ipynb](EcoFOCIpy_1d_filter_example.ipynb) for other options)
- removal of known periods of non-ideal operation (engineering issues, contamination, etc... usually from logs and notes)
- secondary characterization and correction equations

## Data Preprocessing

Some instruments like SBE hex files and ADCP raw files need to be translated to a known format (csv, ascii), and some vendors (SBE, Seaguards, ADCP) have software that can do much of the initial calculations to obtain oceanographic data from engineering data.  Where possible, use vendor software or vendor equations as these are normally well documented (or at least point to a source reference for the equations).

## Data Archiving (EcoFOCI-centric)

1. Primary disk space for all data is on the *EcoRAID* raid server 
    - General data structure:  
    {year}/{operation_type}/
2. Primary RESTfull access is via erddap (current machine: akutan)
    - Github repo with erddap dataset descriptions - [https://github.com/NOAA-PMEL/EcoFOCI_AutoAnalysis/](https://github.com/NOAA-PMEL/EcoFOCI_AutoAnalysis/pulls) - where each entry can be merged with all other entries within a folder to build a datasets.xml
3. Mutliple locations for field meta information and media data
    - ecofoci mariadb database (current machine: akutan)
    - media like images (CNSD: mule/pesto)
    - documentation (Github: ecofoci_xxxx, Google Drive: pmel-foci functional account or member drives)

## Data QC'ing (EcoFOCI-centric)

1. Two workflows (really similar)
    - erddap driven (griddap can go via xarry, tabledap will by via pandas)
    - raw data (often netcdf) driven