# SV3 Data Preprocessing 

This notebook runs a user through the steps to select a campaign and preprocess all the raw data into the inputs necessary to run GARPOS.  

It is specific to the steps for processing SV3 data.  

In [None]:
import os
from pathlib import Path

from es_sfgtools.processing.pipeline import DataHandler
from es_sfgtools.utils.archive_pull import load_site_metadata
from es_sfgtools.utils.loggers import BaseLogger


### Confirm required environment variables are set

In [None]:
# this must be set correctly for GO executables to translate novatel to rinex

#Linux
!echo $LD_LIBRARY_PATH

In [None]:
# this confirms PRIDE-PPPAR is in the PATH
!which pdp3

## Step 1. Initial Setup


#### Browse available campaigns from the community archive and select target
- Locate the campaign of interest in https://gage-data.earthscope.org/archive/seafloor, and note the `network`, `station`, and `campaign` names, which will be input in the cell below.  
- Note: the cascadia-gorda raw data is currently hidden (by request) but still usable, here are the available campaigns

|  | GCC1 | NBR1 | NCC1 |
|---|---|---|---|
| **2022** |2022_A_1065 | 2022_A_1065  |  2022_A_1065 |
| **2023** |  2023_A_1063 | 2023_A_1063 | 2023_A_1063 |
| **2024** |  2024_A_1126 |  2024_A_1126 | 2024_A_1126 |
- In order to use this notebook to process new campaigns, the data must first be submitted and made available from the community archive 

In [None]:
# Input survey parameters
network='cascadia-gorda'
site='NCC1'
campaign='2023_A_1063'

# Set data directory path for local environment
data_dir = Path(f"{os.path.expanduser('~/data/sfg')}")
os.makedirs(data_dir, exist_ok=True)

#### USE THE FOLLOWING DEFAULTS UNLESS DESIRED####
data_handler = DataHandler(directory=data_dir)
data_handler.change_working_station(network=network, station=site, campaign=campaign)
BaseLogger.set_dir(data_handler.station_log_dir)

pipeline, config = data_handler.get_pipeline_sv3()


# Step 2. Inventory available data and its location
This step checks the archive and creates an inventory of whats available for a given campaign

In [None]:
data_handler.update_catalog_from_archive()

In [None]:
# See what files already exist locally
data_type_counts = data_handler.get_dtype_counts()
print(f"Local data directory contains the following:")
for item in data_type_counts.items():
    print(f"    {item[0]}: {item[1]}")

## Step 3. Pull data from remote archive

#### Download files if not already present
Observable file types depend on whether data was collected with an SV2 or SV3 waveglider.  
You can download the default file types, or specify a specific type to download.

![Alt text](garpos_flow.png)

In [None]:
####### Download default file types for SV2 or SV3

data_handler.download_data()

####### OR Download the files by type

# data_handler.download_data(file_type='sonardyne', show_details=False)
# data_handler.download_data(file_type='novatel', show_details=False)
# data_handler.download_data(file_type='master', show_details=False)
# data_handler.download_data(file_type='svpavg', show_details=False)
# data_handler.download_data(file_type='leverarm', show_details=False)
# data_handler.download_data(file_types='dfop00')
# data_handler.download_data(file_types='novatel770')
# data_handler.download_data(file_types='novatel000')

# Step 4.  Process raw files and build GARPOS observation input (shotdata)

### 4.1 Read DFOP00 files containing ping/reply sequences, write them to the shotdata tiledb array
Config options: 
- override: bool = Field(False, title="Flag to Override Existing Data")

In [None]:
config.dfop00_config.override=False
pipeline.config = config
pipeline.process_dfop00()

### 4.2 Read the novatel range messages and normalize the observations to tiledb
Config options: 
- override: bool = Field(False, title="Flag to Override Existing Data")
- n_processes: int = Field(default_factory=cpu_count, title="Number of Processes to Use")

In [None]:
config.novatel_config.override=False
pipeline.config = config
pipeline.pre_process_novatel()

### 4.3 Generate daily rinex 2.11 files from the tiledb observations
Config options: 
- override: bool = Field(False, title="Flag to Override Existing Data")
- override_products_download: bool = Field(False, title="Flag to Override Existing Products Download")
- n_processes: int = Field(default_factory=cpu_count, title="Number of Processes to Use")
- settings_path: Optional[Path] = Field("", title="Settings Path")
- time_interval: Optional[int] = Field(1, title="Tile to Rinex Time Interval [h]")
- processing_year: Optional[int] = Field(default=-1,description="Processing year to query tiledb",le=2100)

In [None]:
config.rinex_config.override=False
pipeline.config = config
pipeline.get_rinex_files()

### 4.4 Process the rinex files using PRIDE-PPPAR to solve for waveglider positions
Config options: 
- system (str): The GNSS system(s) to use. Default is "GREC23J" which is “GPS/GLONASS/Galileo/BDS/BDS-2/BDS-3/QZSS”.
        frequency (list): The GNSS frequencies to use. Default is ["G12", "R12", "E15", "C26", "J12"]. Refer to Table 5-4 in PRIDE-PPP-AR v.3.0 manual for more options.
- loose_edit (bool): Disable strict editing mode, which should be used when high dynamic data quality is poor. Default is True.
- cutoff_elevation (int): The elevation cutoff angle in degrees (0-60 degrees). Default is 7.
- start (datetime): The start time used for processing. Default is None.
- end (datetime): The end time used for processing. Default is None.
- interval (float): Processing interval, values range from 0.02s to 30s. If this item is not specified and the configuration file is specified, the processing interval in the configuration file will be read, otherwise, the sampling rate of the observation file is used by default.
- high_ion (bool): Use 2nd ionospheric delay model with CODE's GIM product. When this option is not entered, no higher-order ionospheric correction is performed. Default is False.
- tides (str): Enter one or more of "S" "O" "P", e.g SO for solid, ocean, and polar tides. Default is "SOP", which uses all tides.
- override: bool = Field(False, title="Flag to Override Existing Data")

In [None]:
config.pride_config.override=False
pipeline.config = config   
pipeline.process_rinex()

### 4.5 Read the PRIDE position results (kin) files and write them to tiledb

In [None]:
pipeline.process_kin()

### 4.6 Merge the PPP position solutions back into the shotdata.  
This step interpolates 1 hz position solutions in order to accurately position the waveglider for each ping/reply.  This step can take some time.

Config options: 
- override: bool = Field(False, title="Flag to Override Existing Data")

In [None]:
config.position_update_config.override=False
pipeline.config = config  
pipeline.update_shotdata()