This repository contains code and data for the paper "A Neural-Mean Vecchia Gaussian Process for Unified Argo Modeling". We implement methods from two related works—"Locally stationary spatio-temporal interpolation of Argo profiling float data" and "A functional-data approach to the Argo data"—and compare their performance against our proposed approach: a one-stop Gaussian Process regression based on Vecchia approximation.
Large data files in this repository are tracked using Git Large File Storage (Git LFS). Downloading the repository as a ZIP file from GitHub will not include the full data. Please clone the repository with Git and ensure Git LFS is installed.
Running MWGP or VGP for the global scenario on a personal laptop is not recommended due to high computational cost.
-
One
.matfile containing the Argo dataset preprocessed by Kuusela and Stein, along with a.mfile containing MATLAB code used to extract the three-month subset (January–March). Both are imported from https://github.com/mkuusela/ArgoMappingPaper with minor modifications to save two more variables:profMonthAggrandprofModeAggr. Running the.mfile requires downloading beforehandRG_ArgoClim_Temperature_2019.ncfrom https://sio-argo.ucsd.edu/RG/RG_ArgoClim_Temperature_2019.nc.gz -
Two
.RDatafiles, which serve as input data for the scenario studies presented in the paper:RG_Defined.RDatastores a mean field from the RG Argo Climatology (2004-2018), and we use it to determine at which grid points a mean field is defined.jan_march_data.RDatais generated by runningprocess_data.Rand stores the data used to implement the three methods compared in the paper: MVGP, KFD, and VGP. -
select_subregion.R: Script used to generate input data for regional scenario studies. -
One
.txtfile: A fitted mean field shared by Kuusela and Stein, which stores the grid points on which mean fields are defined when implementing MWGP.
- Several
.Rfiles used to implement Model 2 and Model 5 in "Locally stationary spatio-temporal interpolation of Argo profiling float data". These scripts are primarily based on the MATLAB code available at https://github.com/mkuusela/ArgoMappingPaper. ./FitMeanFields/: The fitted mean fields for the global scenario are saved in./FitMeanFields/glob. For regional scenarios, the fitted mean fields are saved in./FitMeanFields/sub.
-
Several
.Rfiles prefixed with numbers, indicating the order in which they should be executed when implementing the method from "A functional-data approach to the Argo data". The files are imported from https://github.com/dyarger/argofda with minor modifications to accommodate our 80/20 training–testing setup. -
./functions/: R scripts imported from https://github.com/dyarger/argofda, which define functions required for the functional data approach.
-
Preprocessing.R: Script used to generate CSV files to be saved in./data. -
./pkg: A python package that can be installed locally. -
./experiments: Contains scripts for applying Vecchia GP regression with a neural mean to Argo data.
git clone https://github.com/BrowNian6/Argogit lfs installgit lfs pullcd ./data/Rscript select_subregion.R 1Rscript select_subregion.R 2
cd ./MWGP- run
Preprocessing.Rin Rstudio after specifying the region of interest on Line 2 of the script - run
MonthlyMeanF_glob.RorMonthlyMeanF_subregion.Rto fit monthly mean fields for the region of interest - train and test MWGP models under certain scenarios:
-
model2_train_glob.Randmodel2_test_glob.R: Training and testing under the regional scenarios using Model 2 (MWGP-S). -
model5_train_glob.Randmodel5_test_glob.R: Training and testing under the regional scenarios using Model 5 (MWGP-ST). -
model2_train_subregion.Randmodel2_test_subregion.R: Training and testing under the regional scenarios using Model 2 (MWGP-S). -
model5_train_subregion.Randmodel5_test_subregion.R: Training and testing under the regional scenarios using Model 5 (MWGP-ST).
-
cd ./KFD- run
00_preprocessing.Rusing Rstudio after specifying the region of interest on Line 2 of the script - after specifying the region of interest on Line 1 of each script, run the R scripts sequentially, beginning with files named
01through07 - after specifying the region of interest on Line 1 of
08_load_nugget.Rand09_predTemp.R, test the trained KFD model by running09_predTemp.R
cd ./VGP- run
Preprocessing.Rusing Rstudio - (optional) create python venv
python3 -m venv venvsource venv/bin/activate
- install dependencies
pip install torchpip install scikit-learnpip install pandas
cd ./VGP/pkg/pip install .cd ./VGP/experimentspython3 argo.py <domain> <time_span> <depth_level>-
domain should be one of ["global", "region1", "region2"]
-
time_span should be one of ["Feb", "Jan-Mar"]
-
depth_level should be one of ["10", "300", "1500"]
-
for example,
python3 argo.py global Feb 10
-