Cross-validation-for-Geospatial-Data

This repository hosts datasets and code for the paper "Cross-validation for Geospatial Data: A Framework for Estimating Generalization Performance in Geostatistical Problems". We compared the performance of five cross-validation (CV) methods - standard K-Fold CV (KFCV), BLocking CV (BLCV), BuFfered CV (BFCV), Importance-Weighted CV (IWCV) and our proposed Importance-weighted Buffered CV (IBCV) - in various geospatial scenarios.

Datasets

We provided six simulation datasets and 15 real datasets. The following abbreviations serve as [dataset name] in a command line.

Simulation: sim_sd, sim_si, sim_sdcs, sim_sics, sim_sirs, sim_sipcs
HEWA1800: hewa1800_sd, hewa1800_si, hewa1800_sdcs, hewa1800_sics
HEWA1000: hewa1000_sd, hewa1000_si, hewa1000_sdcs, hewa1000_sics
WETA1800: weta1800_sd, weta1800_si, weta1800_sdcs, weta1800_sics
Alaska: alaska
Housing: house_bay, house_latitude

Environment Installation

To run the code, install the dependencies in requirements.

pip install -r requirements.txt

Basic usage

To compute model errors and their estimates of five CV methods on a specific dataset:

python run.py --dataset [dataset name]

Take the Simulation Scenario SD (sim_sd) dataset for example:

python run.py --dataset sim_sd

The results will be saved in a csv file automatically.

Options

To run any of the following scripts, please install the dependencies in requirements_extra first.

gen_sim: It produces the simulation datasets. Users can generate simulations as many as they want by sim, and change the number of sampling points and sampling strategy as well.
bcv: It splits the training set into blocks based on their geocoordinates, and then assign blocks into folds for cross-validation. Users can fine-tune the hyperparameters the number of folds by k and the block size by bs.
cramer: It performs the statistical test on training and test features and reports the statistics and p value. Users can set the significance level by alpha.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
bcv		bcv
data		data
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
bcv.r		bcv.r
cramer.r		cramer.r
cv_base.py		cv_base.py
gen_sim.jl		gen_sim.jl
requirements.txt		requirements.txt
requirements_extra.txt		requirements_extra.txt
run.py		run.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bcv

bcv

data

data

.gitignore

.gitignore

LICENSE

LICENSE

README.md

README.md

bcv.r

bcv.r

cramer.r

cramer.r

cv_base.py

cv_base.py

gen_sim.jl

gen_sim.jl

requirements.txt

requirements.txt

requirements_extra.txt

requirements_extra.txt

run.py

run.py

utils.py

utils.py

Repository files navigation

Cross-validation-for-Geospatial-Data

Datasets

Environment Installation

Basic usage

Options

About

Releases

Packages

Languages

License

Hutchinson-Lab/Cross-validation-for-Geospatial-Data

Folders and files

Latest commit

History

Repository files navigation

Cross-validation-for-Geospatial-Data

Datasets

Environment Installation

Basic usage

Options

About

Resources

License

Stars

Watchers

Forks

Languages