Skip to content

Latest commit

 

History

History
112 lines (75 loc) · 5.86 KB

methods.md

File metadata and controls

112 lines (75 loc) · 5.86 KB

Methods

Data Access

Retrieve the [Salvage Database]((https://filelib.wildlife.ca.gov/Public/salvage/)

Source data are freely available from the California Department of Fish and Wildlife Fish Salvage Monitoring Program on a public data server.

The bulk of the data access protocol involves converting the .accdb salvage database file on the remote server to a local set of .csv files named by the tables in the database. We accomplish this in two lines of code by pulling and then running a stable Docker software container that contains a set of bash scripts designed specifically for this task. The specific image used for data access is called accessor, is freely available on Docker Hub, and has default setting configured for the salvage database. Code for the construction of the accessor image is available in its repository.

For accessability and reproducibility, we provide an up-to-date version of the salvage data as .csvs from the "current" (1993 - Present) salvage database file (Salvage_data_FTP.accdb). The data can be downloaded via various methods from the repository, including from the website.

Updates are executed via cron jobs on Github Actions and pushed to GitHub as tagged Releases.


Build A Salvage Container

To use the current image to generate an up-to-date container with data for yourself

  1. Install Docker
    • Specific instructions vary depending on OS
  2. Open up a docker-ready terminal
  3. Download the image
sudo docker pull dapperstats/salvage
  1. Build the container
sudo docker container run -ti --name salvage dapperstats/salvage
  1. Copy the data out from the container
sudo docker cp salvage:/data .

Bring the Data into R

An additional conversion makes the data available in R as a list of data.frames that is directly analogous to the .accdb database of tables.

The reading into R is conducted via functions included in the r_functions.R script in the accessor image and available in the public code repository.


Within a Salvage Container

Building on the list above, you can leverage the r_script.R script included in the image, which sources the r_functions.R script and loads the database in as an R object named database. Docker provides ample access and avenues to run R within the container. For example, the docker exec command opens a full API for running within the top (read/write) layer of the container, allowing an endless supply of R code to be included within a single character input:

  1. Run a bash R script from the command line
sudo docker exec -i salvage R -e "source('scripts/r_script.R'); <additional R code>"

You can copy your own scripts into the image and then run them from that environment:

  1. Copying a script into the image
sudo docker cp path/to/my_r_script.R salvage:/scripts
  1. Running the script in the image
sudo docker exec -i salvage R -e "source('scripts/r_script.R'); source('scripts/my_r_script.R')"

Note that we are running the main r_script.R first still here; that script does not save any files externally, so the R session in 6. is gone when we run 8.. For the sake of simplifying the command line call, it is therefore recommended that expanded uses follow 8. and use my_r_script.R as a hub file that directs all of your specific functions, including files saved out from R. For simplicity, saving out all files into a single folder, e.g. output allows a single docker command for retrieving the results from the top layer of the container:

  1. Copy the output out from the container
sudo docker cp salvage:/output .

Working in an Open R Session

Alternatively, the functions are written in only base R, so they should be reproducibly functional outside the image (in an open session).

Within an instance of R, navigate to where you have read the data out from the container into/where this code repository located and source r_script.R:

  1. Load the functions and read in the data
source("scripts/r_script.R")

The resulting database object is a named list of the database's tables, ready for analyses.


Data Preparation

Data preparation code is in development!

Having brought the data into R as-is, we can now prepare them for summaries and analyses. We use the functions included in the salvage_functions.R R script, which is included within the and salvage docker image, which provides a stable runtime environment for the analyses and output generation (including website rendering).


Continuous Deployment

Updates are executed via cron jobs on Github Actions with a workflow described by the docker_cron.yaml file.