sensorQC is a flexible framework for QAQCing high-frequency data for a continuously evolving catalogue of sensors
Switch branches/tags
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
R
README_files/figure-markdown_github
inst/extdata
man
tests
.gitignore
.travis.yml
DESCRIPTION
LICENSE
NAMESPACE
README.Rmd
README.md
appveyor.yml

README.md

Installation

To install the stable version of sensorQC package with dependencies:

install.packages("sensorQC", 
    repos = c("http://owi.usgs.gov/R","http://cran.rstudio.com/"),
    dependencies = TRUE)

Or to install the current development version of the package (using the devtools package):

devtools::install_github("USGS-R/sensorQC")

This package is still very much in development, so the API may change at any time.

Name Status
Linux Build: Build Status
Windows Build: Build status
Package Tests: Coverage Status

High-frequency aquatic sensor QAQC procedures. sensorQC imports data, and runs various statistical outlier detection techniques as specified by the user.

sensorQC Functions (as of v0.4.0)

Function Title
read read in a file for sensor data or a config (.yml) file
window window sensor data for processing in chunks
plot plot sensor data
flag create data flags for a sensor
clean remove or replace flagged data points

example usage

library(sensorQC)
## This information is preliminary or provisional and is subject to revision. It is being provided to meet the need for timely best science. The information has not received final approval by the U.S. Geological Survey (USGS) and is provided on the condition that neither the USGS nor the U.S. Government shall be held liable for any damages resulting from the authorized or unauthorized use of the information. Although this software program has been used by the USGS, no warranty, expressed or implied, is made by the USGS or the U.S. Government as to the accuracy and functioning of the program and related program material nor shall the fact of distribution constitute any such warranty, and no responsibility is assumed by the USGS in connection therewith.
file <- system.file('extdata', 'test_data.txt', package = 'sensorQC') 
sensor <- read(file, format="wide_burst", date.format="%m/%d/%Y %H:%M")
## number of observations:5100
flag(sensor, 'x == 999999', 'persist(x) > 3', 'is.na(x)')
## object of class "sensor"
##                  times     x
## 1  2013-11-01 00:00:00 48.86
## 2  2013-11-01 00:00:01 49.04
## 3  2013-11-01 00:00:02 49.50
## 4  2013-11-01 00:00:03 48.91
## 5  2013-11-01 00:00:04 48.90
## 6  2013-11-01 00:00:05 48.96
## 7  2013-11-01 00:00:06 48.48
## 8  2013-11-01 00:00:07 48.97
## 9  2013-11-01 00:00:08 48.97
## 10 2013-11-01 00:00:09 48.99
## 11 2013-11-01 00:00:10 48.35
## 12 2013-11-01 00:00:11 48.51
## 13 2013-11-01 00:00:12 49.25
## 14 2013-11-01 00:00:13 48.82
## 15 2013-11-01 00:00:14 49.22
##   ...
## x == 999999 (15 flags)
## persist(x) > 3 (4 flags)
## is.na(x) (0 flags)

Use the MAD (median absolute deviation) test, and add w to the function call to specify "windows" (note, sensor must be windowed w/ window() prior to using w)

sensor = window(sensor, type='auto')
flag(sensor, 'x == 999999', 'persist(x) > 3', 'MAD(x,w) > 3', 'MAD(x) > 3')
## object of class "sensor"
##                  times     x
## 1  2013-11-01 00:00:00 48.86
## 2  2013-11-01 00:00:01 49.04
## 3  2013-11-01 00:00:02 49.50
## 4  2013-11-01 00:00:03 48.91
## 5  2013-11-01 00:00:04 48.90
## 6  2013-11-01 00:00:05 48.96
## 7  2013-11-01 00:00:06 48.48
## 8  2013-11-01 00:00:07 48.97
## 9  2013-11-01 00:00:08 48.97
## 10 2013-11-01 00:00:09 48.99
## 11 2013-11-01 00:00:10 48.35
## 12 2013-11-01 00:00:11 48.51
## 13 2013-11-01 00:00:12 49.25
## 14 2013-11-01 00:00:13 48.82
## 15 2013-11-01 00:00:14 49.22
##   ...
## x == 999999 (15 flags)
## persist(x) > 3 (4 flags)
## MAD(x,w) > 3 (129 flags)
## MAD(x) > 3 (91 flags)

Use sensorQC with a simple vector of numbers:

flag(c(3,2,4,3,3,4,2,4),'MAD(x) > 3')
## object of class "sensor"
##   x
## 1 3
## 2 2
## 3 4
## 4 3
## 5 3
## 6 4
## 7 2
## 8 4
## 
## MAD(x) > 3 (0 flags)

plotting data

plot dataset w/ outliers:

plot(sensor)

plot dataset w/o outliers:

flagged = flag(sensor, 'x == 999999', 'persist(x) > 3', 'MAD(x,w) > 3', 'MAD(x) > 3')
plot(flagged)

cleaning data

The clean function can be used to strip flagged data points from the record or replace them with other values (such as NA or -9999)

data = c(999999, 1,2,3,4,2,3,4)
sensor = flag(data, 'x > 9999')
clean(sensor)
## object of class "sensor"
##   x
## 1 1
## 2 2
## 3 3
## 4 4
## 5 2
## 6 3
## 7 4
clean(sensor, replace=NA)
## object of class "sensor"
##    x
## 1 NA
## 2  1
## 3  2
## 4  3
## 5  4
## 6  2
## 7  3
## 8  4

if you have multiple flag rules, you can choose which ones to use by their index:

data = c(999999, 1,2,3,4,2,3,4)
sensor = flag(data, 'x > 9999', 'x == 3')
clean(sensor, which=1)
## object of class "sensor"
##   x
## 1 1
## 2 2
## 3 3
## 4 4
## 5 2
## 6 3
## 7 4
clean(sensor, which=2)
## object of class "sensor"
##        x
## 1 999999
## 2      1
## 3      2
## 4      4
## 5      2
## 6      4

or flag data and clean data all in one step:

clean(data, 'x > 9999', 'persist(x) > 10', 'MAD(x) > 3', replace=NA)
## object of class "sensor"
##    x
## 1 NA
## 2  1
## 3  2
## 4  3
## 5  4
## 6  2
## 7  3
## 8  4

flagging data with a moving window

The MAD(x,w) function can use a rolling window by leveraging the RcppRoll R package.

sensor <- read(file, format="wide_burst", date.format="%m/%d/%Y %H:%M")
## number of observations:5100
sensor = window(sensor, n=300, type='rolling')
flag(sensor, 'x == 999999', 'persist(x) > 3', 'MAD(x,w) > 3', 'MAD(x) > 3')
## Warning in MAD.roller(x, w): MAD.roller function has not been robustly
## tested w/ NAs

## object of class "sensor"
##                  times     x
## 1  2013-11-01 00:00:00 48.86
## 2  2013-11-01 00:00:01 49.04
## 3  2013-11-01 00:00:02 49.50
## 4  2013-11-01 00:00:03 48.91
## 5  2013-11-01 00:00:04 48.90
## 6  2013-11-01 00:00:05 48.96
## 7  2013-11-01 00:00:06 48.48
## 8  2013-11-01 00:00:07 48.97
## 9  2013-11-01 00:00:08 48.97
## 10 2013-11-01 00:00:09 48.99
## 11 2013-11-01 00:00:10 48.35
## 12 2013-11-01 00:00:11 48.51
## 13 2013-11-01 00:00:12 49.25
## 14 2013-11-01 00:00:13 48.82
## 15 2013-11-01 00:00:14 49.22
##   ...
## x == 999999 (15 flags)
## persist(x) > 3 (4 flags)
## MAD(x,w) > 3 (187 flags)
## MAD(x) > 3 (91 flags)