Skip to content

🤔🔢 An R package for seeking out suspicious data frame qualities

Notifications You must be signed in to change notification settings

McCartneyAC/suspicious

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

25 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

suspicious

An R package for seeking out suspicious data frame qualities

Based on Quartz's guide to bad data, this package provides functions for doing data checks for common errors, such as the use of 99999 for NA, invalid or faked data (i.e. zip code is 12345 but you aren't in Schenectady) or problems with data tables being cut off (e.g. Apple Numbers app cuts off after 255 columns, which can result in missing data).

Installation

install.packages("devtools")

devtools::install_github("McCartneyAC/suspicious")

An Example

One of the checks available is a visual check, using ggplot2, of the adherence of a vector of numbers to Benford's Law. Inputting a vector plots the frequency of each initial digit against the frequencies theorized by Benford's Law. This is sometimes used forensically to uncover forged financial documents, but proceed with caution; it's not a foolproof test of tampered data.

dat<-runif(1000, min = 1000, max = 9999)

suspicious::suspect_benford(dat)

Should result in:

Benford's Law Visual Check Example

About

🤔🔢 An R package for seeking out suspicious data frame qualities

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages