Facilitates cleaning, exploring and visualising large-ish datasets (hundreds of thousands to millions of observations with tens to hundreds of variables).
These are mostly wrapper and convenience functions to pre-process (wrangle, explore, clean, etc.) data-sets. Assumes you're happy with tidyverse and the basics of data.table.
Install from GitHub:
install.packages("devtools") library(devtools) install_github("AntonioJBT/episcout")
This is a basic example of things you can do with episcout:
library(episcout) # A data frame: n <- 20 df <- data.frame(var_id = rep(1:(n / 2), each = 2), var_to_rep = rep(c('Pre', 'Post'), n / 2), x = rnorm(n), y = rbinom(n, 1, 0.50), z = rpois(n, 2) ) # Print the first few rows and last few rows: dim(df) epi_head_and_tail(df, rows = 2, cols = 2) epi_head_and_tail(df, rows = 2, cols = 2, last_cols = TRUE) # Get all duplicates: check_dups <- epi_clean_get_dups.R(df, 'var_id', 1) dim(check_dups) check_dups # Get summary descriptive statistics for numeric/integer column: num_vec <- df$x desc_stats <- epi_stats_numeric(num_vec) class(desc_stats) lapply(desc_stats, class) desc_stats # And many more functions for cleaning, stats and plotting that do things a bit faster or more conveniently and I couldn't easily find in other packages.
Pull requests welcome!
If you have any issues, pull requests, etc. please report them in the issue tracker.
- Version 0.1.1 First release
- Version 0.1.2