/
README.Rmd
81 lines (64 loc) · 2.78 KB
/
README.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
---
output: github_document
---
<!-- README.md is generated from README.Rmd. Please edit that file -->
```{r, echo = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = "man/figures/README-"
)
```
# dpurifyr
[![Build Status](https://travis-ci.org/teramonagi/dpurifyr.svg?branch=master)](https://travis-ci.org/teramonagi/dpurifyr)
[![CRAN\_Status\_Badge](http://www.r-pkg.org/badges/version/dpurifyr)](http://cran.r-project.org/package=dpurifyr)
## Overview
dpurifyr package gives you a practical way for `data preprocessing` providing a consistent set of verbs that help you solve the most common data preprocessing challenges.
## Installation
You can install from CRAN with:
```{r gh-installation, eval = FALSE}
# Not yet
# install.packages("dpurifyr")
# Or the development version from GitHub:
# install.packages("devtools")
devtools::install_github("teramonagi/dpurifyr")
```
## Simple Example
The following example uses dpurifyr to solve a fairly realistic problem: apply different types of data preprocessing (`standard_scale` and `scale_minmax`) to columns (`dplyr::starts_with("Sepal")` and `Petal.Width`) selected by the way which is consistent with other tidyverse packages.
```{r example-1, message=FALSE}
library("dpurifyr")
df <- head(iris)
# Create Data Pre-Processing chain while data preproessing for df is done.
pp <- dpurifyr::scale_standard(df, dplyr::starts_with("Sepal")) %>%
dpurifyr::scale_minmax(Petal.Width)
# PreProcessing object(pp) behave like data.frame object
head(pp)
```
Once you get `preprocessing` object, `preprocessing` object can be applied to the other data.
This means that you can apply the same `preprocessing` with fixed parameter to other data.
```{r example-2, message=FALSE}
# You can apply the same preprocessing to different data.frame
pp <- dpurifyr::scale_standard(head(iris), Sepal.Width, Petal.Length) %>%
dpurifyr::scale_standard(Sepal.Length)
# Using the same parameters( `use_param=TRUE`) estimated in pp object.
pp2 <- dpurifyr::apply(head(iris, 10), pp, TRUE)
pp2
```
You can find more information in [vignettes](https://github.com/teramonagi/dpurifyr/blob/master/vignettes/introduction-to-dpurifyr.Rmd).
## Contribution
- If you encounter a clear bug, please file a minimal reproducible example on [github](https://github.com/teramonagi/dupurifyr/issues)
## Citation
To cite package `dpurifyr` in publications use:
```
Nagi Teramo, Shinichi Takayanagi (2017). dpurifyr: A grammar of data preprocessing. R package version 0.1.0. https://github.com/teramonagi/dpurifyr
```
A BibTeX entry for LaTeX users is
```
@Manual{,
title = {dpurifyr: A grammar of data preprocessing},
author = {Nagi Teramo, Shinichi Takayanagi},
year = {2017},
note = {R package version 0.1.0},
url = {https://github.com/teramonagi/dpurifyr},
}
```