Skip to content

Commit

Permalink
Merge pull request #577 from SebKrantz/development
Browse files Browse the repository at this point in the history
Development
  • Loading branch information
SebKrantz committed May 20, 2024
2 parents 827b84f + 4530dcd commit c8fc5af
Show file tree
Hide file tree
Showing 49 changed files with 10,468 additions and 679 deletions.
5 changes: 5 additions & 0 deletions .Rbuildignore
Original file line number Diff line number Diff line change
Expand Up @@ -26,3 +26,8 @@ man/figures
_cache$
_snaps
^CITATION\.cff$
^\.DS_Store$
^revdep$
\.orig$


4 changes: 2 additions & 2 deletions DESCRIPTION
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
Package: collapse
Title: Advanced and Fast Data Transformation
Version: 2.0.14
Date: 2024-05-01
Date: 2024-05-19
Authors@R: c(
person("Sebastian", "Krantz", role = c("aut", "cre"),
email = "sebastian.krantz@graduateinstitute.ch",
Expand All @@ -28,7 +28,7 @@ Description: A C/C++ based package for advanced data transformation and
(grouped, weighted) summary statistics, powerful tools to work with nested data,
fast data object conversions, functions for memory efficient R programming, and
helpers to effectively deal with variable labels, attributes, and missing data.
It is well integrated with base R classes, 'dplyr'/'tibble', 'data.table', 'sf',
It is well integrated with base R classes, 'dplyr'/'tibble', 'data.table', 'sf', 'units',
'plm' (panel-series and data frames), and 'xts'/'zoo'.
URL: https://sebkrantz.github.io/collapse/,
https://github.com/SebKrantz/collapse,
Expand Down
1 change: 1 addition & 0 deletions NAMESPACE
Original file line number Diff line number Diff line change
Expand Up @@ -405,6 +405,7 @@ importFrom("stats", "as.formula", "complete.cases", "cor", "cov", "var", "pt",
export(fncol)
export(fdim)
export(as_numeric_factor)
export(as_integer_factor)
export(as_character_factor)
export(as.numeric_factor)
export(as.character_factor)
Expand Down
6 changes: 6 additions & 0 deletions NEWS.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,9 @@
# collapse 2.0.14

* Updated '*collapse* and *sf*' vignette to reflect the recent support for *units* objects, and added a few more examples.

* Fixed a bug in `join()` where a full join silently became a left join if there are no matches between the tables (#574). Thanks @D3SL for reporting.

* Added function `group_by_vars()`: A standard evaluation version of `fgroup_by()` that is slimmer and safer for programming, e.g. `data |> group_by_vars(ind1) |> collapg(custom = list(fmean = ind2, fsum = ind3))`. Or, using *magrittr*:
```r
library(magrittr)
Expand All @@ -15,6 +19,8 @@ data %>%
}
```

* Added function `as_integer_factor()` to turn factors/factor columns into integer vectors. `as_numeric_factor()` already exists, but is memory inefficient for most factors where levels can be integers.

* `join()` now internally checks if the rows of the joined datasets match exactly. This check, using `identical(m, seq_row(y))`, is inexpensive, but, if `TRUE`, saves a full subset and deep copy of `y`. Thus `join()` now inherits the intelligence already present in functions like `fsubset()`, `roworder()` and `funique()` - a key for efficient data manipulation is simply doing less.

* In `join()`, if `attr = TRUE`, the `count` option to `fmatch()` is always invoked, so that the attribute attached always has the same form, regardless of `verbose` or `validate` settings.
Expand Down
4 changes: 2 additions & 2 deletions R/global_macros.R
Original file line number Diff line number Diff line change
Expand Up @@ -105,7 +105,7 @@ get_collapse <- function(opts = NULL) if(is.null(opts)) as.list(.op) else if(len
"%r-%", "%r*%", "%r/%", "%r+%", "%rr%", "add_stub", "add_vars",
"add_vars<-", "all_funs", "all_identical", "all_obj_equal", "allNA",
"alloc", "allv", "any_duplicated", "anyv", "as_character_factor",
"as_factor_GRP", "as_factor_qG", "as_numeric_factor", "as.character_factor",
"as_factor_GRP", "as_factor_qG", "as_numeric_factor", "as_integer_factor", "as.character_factor",
"as.factor_GRP", "as.factor_qG", "as.numeric_factor", "atomic_elem",
"atomic_elem<-", "av", "av<-", "B", "BY", "BY.data.frame", "BY.default",
"BY.matrix", "cat_vars", "cat_vars<-", "char_vars", "char_vars<-",
Expand Down Expand Up @@ -177,7 +177,7 @@ get_collapse <- function(opts = NULL) if(is.null(opts)) as.list(.op) else if(len
.COLLAPSE_ALL <- sort(unique(c("%-=%", "%!=%", "%!iin%", "%!in%", "%*=%", "%/=%", "%+=%", "%=%", "%==%", "%c-%", "%c*%", "%c/%", "%c+%",
"%cr%", "%iin%", "%r-%", "%r*%", "%r/%", "%r+%", "%rr%", "add_stub", "add_vars", "add_vars<-", "all_funs",
"all_identical", "all_obj_equal", "allNA", "alloc", "allv", "any_duplicated", "anyv", "as_character_factor",
"as_factor_GRP", "as_factor_qG", "as_numeric_factor", "atomic_elem", "atomic_elem<-", "av", "av<-", "B", "BY",
"as_factor_GRP", "as_factor_qG", "as_numeric_factor", "as_integer_factor", "atomic_elem", "atomic_elem<-", "av", "av<-", "B", "BY",
"cat_vars", "cat_vars<-", "char_vars", "char_vars<-", "cinv", "ckmatch", "collap", "collapg", "collapv", "colorder",
"colorderv", "copyAttrib", "copyMostAttrib", "copyv", "D", "dapply", "date_vars", "Date_vars", "date_vars<-",
"Date_vars<-", "descr", "Dlog", "fact_vars", "fact_vars<-", "fbetween", "fcompute", "fcomputev", "fcount",
Expand Down
10 changes: 10 additions & 0 deletions R/small_helper.R
Original file line number Diff line number Diff line change
Expand Up @@ -501,6 +501,16 @@ as_numeric_factor <- function(X, keep.attr = TRUE) {
res
}

as_integer_factor <- function(X, keep.attr = TRUE) {
if(is.atomic(X)) if(keep.attr) return(ffka(X, as.integer)) else
return(as.integer(attr(X, "levels"))[X])
res <- duplAttributes(lapply(unattrib(X),
if(keep.attr) (function(y) if(is.factor(y)) ffka(y, as.integer) else y) else
(function(y) if(is.factor(y)) as.integer(attr(y, "levels"))[y] else y)), X)
if(inherits(X, "data.table")) return(alc(res))
res
}

as_character_factor <- function(X, keep.attr = TRUE) {
if(is.atomic(X)) if(keep.attr) return(ffka(X, tochar)) else
return(as.character.factor(X))
Expand Down
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@
* To facilitate complex data transformation, exploration and computing tasks in R.
* To help make R code fast, flexible, parsimonious and programmer friendly.

It further implements a [class-agnostic approach to R programming](https://sebkrantz.github.io/collapse/articles/collapse_object_handling.html), supporting base R, *tibble*, *grouped_df* (*tidyverse*), *data.table*, *sf*, *pseries*, *pdata.frame* (*plm*), and preserving many others (e.g. *units*, *xts*/*zoo*, *tsibble*).
It further implements a [class-agnostic approach to R programming](https://sebkrantz.github.io/collapse/articles/collapse_object_handling.html), supporting base R, *tibble*, *grouped_df* (*tidyverse*), *data.table*, *sf*, *units*, *pseries*, *pdata.frame* (*plm*), and *xts*/*zoo*.

**Key Features:**

Expand Down
2 changes: 1 addition & 1 deletion _pkgdown.yml
Original file line number Diff line number Diff line change
Expand Up @@ -210,6 +210,7 @@ articles:
contents:
- collapse_documentation
- collapse_for_tidyverse_users
- collapse_and_sf
- collapse_object_handling
- title: Legacy (Pre v1.7)
desc: Vignettes that cover functionality of versions <1.7. These
Expand All @@ -219,5 +220,4 @@ articles:
- collapse_and_dplyr
- collapse_and_data.table
- collapse_and_plm
- collapse_and_sf

2 changes: 1 addition & 1 deletion man/collapse-documentation.Rd
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@ The following table fully summarizes the contents of \emph{\link{collapse}}. The
\link[=fast-data-manipulation]{Fast Data Manipulation} \tab\tab Fast and flexible select, subset, summarise, mutate/transform, sort/reorder, combine, join, reshape, rename and relabel data. Some functions modify by reference and/or allow assignment. In addition a set of (standard evaluation) functions for fast selecting, replacing or adding data frame columns, including shortcuts to select and replace variables by data type.
\tab\tab \code{\link[=fselect]{fselect(<-)}}, \code{\link[=fsubset]{fsubset/ss}}, \code{\link{fsummarise}}, \code{\link{fmutate}}, \code{\link{across}}, \code{\link[=ftransform]{(f/set)transform(v)(<-)}}, \code{\link[=fcompute]{fcompute(v)}}, \code{\link[=roworder]{roworder(v)}}, \code{\link[=colorder]{colorder(v)}}, \code{\link{rowbind}}, \code{\link{join}}, \code{\link{pivot}}, \code{\link[=frename]{(f/set)rename}}, \code{\link[=relabel]{(set)relabel}}, \code{\link[=get_vars]{get_vars(<-)}}, \code{\link[=add_vars]{add_vars(<-)}}, \code{\link[=num_vars]{num_vars(<-)}}, \code{\link[=cat_vars]{cat_vars(<-)}}, \code{\link[=char_vars]{char_vars(<-)}}, \code{\link[=fact_vars]{fact_vars(<-)}}, \code{\link[=logi_vars]{logi_vars(<-)}}, \code{\link[=date_vars]{date_vars(<-)}} \cr \cr \cr
\link[=quick-conversion]{Quick Data Conversion} \tab\tab Quick conversions: data.frame <> data.table <> tibble <> matrix (row- or column-wise) <> list | array > matrix, data.frame, data.table, tibble | vector > factor, matrix, data.frame, data.table, tibble; and converting factors / all factor columns. \tab\tab \code{qDF}, \code{qDT}, \code{qTBL}, \code{qM}, \code{qF}, \code{mrtl}, \code{mctl}, \code{as_numeric_factor}, \code{as_character_factor} \cr \cr \cr
\link[=quick-conversion]{Quick Data Conversion} \tab\tab Quick conversions: data.frame <> data.table <> tibble <> matrix (row- or column-wise) <> list | array > matrix, data.frame, data.table, tibble | vector > factor, matrix, data.frame, data.table, tibble; and converting factors / all factor columns. \tab\tab \code{qDF}, \code{qDT}, \code{qTBL}, \code{qM}, \code{qF}, \code{mrtl}, \code{mctl}, \code{as_numeric_factor}, \code{as_integer_factor}, \code{as_character_factor} \cr \cr \cr
\link[=advanced-aggregation]{Advanced Data Aggregation} \tab\tab Fast and easy (weighted and parallelized) aggregation of multi-type data, with different functions applied to numeric and categorical variables. Custom specifications allow mappings of functions to variables + renaming. \tab\tab \code{collap(v/g)} \cr \cr \cr
Expand Down
2 changes: 1 addition & 1 deletion man/collapse-package.Rd
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ Advanced and Fast Data Transformation
\item To help make R code fast, flexible, parsimonious and programmer friendly. % \emph{collapse} is a fast %to facilitate (advanced) data manipulation in R % To achieve the latter,
% collapse provides a broad set.. -> Nah, its not a misc package
}
It is made compatible with the \emph{tidyverse}, \emph{data.table}, \emph{sf} and the \emph{plm} approach to panel data, and non-destructively handles other classes such as \emph{xts}.
It is made compatible with the \emph{tidyverse}, \emph{data.table}, \emph{sf}, \emph{units}, \emph{xts/zoo}, and the \emph{plm} approach to panel data.

}
\section{Getting Started}{
Expand Down
9 changes: 6 additions & 3 deletions man/quick-conversion.Rd
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@
\alias{mctl}
\alias{mrtl}
\alias{as_numeric_factor}
\alias{as_integer_factor}
\alias{as_character_factor}
%- Also NEED an '\alias' for EACH other topic documented here.
\title{Quick Data Conversion}
Expand All @@ -18,7 +19,7 @@ Fast, flexible and precise conversion of common data objects, without method dis
\item \code{qM} converts vectors, higher-dimensional arrays, data frames and suitable lists to matrix.
\item \code{mctl} and \code{mrtl} column- or row-wise convert a matrix to list, data frame or \emph{data.table}. They are used internally by \code{qDF/qDT/qTBL}, \code{\link{dapply}}, \code{\link{BY}}, etc\dots
\item \code{\link{qF}} converts atomic vectors to factor (documented on a separate page).
\item \code{as_numeric_factor} and \code{as_character_factor} convert factors, or all factor columns in a data frame / list, to character or numeric (by converting the levels).
\item \code{as_numeric_factor}, \code{as_integer_factor}, and \code{as_character_factor} convert factors, or all factor columns in a data frame / list, to character or numeric (by converting the levels).
}
}
\usage{
Expand All @@ -37,12 +38,13 @@ mrtl(X, names = FALSE, return = "list")
# Converting factors or factor columns

as_numeric_factor(X, keep.attr = TRUE)
as_integer_factor(X, keep.attr = TRUE)
as_character_factor(X, keep.attr = TRUE)

}
%- maybe also 'usage' for other objects documented here.
\arguments{
\item{X}{a vector, factor, matrix, higher-dimensional array, data frame or list. \code{mctl} and \code{mrtl} only accept matrices, \code{as_numeric_factor} and \code{as_character_factor} only accept factors, data frames or lists.}
\item{X}{a vector, factor, matrix, higher-dimensional array, data frame or list. \code{mctl} and \code{mrtl} only accept matrices, \code{as_numeric_factor}, \code{as_integer_factor} and \code{as_character_factor} only accept factors, data frames or lists.}
\item{row.names.col}{can be used to add an column saving names or row.names when converting objects to data frame using \code{qDF/qDT/qTBL}. \code{TRUE} will add a column \code{"row.names"}, or you can supply a name e.g. \code{row.names.col = "variable"}. With \code{qM}, the argument has the opposite meaning, and can be used to select one or more columns in a data frame/list which will be used to create the rownames of the matrix e.g. \code{qM(iris, row.names.col = "Species")}. In this case the column(s) can be specified using names, indices, a logical vector or a selector function. See Examples.}
\item{keep.attr}{logical. \code{FALSE} (default) yields a \emph{hard} / \emph{thorough} object conversion: All unnecessary attributes are removed from the object yielding a plain matrix / data.frame / \emph{data.table}. \code{FALSE} yields a \emph{soft} / \emph{minimal} object conversion: Only the attributes 'names', 'row.names', 'dim', 'dimnames' and 'levels' are modified in the conversion. Other attributes are preserved. See also \code{class}.}
\item{class}{if a vector of classes is passed here, the converted object will be assigned these classes. If \code{NULL} is passed, the default classes are assigned: \code{qM} assigns no class, \code{qDF} a class \code{"data.frame"}, and \code{qDT} a class \code{c("data.table", "data.frame")}. If \code{keep.attr = TRUE} and \code{class = NULL} and the object already inherits the default classes, further inherited classes are preserved. See Details and the Example. }
Expand Down Expand Up @@ -77,7 +79,8 @@ The default \code{keep.attr = FALSE} ensures \emph{hard} conversions so that all
\code{qM} - returns a matrix\cr
\code{mctl}, \code{mrtl} - return a list, data frame or \emph{data.table} \cr
\code{qF} - returns a factor\cr
\code{as_numeric_factor} - returns X with factors converted to numeric variables\cr
\code{as_numeric_factor} - returns X with factors converted to numeric (double) variables\cr
\code{as_integer_factor} - returns X with factors converted to integer variables\cr
\code{as_character_factor} - returns X with factors converted to character variables
}
% \note{
Expand Down

0 comments on commit c8fc5af

Please sign in to comment.