RLibs
is a library of various production tools I use in data
processing. It is under construction, a lot of functions are/will be
deprecated, some of them will be moved to other packages (like anything
useful and related to plotting with ggplot2
will go to
sciplotr
).
Default R
==
operator performs strict comparison, which does not
work very well for floating-point problems. What R
considers unequal,
technically can be equal within machine’s precision. The standard
example is
0.1 + 0.2 == 0.3
## [1] FALSE
There is RLibs::are_equal_f
, which performs more or less correct
floating-point comparison (with some given tolerance).
library(RLibs, quietly = TRUE, warn.conflicts = FALSE)
are_equal_f(0.1 + 0.2, 0.3)
## [1] TRUE
Atop of this function there are several more built for comfortable use:
(0.1 + 0.2) %==% 0.3
## [1] TRUE
(0.1 + 0.2) %!=% 0.3
## [1] FALSE
Operators invoke floating-point method only if type of one operand is
floating-point. Type/size stability is enforced by the
vctrs
package.
A set of tools to create clusters to work with future
and furrr
packages.
# Checks cluster status
get_topology()
## [1] 1
# Create 2 workers, each spawning 2 workers (so 4 + 2 in total, max 4 working simultaneously)
plan_cluster(2, 2)
## Cluster: [2, 2]
unlist(furrr::future_map(1:2, ~list(Sys.getpid(), furrr::future_map(1:2, ~Sys.getpid()))))
## [1] 22324 16552 13264 20988 26340 21840
# Switch back to sequential execution
plan_cluster(1)
## Cluster: single process
unlist(furrr::future_map(1:2, ~list(Sys.getpid(), furrr::future_map(1:2, ~Sys.getpid()))))
## [1] 10160 10160 10160 10160 10160 10160
dplyr
can do various joins, like inner_join
, left_join
. Here is a
way to do conditional joins (not really optimized):
library(dplyr, quietly = TRUE, warn.conflicts = FALSE)
tbl <- data.frame(Type = c("10-20", "20-30"), L = c(10, 20), U = c(20, 30))
# Subsetting mtcars to reduce output
left_join_cnd(mtcars[c(1:7, 18:20, 28:32),], tbl, .x$mpg >= .y$L, .x$mpg < .y$U) %>% select(Type, L, mpg, U, everything())
## Type L mpg U cyl disp hp drat wt qsec vs am gear carb
## 1 20-30 20 21.0 30 6 160.0 110 3.90 2.620 16.46 0 1 4 4
## 2 20-30 20 21.0 30 6 160.0 110 3.90 2.875 17.02 0 1 4 4
## 3 20-30 20 22.8 30 4 108.0 93 3.85 2.320 18.61 1 1 4 1
## 4 20-30 20 21.4 30 6 258.0 110 3.08 3.215 19.44 1 0 3 1
## 5 10-20 10 18.7 20 8 360.0 175 3.15 3.440 17.02 0 0 3 2
## 6 10-20 10 18.1 20 6 225.0 105 2.76 3.460 20.22 1 0 3 1
## 7 10-20 10 14.3 20 8 360.0 245 3.21 3.570 15.84 0 0 3 4
## 8 <NA> NA 32.4 NA 4 78.7 66 4.08 2.200 19.47 1 1 4 1
## 9 <NA> NA 30.4 NA 4 75.7 52 4.93 1.615 18.52 1 1 4 2
## 10 <NA> NA 33.9 NA 4 71.1 65 4.22 1.835 19.90 1 1 4 1
## 11 <NA> NA 30.4 NA 4 95.1 113 3.77 1.513 16.90 1 1 5 2
## 12 10-20 10 15.8 20 8 351.0 264 4.22 3.170 14.50 0 1 5 4
## 13 10-20 10 19.7 20 6 145.0 175 3.62 2.770 15.50 0 1 5 6
## 14 10-20 10 15.0 20 8 301.0 335 3.54 3.570 14.60 0 1 5 8
## 15 20-30 20 21.4 30 4 121.0 109 4.11 2.780 18.60 1 1 4 2
Here ...
accepts comma-separated conditions, similar to
dplyr::filter
, where .x
refers to lhs table and .y
refers to rhs
table.
There are also *_join_safe
, which perform exactly the same as
dplyr::*_join
joins, but beforehand key columns are converted to
common types and a meaningful error message is displayed if conversion
fails. No more casting factors to strings if the levels are different.
vctrs
to the rescue!
%vec_in%
invokesvctrs::vec_in()
,%within%
is non-inclusive check if vector is in range,%withini%
include boundaries,fct_get
gets values of factor (as inlevels(factor)[factor]
),len
is anS3
that invokesvctrs::vec_size
almost always, except for a few cases, where it callslength
, e.g. forgrid::unit
.cc
is short forvctrs::vec_c
,vec_rbind_uq
is an unquoted pipe-friendly version ofvec_rbind
, which accepts list as parameter, not...
. It can be used to bind list of data frames / tibbles, like `tbl_list %>% map(mutate, A = 2- B) %>% vec_rbind_uq
. It is currently preferred over
dplyr::bind_rows`, which does horrible things to the types.
- B) %>% vec_rbind_uq
lin
does linear inter/extrapolation. It ensures type/size stability
write_fixed
accepts a table, a path and format specifier (e.g. a vector of sprintf-like specifiers, one per each column), and outputs table in the plain text format. It is a great way to produce human-readable tables that can be read back by e.g.read.table
orreadr::read_table / read_table2
.read/write_smart
accepts a table, a path, and other arguments. It calls different methods based on the extension of the path provided. Currently supported types arefeather
,fth
forfeather
type of data,rds
forR
’srds
,csv
for comma-separated file and everything else is processed bywrite_fixed/read_table
as plain text.