fansi - ANSI Control Sequence Aware String Functions
Counterparts to R string manipulation functions that account for the effects of ANSI text formatting control sequences.
Formatting Strings with Control Sequences
Many terminals will recognize special sequences of characters in strings
and change display behavior as a result. For example, on my terminal the
"?" is a digit in 1-7,
change the foreground and background colors of text respectively:
fansi <- "\033[30m\033[41mF\033[42mA\033[43mN\033[44mS\033[45mI\033[m"
This type of sequence is called an ANSI CSI SGR control sequence. Most
*nix terminals support them, and newer versions of Windows and Rstudio
consoles do too. You can check whether your display supports them by
fansi functions behave as expected depends on many
factors, including how your particular display handles Control
?fansi for details, particularly if you are getting
Manipulation of Formatted Strings
ANSI control characters and sequences (Control Sequences hereafter) break the relationship between byte/character position in a string and display position. For example, to extract the “ANS” part of our colored “FANSI”, we would need to carefully compute the character positions:
fansi we can select directly based on display position:
If you look closely you’ll notice that the text color for the
version is wrong as the naïve string extraction loses the
"\033[37m" that sets the foreground color. Additionally, the
color from the last letter bleeds out into the next line.
fansi provides counterparts to the following string functions:
These are drop-in replacements that behave (almost) identically to the
base counterparts, except for the Control Sequence awareness. There
are also utility functions such as
strip_ctl to remove Control
has_ctl to detect whether strings contain them.
fansi is written in C so you should find performance of the
fansi functions to be slightly slower than the corresponding base
functions, with the exception that
strwrap_ctl is much faster.
type = "width" will be slower still. We have
prioritized convenience and safety over raw speed in the C code, but
unless your code is primarily engaged in string manipulation
should be fast enough to avoid attention in benchmarking traces.
Width Based Substrings
fansi also includes improved versions of some of those functions, such
substr2_ctl which allows for width based substrings. To illustrate,
let’s create an emoji string made up of two wide characters:
pizza.grin <- sprintf("\033[46m%s\033[m", strrep("\U1F355\U1F600", 10))
And a colorful background made up of one wide characters:
raw <- paste0("\033[45m", strrep("FANSI", 40)) wrapped <- strwrap2_ctl(raw, 41, wrap.always=TRUE)
When we inject the 2-wide emoji into the 1-wide background their widths are accounted for as shown by the result remaining rectangular:
starts <- c(18, 13, 8, 13, 18) ends <- c(23, 28, 33, 28, 23) substr2_ctl(wrapped, type='width', starts, ends) <- pizza.grin
fansi width calculations use heuristics to account for graphemes,
including combining emoji:
emo <- c( "\U1F468", "\U1F468\U1F3FD", "\U1F468\U1F3FD\u200D\U1F9B3", "\U1F468\u200D\U1F469\u200D\U1F467\u200D\U1F466" ) writeLines( paste( emo, paste("base:", nchar(emo, type='width')), paste("fansi:", nchar_ctl(emo, type='width')) ) ) ## 👨 base: 2 fansi: 2 ## 👨🏽 base: 4 fansi: 2 ## 👨🏽🦳 base: 6 fansi: 2 ## 👨👩👧👦 base: 8 fansi: 2
You can translate ANSI CSI SGR formatted strings into their HTML
It is possible to set
knitr hooks such that R output that contains
ANSI CSI SGR is automatically converted to the HTML formatted equivalent
and displayed as intended. See the
This package is available on CRAN:
It has no runtime dependencies.
For the development version use
f.dl <- tempfile() f.uz <- tempfile() github.url <- 'https://github.com/brodieG/fansi/archive/development.zip' download.file(github.url, f.dl) unzip(f.dl, exdir=f.uz) install.packages(file.path(f.uz, 'fansi-development'), repos=NULL, type='source') unlink(c(f.dl, f.uz))
There is no guarantee that development versions are stable or even working. The master branch typically mirrors CRAN and should be stable.
Related Packages and References
- crayon, the library that started it all.
- ansistrings, which implements similar functionality.
- ECMA-48 - Control Functions For Coded Character Sets, in particular pages 10-12, and 61.
- CCITT Recommendation T.416
- ANSI Escape Code - Wikipedia for a gentler introduction.
- R Core for developing and maintaining such a wonderful language.
- CRAN maintainers, for patiently shepherding packages onto CRAN and maintaining the repository, and Uwe Ligges in particular for maintaining Winbuilder.
- Gábor Csárdi for getting me started on the journey ANSI control sequences, and for many of the ideas on how to process them.
- Jim Hester for covr, and with Rstudio for r-lib/actions.
- Dirk Eddelbuettel and Carl Boettiger for the rocker project, and Gábor Csárdi and the R-consortium for Rhub, without which testing bugs on R-devel and other platforms would be a nightmare.
- Tomas Kalibera for rchk and the accompanying vagrant image, and rcnst to help detect errors in compiled code.
- Winston Chang for the r-debug docker container, in particular because of the valgrind level 2 instrumented version of R.
- George Nachman etal. for Iterm2, a Free terminal emulator that supports truecolor CSI SGR.
- Hadley Wickham and Peter Danenberg for roxygen2.
- Yihui Xie for knitr and J.J. Allaire et al. for rmarkdown, and by extension John MacFarlane for pandoc.
- Gábor Csárdi, the R-consortium, et al. for revdepcheck to simplify reverse dependency checks.
- Olaf Mersmann for microbenchmark, because microsecond matter, and Joshua Ulrich for making it lightweight.
- All open source developers out there that make their work freely available for others to use.
- Github, Codecov, Vagrant, Docker, Ubuntu, Brew for providing infrastructure that greatly simplifies open source development.
- Free Software Foundation for developing the GPL license and promotion of the free software movement.