Home
by Henrik Bengtsson.
List of features and modification I would love to see in R:
-
Internal
HASNA(x)
flag indicating whetherx
has missing values (HASNA=1
) or not (HASNA=0
), or it is unknown (HASNA=2
). This flag can be set by any function that have scannedx
for missing values. This would allow functions to skip expensive testing for missing values wheneverHASNA=0
. (Now it is up to the user to keep track and use na.rm=FALSE, iff supported) -
Luke is changing the SEXP header for reference counting. Thanks to the need for alignment, we will get some extra bits. We have already decided to use one of those for this purpose. Another bit will track whether a vector is sorted.
-
Generic support for dimension-aware attributes that are acknowledged whenever the object is subsetted. For vectors we have
names()
, for matrices and data frames we haverownames()
andcolnames()
, and for arrays and other objects we havedimnames()
. -
This is essentially
Biobase::AnnotatedDataFrame
andS4Vectors::DataFrame
. One interesting direction would be to consider the meta columns as grouping factors and use them to implement pivot-table functionality. -
Prototype:
> x <- matrix(1:12, ncol=4)
> colnames(x) <- c("A", "B", "C", "D")
> colattr(x, 'gender') <- c("male", "male", "female", "male")
> x
male male female male
A B C D
[1,] 1 4 7 10
[2,] 2 5 8 11
[3,] 3 6 9 12
> x[,2:3]
male female
B C
[1,] 4 7
[2,] 5 8
[3,] 6 9
- Add support for
dim(x) <- dims
, wheredims
has oneNA
value, which is then inferred fromlength(x)
andna.omit(dims)
. If incompatible, then an error is given. For example,
> x <- matrix(1:12, ncol=4)
> dim(x)
[1] 3 4
> dim(x) <- c(NA, 3)
> dim(x)
[1] 4 3
Comment: The R.utils::dimNA()
function implements this.
- Explicitly specify an argument as "missing". For instance, calling
foo(x=missing())
should resolvemissing(x)
asTRUE
. Comment: See matrixStats discussion.
-
value <- sandbox(...)
which analogously toevalq(local(...))
evaluates an R expression but without leaving any side effects and preserving all options, environments, connections sinks, graphics devices, etc. The effect should be as evalutating the expression in a separate R processing (after importing global variables and loading packages) and returning the value to the calling R process. -
source(..., args=...)
- pass / override command-line arguments when callingsource()
. -
rscript(..., args=...)
- run an R script (with command-line arguments) in a separate process (viasystem()
). Should (optionally?) preserve the same setup (e.g. .libPaths(), options(), ...) as the calling R session.
-
Support for one-sided plot limits, e.g.
plot(5:10, xlim=c(0,+Inf))
wherexlim[2]
is inferred from data, cf.xlim=NULL
. -
Standardized graphics device settings and API. For instance, we have
ps.options()
but nopng.options()
. For some devices we can set the default width and height, whereas for others the defaults are hardwired to the arguments of the device function. Comment: TheR.devices
package tries to work around this.
-
Atomic writing to file to avoid incomplete/corrupt files being written. This can be achieved by writing to a temporary file/directory and the renaming when writing/saving is complete. This can be made optional, e.g.
saveRDS(x, file="foo.rds", atomic=TRUE)
.
-
A simple class for files, e.g.
pathname <- p("R/zzz.R")
andpathnames <- p("R/000.R", "R/zzz.R")
. More over, for instance,pathnames <- dir("R/")
should effectively returnpathnames <- p(dir("R/"))
. -
A simple class for regular expressions, e.g.
gsub(re("^[a-z]+"), x)
. Also fixed expression, e.g.gsub(fe("(abc)"), x)
. This could allow for things such as usingx[re("a.a")]
to get subsetx[c("aba", "aea")]
.
-
Support URLs in addition to local files when calling
R -f
orRscript
, e.g.Rscript http://callr.org/install#MASS
. -
Package scripts via
Rscript R.rsp::rfile
, which calls scriptrfile.R
insystem.file("exec", package="R.rsp")
iff it exists. Similarly forR CMD
, e.g.R CMD R.rsp::rfile
. Also, if package is not explicitly specified, theexec
directory of all packages should be scanned (only forR CMD
), e.g.R CMD rfile
. See also R-devel threadR CMD <custom>
? -
R CMD check --flavor=<flavor>
: Instead of hard-coded tests as inR CMD check --as-cran
, support for custom test suits, which themselves could be R packages, e.g.R CMD check --flavor=CRAN
(R packagecheck.CRAN
) andR CMD check --flavor=Bioconductor
check.Bioconductor
). In the bigger picture, this will separate R core and CRAN. -
Rscript -p <n> foo.R
(or--processes=<n>
) for specifying that a (maximum of)<n>
cores may be used including the main process. This would set optionmc.cores
to<n>-1
, cf.help('options')
. As an alternative, evironment variableR_PROCESSES
can be set. The default is<n> = 1
. See also R-devel thread 'SUGGESTION: Environment variable R_MAX_MC_CORES for maximum number of cores'.
- [Issue #3] Use
An exceptional error occurred that R could not recover from. The R session is now aborting ...
instead of justaborting ...
, because from the latter it is not always clear where that messages comes from, i.e. it could have been outputted by something else.
-
The system-library directory should be read only after installing R and/or not accept installation of non-base packages. If installation additional packages there, an end-user is forced to have those package on their library path. Better is to install any additional site-wide packages in a site-wide library, cf.
.Library.site
andR_LIBS_SITE
. This way the user can choose to include the site-wide library/libraries or not. -
One package library per repository, e.g.
~/R/library/3.1/CRAN/
,~/R/library/3.1/Bioconductor/
, and~/R/library/3.1/R-Forge/
. This way it is easy to include/exclude complete sets of packages.install.packages()
should install packages to the corresponding directory, cf. howupdate.packages()
updates packages where they lives (I think). -
Repository metadata that provides information about a repository. This can be provide as a DCF file
REPOSITORY
in the root of the repository URL, e.g.http://cran.r-project.org/REPOSITORY
andhttp://www.bioconductor.org/packages/release/bioc/REPOSITORY
. The content ofREPOSITORY
could be:
Repository: BioCsoft_3.1
Title: Bioconductor release Software repository
Depends: R (>= 3.2.0)
Description: R package repository for Bioconductor release 3.1 branch.
Maintainer: Bioconductor Webmaster <webmaster@bioconductor.org>
URL: http://www.bioconductor.org/packages/release/bioc
SeeAlso: http://www.bioconductor.org/about/mirrors/mirror-how-to/
IsMirror: TRUE