Skip to content

Native input specification

aL3xa edited this page Feb 14, 2013 · 17 revisions

DISCLAIMER

This document is for developers' reference only and/or for a pair of curious eyes who'd like an early-bird preview of new input specifications and rapport's course of development. Feel free to rant if you think that something we're working on is pure rubbish.


Rationale: old inputs suck because they differ from native R classes. Instead of rolling our own inventions, why not embrace the (implicit) conventions from R ecosystem and build/define inputs according to them? Writing templates should be straightforward and rapport should make you warm and fuzzy inside.

Introduction

Native input specification takes leverage of R objects' classes, thus not relying on custom conventions when defining template inputs. Following specifications are to depict the current state of implementation along with conventions used in it. Unlike old input specification that relied on rather cumbersome custom syntax, new input specification is 100% pure YAML, and as such should be more intuitive and native to R.

Inputs

Inputs can be divided into two categories:

  • dataset inputs (non-standalone) - inputs that match the named element in the object provided in data formal argument in rapport call. This is usually a data.frame object, but can/should be any object that allows subsetting by name, i.e. that has named elements (named list will do). Dataset inputs cannot have the (default) value, so they always require user input.
  • standalone inputs don't depend on object passed in data. They accept object provided in value attribute in the template definition, and users can provide their own as well by passing an object of an appropriate class in the rapport call.

General input options

Following options are available for all inputs:

  • name (character string) - input name. Cannot be blank, as mapped inputs will be assigned to that name in the evaluation environment. Custom naming conventions (compatible with R ones) are applied - see ?rapport:::guess.input.name for details. #change this to user-friendly man page (e.g. Rapport naming conventions or smth
  • label (character string) - input label. Can be blank, but it's useful to have something in there in order to get pretty output in a report, e.g. in plots and tables (e.g. Number of hours is by far more descriptive than nwhours). Defaults to empty string.
  • description (character string) - input description. It can be blank, but sometimes it's convenient to have a lengthy description of provided input. Defaults to empty string.
  • class (character string) - defines a class of an input (d'uh). It can be omitted (defaults to any), but most of the times you'll find it useful to fine-tune your inputs. Currently supported classes are: any (default), character, complex, factor, integer, logical, numeric, and raw.
  • required (logical value) - is input required (defaults to FALSE). If TRUE, input must match the named element of an object provided in data argument (dataset inputs) or user has to provide an object of appropriate class (standalone inputs).
  • standalone (logical value) - does input depend on the object provided in data. Defaults to FALSE.
  • length - provides set of rules for input's length restriction. Defaults to (exactly: 1). It can accept various attributes, e.g.
    • if omitted or NULL, it will default to exactly: 1
    • if integer value is provided, it refers to exactly N inputs:
length: 10

is equivalent to (and will be stored internally as):

length:
  exactly: 10
  • from and to attributes can be passed to define a range that input's length must fall into:
length:
  from: 1
  to: 10

from or to can be omitted, and the sane defaults will be set implicitly. For instance, by omitting to, it will default to Inf, and if omitting from it will be set to 1.

length:
  from: 1

is identical to:

length:
  from: 1
  to: Inf

or:

length:
  to: 10

is equivalent to:

length:
  from: 1
  to: 10
  • value (a vector of an equivalent class). Only available for standalone inputs, and must match the class and length of a given input. NULL is also allowed, if required is FALSE. Additional checks are performed based on the input class. See Class-specific options for details.

Example

An "ordinary" input with no class-specific options (see below) should look pretty much like this:

- name: l
  label: Logical input
  description: Nothing special about this, really. Just an ordinary logical input...
  standalone: TRUE
  value: TRUE
  length: 1
  required: FALSE

This will define a standalone logical input with TRUE as the default value, that will accept only one logical value and is not required.

Class-specific options

character

  • regexp (character value) - a regular expression that all values in the input should match. regexp is omitted by default, and check will be performed only if attribute is non-NULL.
  • nchar - sets restrictions on the number of characters. Accepts the same options as length attribute, only this time number of characters are checked. nchar is omitted from input definition by default. Checks are performed only if attribute is properly defined.
  • matchable (logical value) - if set to TRUE, the value attribute is mandatory, and it (value) should accept a character vector that will be passed to match.args choices attribute, and user-provided value for the input is passed to arg. To demystify, if c is a matchable character input:
- name: c
  label: Character input
  length:
    min: 2
  matchable: TRUE
  value:
    - un
    - deux
    - trois
    - quatre
    - cinq

and you're about to issue rapport("mytemplate", some_data, c = c("u", "d")), matchable will evaluate match.arg(c("u", "d"), c("un", "deux", "trois", "quatre", "cinq"), several.ok = TRUE). Note that several.ok is set to TRUE because length attribute requires at least 2 values to be matched. We will guess this one for you depending on the length attribute value. If the number of options exceeds the range [min, max], or differs from length$exactly, an error will be issued.
matchable attribute works on dataset inputs as well (unlike the non-matchable inputs). In that case, a named element is passed to arg argument in match.arg.
If user input is omitted, we'll grab the options from the value attribute to a vector and assign it to an input name in the rapport evaluation environment, according to the least limit set in length attribute. For instance, if user didn't map a value to an input, e.g. rapport("mytemplate", some_data), character vector c("un", "deux") will be assigned to symbol c. One could set length$min or length$exactly to 3, and c("un", "deux", "trois") will be assigned to c symbol instead, etc.

numeric, integer

  • limit (a named list with min and max attributes) - checks if values of numeric/integer inputs fall into range defined by min and max. Both min and max should be length-one vectors of appropriate class. limits are omitted by default. Checks are performed only if limit attribute is provided.

factor

  • nlevels (an integer or a named list) - defines number of levels given factor is allowed to have. See description for length or nchar for details. Attribute is omitted by default and checks will be performed only if non-NULL.
  • matchable (a logical value) - see character for details. The only difference is that in case of factors, factor levels are matched instead.

Clone this wiki locally