data.frame-based API for model and predict functions
Switch branches/tags
Nothing to show
Clone or download
drsimonj Add: error messages that indicate twidlr in use
A concern voiced by users was that they may forget whether twidlr is being used or not, and which functions are available etc. Without writing new model names, this commit adds specific error messages that alert the user if they provide arguments that are not compatible with the twidlr function. The error messages explicitly mention that the function comes from twidlr
Latest commit e026dc0 Jun 6, 2017

twidlr: consistent data.frame and formula API for models


twidlr is an R package that exposes a consistent API for model functions and their corresponding predict methods such that they are specified as:

fit <- model(data, formula, ...)
predict(fit, data, ...)

Where "data" is a required data.frame (or able to be coerced to one) and "formula" is a formula (or string able to be coerced to one) that describes the model to be fitted.

twidlr gets its name from the "twiddle" used in R formulas.


twidlr is available to install from github by running:

# install.packages("devtools")


library(twidlr) exposes model functions that you're already familiar with, but such that they accept a data.frame first, formula second, and then additional arguments. A robust method to predict data is also exposed.

For example, a typical linear model would be lm(hp ~ mpg * wt, mtcars, ...). Once twidlr is loaded, the same model would be run via lm(mtcars, hp ~ mpg * wt, ...).


Modelling in R is messy! Some models take formulas and data frames while others require matrices and vectors. The same can be said of corresponding predict() methods, which can also be impure, returning unexpected or inconsistent results.

twidlr seeks to overcome these problems be providing:

  • Consistent API for model functions and their corresponding predict methods (helping to improve the generality of tidy modelling packages like piplearner)
  • Pure and available predictions by way of predict being made available for all methods (including unsupervised algorithms like kmeans) and making "data" a required argument
  • Tidyverse philosophy by working with data frames and being pipeable such as mtcars %>% lm(hp ~ wt)
  • Leverage formula operators where they may be valid but not originally available. For example, to specify select variables or include additional terms like interactions and dummy-coded variables with syntax such as glmnet(iris, Sepal.Width ~ Petal.Width * Petal.Length + Species). Formulas created as strings can always be used too!

twidlr models

Model functions exposed by twidlr:

Package Functions
e1071 naiveBayes, svm
gamlss gamlss
glmnet cv.glmnet, glmnet
lme4 glmer, lmer
quantreg crq, nlrq, rq, rqss
randomForest randomForest
rpart rpart
stats aov, factanal, glm, kmeans, lm, prcomp, t.test (now 'ttest')
xgboost xgboost


For conventions and best-practices when contributing to twidlr, please see