leadr

The goal of leadr is to stream-line model organization in data science projects and Kaggle competitions. The main function leadr::board takes a caret model and automatically builds a personal leaderboard for the entire project.

This leaderboard allows you to easily sort models by metric (accuracy, RMSE, etc.) and ensures that you never lose track of a good model during interactive analysis. Check out my blog post for some background.

Installation

The package is not currently available on CRAN. You can install the development version with:

# install.packages("devtools")
devtools::install_github("tmastny/leadr")

Getting Started

Let's say you want to build a classifier for the iris data set. We start by initializing an R project with this directory:

.
└── iris.Rproj

Then we fit our first model.

library(caret)
model <- train(Species ~ ., data = iris, method = 'glmnet')

Before leadr, we might create the script glmnet_1.R to record the model, save the train object as a .RDS file, and keep track of the accuracy in a spreadsheet.

With leadr, we only need to do the following:

leadr::board(model)

## # A tibble: 1 x 13
##    rank    id dir     model  metric score public method   num group index 
##   <dbl>  <id> <chr>   <chr>  <chr>  <dbl>  <dbl> <chr>  <dbl> <dbl> <list>
## 1    1.     1 models… glmnet Accur… 0.964     NA boot     25.    1. <list…
## # ... with 2 more variables: tune <list>, seeds <list>

board creates a personal leaderboard for your project that ranks and sorts your model based on the model's metric. The leaderboard tibble has all the information needed to successfully recreate and document any model.

board also modifies the project directory:

.
├── iris.Rproj
├── leadrboard.RDS
└── models
    └── initial
        └── model1.RDS

By default, board saves the leaderboard tibble as a .RDS file at the project root and creates a directory models. Within models, each caret model is saved in a subdirectory and named in the order they were ran.

Interactive

In the previous example, we did everything from the command line and leadr took care of the organization and documentation. In fact, leadr benefits from interactive use in other ways. For example, leadr uses pillar and crayon to programmatically color outputs:

Vignettes

For a full description of the features, check out my vignettes hosted here: https://tmastny.github.io/leadr/

Introduction: walkthrough of the basic workflow of leadr
Ensembles: overview of the tools that leadr provides to make ensemble models

Name		Name	Last commit message	Last commit date
Latest commit History 78 Commits
R		R
docs		docs
man		man
tests		tests
vignettes		vignettes
.Rbuildignore		.Rbuildignore
.gitignore		.gitignore
DESCRIPTION		DESCRIPTION
LICENSE.md		LICENSE.md
NAMESPACE		NAMESPACE
README.Rmd		README.Rmd
README.md		README.md
_pkgdown.yaml		_pkgdown.yaml
leadr.Rproj		leadr.Rproj

License

tmastny/leadr

Folders and files

Latest commit

History

Repository files navigation

leadr

Installation

Getting Started

Interactive

Vignettes

About

Resources

License

Stars

Watchers

Forks

Languages