Modelling with Tidymodels and Parsnip
A Tidy Approach to a Classification Problem
22 June 2019
Recently I have completed the Business Analysis With R online course focused on applied data and business science with R, which introduced me to a couple of new modelling concepts and approaches. One that especially captured my attention is
parsnip and its attempt to implement a unified modelling and analysis interface (similar to python's
scikit-learn) to seamlessly access several modelling platforms in R.
parsnip is the brainchild of RStudio's Max Khun (of
caret fame) and Davis Vaughan and forms part of
tidymodels, a growing ensemble of tools to explore and iterate modelling tasks that shares a common philosophy (and a few libraries) with the
Although there are a number of packages at different stages in their development, I have decided to take
tidymodels "for a spin", so to speak, and create and execute a "tidy" modelling workflow to tackle a classification problem. My aim is to show how easy it is to fit a simple logistic regression in R's
glm and quickly switch to a cross-validated random forest using the
ranger engine by changing only a few lines of code.
For this post in particular I'm focusing on four different libraries from the
rsample for data sampling and cross-validation,
recipes for data preprocessing,
parsnip for model set up and estimation, and
yardstick for model assessment.
You can find the final article on my website
I've also published the article on Towards Data Science