A talk with Robert Krzyzanowski on how R is being used in production by one of the fastest-growing online consumer lenders in the world.
Syberia is a framework designed to productionize the R programming language, for both the development and deployment of machine learning models. By incorporating R into a single end-to-end machine learning processes, the team at Avant is able to rapidly improve the company's underwriting, fraud detection, collections activity, and lead conversion analysis.
Rob is the original creator of Syberia, and he is the current director of data engineering at Avant, an online consumer lender that launched in 2012 and crossed the $1B threshold in 2015. Rob's previous experience includes studying pure mathematics at the University of Illinois at Chicago as well as B.S. in math, computer science and physics. Conditional on his spare time, the likelihood is high you'll find Rob playing chess and drinking black mango tea!
Syberia is the development framework for R.
Syberia was developed at Avant to serve as the framework for developing and deploying machine learning models. The goal was to develop a modeling engine capable of solving academic, research and business problems that require use of statistical methods.
The modeling grammar, currently the main significant engine built in Syberia, is a framework for building, debugging, testing, and deploying classifiers developed in R. It provides an opinionated unified framework for fast iteration on model development and deployment. It has modularity and testability built in as a design assumption, is founded on convention-over-configuration, and aims to solve the problems of classifier-specific data preparation and classifier-specific modeling parameters.
The more general vision for Syberia is still in progress but aims to bring unity to the currently disparate realms of the R development ecosystem. In the viewpoint of the author, R is syntactic sugar around LISP, which enables arbitrary computation; Syberia is an attempt to support this conjecture by allowing the construction of arbitrary software projects within the R programming language, thereby finally outgrowing its long-overdue misconception as a statistical tool.
The timeline for future engines and information about how to contribute is listed at the Syberia roadmap.
To get started right away, try out the minimal example syberia project:
# Run this from your command line terminal.
git clone git@github.com:syberia/example.sy.git && cd example.sy && R
This will open an R console (installing dependencies for the first time may take a while; for troubleshooting see the troubleshooting guide). You can then type:
run("example1")
model$predict(iris[1:5, ]) # The first five scores from a trained classifier.
# [1] 5.005686 4.757667 4.773923 4.890092 5.055138
For more detailed instructions, see the installation guide.
(Example generated using ttygif.)
Syberia relies on the following supplemental packages:
Name | Status |
---|---|
Mungebits2 | |
Stagerunner | |
Tundra | |
Director |
Additional packages used internally at Avant in conjunction with Syberia modeling projects include batchman, bettertrace, cachemeifyoucan, dokk, lockbox, microserver, objectdiff, Ramd, rocco, s3mpi, testthatsomemore, and treeskeleton.
Syberia is currently released with the following engines.
Name | Status | Description |
---|---|---|
Base | The base engine that defines routes and controllers. | |
Modeling | The modeling engine for deploying structured learning problems. | |
Example | The hello world of modeling projects. Can be used for new projects. | |
Examples | Some examples from Kaggle and other sources in Syberia. |
To run the tests for the Syberia package, you will have to check out its git submodules.
git submodule update --init --recursive
This will pull in inst/engines/base.sy
from the base engine.
This project is licensed under the MIT License:
Copyright (c) 2014-2017 Robert Krzyzanowski
Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
Syberia was originally created at Avant by Robert Krzyzanowski, rob@syberia.io. Additional contributors include Peter Hurford, Kirill Sevastyanenko, Tong Lu, Abel Castillo, and others.