Scalable R for Machine Learning
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Type Name Latest commit message Commit time
Failed to load latest commit information.

R4ML Logo

What is R4ML?

R4ML is a scalable, hybrid approach to ML/Stats using R, Apache SystemML, and Apache Spark

R4ML Key Features

  • R4ML is a git downloadable open source R package from IBM
  • Created on top of SparkR and Apache SystemML (so it supports features from both)
  • Acts as a R bridge between SparkR and Apache SystemML
  • Provides a collection of canned algorithms
  • Provides the ability to create custom ML algorithms
  • Provides both SparkR and Apache SystemML functionality
  • APIs are friendlier to the R user

R4ML Architecture

R4ML Simple Architecture

How to install

Quick install (run from R console):

# Download Apache Spark 2.1.0 (Note: Java must be installed)
download.file("", "~/spark-2.1.0-bin-hadoop2.7.tgz")
system("tar -xvf ~/spark-2.1.0-bin-hadoop2.7.tgz")
Sys.setenv("SPARK_HOME" = file.path(getwd(), "spark-2.1.0-bin-hadoop2.7"))

# Add the library path for SparkR
.libPaths(c(.libPaths(), "~/spark-2.1.0-bin-hadoop2.7/R/lib/"))

# Install R4ML dependencies
install.packages(c("uuid", "R6"), repos = "")

# Download and install R4ML
download.file("", "~/R4ML_0.8.0.tar.gz")
install.packages("~/R4ML_0.8.0.tar.gz", repos = NULL, type = "source")

# Load dependencies and use R4ML
library("SparkR", lib.loc = "~/spark-2.1.0-bin-hadoop2.7/R/lib/")
r4ml.session(sparkHome = file.path(getwd(), "spark-2.1.0-bin-hadoop2.7"))

More detailed instructions can be found here.

How to Use R4ML

Once you have installed R4ML it is time to use it for scalable machine learning and data analysis. Look at the section on R4ML Examples.

R4ML Documentation

After you follow the instruction at 'How to install', you can point your browser to


For example, if you have installed in the /home/data-scientist/codait then open a web browser and type in the following in the url