Syllabus

Ali Zaidi edited this page Oct 10, 2016 · 2 revisions

Microsoft R for Data Science Workshop Syllabus

Training Overview

The Microsoft Data Science Team invites you to an in-depth 3-day workshop on using Microsoft R for Data Science. In these sessions, you’ll gain hands-on experience with conducting scalable data analysis with Microsoft R Server. You will learn the fundamentals of R, and understand how Microsoft R Server addresses the major scalability and operationalization challenges associated with open source R.

Prerequisites

There are a few things you will need in order to take full advantage of the course:

  • An Azure subscription
  • A terminal emulator with openSSH or bash, e.g., Putty, or Cygwin/MobaXterm
    • I use MobaXterm and the Ubuntu Bash Subsystem within Windows
  • Some R IDE. Some reasonable choices:
    • RStudio
    • Visual Studio 2015 with RTVS (Community Edition is sufficient)
    • Jupyter/JupyterLab with IRKernel
  • Microsoft R Server 8.0.3 or later

I will assume you have already taken the following courses, or have the background provided by these courses:

I will not assume any background knowledge about Microsoft R Server, but for those that are eager, you can find an online video series about MRS here:

  1. Course Website
  2. Video Lectures on Channel 9
  3. Lab Exercises

A useful overview and comparison of MRS and MRO is available here.

Training Objectives

The course is designed to help analysts integrate Microsoft R Server (MRS) into their data science toolbox, and integrate it with other tools in Azure and the Cortana Intelligence Suite. After completion, participants will be able to

  • Explore and visualize data with R
  • Manipulate data that is too large to fit into memory with MRS
  • Train and test statistical models with high performance parallel external memory algorithms
  • Access data stored in Azure Blob Storage using Microsoft R Server (MRS)
  • Deploy Models as AzureML webservices

Syllabus and Timeline

Course Modules

Each Training Module guides you through a logical progression with hands-on tasks in do-verb form. Each day is broken up into 1-4 hour Modules, where you will learn and perform labs on your own. Some material that is out of scope for hands-on labs will instead be demonstrated by instructor led labs. Participants will receive a copy of the lab material to try on their own, but are not required to run the analysis during the training time. The modules, broken up into a general agenda are as follows. The specific modules may bleed across sessions depending on engagement of the audience

Part I - Functional-Object Based Computing with R

Day One - Morning Session

  • Overview of the R Project and CRAN
  • Exploring the Microsoft R Data Stack
  • Functional Programming for Data Manipulation with the dplyr package

Day One - Afternoon Session

  • Understanding dplyr's symantics and the magrittr pipe
  • Data Visualization and Exploratory Data Analysis
  • Using the broom package for Modeling and Summarization

Part II - Breaking the Memory Barrier with RevoScaleR

Day Two - Morning Sesion

  • Overview of the Microsoft R Data Ecosystem
  • Modeling and Scoring with High-Performance ScaleR Algorithms
  • Data Manipulation with the dplyrXdf Package

Day Two - Afternoon Session

  • Summarizing Data with RevoScaleR
  • Performance Considerations with RevoScaleR
  • Parallel Computing and Disributed Computing with Microsoft R Server
  • Deploying R and ScaleR algorithms to Azure with the AzureML package

Part III - Microsoft R Server with Spark

Day 3 – Morning Session

  • Overview of the Apache Spark Project
  • Ingesting Data into Azure Blob Storage
  • Creating Spark DataFrames and Spark Contexts
  • Manipulating HDFS data with the sparklyr package

Day 3 – Afternoon Session

  • Creating Distributed eXternal DataFrames in HDFS
  • Preparing Data for Modeling with Microsoft R Server
  • Training Statistical Models with Microsoft R Server and the Spark Compute Context
  • Scoring and Deploying Models
  • Performance Considerations on Hadoop
You can’t perform that action at this time.
You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session.
Press h to open a hovercard with more details.