Skip to content
Workshop (6 hours): preprocessing, cross-validation, lasso, decision trees, random forest, xgboost, superlearner ensembles
Branch: master
Clone or download
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
binder update required packages Apr 10, 2019
data-raw move inbound to data-raw Apr 10, 2019
data
docs slides minor updates May 10, 2019
solutions challenge 4 update May 10, 2019
.gitignore update gitignore Apr 10, 2019
01-overview.Rmd update required packages Apr 10, 2019
02-preprocessing.Rmd preprocesssing: add some edits from the last time we gave this workshop Jul 15, 2019
03-lasso.Rmd lasso: refactor for cleaner data management, other edits Apr 10, 2019
04-decision-trees.Rmd 04-decision-tree.Rmd cp fix Apr 18, 2019
05-random-forest.Rmd update 04-random-forest.Rmd May 10, 2019
06-xgboost.Rmd 04-xgboost.Rmd big question number fix May 10, 2019
07-ensembles.Rmd 07-ensembles.Rmd small update May 10, 2019
LICENSE license update Mar 8, 2017
Machine-Learning-in-R.Rproj slides minor updates Mar 13, 2019
README.md algorithm update Apr 10, 2019
_bookdown.yml Add bookdown config files Feb 16, 2019
_build.sh Add bookdown config files Feb 16, 2019
_output.yml Add bookdown config files Feb 16, 2019
book.bib Add bookdown config files Feb 16, 2019
index.Rmd
now.json Add bookdown config files Feb 16, 2019
packages.bib Add bookdown config files Feb 16, 2019
participant-instructions.md
preamble.tex Add bookdown config files Feb 16, 2019
style.css Add bookdown config files Feb 16, 2019
toc.css Add bookdown config files Feb 16, 2019

README.md

Machine Learning in R

This is the repository for D-Lab’s Introduction to Machine Learning in R workshop. View the associated slides here.

RStudio Binder: Binder

Content outline

  • Background on machine learning
    • Classification vs regression
    • Performance metrics
  • Data preprocessing
    • Missing data
    • Train/test splits
  • Algorithm walkthroughs
    • Lasso
    • Decision trees
    • Random forests
    • Gradient boosted machines
    • SuperLearner ensembling

Getting started

Please follow the notes in participant-instructions.md.

Assumed participant background

We assume that participants have familiarity with:

  • basic R syntax
  • statistical concepts such as mean and standard deviation

Technology requirements

Please bring a laptop with the following:

Resources

Browse resources listed on the D-Lab Machine Learning Working Group repository. Scroll down to see code examples in R and Python, books, courses at UC Berkeley, online classes, and other resources and groups to help you along your machine learning journey!

Slideshow

The slides were made using xaringan, which is a wrapper for remark.js. Check out Chapter 7 if you are interested in making your own! The theme borrows from Brad Boehmke's presentation on Decision Trees, Bagging, and Random Forests - with an example implementation in R.

You can’t perform that action at this time.