See the Fall 2020 tidymodels update!

https://github.com/dlab-berkeley/Machine-Learning-with-tidymodels

Machine Learning in R

This is the repository for D-Lab’s Introduction to Machine Learning in R workshop. View the associated slides here.

RStudio Binder:

Content outline

Background on machine learning
- Classification vs regression
- Performance metrics
Data preprocessing
- Missing data
- Train/test splits
Algorithm walkthroughs
- Lasso
- Decision trees
- Random forests
- Gradient boosted machines
- SuperLearner ensembling
- Principal component analysis
- Hierarchical agglomerative clustering
Challenge questions

Getting started

Please follow the notes in participant-instructions.md.

HAVE FUN! :^)

The seven algorithm R Markdown files (lasso, decision tree, random forest, xgboost, SuperLearner, PCA, and clustering) are designed to function in a standalone manner.

After installing and librarying the packages in 01-overview.Rmd, run all the code in 02-preprocessing.Rmd to preprocess the data. Then, open any one of the seven algorithm R Markdown files and "Run All" code to see the results and visualizations!

Assumed participant background

We assume that participants have familiarity with:

Basic R syntax
Statistical concepts such as mean and standard deviation

Technology requirements

Please bring a laptop with the following:

R version 3.5 or greater
RStudio integrated development environment (IDE) is highly recommended but not required.

Resources

Browse resources listed on the D-Lab Machine Learning Working Group repository. Scroll down to see code examples in R and Python, books, courses at UC Berkeley, online classes, and other resources and groups to help you along your machine learning journey!

Slideshow

The slides were made using xaringan, which is a wrapper for remark.js. Check out Chapter 7 if you are interested in making your own! The theme borrows from Brad Boehmke's presentation on Decision Trees, Bagging, and Random Forests - with an example implementation in R.

Name		Name	Last commit message	Last commit date
Latest commit History 375 Commits
Challenges		Challenges
binder		binder
data-raw		data-raw
data		data
docs		docs
visuals		visuals
.gitignore		.gitignore
01-overview.Rmd		01-overview.Rmd
02-preprocessing.Rmd		02-preprocessing.Rmd
03-lasso.Rmd		03-lasso.Rmd
04-decision-trees.Rmd		04-decision-trees.Rmd
05-random-forest.Rmd		05-random-forest.Rmd
06-xgboost.Rmd		06-xgboost.Rmd
07-ensembles.Rmd		07-ensembles.Rmd
08-PCA.Rmd		08-PCA.Rmd
09-hclust.Rmd		09-hclust.Rmd
LICENSE		LICENSE
Machine-Learning-in-R.Rproj		Machine-Learning-in-R.Rproj
README.md		README.md
_bookdown.yml		_bookdown.yml
_build.sh		_build.sh
_output.yml		_output.yml
book.bib		book.bib
index.Rmd		index.Rmd
now.json		now.json
packages.bib		packages.bib
participant-instructions.md		participant-instructions.md
preamble.tex		preamble.tex
style.css		style.css
toc.css		toc.css

License

dlab-berkeley/Machine-Learning-in-R

Folders and files

Latest commit

History

Repository files navigation

See the Fall 2020 tidymodels update!

Machine Learning in R

Content outline

Getting started

HAVE FUN! :^)

Assumed participant background

Technology requirements

Resources

Slideshow

About

Topics

Resources

License

Stars

Watchers

Forks

Languages