Various forays into the data science of beer.
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
clusterfun
compile
docs
hashtag_beer
helpers
img
img_old
present
run_it
scratchpad
writeup
.Rprofile
.gitattributes
.gitignore
README.md
beer_data_science.Rproj

README.md

Overview

This is a preliminary, strictly for-fun foray into beer data. Pairs well with most session IPAs.

You can find the main report in compile/compile.md, read an interview about the project, or flip through the presentation slides for a talk given at RLadies Chicago. (The code behind those slides is in the present directory.)


The overarching question I went into the analysis with was: how well do beer styles actually describe the characteristics of beers within each style? In other words, do natural clusters in beer align well with style boundaries?

I set about answering this with a mix of clustering (k-means) and classification (neural net and random forest) methods. If you want to play around with clustering interactively, there's a small Shiny app that'll let you do that.


Structure


Reproduce it

All beer data was grabbed from the BreweryDB API, converted from JSON into a dataframe, and dumped into a MySQL database. To grab the data yourself, you can create an API key on BreweryDB run the run_it.R script inside the run_it directory. For a quicker but less up-to-date solution (the BreweryDB database is updated pretty frequently) you might consider stashing the data in a CSV or .feather file.

This analysis deals mainly with beer and its consituent components like ingredients (hops, malts) and other characteristics like bitterness and alcohol content. However, you can easily construct your own function for grabbing other things like breweries, glassware, locations, etc. by running the function generator in helpers/construct_funcs.R.

Any and all feedback is more than welcome. Cheers!