analyse wfmu playlists
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
www
.RData
.Rhistory
.gitattributes
.gitignore
DJKey.RData
README.md
WFMU_analyzeShiny.r
allDJArtists.RData
analyze.r
analyzeCommon.rmd
analyze_playlists.rmd
arcDiagram.R
artistTokens.RData
artistfreq.txt
arts_data.zip
arts_data_test.csv
arts_data_train.csv
city_pop.rdata
clean_playlists.r
deaths.rdata
djDocs.RData
djSimilarity.RData
dj_key.csv
djdtm.RData
djs_and_artists.csv
djtdm_all.RData
djtdm_off.RData
djtdm_on.RData
docMatrix.RData
docMatrix.csv
docTermMatrix.RData
forcegitoverwrite.cmd
kenstuff.r
playlist_of_death.rmd
playlists.Rdata
playlists_full.Rdata
playlists_raw.RData
playlisturls.Rdata
precalc_similiarity.r
readme.txt
rock_deaths 2.rmd
rock_deaths.rmd
scrape_oneoffs.r
scrape_playlists.R
scrape_playlists.rmd
spellcheker.r
wfmu supplement 1.rmd
wfmu supplement 2.rmd
wfmu.Rproj

README.md

title output author
My Science Project: A Statistical Examination of 33 Years of Playlists WFMU.org.
html_notebook
code_folding
hide
Arthur Steinmetz

Free form radio, WFMU.ORG maintains a huge trove of past playlists from many DJ's radio shows. More recently, web-only programming has been added to this. This dataset offers lots of opportunities for analysis. I scraped all the playlists I could from the web site and started asking questions.

The scraping and data-cleaning process was the most time consumer part of the exercise. Playlist tables are not in a consistent format and my HTML skills are rudimentary. Further, the DJ's are inconsistent in how they enter artist names. There are 12 different ways Andy Breckman can misspell "Bruce Springsteen" I take a stab at fixing some of the glaring errors. Additionally, many artist names are variant due to collaborators with "featuring," "and the," "with," etc in the name. I condense the first two words of every artist name into a token and drop the rest. Also, many DJs have signature songs to open and/or close their shows. Including these skews the play count for songs. I have chosen to strip those out, or try to. In a few cases the air date is clearly wrong. I strip those shows out as well. The result is an reasonably accurate but incomplete record of all the playlists available at WFMU.ORG as of July 2017. The code used for scraping and cleaning is in scrape_playlists.r and clean_playlists.r. The notebook with the analysis is analyze_playlists.rmd. playlists.RData has the cleaned playlist data used by analyze_playlists.rmd. playlists.zip (contains playlists.csv) is available for your convenience.

In the future, I plan to make an interactive tool to query the data set.

Thanks to station manager, Ken Freedman, for giving me permission to scrape the site.