-
Notifications
You must be signed in to change notification settings - Fork 10
/
Lyrical.Rpres
66 lines (50 loc) · 1.53 KB
/
Lyrical.Rpres
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
Computational investigations of Dav's favorite pop
==================================================
author: Dav Clark
date: Dec 10, 2014
About this presentation
=======================
Everything in this slide is executed dynamically when I generate the slides. So, you can see exactly what code was used to generate *everything*. Except this (why?):
```{r, eval=FALSE}
source('install-packages.R')
```
The formatting is handled by [Rmarkdown](http://rmarkdown.rstudio.com/) (the format for links is not obvious!).
The Data
========
I downloaded some song lyrics (usng python)
- Jay Z
- Billy Joel
- Miley Cyrus
- Macklemmore
They are in text files in the `lyrics` directory. And I wrote a script to load them in and clean them up:
```{r}
source('load-data.R')
```
What's in the corpus?
=====================
Let's check the first two lines of the first document. We'll need to load the tm package here to make the objects fully functional.
```{r}
library(tm)
lyrics.corpus[[1]][1:7]
```
Compute term frequencies!
=========================
```{r}
lyrics.dtm <- TermDocumentMatrix(lyrics.corpus, control=list(minDocFreq = 2))
print(lyrics.dtm)
```
That's a lot of terms...
Grab the most frequent terms
============================
```{r}
library(slam)
freq.terms <- findFreqTerms(lyrics.dtm, lowfreq=100)
# Note - this is the row_sums method from slam!
freq.counts <- row_sums(lyrics.dtm[freq.terms,])
```
Word Cloud!
===========
```{r, fig.align='center', fig.fig.cap='Words in Lyrics'}
library(wordcloud)
wordcloud(names(freq.counts), freq.counts)
```