-
Notifications
You must be signed in to change notification settings - Fork 1
/
Reshaping.Rmd
316 lines (208 loc) · 11 KB
/
Reshaping.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
---
title: "Shaping humdrum data"
author: "Nathaniel Condit-Schultz"
date: "July 2022"
output: rmarkdown::html_vignette
vignette: >
%\VignetteIndexEntry{Shaping humdrum data}
%\VignetteEncoding{UTF-8}
%\VignetteEngine{knitr::rmarkdown}
editor_options:
markdown:
wrap: sentence
---
```{r, include = FALSE, message=FALSE, echo = FALSE }
source('vignette_header.R')
```
Welcome to "Shaping humdrum data"!
This article explains various steps you can, and often need, to take to prepare humdrum datasets for analysis, using `r hm`.
One of the great strengths of the [humdrum syntax](HumdrumSyntax.html "Intro to the humdrum syntax") is its flexibility---there are lots of ways you can structure your data to conveniently *represent* musical information.
However, when it comes time analyze data, we typically want to "reshape" our data into a particular format that is ideal for *analysis*.
The key idea is this:
given our data *and* our particular research question, we must determine/decide what constitutes a *single data observation*.
We want each data observation, or data point, to correspond to one row in our humdrum table.
Depending on how your humdrum data files are organized, this might not be the case when you first load your data.
In order to shape our data, there are three steps we might need to do to our data:
1. Removing information we don't want.
2. Splitting apart information that is bundled together.
3. Pasting together or aligning information that is currently separated.
Consider the following small humdrum file, which is bundled with `r hm`.
```{r}
example <- readHumdrum(humdrumRroot, 'examples/Reshaping_example.hum')
example
```
This file contains (at least) **seven** different pieces of information!
There are two voices, alto and soprano; Each voice has its rhythm *and* pitch information encoded in its `**kern` spine, plus lyrical information in the next `**silbe` spine.
In addition, the `**harm` spine indicates the harmony accompanying *both* the vocal parts.
Now, depending on what sorts of analyses we want to perform, we need to decide what combinations of these seven information streams consitute a "data observation."
# Filtering Data
The first step is often to simply remove data we don't need.
In *this* article, we'll show you the most common, basic, ways you might filter your data.
For more details about other `r hm` filtering functionality, check out
the [data filtering](Filtering.html "Filtering humdrum data article") article.
## Indexing
For example, *if* we are studying tonality, we might simply want to ignore the lyric data.
These easiest way to do this is to index out spines we don't want, either using numeric indices
```{r}
example[[ , c(1,3,5)]]
```
or (probably better) by exclusive interpretation:
```{r}
example[[ , c('**kern', '**harm')]]
```
## Parsing Token
In this file, the `**kern` spines in our example file include rhythmic data (`**recip`) *and* pitch data.
*If* we are "just" studying tonality, we could extract the pitch information from the `Token` field, and save it to a new field.
For example, we can use the `kern()` function to extract the pitch information from `Token`, and put it in a new field, which we'll call `Pitch`.
```{r}
example |>
mutate(Pitch = kern(Token)) -> example
example
```
Having now isolated the pitch information, we can analyze it.
(Of course, the rhythm information is still present, in the original `Token` field.)
## Other filtering
Of course, there are many other filtering options you might want to do, depending on your research goals.
Perhaps you only want to study pieces/passages in a flat keys, or only works from a particular time period, etc.
A common example would be limiting our rhythmic analyses to music in 4/4 time, and perhaps ignoring pickup notes.
The `TimeSignature` field indicates the time signature, and the `Bar` field can be used as an indicator of pickups at the beginning of pieces, as they are marked `Bar == 0`.
So for example (using the Bach chorale data):
```{r}
chorales <- readHumdrum(humdrumRroot, 'HumdrumData/BachChorales/.*krn')
chorales |>
filter(Bar > 0 & TimeSignature == 'M4/4') -> chorales
chorales
```
# Splitting/Separating Data
Humdrum data often packs multiple pieces of information into compact, concise, readable tokens.
The classic example, or course, is `**kern` itself which often includes rhythm, pitch, phrasing, beaming, and pitch ornamentation information!
These tokens are great for reading/writing, but not for analyzing, so we typically want to separate the information we do want.
### Isolating Pitch and Rhythm
As we've seen, the `**kern` spines in our example file include rhythmic data (`**recip`) *and* pitch data.
In some cases, we might want to access *both* pieces of information, but separately.
We can separate them by applying different functions to the `Token` field, and saving the output to new fields.
For example,
```{r, echo = -1}
example <- readHumdrum(humdrumRroot, 'examples/Reshaping_example.hum')
example |>
mutate(Rhythm = recip(Token),
Pitch = kern(Token)) -> example
example
```
Hey, the printout kind of looks the same...but if you look at the bottom you'll see that there are now separte `Pitch` and `Rhythm` fields.
Since both fields are automatically selected by `mutate()`, we are seeing them both.
But we could, for example, select one or the other field,
```{r}
example |> select(Pitch)
example |> select(Rhythm)
```
or run commands using them.
For example, we could tabulate pitches wherever the rhythmic duration is a quarter note:
```{r}
example |>
filter(Rhythm == '4') |>
count(Pitch)
```
This is only possible because we separated the information which was originally "pasted" together in the `**kern` tokens.
## Rend
What if actually want to move the rhythm and pitch information to separate spines---i.e., into their rows in the humdrum table?
Use `rend()`:
```{r}
example |>
rend(Rhythm, Pitch)
```
# Cleave (Pasting/Aligning)
The next step might be to align/combine information that is currently separated.
In many `r hm` datasets, we have multiple pieces of information spread across multiple spines, or in some cases, across spine paths or stops.
If, given our research question, we need to think of multiple pieces of information as describing a *single data point*, we'll need to reshape the data.
For example, in our `example` file the `**silbe` (lyric) spines associate each syllable with exactly one note in the adjacent `**kern` spines.
Currently, by default, each `**kern` token *and* each `**silbe` token are in their own, separate row of the humdrum table.
We want them in the *same* row.
In the humdrum syntax view, this means moving the `**silbe` data into the same location as the `**kern` tokens.
## Cleaving Spines
To take separate spines---like `**kern` and `**silbe`---and paste them together, we use `cleave()`; as in "to cleave together."
Simply run cleave, and tell it which spines to cleave together.
In our `example` file, the 1st and 3rd spines are `**kern` while the 2nd and 4th spines are `**silbe`, so we can tell `cleave` to cleave together `1:2` and `3:4`:
In our example, we want to align the notes in the `**kern` spines with the syllables in the `**silbe` spine.
We can do this directly using `cleave()`: use the `fold` argument to indicate which spine to fold, and the `onto` argument to indicate which spine to move it onto.
```{r}
example <- readHumdrum(humdrumRroot, 'examples/Reshaping_example.hum')
example |>
cleave(1:2, 3:4)
```
It worked! The 1st and 3rd spines have dissappeared, with their content now in the 1st and 3rd spines.
But notice that the cleave put some data into a new field, which it has called `Spine2|4`.
We can improve this name using the `newFields` argument:
```{r}
example |>
cleave(1:2, 3:4, newFields = 'Silbe') -> example
example
```
But, wait, where did the `**kern` and `**sible` data go exactly?
Well, since the `**kern` spines were the first spines we indicated to `cleave()`, that data stays in the original `Token` field.
The other spines---which in this case are the `**sible` data---are put into the new field(s).
See:
```{r}
example |> select(Token)
example |> select(Silbe)
```
Notice that the fifth spine, which wasn't part of the cleave, is also left untouched in the `Token` field.
------------------------------------------------------------------------
In some datasets, there might be different numbers of `**kern`/`**silbe` spines in different files within the dataset.
In these cases, it will be much smarter to pass `cleave()` the names of the exclusive intepretations we want to cleave.
We should get the same result we got manually before:
```{r}
example <- readHumdrum(humdrumRroot, 'examples/Reshaping_example.hum')
example |>
cleave(c('kern', 'silbe'))
```
----
What if we want to study the note combinations that are formed between the two `**kern` spines?
As they are, this would be hard because the tokens from each spine are all in their own rows.
However, we could `cleave()` together the two kern spines!
(We'll also isolate *just* the `**kern` spines for this.)
```{r}
example <- readHumdrum(humdrumRroot, 'examples/Reshaping_example.hum')
example |>
index2( , '**kern') |>
kern(simple = TRUE) |>
cleave(c(1, 2)) |>
count()
```
### Multi-cleaving
What about that `**harm` spine in our data.
Unlike the `**silbe` spines, the `**harm` spine is not "paired up" with the `**kern` spines.
Rather, the `**harm` actually describes what is happening in the entire *record* of data.
The chords indicated in this spine are associated with the pitches in both, or either, of the `**kern` spines.
Luckily, `cleave()` will handle this:
```{r}
example <- readHumdrum(humdrumRroot, 'examples/Reshaping_example.hum')
example |>
cleave(c('kern', 'harm'))
```
Data from the `**harm` spine is copied *twice* into a new field "on top of" *both* `**kern` spines!
## Cleaving Stops and Paths
Though spines are the most common structure in humdrum data that you might need to "cleave" together, we can also cleave other structures.
Of course, it depends on what questions you are trying to ask of your data!
### Multi-Stops
Consider this example, with multi-stop chords in a `**kern` spine:
```{r}
example_stops <- readHumdrum(humdrumRroot, 'examples/Reshaping_example2_stops.hum')
example_stops
```
By default, `r hm` treats each token (note) in *each* stop as a separate data observation, with its own row in the humdrum table.
If we are studying harmony, we might want to align those stops "on top" of each other, in different fields.
But, we can cleave the stops together, putting them each into their own field!
```{r}
example_stops |>
cleave(Stop = 1:3)
```
The first stop (`Stop == 1`) is left in the `Token` field, but we get new `Stop2` and `Stop3` fields holding the other stops!
### Spine Paths
When working with spine paths, it is often less obvious how we should interpret different paths in terms of data observations, but if you want to cleave them, you can.
Don't forget that `Path` is numbered starting from `0`.
```{r}
example_paths <- readHumdrum(humdrumRroot, 'examples/Reshaping_example3_paths.hum')
example_paths |>
cleave(Path = 0:1)
```