-
Notifications
You must be signed in to change notification settings - Fork 1
/
Context.Rmd
699 lines (478 loc) · 23.6 KB
/
Context.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
---
title: "Contextualizing humdrum data"
author: "Nathaniel Condit-Schultz"
date: "July 2022"
output: rmarkdown::html_vignette
vignette: >
%\VignetteIndexEntry{Contextualizing humdrum data}
%\VignetteEncoding{UTF-8}
%\VignetteEngine{knitr::rmarkdown}
editor_options:
markdown:
wrap: sentence
---
```{r, include = FALSE, message=FALSE, echo = FALSE }
source('vignette_header.R')
```
Welcome to "Contextualizing humdrum data"!
This article explains how `r hm` can be used contextualize musical data.
When analyzing musical data, we often treat each and every data token as a separate, independent "data point."
However, in many cases, we want to consider data points *in context*---what other data points are nearby in the data?
Since the [humdrum syntax](HumdrumSyntax.html) encodes data in temporal order, the "context" usually means either "what is happening before or after this data point?" or "what else is happening at the same time as this this data point?"
`r Hm` provides a number of ways of analyzing data "in context."
This article, like all of our articles, closely parallels information in `r hm`'s detailed code documentation, which can be found in the "[Reference](https://humdrumr.ccml.gtcmt.gatech.edu/reference/index.html#reading-and-writing#manipulating-humdrum-data "HumdrumR function reference, Manipulating humdrum data")" section of the `r hm` [homepage](humdrumR.ccml.gtcmt.gatech.edu).
You can also find this information within R, once `r hm` is loaded, using `?context`, `group_by`, or `?withinHumdrum`.
# Groupby
The most conventionally "R-style" way to look at data context is using R/`r hm`'s various "group by" options.
This functionality is described elsewhere, for example in the [Working With Data](WorkingWithData.html) article, and in the `within.humdrumR()` man page.
Group-by functionality isn't necessarily connected to temporal context: you can, for instance, group together all the data from each instrument in an ensemble, across all songs in a dataset---this is useful, but non-temporal-context.
If you want to use `groupby` to get more temporal context, here are a few good options:
### Group by Record
All `r hm` data has a `Record` field, indicating data points that occur at the same time.
Using `group_by()` we can perform calculations grouped by record.
Let's load our trusty Bach-chorale data:
```{r message=FALSE}
chorales <- readHumdrum(humdrumRroot, 'HumdrumData/BachChorales/.*krn')
```
Let's count how many new note onsets (*not* rests) occur on each record of the data, and tabulate the results.
We'll look for tokens that *don't* contain an `r` (for rest), using a regular-expression match `%~% 'r'` and the negating bang (`!`).
```{r, echo = 6:9}
chorales |>
group_by(Piece, Record) |>
with(sum(!Token %~% 'r')) |>
count() -> counts
chorales |>
group_by(Piece, Record) |>
with(sum(!Token %~% 'r')) |>
count()
```
Most records (`r table(counts)['4']` out of `r sum(counts)`) have new notes in all four voices---no surprise for choral music---with *one* onset being the next most common case (`r table(counts)['1']` records).
Note that, when we called `group_by()`, we included `Piece` *and* `Record`.
Why? Because each piece in the data set repeats the same record numbers---you wouldn't want to count the 17th record from the 3rd chorale together with the 17th record from the 7th chorale, for example.
When using `group_by()`, you'll almost always want to have `Piece` as a grouping factor.
### Group by Bar
Most humdrum data includes bar lines, indicated by `=`.
When `r hm` reads data (`?readHumdrumR`) data, it will look at these barlines, count them in each file, and put that count into a field called `Bar` (`?fields`).
You can then use this data to group data by bar.
Remember, that if your data set has no `=` tokens, the `Bar` field will be nothing but useless `0`s.
What if we wanted to know what the lowest note in each bar of the music is?
Let's first extract `semits()` data, so it is easy to get the lowest value:
```{r}
chorales |>
mutate(Semits = semits(Token),
Kern = kern(Token)) -> chorales
```
We can now group bars (within pieces, once again!), get the minimum (using `which.min()`), and tabulate them.
```{r, echo = 6:9}
chorales |>
group_by(Piece, Bar) |>
with(Kern[which.min(Semits)]) |>
count() |> table() -> lownotes
chorales |>
group_by(Piece, Bar) |>
with(Kern[which.min(Semits)]) |>
count()
```
The highest lowest-note-in-bar is `r names(lownotes)[max(which(lownotes > 0))]`.
The most common lowest-note-in-bar is `r names(lownotes)[which.max(lownotes)]`.
---
For further analysis, we might want to save the lowest-note into a new field, so we'll use `mutate()`.
By default, the `mutate()` function will [recycle][recycling] any scalar (length one) result throughout its group,
which is usually what we want:
```{r}
chorales |>
group_by(Piece, Bar) |>
mutate(LowNote = min(Semits)) |>
ungroup() -> chorales
chorales
```
(Note that we use `ungroup()` before overwriting `chorales`, so that grouping doesn't affect our future analyses.)
We could then, for example, subtract the bar's lowest note from each note:
```{r}
chorales |>
mutate(Semits - LowNote)
```
### Group by Beat
Another useful contextual group is the beat.
There will be no automatic "beat" field in our data, so we'll need to make one---obviously, we need to have data with rhythmic information encoded (like `**kern` or `**recip`) to do this!
We can use the `timecount()` function to count the beats in our data.
We could count quarter-notes by setting `unit = '4'`; alternatively, if our data includes time-signature interpretations (like `*M3/4`) we could use the `TimeSignature` field to get the tactus of the meter.
Let's try the later and save the result to a new field, which we'll call `Beat` (or anything else you want).
```{r}
chorales |> select(Token) |>
mutate(Beat = timecount(Token, unit = tactus(TimeSignature))) -> chorales
```
We can now group by beat (and piece, of course).
Maybe we want to know the range of notes in each beat:
```{r}
chorales |>
group_by(Piece, Beat) |>
with(diff(range(Semits))) |>
draw(xlab = 'Pitch range within beat')
```
> Note: If your data includes spine paths, you'll want to set `mutate(timecount(Token), expandPaths = TRUE, ...)`.
> Count (and similar functions, like `timeline()`) won't work correctly without paths expanded (`?expandPaths`).
# Contextual Windows
When you use `group_by()`, your data is *exhaustively* partitioned into *non-overlapping* groups; the groups are also not necessarily ordered---we have to explicitly use (temporally) ordered groups like `group_by(Piece, Record)` if we want temporal context.
In contrast, the `context()` function, when used in combination with `with()` or `mutate()`, gives us much more flexible control over the context we want for our data.
`context()` takes an input vector and treats it as an ordered sequence of information.
It then identifies arbitrary contextual windows in the data, based on your criteria.
The windows created by `context()` are always sequentially ordered, aren't necessarily exhaustive (some data might not fall in any window), and can *overlap* (some data falls in multiple windows).
We use `context()` by telling indicating when to "open" (begin) and "close" (end) contextual windows, using the `open` and `close` arguments.
Context can use many criteria to open/close windows.
In the following sections, we layout just a few examples; we'll use our chorale data again, as well as the built-in Beethoven variation data:
```{r message=FALSE}
chorales <- readHumdrum(humdrumRroot, 'HumdrumData/BachChorales/.*krn')
beethoven <- readHumdrum(humdrumRroot, 'HumdrumData/BeethovenVariations/.*krn')
```
For most examples, we'll use `within()` and run the command `paste(Token, collapse = '|')`.
This will cause all the tokens within a contextual group to collapse to single string, separated by `|`.
This is the simplest/fastest way to *see* what `context()` is doing!
## Regular windows
In some cases, we simply want to progress through data and open/close windows at regular locations.
We can do this using the `hop()` function.
(`hop()` is a `r hm` function which is very similar to R's base `seq()` function; however, `hop()` has some special features and, more importantly, gets special treatment from `context()`).
Maybe we want to open a four-note window on every note:
```{r}
chorales |>
context(hop(), open + 3) |>
within(paste(Token, collapse = '|'))
```
Cool, we've got overlapping four-note windows, running through each spine!
How did we do this?
### Regular Open
The first argument to `context()` is the `open` argument;
we give `open` a set of indices (natural numbers) where to open windows in the data.
`hop()` simply generates a sequence of numbers "along" the input data---by default, the "hop size" is `1`, but we can change that.
For example:
```{r}
hop(letters, by = 1)
hop(letters, by = 2)
```
When we use `hop()` inside `context()`, it automatically knows what the input vector(s) (data fields) to hop along are.
So now check this out:
```{r}
chorales |>
context(hop(2), open + 3) |>
within(paste(Token, collapse = '|'))
```
By saying `hop(2)`, our windows only open on every *other* index.
If we don't want our four-note windows to overlap at all, we could set `hop(4)`:
```{r}
chorales |>
context(hop(4), open + 3) |>
within(paste(Token, collapse = '|'))
```
Note that the `by` argument can be a vector of hop-sizes, which will cause `hop()` to generate a sequence of irregular hops!
You can also use `hop()`'s `from` and `to` arguments to control when the first/last windows occur, etc.
When using `to`, you can refer to another special variable---`end`---which `context()` will interpret as the last index.
So you could, for example, say `hop(from = 5, to = end - 5)`.
### Fixed Close
So `hop()` is telling `context()` when to open windows; how are we telling it when to close windows?
The second argument to `context()` is the `close` argument.
Like `open`, the `close` argument should be a set of natural-number indices.
However, the cool thing is that the `close` argument can *refer to* the `open` argument.
So rather than manually figuring out what index would go with each `open` (don't try, because the multiple spines etc. make it confusing), we simply say `open + 3`.
If we want five-note windows, we can say `open + 4`.
----
What if we want to the window to simply close before the next opening?
We can do this by having the `close` argument refer to the `nextopen` variable.
So instead of saying `open + 3` we say `nextopen - 1L`!
```{r}
chorales |>
context(hop(4), nextopen - 1) |>
within(paste(Token, collapse = '|'))
```
Now we'll get exhaustive windows no matter where we open the windows.
For example, we can change the hop size and still get exhaustive windows!
```{r}
chorales |>
context(hop(6), nextopen - 1) |>
within(paste(Token, collapse = '|'))
```
We could also close windows at `nextopen - 2` or `nextopen + 1`---whatever we want!
### Flip it around
We don't have to define `open` first and have `close` refer to open---we can also do the opposite!
```{r}
chorales |>
context(close - 3, hop(4, from = 4)) |>
within(paste(Token, collapse = '|'))
```
We've got the exact same result by telling the window closes to hop along regularly (every four indices) and having the `open` argument refer to the closing indices (`close - 3`).
The `open` argument can also refer to the `prevclose` (previous close).
Notice that our output windows are still "placed" to align with the opening---we can set `alignLeft = FALSE` in our call to `within.humdrumR()`, if we want the output to align with the close:
```{r}
chorales |>
context(close - 3, hop(4, from = 4)) |>
within(paste(Token, collapse = '|'),
alignLeft = FALSE)
```
----
Note that these regular windows we are creating are examples of N-grams.
`r Hm` also defines another approach to defining N-grams which will generally be faster than using `context()`---this alternative approach is described in the last section of this article.
## Irregular Windows
The regular windows we created in the previous section are useful, but `context()` can do a *lot* more.
You can tell `context()` to open, or close, windows based on arbitrary criteria!
For example, let's say you want to open a window any time the leading tone occurs, and stay open until the next tonic.
To do this, let's get the `solfa()` data into a `Solfa` field:
```{r}
chorales |>
solfa(Token, simple = TRUE) -> chorales
```
Alright, the easy thing to do here is to give `context()`'s `open`/`close` arguments `character` strings, which are matched as regular expressions against the active field:
```{r}
chorales |>
select(Solfa) |>
context('ti', 'do') |>
with(paste(Solfa, collapse = '|'))
```
Pretty cool!
But wait, something seems odd; one of the outputs is `"ti-do-fa-re-so-so-fa-so-la-re-so-fa-mi-re-di-re-ti-do"`.
Why doesn't this window close when it hits that first "do"?
This has to do with `context()`s treatment of overlapping windows, which is controlled by the `overlap` argument.
By default, `overlap = 'paired'`, which means `context()` attempts to pair each open with the next *unused* close---the reason we don't close on the first "do" in `"ti-do-fa-re-so-so-fa-so-la-re-so-fa-mi-re-di-re-ti-do"`, is because the "do" was *already* the close of the previous window.
For this analysis, we might want to try `overlap = 'none'`: with this argument, a new window will only open after the current window is closed.
```{r}
chorales |>
select(Solfa) |>
context('ti', 'do', overlap = 'none') |>
with(paste(Solfa, collapse = '|'))
```
Another option would be to allow multiple windows to close at the same place (i.e., on the same "do").
This can be achieved with the setting `overlap = 'edge'`:
```{r}
chorales |>
select(Solfa) |>
context('ti', 'do', overlap = 'edge') |>
with(paste(Solfa, collapse = '|'))
```
----
If this is a lot to wrap your head around, you are not the only one!
There are many ways to define/control how contextual windows are defined, and it's often difficult to decide what we want in a particular analysis.
You can read the `context()` documentation for some more examples, or simply play around!
The following sections layout a few more examples, just to illustrate some possibilities.
### More criteria
What if want to have our windows close on tonic (do), but only on long-durations?
Let's extract duration information into a new field:
```{r}
chorales |>
select(Token) |>
duration(Token) -> chorales
```
We could now do something like this:
```{r}
chorales |>
select(Solfa) |>
context('ti',
Solfa == 'do' & Duration >= .5,
overlap = 'edge') |>
with(paste(Solfa, collapse = '|'))
```
Notice that, because our `close` expression is more complicated now, I had to explicitly say `Solfa == 'do'` instead of using the shortcut of just providing a single string.
### Open until next/previous
We can use what we learned above about the `nextopen` and `nextclose` variables to make windows open/close at matches.
For example, we could have windows close every time there is a fermata (`";"` token in `**kern`) in the data, but *open* again immediately after each fermata:
```{r}
chorales |>
select(Token) |>
context(prevclose + 1, ';') |>
with(paste(Token, collapse = '|'))
```
There is an issue here: the first fermata in the data doesn't get paired with anything, because there is no "previous close" before it.
This will happen whenever you use `nextopen` or `prevclose`!
You can fix this by explicitly adding an opening window at `1`:
```{r}
chorales |>
select(Token) |>
context(1 | prevclose + 1, ';') |>
with(paste(Token, collapse = '|'))
```
By using the `|` (or) command in `context()`, we are saying open a window at `1` *or* at `prevclose + 1`.
When working with `nextopen` you might want to use the special `end` argument (only available inside `context()`), which is the last index in the input vector.
### Semi-fixed windows
In some cases, you might want to have windows open (or close) at a fixed interval, but close based on something irregular.
We can do this easily by combining what we've already learned.
For example, we could open a window on every third index, but close only when we see a fermata.
We'll want to use `overlap = 'edge'` again.
```{r}
chorales |>
select(Token) |>
context(hop(4), ';', overlap = 'edge') |>
with(paste(Token, collapse = '|'))
```
### Slurs
A common case of contextual information in musical scores are slurs, which are used in to indicate articulation (e.g., bowing) and phrasing information.
In `**kern`, slurs are indicated with parentheses, like `(` or `)`.
To see some examples, let's look at our `beethoven` dataset, which we loaded above.
We will start by removing multi-stops (which would make this much more complicated) and extracting only the `**kern` data.
```{r}
beethoven |>
filter(Exclusive == 'kern' & Stop == 1) |>
removeEmptySpines() |>
removeEmptyStops() -> beethoven
```
We can see parentheses used to indicate slurs in the piano parts.
Let's say we want to get the length of all these slurred groups:
```{r}
beethoven |>
context('(', ')') |>
with(length(Token)) |>
count()
```
Most of the slurs are only 2, 3, or 4 notes.
But there is one that is 13! I wonder where that is?
```{r}
beethoven |>
context('(', ')') |>
mutate(SlurLength = length(Token)) |>
uncontext() |>
group_by(File, Bar, Spine) |>
select(Token) |>
filter(any(SlurLength == 13))
```
There it is!
---
Ok, what if we want to collapse our slurred notes together, like we've been doing throughout this article?
```{r}
beethoven |>
context('(', ')', overlap = ) |>
within(paste(Token, collapse = '|'))
```
That worked...but we lost all the *unslurred* notes.
We can recover these tokens when we call `uncontext()`.
Normally, `uncontext()` just removes contextual windows from your data, which doesn't actually change any data fields.
However, if you provide a `complement` argument, which must refer to an existing field, that "complement" field will be filled
into and the currently selected field, wherever no contextual window was defined.
(This behavior is similar to the `complement` argument of `unfilter()`.)
```{r}
beethoven |>
context('(', ')', overlap = ) |>
within(paste(Token, collapse = '|')) |>
uncontext(complement = 'Token')
```
### Nested windows
In some cases, we might have contextual windows "nested" inside each other.
For example, slurs in sheet music might overlap to represent fine gradations in articulation.
The `context()` funciton can handle nested windows by setting `overlap = 'nested'`.
Here is an example file we can experiment with:
```{r}
nested <- readHumdrum(humdrumRroot, 'examples/Phrases.krn')
nested
```
We've got nested slurs. Let's try `context(..., overlap = 'nested')`:
```{r}
nested |>
context('(', ')', overlap = 'nested') |>
with(paste(Token, collapse = '|'))
```
Very good! We get all our windows, including the nested ones.
(Look how the result differs if you set `overlap = 'paired'`.)
But what if we only want the topmost or bottommost slurs?
Use the `depth` argument: `depth` should be one or more non-zero integers, indicating how deeply nested you want your windows to be.
`depth = 1` would be the "top" (unnested) layer, `2` the next-most nested, etc.
You can also use negative numbers to start from the most nested and work backwards: `-1` is the most deeply nested layer, `-2` the second-most deeply nested, etc.
Finally, you can specify more than one depths by making depth vector, like `depth = c(1,2)`.
```{r}
nested |>
context('(', ')', overlap = 'nested', depth = 1) |>
with(paste(Token, collapse = '|'))
nested |>
context('(', ')', overlap = 'nested', depth = 2) |>
with(paste(Token, collapse = '|'))
nested |>
context('(', ')', overlap = 'nested', depth = 2:3) |>
with(paste(Token, collapse = '|'))
nested |>
context('(', ')', overlap = 'nested', depth = -1) |>
with(paste(Token, collapse = '|'))
```
# N-grams
In the previous section, we saw that the `context()` function can be used to create n-grams (and so much more).
`r Hm` also offers a different, lag-based, approach to doing n-gram analyses.
The lag-based approach is more fully vectorized than `context()` which makes it extremely fast, but also less general purpose.
Depending on what you are doing with you n-grams, `context()` may be the only way that works---basically, if you want to apply an expression separately to each every n-gram, you need to use `context()`.
The idea of lag-based n-grams can be demonstrated quite simply using the `letters` vector (built in to R) and the `lag(n)` command.
The `lag(n)` command "shifts" a vector over by `n` indices:
```{r}
cbind(letters, lag(letters), lag(letters, n = 2))
```
What happened here? We give the `cbind()` function three separate arguments: 1) the normal `letters` vector; 2) `letters` lagged by 1; 3) `letters` lagged by 3.
These three arguments are bound together into a three-column `matrix`.
We can do the same thing with `paste()`:
```{r}
paste(letters, lag(letters), lag(letters, n = 2))
```
We made three-grams!
This approach, if used with fully-vectorized functions will be extremely fast, even for large datasets.
## Lag within humdrumR
The [with/dplyr][?withHumdrum] functions allow you to create lagged vectors in a special, concise way.
Let's work again with the chorales, using just simple `kern()` data:
```{r message=FALSE}
chorales <- readHumdrum(humdrumRroot, 'HumdrumData/BachChorales/.*krn')
chorales |>
kern(simple = TRUE) -> chorales
```
When using `with()`/`mutate()`/etc., instead of writing `lag(x, n = 1)`, we can write `x[lag = 1]`.
We can then paste a field (like `Kern`) to itself *lagged*, like this:
```{r}
chorales |>
within(paste(Kern, Kern[lag = 1], sep = '|'))
```
We can use negative or positive lags, depending on how we want the n-grams to line up:
```{r}
chorales |>
within(paste(Kern, Kern[lag = -1], sep = '|'))
```
An important point! `with()`/`mutate()`/etc. will automatically group lagged data by `list(File, Spine, Path)`, so the n-grams won't cross
from the end of one file/spine to the beginning of the next, etc.
----
`paste()` isn't the only vectorized function we might want to apply to lagged data.
Another common example would be `count()`:
```{r}
chorales |>
with(count(Kern, Kern[lag = -1]))
```
We get a transition matrix!
## Larger N
For functions that accept unlimited arguments (`...`), like `paste()`, and `count()`, you can easily extend the principle to create longer n-grams:
```{r}
chorales |>
within(paste(Kern, Kern[lag = -1], Kern[lag = -2], sep = '|'))
```
But there's an even better way! Simply give that `lag` argument a *vector* of lags!
In fact, `lag = 0` spits out the unlagged vector, so you can do it all in a single index command:
```{r}
chorales |>
within(paste(Kern[lag = 0:-2], sep = '|'))
```
Let's create 10-grams, and see what the most frequent 10-grams are:
```{r}
chorales |>
with(paste(Kern[lag = 0:-9], sep = '|')) |>
table() |>
sort() |>
tail(n = 10)
```
That's not what we want!
When you do lagged n-grams, the first and last n-grams get "padded" with `NA` values.
We can use the R function `grep(invert = TRUE, value = TRUE)` to get rid of these:
```{r}
chorales |>
with(paste(Kern[lag = 0:-9], sep = '|')) |>
grep(pattern = 'NA', invert = TRUE, value = TRUE) |>
table() |>
sort() |>
tail(n = 10)
```
That still doesn't seem right, does it?
Actually, it is right: in these 10 chorales, there are no 10-gram pitch patterns that occur more than once!
Let's try a 5-gram instead:
```{r}
chorales |>
with(paste(Kern[lag = 0:-4], sep = '|')) |>
grep(pattern = 'NA', invert = TRUE, value = TRUE) |>
table() |>
sort() |>
tail(n = 10)
```
Now we see a couple of n-grams (like `d e d c b`) that occur more often.