vignettes/GettingStarted.Rmd

---
title: "Getting started with humdrumR"
author: "Nathaniel Condit-Schultz"
date:   "`r Sys.Date()`"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Getting started with humdrumR}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---


```{r, include = FALSE}
source('vignette_header.R')

```


Welcome to "Getting started with `r hm`"!
This article provides a quick introduction to the basics of `r hm`, getting you started loading humdrum data and performing (very) simple analyses of humdrum data.
Before you continue, make sure `r hm` is installed: [how to install humdrumR](https://humdrumr.ccml.gtcmt.gatech.edu/#installing-humdrum_mathbbr "Installing humdrumR").  
Once it's installed, you can open an R session and load the library using the command `library(humdrumR)`---now you are ready to rock!

This article, like all of our articles, closely parallels information in `r hm`'s detailed code documentation, which can be found in the "[Reference](https://humdrumr.ccml.gtcmt.gatech.edu/reference/index.html "HumdrumR function reference")" section of the `r hm` [homepage](humdrumR.ccml.gtcmt.gatech.edu).
Once `r hm` is installed and loaded, the code documentation can also be accessed directly within an R session by using the `?` command, like `?humdrumR`,
Anywhere in one of our articles where you see a named variable followed by parentheses, like `kern()` or `recip()`, you can call `?kern` or `?recip` to see the corresponding documentation.


<!--
This tutorial is aimed at novice or even non-programmers---if you are already a strong coder and want to move things a long a little faster, checkout the [humdrumR for coders](IntroForCoders.html "humdrumR for coders") tutorial instead.

-->


```{r, echo = FALSE}
library(humdrumR)
humdrumR(syntaxHighlight = FALSE)

```

# Quick Start

Let's just dive right in!

To illustrate how `r hm` works, we'll need some [humdrum data](HumdrumSyntax.html "The Humdrum Syntax") to work with.
Fortunately, `r hm` comes packaged with a small number of humdrum data files just for you to play around with.
These files are stored in the directory where your computer installed `r hm`, in a subdirectory called "`HumdrumData`".
You can move your R session to this directory using R's "set working directory" command: `setwd(humdrumRroot)`.
Once you're in the humdrumR directory, you can use the base R `dir` function to see what humdrum data is available to you.

```{r}
library(humdrumR)

setwd(humdrumRroot)

dir('HumdrumData')

```


It looks like there are `r humdrumR:::num2print(length(dir('HumdrumData')))` directories of humdrum data available to you.
Using `dir` again, we can look inside one: let's start with the "`BachChorales`" directory.

```{r}

dir('HumdrumData/BachChorales')

```


There are `r humdrumR:::num2print(length(dir('HumdrumData/BachChorales')))` files in the directory, named "`chor001.krn`", "`chor002.krn`", etc.
These are humdrum plain-text files, representing ten chorales by J.S. Bach; each file contains four spines (columns) of `**kern` data, representing musical pitch and rhythm (among other things).
Take a minute to find the files in your computer's finder/explorer and open them up with a simple text editor.
One of the core philosophies of `r hm` is that we maintain a direct, transparent relationship with our symbolic data---so always take the time to look at your data! 
You can also do this within Rstudio's "Files" pane---in fact, Rstudio will make things extra easy for you because you can (within the Files pane) click "More" > "Go To Working Directory" to quickly find the files.


## Reading humdrum data

Now that we've found some humdrum data to look at, let's read it into `r hm`.
We can do this using `r hm`'s `readHumdrum()` command.
Try this:

```{r}

readHumdrum('HumdrumData/BachChorales/chor001.krn') -> chor1

```

This command does two things:

1. The `readHumdrum()` function will read the "chor001.krn" file into R and create a `r hm` data object from it.
2. This new object will be saved to a variable called `chor1`. (The name 'chor1' is just a name I chose---you are welcome to give it a different name if you want.)

Once we've created our `chor1` object (or whatever you chose to call it), we can take a quick look at what it is by just typing its name on the command line and pressing enter:

```{r}

chor1

```

(In R, when you enter something on the command line, R "prints" it out for you to read.)
The print-out you see shows you the name of the file, the contents of the file, and some stuff about "Data fields" that you will learn about in our [next article](DataFields.html "HumdrumR data fields").

Cool! Still, looking at a single humdrum file is not really that exciting. 
The whole point of using computers is that they allow us to work with large amounts of data.
Luckily, `r hm` makes this very easy.
Check out this next command:

```{r}

readHumdrum('HumdrumData/BachChorales/chor0') -> chorales

```

Instead of writing `'chor001.krn'`, I wrote `'chor0'`.
When we feed the string `'chor0'` to `readHumdrum()`, it won't just look for a file called "`chor0`"; it will read *any* file in that directory whose name *contains* the sub-string `"chor0"`---which in this case is all ten files!
Try printing the new `chorales` object to see how it is different:

```{r}

chorales

```

Wow! We've now got a "humdrumR corpus of `r humdrumR:::num2print(length(chorales)) ` pieces"---and that's nothing: `readHumdrum()` will work just as well reading hundreds or thousands of files!
Notice that when you print a `r hm` object, `r hm` shows you the beginning of the first file and the end of the last file, as well as telling you how many files there are in total.


---

> `readHumdrum()` has a number of other cool options which you can read about in more detail in our [humdrumR read/write tutorial](ReadWrite.html "Reading and writing data with humdrumR").


## Counting Things

Once we have some data loaded, the next thing a good computational musicologist does is start counting!
To count the contents of data, we can use the `count()` function.

```{r}
chorales |> 
  count() 

```

That's quite a mess! What have we done?
When we pass our `chorales` data  to `count()`, it counted all the unique data tokens (ignoring non-data tokens, like barline and interpretations) in the data.
There are a *lot* of unique tokens in this data, so it's not super helpful.
Maybe we'd like to look at just the twenty most common tokens in the `chorales`?
To this, we can pass the count through to two base R functions, `sort()` and `tail()`:


```{r echo=-1}
chorales |> count() |> sort() |> head(n = 1) -> most

chorales |> 
  count() |>
  sort() |>
  head(n = 20)

```

Ah, that's more promising! We see that the most common token is a quarter-note E4 (`4e`), which occurs `r humdrumR:::num2print(most$n)` times.

### Separating pitch and rhythm

To make our tallies more useful, we might want to count only the pitch or rhythm part of the `**kern` data.
To do this, we need to to be able to extract the pitch/rhythm from the original `**kern` tokens, which we can do that using `r hm`'s suite of [pitch](https://computational-cognitive-musicology-lab.github.io/humdrumR/reference/pitchFunctions.html) and [rhythm](https://computational-cognitive-musicology-lab.github.io/humdrumR/reference/rhythmFunctions.html) functions.
For example, let's try the `pitch()` function:


```{r}
chorales |> 
  pitch()

```

The `pitch()` function takes the original `**kern` tokens, reads the pitch part of each token,
and translates it to [scientific pitch](https://en.wikipedia.org/wiki/Scientific_pitch) notation.
Let's pass *that* to count:

```{r}
chorales |>
  pitch() |>
  count()


```

Pretty cool, but still quite a big table.
Maybe we'd like to ignore octave information for now?
Luckily, `r hm`'s [pitch functions](https://computational-cognitive-musicology-lab.github.io/humdrumR/reference/pitchFunctions.html) have a "`simple`" argument, which can be used to ask for only *simple* pitch information (no octave).

```{r}
chorales |>
  pitch(simple = TRUE) |>
  count() 

```

We can make plot of our nice simple-pitch table, using `r hm`'s `draw()` function:


```{r}
chorales |>
  pitch(simple = TRUE) |>
  count() |>
  draw()

```


Instead of pitch, we could do the same sort of counting of rhythm information, for example, using the `notehead()` function:

```{r}

chorales |>
  notehead() |>
  count() |>
  draw()

```

## Filtering

Sometimes, we might only want to look at a subset of our data.
For example, maybe we only want to count notes sung by the soprano.
In the Bach chorale data we are working with, the soprano voice is always in the fourth spine (column).
We can use the `filter()` function to indicate a subset we'd like to study:

```{r}

chorales |> 
  pitch(simple = TRUE) |> 
  filter(Spine == 4) |>
  count()

```

Let's try something even cooler.
Notice that, in the chorale data, there are tandem interpretations that look like `*G:` and `*E:`.
These are indications of the key.
Anytime you read humdrum data that has these key interpretations, `r hm` will read them into a "field" called `Key`.
We could, for example, count all the notes sung when the key is G major like this:


```{r}

chorales |>
  filter(Key == 'G:') |>
  pitch(simple = TRUE) |>
  count()

```

Guess what? There are a bunch more "fields" hidden in your `r hm` data object that you can use...and you can make your own!
Check out our next article, on `r hm`'s data [fields](DataFields.html "HumdrumR data fields"), to learn more.


# What next?

You've gotten started, but there is much more to learn!
To keep learning check out the other articles on the [humdrumR homepage](https://humdrumR.ccml.gtcmt.gatech.edu).
If you want to continue along the path we've started here, the next articles to check out are probably 
[HumdrumR data fields](DataFields.html "HumdrumR data fields"),
[Getting to know your data](Summary.html "Getting to know your data"),
[Filtering humdrum data](Filtering.html "Filtering humdrum data"), and 
[Working with humdrum data](WorkingWithData.html "Working with humdrum data").
Since most musicological analysis involves pitch or rhythm, you'll probably want to learn about
relevant ideas from the [Pitch and tonality](PitchAndTonality.html "Pitch and tonality in humdrumR")
and [Rhythm and meter](RhythmAndMeter.html "Rhythm and meter in humdrumR") articles.

If the humdrum data you are working with is complex---e.g., including multiple different exclusive interpretations,
spine paths, or multi-stops---you'll probably find you need to check out the [Shaping humdrum data](Reshaping.html "Shaping humdrum data") article, which will give you tools to deal with with more complex humdrum data sets.