# P0. Introduction to Music as Data (Practical)

Eamonn Bell \<eamonn.bell@durham.ac.uk\>  
Department of Computer Science  
Durham University

Guest lecture for Master of Data Science (Digital Humanities) students  
Digital Humanities: Practice and Theory (Heslin)  
9a-11a, 1p-2p; November 28, 2021

## Instructions

Everyone should complete P0.T1 and attempt at least one of P0.T2.a, P0.T2.b, or P0.T2.c.

At about 1:40 p.m. I'll invite some of you to share your visualisation with the class. To do this, please join the Zoom room (with your microphone muted or, ideally, without joining the audio part of the call!) and share your screen.

If you are happy to do so, please download and email your attempts to me in the .ipynb format at <eamonn.bell@durham.ac.uk>. They will not be marked but they will help me improve this material for future courses on the same topic.

If you need to install packages from PyPI or from Github in this notebook you can do so by adding a cell with the following format

```
!pip install PACKAGE_NAME
```

### P0.T1. Create your own visualisation of your own musical/sonic production

- Record your own audio file of you
    - speaking, or
    - humming, or
    - singing, or
    - clapping, or
    - playing a musical instrument
- Process it whatever way you like using simple arithemetic operations or more complicated `numpy` functions
    - reduce it in volume, or
    - reverse it, or
    - clip it, or
    - perform a rolling average on it, or
    - trim it, or
    - produce a spectrogram of it
- Extract some features or segmentations using the onset and beat detection implementations in `librosa`
    - Optionally, sonify these features and visualise the effect of this sonification, through the addition of further subplots.
- Use `matplotlib` subplots to connect your visual representations of the the original recording and your processed/analysed version of it, in order to tell a story about the relationship between them.

### P0.T2.a. @DLEveryFriday

Every Friday since January 29, 2021, David Lynch has posted a video message on Twitter celebrating the arrival of the last day of the working week.

I've noticed that recently, his performance gets more and more high-pitched and drawn out. We might not think about this is a musical performance, but speech has structure in time and in pitch that can be explored using the tools of music information retrieval.

To try and better understand whether this hypothesis could be true, use some combination of visualisation and or analysis with `librosa` (and any other tools you might find useful).

You'll need to download these videos before you can analyse them. A Python-based tool called `gallery-dl` is useful for downloading large amounts of web media. So is `youtube-dl` (or its more recent fork, `yt-dlp`)

In [10]:
!gallery-dl -d data/ https://twitter.com/dleveryfriday?lang=en

[1;32mdata/twitter/DLEveryFriday/1461732628244176906_1.mp4[0m
[1;32mdata/twitter/DLEveryFriday/1459186241417760770_1.mp4[0m
[1;32mdata/twitter/DLEveryFriday/1456641480030932998_1.mp4[0m
[1;32mdata/twitter/DLEveryFriday/1454107332938457100_1.mp4[0m
[1;32mdata/twitter/DLEveryFriday/1451563100214280216_1.mp4[0m
[1;32mdata/twitter/DLEveryFriday/1449023895240159236_1.mp4[0m
[1;32mdata/twitter/DLEveryFriday/1446485013915181070_1.mp4[0m
[1;32mdata/twitter/DLEveryFriday/1443951478998278146_1.mp4[0m
[1;32mdata/twitter/DLEveryFriday/1441407461856927750_1.mp4[0m
[1;32mdata/twitter/DLEveryFriday/1438873255184642049_1.mp4[0m
[1;32mdata/twitter/DLEveryFriday/1436343844539355136_1.mp4[0m
[1;32mdata/twitter/DLEveryFriday/1433808185362632709_1.mp4[0m
[1;32mdata/twitter/DLEveryFriday/1431271550024622085_1.mp4[0m
[1;32mdata/twitter/DLEveryFriday/1428727328784408576_1.mp4[0m
[1;32mdata/twitter/DLEveryFriday/1426196444936122370_1.mp4[0m
[1;32mdata/twitter/DLEveryFriday/142366

Even though these are video files, you can still open them using `librosa.load`.

You will also need to keep a record of the original time that these videos were created. The filename is the `tweet_id` (status ID) and not a timestamp. You'll have to get these from the file system, either through something like the built-in `os` module ([documentation here](https://docs.python.org/3/library/os.html)) or through clever use of shell invocations and [the way that Jupyter Notebook stores cell outputs](https://stackoverflow.com/questions/27952428/programmatically-get-current-ipython-notebook-cell-output).
 
Load one or two of these into a spectrogram and think about what is involved in detecting the moment where Lynch starts saying "Friday". Then, if you can locate this moment in one recording, what might be involved in finding similar segments in the other recordings?

There is a task called voice activity detection (VAD), which you might like to explore. [This might set you in the right direction](https://github.com/wiseman/py-webrtcvad/blob/master/example.py).

Once you have found the correct segment in each of the recordings, what are the next steps you might take? Examine `librosa.pitch` for ideas.

This is not a straightforward task and I don't expect you to complete it, but I am curious to hear from you how you might begin to address the problem.

### P0.T2.a. Write your own onset detector

Returning to the section "Detecting time-series features" in the Lecture notebook, consider what is involved in writing your own onset detector that works on time-domain representations of sound.

Take the results of `librosa.onset.onset_detect` when applied to the `data/clicks.wav` file as your reference or source of ground truth.

Write your own function (either wrapping `scipy.signal.find_peaks` or any other peak detection package you might find online, or writing your own from scratch) that returns an array of time values, one per onset.

Design a scoring function that will rate the success of your onset detector, by comparing its results to the reference/ground truth. Some ideas:
    - the scoring function might be as simple as counting the number of onsets detected and comparing them to each other
    - you will have to allow for some tolerance (

### P0.T2.c. Text analysis

In this exercise you will work with a messier dataset about music, two dumps of Reddit comments (dating from January 2017) from different subreddits corresponding to two (rather different) genres of media: /r/asmr and /r/vaporwave.

- [data/reddit-comments/asmr2017-01.zip](http://www.columbia.edu/~epb2125/techniques/data/reddit-comments/asmr2017-01.zip)
- [data/reddit-comments/vaporwave2017-01.zip](http://www.columbia.edu/~epb2125/techniques/data/reddit-comments/vaporwave2017-01.zip)

Exploring these datasets in whatever way you like with the help of Python, characterize three differences and three similarities between these two online communities, at least as they are represented in these comments.

You might find the packages `TextBlob` and/or `spaCy` useful.

## Resources

### Useful Python packages

- `librosa`
- `pretty-midi`
- `essentia`
- `music21`
- `madmom`
- `amen`

### Non-Python tools

- essentia
- Tone.js
- Humdrum toolkit

### Learning more

- A good place to start to get a quick sense for what is possible is [musicinformationretrieval.com]
- Meinard Müller, _Fundamentals of Music Processing: Using Python and Jupyter Notebooks_. 2nd edition, 495 p., hardcover. Springer, 2021.
    - Excellent [online Jupyter notebooks](https://www.audiolabs-erlangen.de/resources/MIR/FMP/C0/C0.html) support this text
- The `librosa` documentation is full of [useful and non-trivial examples](https://librosa.org/librosa_gallery/), but more than that, its code is very readable. It is useful for understanding how classic techniques in the field should be implemented.
- [Audio Signal Processing for Music Applications](https://www.coursera.org/learn/audio-signal-processing), a MOOC on Coursera.org by Xavier Serra and Julius O. Smith III is quite good too.

### More extensive lists of resoures

- https://github.com/ciconia/awesome-music