# The Nintendo Entertainment System Music Database

"The Nintendo Entertainment Music Database", which I will be abbreviaiting to NES-MDB from here on out, is a dataset created by Chris Donahue, Huanru Henry Mao, and Julian McAuley. It contains thousands of songs found within Nintendo Entertainment System video games, with each song containing a musical score for four instrument voices and expressive attributes for the dynamics and timbre of each voice. It is this portion of the NES-MDB that sets it apart from other music databases, as unlike others which typically only contain MIDI files, the NES-MDB contains all of the information necessary to render *exact* acoustic performances of the original compositions.

## Intended Use

The purpose of this database is to bring forth a new wave of polyphonic music generation that puts an emphasis on expressive performance attributes. While there are many other datasets that give a look into similar music compilations, it is the expressive attributes available in this dataset, along with the four distinct instrument voices, that give a clear and concise view of how music was composed for the NES. Along with the dataset, a tool was also created by the authors that allows computer generated compositions to be rendered as NES style audio, which can then be used for researchers to try out their music as waveforms capable of being read by the NES. A tool such as this permits even greater depths to which researchers can explore the finer details of automated music generation for the NES.

## The Score Formats

Within the database, there are a total of six different musical formats that allow unique looks into the structure of the NES music library. The main ones focused on in the paper include the blended composition, separated composition, and expressive compositin, which I will explain in more detail below, as they provide the most use for researchers. Along with these, MIDI files for each track are included which deliver a familiar format to anyone who wishes to view the music from the NES. The NES language modeling (NLM) format is also here. It contains a timestamped list of instructions that control the synthesizer state machine inside the NES. The final format in the database is the video game music (VGM) format. This is where all of the prior formats were derived from, as it is the raw sound file that contains the code for the music. Although these last three formats were not discussed in depth in how they can be used in their own areas of study, they undoubtably have a wide range of use when it comes to studying various topics in general music composition and raw video game music files.

## Blended Score Format

Much of the prior research on monophonic music composition uses the blended score format, making it the standardized benchmark for seqeuntial models. Due to the format not maintaining the unique instrument voices, however, it is not ideal for NES music. Nevertheless, it can still be quite useful here for comparison between alogrithms as a baseline score.

The blended score is calculated by the following formula: *P(c) = P(B1) · P(B2 |B1) · ... · P(BT |Bt<T )*
* *B* is binary matrix of size *N x T*
* *N* is the number of possible note values (88 for the NES)
* *T* is the timestamp of the note
* *B[n, t] = 1* if a voice is playing a note *n* at timestamp *t*, and *0* otherwise

Below is a depiction of the blended score format for the song *Ending Theme* in the NES game *Abadox*.
![Blended Score](https://raw.githubusercontent.com/chrisdonahue/nesmdb/master/static/score_blended.png)

## Separated Score Format

Unlike the blended format, the separated score format encodes a monophonic sequence for every instrument voice available, rather than only allowing one voice. Due to the nature of the hardware restrictions within the NES APU only allowing four unique instrument voices, this format delivers an ideal way to view the sequence in which the voices are played.

The separated score is calculated by the following format: *P(c) = ∏(T, t=1) · ∏(V, v=1) · P(Sv,t | Sv,t̂≠t, Sv̂≠v,∀t̂)*
* *S* is a matrix of size *V x T*
* *V* is the number of instrument voices
* *T* is the timestamp of the note
* *S[v, t] = n* represents the note *n* played by the voice *v* at timestamp *t*

Below is a depiction of the separated score format for the song *Ending Theme* in the NES game *Abadox*.
![Seperated Score](https://raw.githubusercontent.com/chrisdonahue/nesmdb/master/static/score_separated.png)

## Expressive Score Format

The final format on display is where we get to view the expressive attributes that makes this dataset special. The expressive score format builds off of the separated score format, but adds on the dynamics such as velocity and timbre of each note. It is this format that the creators of the database hope to bring about revolution in polyphonic music generation.

The expressive score is calculated by the following format: *P(m) = P(e|c) · P(c)*
* *P(e|c)* is the mapping of a composition *c* onto an expressive characteristic *e*
* *P(c)* is taken from the separated score

Below is a depiction of the expressive score format for the song *Ending Theme* in the NES game *Abadox*.
![Expressive Score](https://raw.githubusercontent.com/chrisdonahue/nesmdb/master/static/score_expressive.png)

## Use in Other Papers

One paper that I will highlight that uses this dataset is "Towards Automatic Instrumentation By Learning to Separate Parts in Symbolic Multitrack Music," by Hao-Wen Dong, et al. The goal of this project is to study the feasibility of automatic music generation dynamically assigning instrument voices to different notes within a solo music performance. This project uses the NES-MDB as one of four datasets that they utilize to examine the effectiveness of the models that are proposed within the paper. The paper goes into great detail about how they alter and implement the NES-MDB to fit their needs for their models, but the ultimate outcome here is their models succeed in outperforming various baselines. This paper is a great example of the potential that the NES-MDB has in aiding in the study of a wide variety of music based studies. Even though the goal of this paper is not how the creators of the NES-MDB had imagined their dataset would be used, the flexibility and numerous formats provided within the dataset allowed Hao-Wen Dong and his team to use it to better understand part separation in mulitrack music.

## Look at the Code

Due to the unique nature of the NES-MDB, there isn't an easy way to view all of the information available within the dataset in something like an Excel document, for example. If someone wished to look at the expressive score format for *Mega Man 2*'s track *Title*, they would have to execute the code as followes:

In [4]:
import pickle

with open('225_MegaMan2_01_02Title.exprsco.pkl', 'rb') as f:
  rate, nsamps, exprsco = pickle.load(f)

print('Temporal discretization rate: {}'.format(rate)) # Will be 24.0
print('Length of original VGM: {}'.format(nsamps / 44100.))
print('Piano roll shape: {}'.format(exprsco.shape))

Temporal discretization rate: 24.0
Length of original VGM: 42.58047619047619
Piano roll shape: (1021, 4, 3)


In order to render your generated music through an emulation of the NES synthesizer, the creators of the NES-MDB created their own package titled `nesmdb`. This package also features the functionality to convert between the formats available in the dataset, as well as turning any format into a *.wav* file. Here is what someone would run if they wished to convert the expressive format of *Mega Man 2*'s track *Title* to a WAV file:

Perhaps the coolest thing about the `nesmdb` package is its capabality to emulate the NES APU from within python. Donahue and his team needed to build the functionallity in order to extract the information needed from the VGM files so that they could build all of the other formats. There is a large portion of code that went into this, but here is a look at the code used to extract the information about the pulse wave voices from the music.

## Citations

Chris Donahue, Huanru Henry Mao, and Julian McAuley, “The NES Music Database: A Multi-Instrumental Dataset with Expressive Performance Attributes,” 19th International Society for Music Information Retrieval Conference, Paris, France, 2018.

Hao-Wen Dong, Chris Donahue, Taylor Berg-Kirkpatrick, Julian McAuley, "Towards Automatic Instrumentation By Learning to Separate Parts in Symbolic Multitrack Music," 22nd International Society for Information Retrieval Conference, Online, 2021