Skip to content
Fabian C. Moss edited this page Oct 15, 2021 · 52 revisions

Welcome to the Musical Corpora Register!

The purpose of this wiki is to collect links to published musical corpora including some explanations. Hopefully, it is useful to some students and researchers that study music. The corpora are not listed in a particular order, yet. Everybody is welcome to contribute!

RS200 Pop / Rock corpus of harmonic labels

  • http://rockcorpus.midside.com
  • by Trevor deClercq and David Temperley
  • first published in 2011
  • corpus of harmonic labels for Pop / Rock songs in standard roman numeral notation
  • planned to increase to all 500 pieces of Rolling Stones collection

iReal Jazz chord sequences

Charlie Parker Omnibook

McGill Billboard Project

SUPRA (Stanford University Piano Roll Archive)

ASAP Dataset (Aligned Scores and Performances)

  • https://github.com/fosfrancesco/asap-dataset
  • MIDI and audio performances temporally matched to sheet music (in MusicXML and MIDI)
  • Including beat, downbeat, time signature, and key signature annotations
  • 1068 MIDI performances, 520 audio performances (from MAESTRO), aligned to 222 pieces

MAESTRO (by Magenta)

TAVERN

  • http://u.osu.edu/tavern/
  • by Johanna Devaney, Claire Arthur, Nathaniel Condit-Schultz, and Kirsten Nisula
  • theme and variation encodings with roman numerals
  • themes and variations for piano by Mozart and Beethoven, divided into 1060 phrases
  • annotated with roman numerals

Beatles Corpus

Yale Classical Archives Corpus

ELVIS project

  • https://elvisproject.ca
  • part of SIMSSA, the Single Interface for Music Score Searching and Analysis project
  • 2852 Pieces and 3358 Movements by 164 Composers
  • symbolic data in formats such as MEI, MusicXML, MIDI, and others

RAMEAU

Band-in-a-Box Jazz standards

  • Band-in-a-Box files available at http://bhs.minor9.com/
  • converted by Keunwoo Choi, George Fazekas, and Mark Sandler into one .txt-file for the research presented in this article
  • chords of Jazz standards with time information in beats

Weimar Jazz Database

Real world computing data base

GTTM Database

  • http://gttm.jp/gttm/database/
  • by Masatoshi Hamanaka, Keiji Hirata, and Satoshi Tojo
  • 300 8-bar phrases of monophonic melodies from western classical music
  • XML format

Kostka-Payne Korpus

  • http://davidtemperley.com/kp-stats/
  • by David Temperley
  • corpus consisting of 46 chord-analyzed excerpts in the workbook accompanying the theory textbook Tonal Harmony by Stefan Kostka and Dorothy Payne

Dutch Folk Song Database (The Meertens Tune Collections / MTC-ANN)

Essen Associative Code and Folksong Database

Finnish Folk Song Database

Annotated jazz chord progression corpus

Verovio Humdrum Viewer Online Repertories

Kern Scores Music Collection

MuseData

Henrik Norbeck's ABC Tunes

  • http://www.norbeck.nu/abc/
  • by Henrik Norbeck, Stockholm, Sweden.
  • A free online tune book of mostly Irish and Swedish traditional music
  • Sheet music and lyrics for more than 2800 tunes in ABC format

Collection of World Music Corpora

Harmonic analysis of Joseph Haydn's "Sun Quartets"

Analyses of the Algomus group

  • http://www.algomus.fr/data/
  • sonata form structure and cadences (2000+ labels) of 32 Mozart string quartet movements
  • S/CS/CS2 patterns, cadences, pedals (1000+ labels) of 24 Bach fugues + 12 Shostakovich fugues (op.57, 1952)

The Digital Mozart Score Viewer

Digital Edition of Mozart Piano Sonatas

Beethoven Piano Sonatas with Functional Harmony (BPS-FH)

Digital Edition of Beethoven Piano Sonatas

Beethoven-Werkstatt

Josquin Research Project

  • Jesse Rodin, Craig Sapp, Clare Bokulich
  • https://josquin.stanford.edu, github
  • ca. 1200 movements from ca. 1420–1520
  • collected in Humdrum (on GH), available in many other formats
  • CC-BY-SA 4.0 (derivates must be published under similar license)
  • web interface for analytic queries

Tasso in Music Project

JKU Pattern Development Database

kunstderfuge.com

  • 19.300 MIDI files in total, 17.500 in "XL Zip Archive"
  • requires "academic subscription"
  • website info

Million Song Dataset

The Million Song Dataset is a freely-available collection of audio features and metadata for a million contemporary popular music tracks.

  • The core of the dataset is the feature analysis and metadata for one million songs, provided by The Echo Nest. The dataset does not include any audio, only the derived features. Note, however, that sample audio can be fetched from services like 7digital, using code we provide.
  • The Million Song Dataset is also a cluster of complementary datasets contributed by the community:
    • SecondHandSongs dataset -> cover songs
    • musiXmatch dataset -> lyrics
    • Last.fm dataset -> song-level tags and similarity
    • Taste Profile subset -> user data
    • thisismyjam-to-MSD mapping -> more user data
    • tagtraum genre annotations -> genre labels
    • Top MAGD dataset -> more genre labels
  • Link

The Lakh MIDI Dataset

Art song vocal lines

  • Collection of vocal lines from songs by 19th century French and German composers
  • .krn format.
  • Leigh Van Handel

ScoresOfScores - Lieder Encoding Project

OpenScore

Choral Wiki

  • https://www.cpdl.org/wiki/
  • Sheet music of choral music in various engraving formats
  • community of music lovers, especially for Baroque and Renaissance

music21 Corpus

Nottingham dataset, cleaned version

Neuma

CMME (Computerized Mensural Music Editing)

  • large collections of 16th century scores
  • available on GitHub
  • with tool to translate to MusiXML
  • http://www.cmme.org/

Links

SymbTr (Turkish Maqam; symbolic)

Johann Crueger Cantional Settings

The Sessions

Clone this wiki locally