# Semester Project 2: Chord Profiles

The DCML has created a large corpus of digital score engraving files annotated by human experts with harmonic analyses. The analyses were expressed using a machine-readable version of the common Roman numeral syntax. While some of the subcollections of these annotations have been analysed (e.g., those for Beethoven's string quartets or Mozart's piano sonatas), so far the chord labels have not been compared to the actual notes making up the score segments that they describe. Therefore, the goal of this project was to create aggregated tone distributions ("chord profiles") over the different chord labels.

In [1]:
# Hiding code cells
from IPython.display import HTML
HTML('''<script>
code_show=true; 
function code_toggle() {
 if (code_show){
 $('div.input').hide();
 } else {
 $('div.input').show();
 }
 code_show = !code_show
} 
$( document ).ready(code_toggle);
</script>
<b>The raw code for this IPython notebook is by default hidden for easier reading.
To toggle on/off the raw code, click <a href="javascript:code_toggle()">here</a>.</b>''')

<h1>Table of Contents<span class="tocSkip"></span></h1>
<div class="toc"><ul class="toc-item"><li><span><a href="#Corpus-Overview" data-toc-modified-id="Corpus-Overview-1"><span class="toc-item-num">1&nbsp;&nbsp;</span>Corpus Overview</a></span><ul class="toc-item"><li><span><a href="#Relationship-between-types-(chords)-and-tokens-(chord-occurrences)" data-toc-modified-id="Relationship-between-types-(chords)-and-tokens-(chord-occurrences)-1.1"><span class="toc-item-num">1.1&nbsp;&nbsp;</span>Relationship between types (chords) and tokens (chord occurrences)</a></span></li><li><span><a href="#Most-frequent-chords-in-major-and-minor-respectively" data-toc-modified-id="Most-frequent-chords-in-major-and-minor-respectively-1.2"><span class="toc-item-num">1.2&nbsp;&nbsp;</span>Most frequent chords in major and minor respectively</a></span></li></ul></li><li><span><a href="#Computing-Chord-Profiles" data-toc-modified-id="Computing-Chord-Profiles-2"><span class="toc-item-num">2&nbsp;&nbsp;</span>Computing Chord Profiles</a></span></li><li><span><a href="#Chord-Profiles-of-Chromatic-Pitch-Classes" data-toc-modified-id="Chord-Profiles-of-Chromatic-Pitch-Classes-3"><span class="toc-item-num">3&nbsp;&nbsp;</span>Chord Profiles of Chromatic Pitch Classes</a></span><ul class="toc-item"><li><span><a href="#Major" data-toc-modified-id="Major-3.1"><span class="toc-item-num">3.1&nbsp;&nbsp;</span>Major</a></span></li><li><span><a href="#Minor" data-toc-modified-id="Minor-3.2"><span class="toc-item-num">3.2&nbsp;&nbsp;</span>Minor</a></span></li></ul></li><li><span><a href="#Chord-profiles-of-tonal-pitch-classes" data-toc-modified-id="Chord-profiles-of-tonal-pitch-classes-4"><span class="toc-item-num">4&nbsp;&nbsp;</span>Chord profiles of tonal pitch classes</a></span><ul class="toc-item"><li><span><a href="#Rank-1" data-toc-modified-id="Rank-1-4.1"><span class="toc-item-num">4.1&nbsp;&nbsp;</span>Rank 1</a></span></li><li><span><a href="#Rank-2:-V" data-toc-modified-id="Rank-2:-V-4.2"><span class="toc-item-num">4.2&nbsp;&nbsp;</span>Rank 2: <code>V</code></a></span></li><li><span><a href="#Rank-3:-I6/i6" data-toc-modified-id="Rank-3:-I6/i6-4.3"><span class="toc-item-num">4.3&nbsp;&nbsp;</span>Rank 3: <code>I6</code>/<code>i6</code></a></span></li><li><span><a href="#Rank-4:-V7" data-toc-modified-id="Rank-4:-V7-4.4"><span class="toc-item-num">4.4&nbsp;&nbsp;</span>Rank 4: <code>V7</code></a></span></li><li><span><a href="#Rank-5:-IV/iv" data-toc-modified-id="Rank-5:-IV/iv-4.5"><span class="toc-item-num">4.5&nbsp;&nbsp;</span>Rank 5: <code>IV</code>/<code>iv</code></a></span></li><li><span><a href="#Rank-7:-vi/VI" data-toc-modified-id="Rank-7:-vi/VI-4.6"><span class="toc-item-num">4.6&nbsp;&nbsp;</span>Rank 7: <code>vi</code>/<code>VI</code></a></span></li><li><span><a href="#Rank-9:-V(64)" data-toc-modified-id="Rank-9:-V(64)-4.7"><span class="toc-item-num">4.7&nbsp;&nbsp;</span>Rank 9: <code>V(64)</code></a></span></li></ul></li></ul></div>

In [2]:
# loading libraries
%load_ext autoreload
%autoreload 2
import sys, os
sys.path.append(os.path.abspath('../../../Code'))
from helpers import *
from plot_helpers import *
from plotly.subplots import make_subplots
pd.set_option('display.max_rows', 1000)
pd.set_option('display.max_columns', 500)

In [3]:
# Helper functions
def color_chord_tones(df, chord_tones):
    k = len(df.columns)
    df = df.reindex(sort_tpcs(df.index, start=chord_tones[0]))
    return df.style.apply(lambda S: ['background-color: yellow']*k if S.name in chord_tones else ['']*k, axis=1)

def distinguish_chord_tones(chord_segment, chord):
    """Calculate statistics for the intervals contained in a note list.
    
    Parameters
    ----------
    chord_segment : pd.DataFrame
        Note list representing one or several chords.
    chord : str
        Chord label that the note list represents.
    """
    chord_tones = chord2tpcs(chord)
            
    mask = chord_segment.gracenote.isna()
    chord_notes = chord_segment[mask]
    all_notes = chord_notes.tpc.astype(int).unique()
    res = {}
    total_duration = chord_notes.duration.sum()
    if total_duration == 0:
        return pd.Series(dtype=object)
    for tpc in all_notes:
        sel = chord_notes[(chord_notes.tpc == tpc)]
        tot = len(sel)
        res[('count', tpc)] = tot
        res[('duration_frac', tpc)] = float(sel.duration.sum() / total_duration)
        res[('duration_mean', tpc)] = sel.duration.mean()
        onbeat = (sel.subbeat == 0) & ~sel.overlapping.isin([0, -1])
        res[('onbeat', tpc)] = len(sel[onbeat]) / tot
        #res[('offbeat', tpc)] = round(len(sel[~onbeat]) / tot, 2)
    res = pd.Series(res).unstack().T
    k = len(res.columns)
    res = res.reindex(sort_tpcs(res.index, start=chord_tones[0]))
    #res = res.style.apply(lambda S: ['background-color: yellow']*k if S.name in chord_tones else ['']*k, axis=1)
    return res

def divide_maj_min(cl, k=30):
    cl_min = cl[cl.localminor]
    cl_maj = cl[~cl.localminor]
    tokens = len(cl)
    print(f"{len(cl_maj)/tokens:.2%} in major, {len(cl_min)/tokens:.2%} in minor")
    cum_min = cumulative_fraction(cl_min.chord)
    cum_maj = cumulative_fraction(cl_maj.chord)
    cum_both = cum_maj.reset_index().rename(columns={'index': 'major', 'x': 'maj_count', 'y': 'maj_cum'}).join(
        cum_min.reset_index().rename(columns={'index': 'minor', 'x': 'min_count', 'y': 'min_cum'}))
    cum_both.index = cum_both.index + 1
    display(cum_both.head(k))
    return cl_maj, cl_min

def get_tone_distribution(chord, minor=False, sort_tpc=True):
    try:
        cn = min_notes.loc[chord] if minor else maj_notes.loc[chord]
    except:
        return pd.DataFrame()
    complete = distinguish_chord_tones(cn, chord)
    dis = {'average': complete}
    for sc, ids in sc_ids.iteritems():
        df = distinguish_chord_tones(cn.loc[ids], chord)
        dis[sc] = df
    df = pd.concat(dis.values(), keys=dis.keys(), axis=1)
    if sort_tpc:
        chord_tones = chord2tpcs(chord)
        df = df.reindex(sort_tpcs(df.index, start=chord_tones[0]))
    return df

def notes_by_type(cn, cl, slic=SL[:]):
    chords = cl.chord.value_counts().iloc[slic].index.to_list()
    return cl[cl.chord.isin(chords)].groupby('chord').apply(lambda cl: chords_by_id(cn, cl)), chords

def plot_non_chord_tones(chord, minor=False, only_chord_tones=True, norm=True, return_figs=False):
    chord_tones = chord2tpcs(chord, minor=minor)
    dis = get_tone_distribution(chord, minor)
    df = dis.loc[:, idx[:, 'duration_frac']].droplevel(1, axis=1)
    if only_chord_tones:
        ct = df[['average'] + [k for k in sc_order.keys() if k in df.columns]].loc[chord_tones].T
    else:
        # without merging non-chord-tones
        # ct = df[['average'] + [k for k in sc_order.keys() if k in df.columns]].loc[df.sum(axis=1).sort_values(ascending=False).index].T
        ct = df[['average'] + [k for k in sc_order.keys() if k in df.columns]].rename(lambda x: x if x in chord_tones else 'other').groupby(level=0).sum().T
    if norm:
        ct = ct.div(ct.sum(axis=1), axis='index')
    agg = df[['average'] + [k for k in sc_order.keys() if k in df.columns]].loc[[i for i in df.index if not i in chord_tones]]
    ct_names = tpc2name(chord_tones)
    fig1 = df.rename(tpc2name).iplot('bar', title=f"Chord tones {ct_names} and non-chord tones of all {chord} chords in {'minor' if minor else 'major'}", 
                                     xTitle='Tonal Pitch Class', 
                                     yTitle='Fraction', 
                                     asFigure=True)
    fig2 = ct.rename(columns=tpc2name).iplot('bar', barmode='stack', 
                                             title=f"{'Normalised d' if norm else 'D'}istribution of chord tones {ct_names} within all {chord} chords in {'minor' if minor else 'major'}", 
                                             yTitle='Fraction', 
                                             asFigure=True)
    fig3 = agg.sum().iplot('bar', title=f"Fraction of non-chord tones in all {chord} chords in {'minor' if minor else 'major'}", 
                           yTitle='Fraction', 
                           asFigure=True)
    if return_figs:
        return fig1, fig2, fig3
    else:
        fig1.show()
        fig2.show()
        fig3.show()

def summarize_corpus(S, k=20):
    cum = cumulative_fraction(S).reset_index()
    cum.index = cum.index + 1
    types = len(cum)
    tokens = cum.x.sum()
    print(f"{types} types, {tokens} tokens, TTR={types/tokens:.1%}")
    display(cum.rename(columns={'index': 'chord label', 'x': 'chord counts', 'y': 'cumulative fraction'}).head(k))
    
    fig = make_subplots(specs=[[{"secondary_y": True,}]])
    ix = cum.index
    fig.add_trace(
        go.Scatter(x=ix, y=cum.x, name="Absolute count", mode='markers', marker=dict(size=2)),
        secondary_y=False,
    )
    fig.add_trace(
        go.Scatter(x=ix, y=cum.y, name="Cumulative fraction", mode='markers', marker=dict(size=2)),
        secondary_y=True,
    )
    fig.update_xaxes(title_text="Rank", range=(-50, 3800), zeroline=False, gridcolor='lightgrey')
    fig.update_yaxes(title_text="Chord counts", secondary_y=False, type='log', gridcolor='grey', zeroline=True, dtick=1, range=(-0.1, 4.40))
    fig.update_yaxes(title_text="Fraction", secondary_y=True, gridcolor='lightgrey', zeroline=False, dtick=0.1, range=(-0.023,1.099))
    fig.update_layout(legend=dict(orientation='h'))
    fig.show()
    
def summarize_ivs(chord_segment, bass=None, exclude_bass=False):
    """Calculate statistics for the intervals contained in a note list.
    
    Parameters
    ----------
    chord_segment : pd.DataFrame
        Note list representing one or several chords.
    bass : str or int, optional
        If you don't specify the tonal pitch class of the bass note, the first note is taken as bass.
    exclude_bass : bool, optional
        Pass True to exclude the bass note (interval P1) from the stats.
    """
    if bass is None:
        bass_tpc = chord_segment.tpc.iloc[0]
    else:
        bass_tpc = name2tpc(bass) if bass.__class__ == str else bass
            
    mask = (chord_segment.tpc != bass_tpc) & chord_segment.gracenote.isna() if exclude_bass else chord_segment.gracenote.isna()
    chord_notes = chord_segment[mask].copy()
    chord_notes['intervals'] = tpc2iv(chord_notes.tpc - bass_tpc)
    intervals = sort_intervals(set(chord_notes.intervals.values))
    res = {}
    total_duration = chord_notes.duration.sum()
    if len(intervals) == 0 or total_duration == 0:
        return res
    for iv in intervals:
        sel = chord_notes[(chord_notes.intervals == iv)]
        tot = len(sel)
        res[('count', iv)] = tot
        res[('duration_frac', iv)] = round(float(sel.duration.sum() / total_duration), 2)
        res[('duration_mean', iv)] = frac(sel.duration.mean())
        onbeat = (sel.subbeat == 0) & ~sel.overlapping.isin([0, -1])
        res[('onbeat', iv)] = round(len(sel[onbeat]) / tot, 2)
        #res[('offbeat', iv)] = round(len(sel[~onbeat]) / tot, 2)
    return pd.Series(res).sort_index(level=0, sort_remaining=False)

In [4]:
correct_ids = read_dump('../correct_chord_tone_ids.tsv') # For every chord label the IDs of all chord and all non-chord tones
ccl = read_dump('../correct_chord_list.tsv') # chord IDs and the respective chord features
ccnt = read_dump('../correct_chord_notes.tsv', index_col=[0,1,2,3]) # note list with attributed chord IDs and with 
                                                                    #all segments transposed to C (maj/min) 
ct = ccnt.loc[correct_ids.cn.sum()]  # chord tones
nct = ccnt.loc[correct_ids.ncn.sum()]# non-chord tones

In [5]:
# Subcorpora
rena = {'mscx': 'Beethoven: Quartets',  
        'Beethoven-Sonatas': 'Beethoven: Piano Sonatas',
        'Chopin-Mazurkas': 'Chopin: Mazurkas',
        'Corelli': 'Corelli: Trio Sonatas',
        "Couperin-L'art de toucher": "Couperin: L'art de toucher le clavecin" ,
        'Debussy-Suite_bergamasque': 'Debussy: Suite Bergamasque',
        'Dvorak-Silhouettes': 'Dvořák: Silhouettes',
        'English Suites': 'Bach: English Suites', 
        'French Suites': 'Bach: French Suites', 
        'Libro_6': 'Gesualdo: Madrigals',              #Gesualdo
        'Grieg-Lyrical_Pieces': 'Grieg: Lyrical Pieces',
        'Kozeluh-Sonatas': 'Koželuch: Piano Sonatas',
        'Liszt_Années': 'Liszt: Années de Pélérinage',
        'Medtner-Märchen': 'Medtner: Tales',
        'Mendelssohn - String Quartets': 'Mendelssohn: Quartets',
        'Monteverdi-Madrigals': 'Monteverdi: Madrigals',
        'Harmonic Annotations': 'Mozart: Piano Sonatas', #Mozart sonatas
        'Ravel': 'Ravel: Miroirs',
        'Schubert_Winterreise': 'Schubert: Winterreise',
        'Schütz - Kleine geistliche Konzerte': 'Schütz: Kleine Geistliche Konzerte',
        'Schumann-Kinderszenen': 'Schumann: Kinderszenen',
        'schumann_liederkreis': 'Schumann: Liederkreis',
        'Sweelinck': 'Sweelinck: Fantasia crommatica',
        'Tchaikovsky_Seasons': 'Tchaikovsky: The Seasons',
        'Wagner': 'Wagner: Ouvertures',}
fl = pd.read_csv('../selected_files.tsv', sep='\t', index_col=[0])
sc_ids = fl.groupby('subcorpus').apply(lambda df: df.index)
sc_ids.rename(index=rena, inplace=True)
scs = sc_ids.index.to_list()

In [6]:
sc_order = {'Sweelinck: Fantasia crommatica': (1562, 1621),
 'Monteverdi: Madrigals': (1587, 1651),
 'Gesualdo: Madrigals': (1611, 1611),
 'Schütz: Kleine Geistliche Konzerte': (1636, 1639),
 'Corelli: Trio Sonatas': (1681, 1694),
 'Bach: English Suites': (1713, 1714),
 "Couperin: L'art de toucher le clavecin": (1722, 1722),
 'Bach: French Suites': (1722, 1725),
 'WFBach-Sonatas': (1745, 1760),
 'Mozart: Piano Sonatas': (1774, 1789),
 'Koželuch: Piano Sonatas': (1780, 1806),
 'Pleyel-Quartets': (1782, 1783),
 'Beethoven: Piano Sonatas': (1793, 1822),
 'Beethoven: Quartets': (1798, 1826),
 'Chopin: Mazurkas': (1825, 1849),
 'Schubert: Winterreise': (1827, 1828),
 'Mendelssohn: Quartets': (1827, 1847),
 'Schumann: Kinderszenen': (1838, 1838),
 'Schumann: Liederkreis': (1840, 1840),
 'Liszt: Années de Pélérinage': (1846, 1882),
 'Wagner: Ouvertures': (1859, 1867),
 'Grieg: Lyrical Pieces': (1867, 1901),
 'Dvořák: Silhouettes': (1870, 1879),
 'Tchaikovsky: The Seasons': (1876, 1876),
 'Debussy: Suite Bergamasque': (1890, 1905),
 'Ravel: Miroirs': (1901, 1905),
 'Medtner: Tales': (1904, 1925)}

## Corpus Overview

The corpus from which the chord profiles were computed consists of 26 work groups (subcorpora) containing 634 pieces and 980,176 notes. This plot shows the spans of composition dates of the different subcorpora and every time span is placed on the y-axis according to the number of pieces covered. The dotted red line sums up the number of annotated compositions available for every year.

In [7]:
pdata = pd.Series(sc_order).rename('span').to_frame().join(fl.subcorpus.value_counts().rename(index=rena), how='inner')
alle = {i:0 for i in range(min(pdata.span.min()),max(pdata.span.max())+1)} #dictionary to count
traces = []

for i,r in pdata.iterrows(): 
    fro,to = r.span
    for j in range(fro,to+1):
        alle[j] += r.subcorpus
    t = go.Scatter(
    x=[fro,to],
    y=[r.subcorpus,r.subcorpus],
    name = i,)
    traces.append(t)
  
all_x = list(alle.keys())
all_y = list(alle.values())
traces.append(go.Scatter(x=all_x,y=all_y,name="Sum",line = dict(color = ('rgb(205, 12, 24)'),width = 1,dash = 'dot')))


layout = dict(STD_LAYOUT,
              xaxis = dict(title = 'composition dates spans'),
              yaxis = dict(title = 'number of annotated pieces'),
              )

fig = dict(data=traces,layout=layout)
go.Figure(fig)

### Relationship between types (chords) and tokens (chord occurrences)

In [8]:
summarize_corpus(ccl.chord)

3587 types, 141708 tokens, TTR=2.5%


Unnamed: 0,chord label,chord counts,cumulative fraction
1,I,14139,0.099776
2,V,10383,0.173046
3,i,7989,0.229422
4,V7,7819,0.284599
5,I6,5659,0.324534
6,IV,4058,0.35317
7,V(64),3178,0.375596
8,i6,3031,0.396985
9,V6,2789,0.416667
10,V65,2444,0.433913


The plot reveals that only 1328 chord types (37.0%) occur more often than 3 times within the entire corpus and that only 295 chord types (8.2%) account for 90% of all tokens. This distribution shows that meaningful chord profiles can be abstracted only for a small fraction of all chord types.
### Most frequent chords in major and minor respectively

In [9]:
ccl_maj, ccl_min = divide_maj_min(ccl, 15)

57.85% in major, 42.15% in minor


Unnamed: 0,major,maj_count,maj_cum,minor,min_count,min_cum
1,I,13395,0.163392,i,7731,0.129439
2,V,5933,0.235762,V,4450,0.203945
3,I6,5466,0.302436,i6,2931,0.253018
4,V7,5331,0.367463,V7,2488,0.294674
5,IV,3496,0.410107,iv,2037,0.328779
6,ii6,2261,0.437687,III,1416,0.352487
7,vi,2215,0.464705,VI,1378,0.375559
8,ii,2047,0.489674,iv6,1154,0.39488
9,V(64),2037,0.514522,V(64),1141,0.413984
10,V6,1826,0.536795,v,1123,0.432786


This table shows the chord occurrences after splitting the corpus into major and minor segments. Comparison between the distributions shows

* the correspondence for ranks 1-5, 7 and 9,
* a preference of minor segments for evading into the relative major (`III` and `VII`) as well as into the parallel major (`I`),
* the fact that, contrary to the major part, no single type with root `II` occurs among the 15 chord types that account for 50% of all minor segments, nor do the typical chords over scale degree `2`, namely `V43` and `#viio6`,
* that generally speaking the chord vocabulary is more diverse in minor (since only 9 chord types account for 51.4% of major segments)

## Computing Chord Profiles
The computation of chord profiles included a number of processing steps. 

1. The annotated MuseScore files were converted into a list of notes and a list of chord labels each. These lists included the respectively important features such as pitch, onset and duration for every note, and local key, root and inversion for every chord. 
1. The positions of the chord labels were used to extract the corresponding segments from the note lists. In order to dispose of correct tone durations for every segment, overlapping notes had to be subdivided and represented in two different segments.
1. With the resulting segmented note list it was possible to transpose all notes in a way that they represent their respective chord label with respect to C major or C minor.
1. This transposition made it then possible to aggregate the durations of all pitches occurring under the same chord label in the same mode and to plot their exact durational fractions over the whole corpus.

In the following, the resulting chord profiles, that is, proportions of tone durations, will be shown for the most frequent chord types and with two different representations of pitch. Section 3 plots chord profiles over chromatic pitch classes where 0 is always the local key's tonic C. Section 4 plots chord profiles over tonal pitch classes, differentiating between different tone semantics and comparing the distributions of the various subcorpora.

## Chord Profiles of Chromatic Pitch Classes

In [10]:
min_notes, min_chords = notes_by_type(ccnt, ccl_min, SL[:])
maj_notes, maj_chords = notes_by_type(ccnt, ccl_maj, SL[:])

In [11]:
def plot_chord_profiles(chords, minor=False):
    for c in chords: # don't ask for too much!
        if minor:
            bag_of_notes(min_notes.loc[c], tpc='pc')[['duration_n']].iplot('bar', title=f"Chord profile for all {c} chords in minor", xTitle='Chromatic pitch class', yTitle='Fraction over all chord tokens')
        else:
            bag_of_notes(maj_notes.loc[c], tpc='pc')[['duration_n']].iplot('bar', title=f"Chord profile for all {c} chords in major", xTitle='Chromatic pitch class', yTitle='Fraction over all chord tokens')

### Major

In [12]:
plot_chord_profiles(['I', 'I6', 'ii', 'ii6', 'IV', 'IV6', 'V', 'V6', 'V(64)', 'V7', 'V65', 'V43', 'V2', 'viio6'])

### Minor

In [13]:
plot_chord_profiles(['i', 'i6', 'i64', 'III', 'iv', 'iv6', 'V', 'V6', 'V(64)', 'V7', 'V65'], True)

The chord profiles show that the respective chord tones are clearly distinguishable as large spikes, whereas non-chord tones are relatively short in their overall duration with an apparent preference for those belonging to the diatonic scale (this can be verified only by looking at tonal pitch classes, see section 4). Comparing the profiles of root positions with their inversions shows that the bass note is always the most prominent pitch, except in `V65` chords for which the root is more prominent. Interestingly, this exception holds for both major and minor. One explanation for this could be that the presence of the root is a decisive factor for annotating a chord as `V65` rather than `viio` or `#viio`. However, for this explanation to be true, we would expect the same to be true for `V43` chords as well which, however, feature the bass note more than the root. A better explanation could be for `V65` to occur more frequently within pedal point sections over `V`.

## Chord profiles of tonal pitch classes
This section compares the chord profiles of the various subcorpora. Three plots are shown for each chord type of occurrence ranks 1-5, 7, and 9, for which the correspondence between major and minor has been noted above:

1. The chord profiles of the subcorpora overlayed in one plot. Here, rather than summarising enharmonic equivalents, the tonal pitch classes are retained and the distributions are shown for the different subcorpora. This is supposed to show differences in the use of the different non-chord tones between them. The pitches are ordered by their occurrence on the piano, starting from the chord's bass note.
1. The second plot enables comparison of the chord tone distributions of the subcorpora.
1. The second plot shows the durational fraction of all non-chord tones for the respective chord type.

**Hint**: In oder to easily compare the values, switch the hovermode of the interactive plots from `Show closest data on hover` to `Compare data on hover`.

### Rank 1

In [14]:
plot_non_chord_tones('I')

In [15]:
plot_non_chord_tones('i', minor=True)

### Rank 2: `V`

In [16]:
plot_non_chord_tones('V')

In [17]:
plot_non_chord_tones('V', minor=True)

### Rank 3: `I6`/`i6`

In [18]:
plot_non_chord_tones('I6')

In [19]:
plot_non_chord_tones('i6', minor=True)

### Rank 4: `V7`

In [20]:
plot_non_chord_tones('V7')

In [21]:
plot_non_chord_tones('V7', minor=True)

### Rank 5: `IV`/`iv`

In [22]:
plot_non_chord_tones('IV')

In [23]:
plot_non_chord_tones('iv', minor=True)

### Rank 7: `vi`/`VI`

In [24]:
plot_non_chord_tones('vi')

In [25]:
plot_non_chord_tones('VI', minor=True)

### Rank 9: `V(64)`

In [26]:
plot_non_chord_tones('V(64)')

In [27]:
plot_non_chord_tones('V(64)', minor=True)

# Summary
There are a couple of observations that can be made from the previous plots. Using the tone profiles in the form of distributions of durational fractions of tonal (rather than chromatic) pitch classes, the previous observation holds that diatonic neighbour notes play a bigger role than chromatically altered ones. In every tone profile plot, individual spikes indicate individual corpora's preference for certain tones in the usage of a particular chord, for chord-tones as well as for non-chord tones. For the time being, the question has to remain an open whether these spikes indicate particular ways of using chords or whether the differences are coincidental data artifacts. The same is true for the proportions of chord tones, although there seems to be the general trend that the Schütz and Corelli subcorpora, bass notes seem to have the largest durational proportion. This could point to the fact that they are the only two corpora which can be labelled as "thorough bass music" in the sense that they are based on a basso continuo voice. Among those corpora that frequently show the highest proportion of non-chord tones, often over 20%, are Chopin's *Mazurkas*, Liszt's *Années de Pélérinage*, Dvořák's *Silhouettes* and Medtner's *Tales*. However, the comparisons of non-chord tone fractions between the subcorpora show a high variance for the different chords. In a next step, the statistics should be repeated in a more robust form, e.g. by using boostrapping for computing confidence intervals. Such an approach could afford more rigid claims as to 
* which subcorpora show higher preferences for using non-chord tones,
* which subcorpora feature which types of non-chord tones (diatonic vs. chromatic, preference for certain intervals),
* whether certain chord-tone distributions are characteristic for a given subcorpus,
* whether different musical eras show a preference for chordal bass or chordal root.