`python3 -m pip install -U pandas plotly nbformat networkx`

`pip install "https://github.com/DCMLab/wavescapes/archive/refs/heads/johannes.zip"`

In [None]:
%reload_ext autoreload
%autoreload 2
import numpy as np

from etl import get_dfts, get_pickled_magnitude_phase_matrices, get_metadata, get_most_resonant, get_pcms, get_pcvs, test_dict_keys, \
  get_correlations, make_feature_vectors, get_metric, get_most_resonant_penta_dia

from sklearn.discriminant_analysis import QuadraticDiscriminantAnalysis

from utils import get_coeff

from wavescapes.color import circular_hue

## Settings

In [None]:
DEBUSSY_REPO = '..'
DATA_FOLDER = '~/DATA/debussy_figures/data'
DATA_FOLDER = './data'
EXAMPLE_FNAME = 'l000_etude'
LONG_FORMAT = False

## Loading metadata
Metadata for all pieces contained in the dataset.

In [None]:
metadata = get_metadata(DEBUSSY_REPO)
metadata.columns

#### Columns for ordinal plots

Creating a column `years_ordinal` that represents the year of publication as a range of years in which Debussy composed.

Also creating a column `years_periods` in which the years of publication are grouped into three periods.

Periods:
- 1880-1892
- 1893-1912
- 1913-1917

src: the cambridge companion to Debussy (the phases years are not consistent accross all sources)

In [None]:
years_ordinal = {val:idx for idx, val in enumerate(np.sort(metadata.year.unique()))}
metadata['years_ordinal'] = metadata.year.apply(lambda x: years_ordinal[x])

In [None]:
years_periods = {}

for idx, val in enumerate(np.sort(metadata.year.unique())):
    if val < 1893:
        years_periods[val] = 0
    elif val < 1913:
        years_periods[val] = 1
    else:
        years_periods[val] = 2

metadata['years_periods'] = metadata.year.fillna(1880.0).apply(lambda x: years_periods[x])
metadata.years_ordinal.head(1),metadata.years_periods.head(1) 

The column `year` contains composition years as the middle between beginning and end  of the composition span.

In [None]:
metadata.year.head(10)

Series `median_recording` contains median recording times in seconds, retrieved from the Spotify API. the Spotify API.

In [None]:
metadata.median_recording.head(10)

Columns mirroring a piece's activity are currently:
* `qb_per_minute`: the pieces' lengths (expressed as 'qb' = quarterbeats) normalized by the median recording times; a proxy for the tempo
* `sounding_notes_per_minute`: the summed length of all notes normalized by the piece's duration (in minutes)
* `sounding_notes_per_qb`: the summed length of all notes normalized by the piece's length (in qb)
Other measures of activity could be, for example, 'onsets per beat/second' or 'distinct pitch classes per beat/second'.

## Loading Pitch Class Vectors (PCVs)
An `{fname -> pd.DataFrame}` dictionary where each `(NX12)` DataFrame contains the absolute durations (expressed in quarter nots) of the 12 chromatic pitch classes for the `N` slices of length = 1 quarter note that make up the piece `fname`. The IntervalIndex reflects each slice's position in the piece. Set `pandas` to False to retrieve NumPy arrays without the IntervalIndex and column names.

In [None]:
pcvs = get_pcvs(DEBUSSY_REPO, pandas=True)
test_dict_keys(pcvs, metadata)
pcvs[EXAMPLE_FNAME].head(5)

## Loading Pitch Class Matrices
An `{fname -> np.array}` dictionary where each `(NxNx12)` array contains the aggregated PCVs for all segments that make up a piece. The square matrices contain values only in the upper right triangle, with the lower left beneath the diagonal is filled with zeros. The values are arranged such that row 0 correponds to the original PCV, row 1 the aggregated PCVs for all segments of length = 2 quarter notes, etc. For getting the segment reaching from slice 3 to 5 (including), i.e. length 3, the coordinates are `(2, 5)` (think x = 'length - 1' and y = index of the last slice included). The following example shows the upper left 3x3 submatrix where the first three entries (which are PCVs of size 12) correspond to the first three PCVs above, the first three of the second row to their sums padded with a 0-PCV, and the first three of the third row corresponding to the sum of row 0, padded with another 0-PCV.

In [None]:
pcms = get_pcms(DEBUSSY_REPO, long=LONG_FORMAT)
test_dict_keys(pcms, metadata)
pcms[EXAMPLE_FNAME].shape

## Loading Discrete Fourier Transforms
`{fname -> np.array}` containing `(NxNx7)` complex matrices. For instance, here's the first element, a size 7 complex vector with DFT coefficients 0 through 6:

In [None]:
dfts = get_dfts(DEBUSSY_REPO, long=LONG_FORMAT)
test_dict_keys(dfts, metadata)
dfts[EXAMPLE_FNAME].shape

You can view the 7 complex numbers as magnitude-phase pairs

In [None]:
get_coeff(dfts[EXAMPLE_FNAME], 0, 0)

or even as strings where the numbers are rounded and angles are shown in degrees:

In [None]:
get_coeff(dfts[EXAMPLE_FNAME], 0, 0, deg=True)

## Loading magnitude-phase matrices
`{fname -> np.array}` where each of the `(NxNx6x2)` matrices contains the 6 relevant DFT coefficients converted into magnitude-phase pairs where the magnitudes have undergone at least one normalization, i.e. are all within [0,1]. The files have been pre-computed and are loaded from g-zipped pickled matrices.

The parameter `norm_params` can be one or several `(how, indulge)` pairs where `indulge` is a boolean and `how ∈ {'0c', 'post_norm', 'max_weighted', 'max'}`.

In [None]:
norm_params = ('0c', True)
mag_phase_mx_dict = get_pickled_magnitude_phase_matrices(DATA_FOLDER, norm_params=norm_params, long=LONG_FORMAT)
test_dict_keys(mag_phase_mx_dict, metadata)
mag_phase_mx_dict[EXAMPLE_FNAME].shape


In [None]:
mpm = mag_phase_mx_dict[EXAMPLE_FNAME]
colors = circular_hue(mpm[...,1,:], output_rgba=True, ignore_phase=True)
colors.shape

Note that the phases (2nd column) are the same that we inspected above via `get_coeff()` whereas the magnitudes are now normalized by the first (now absent) coefficient 0.

In [None]:
mag_phase_mx_dict[EXAMPLE_FNAME][0]

## Loading most resonant DFT coefficients
This cell depends on the previously loaded magnitude-phase matrices, i.e. a conscious choice of a normalization method has been made above.

`get_most_resonant` returns three `{fname -> nd.array}` dictionaries where for each piece, the three `(NxN)` matrices correspond to

1. the index between 0 and 5 of the most resonant of the six DFT coefficient 1 through 6
2. its magnitude
3. the inverse entropy of the 6 magnitudes

In [None]:
max_coeffs, max_mags, inv_entropies = get_most_resonant(mag_phase_mx_dict, )
np.column_stack((max_coeffs[EXAMPLE_FNAME][:3],
max_mags[EXAMPLE_FNAME][:3],
inv_entropies[EXAMPLE_FNAME][:3]))

## Loading major, minor, and tritone correlations

This cell loads pickled matrices. To re-compute correlations from pitch-class matrices, use `get_maj_min_coeffs()` for major and minor correlations and `get_ttms()` for tritone-ness matrices.

In [None]:
correl_dict = get_correlations(DATA_FOLDER, long=LONG_FORMAT)
test_dict_keys(correl_dict, metadata)
correl_dict[EXAMPLE_FNAME].shape

## Loading pickled 9-fold vectors

The function is a shortcut for
* loading a particular kind of pickled normalized magnitude-phase-matrices
* loading pickled tritone, major, and minor coefficients
* concatenating them toegther

In [None]:
norm_params = ('0c', True)
ninefold_dict = make_feature_vectors(DATA_FOLDER, norm_params=norm_params, long=LONG_FORMAT)
test_dict_keys(ninefold_dict, metadata)
ninefold_dict[EXAMPLE_FNAME].shape

### Separating pentatonic from diatonic

In [None]:
from sklearn.model_selection import train_test_split
import pandas as pd
from sklearn.ensemble import GradientBoostingClassifier

ground_truth_train = pd.read_csv('full_groundtruth_train.csv')
penta_dia = ground_truth_train[ground_truth_train['structure'].isin(['penta', 'majmin'])]

X_cols = ['coeff1', 'coeff2', 'coeff3', 'coeff4', 'coeff5', 'coeff6', 'major', 'minor', 'tritone']
X_train, X_test, y_train, y_test = train_test_split(
    penta_dia[X_cols], penta_dia['diatonic'], test_size=0.33, random_state=42
    )

clf = QuadraticDiscriminantAnalysis().fit(X_train, y_train)
print(clf.score(X_test, y_test))

max_coeffs_penta, max_mags_penta, inv_entropies_penta = get_most_resonant_penta_dia(mag_phase_mx_dict, ninefold_dict, clf)
np.column_stack((max_coeffs_penta[EXAMPLE_FNAME][:3],
max_mags_penta[EXAMPLE_FNAME][:3],
inv_entropies_penta[EXAMPLE_FNAME][:3]))

# Metrics

In this section, a dataframe containing all metrics is compiled. Optional plots and tests can be done by adjusting the parameters of the wrapper function `get_metric` that can be found in `etl.py`. 

In [None]:
metadata_metrics = metadata.copy()
#metadata_metrics = pd.read_csv('metrics.csv').set_index('fname')


## Center of mass

Computing the center of mass of each coefficient for all the pieces. Uses `mag_phase_mx_dict` as input and outputs the vertical center of mass as a fraction of the height of the wavescape.

In [None]:
cols = [f"center_of_mass_{i}" for i in range(1,7)]
metadata_metrics = get_metric('center_of_mass', metadata_metrics, 
                              mag_phase_mx_dict=mag_phase_mx_dict, 
                              cols=cols, store_matrix=True, 
                              show_plot=True, save_name='center_of_mass', title='Center of Mass')
metadata_metrics.head(1)

In [None]:
# trying out some options of the function
# 1 unified plot
metadata_metrics = get_metric('center_of_mass', metadata_metrics, 
                              mag_phase_mx_dict=mag_phase_mx_dict,
                              cols=cols, store_matrix=True, 
                              show_plot=True, save_name='center_of_mass', title='Center of Mass',
                              unified=True)


In [None]:
# 2 using ordinal years
metadata_metrics = get_metric('center_of_mass', metadata_metrics,
                              mag_phase_mx_dict=mag_phase_mx_dict,
                              cols=cols, store_matrix=True,
                              show_plot=True, save_name='center_of_mass', title='Center of Mass',
                              ordinal=True, ordinal_col='years_ordinal')

In [None]:
# boxplot version with ordinal column
metadata_metrics = get_metric('center_of_mass', metadata_metrics, 
                              mag_phase_mx_dict=mag_phase_mx_dict,
                              cols=cols, store_matrix=True, 
                              show_plot=True, save_name='center_of_mass', title='Center of Mass', 
                              boxplot=True, ordinal=True, ordinal_col='years_periods')

In [None]:
# 4. testing option
metadata_metrics = get_metric('center_of_mass', metadata_metrics,
                              mag_phase_mx_dict=mag_phase_mx_dict, cols=cols,
                              store_matrix=True, testing=True)

# Mean Resonance

Computing the mean resonance of each coefficient for all the pieces. Uses `mag_phase_mx_dict` as input and outputs the magnitude resonance of the wavescape for each coefficient.

In [None]:
cols = [f"mean_resonances_{i}" for i in range(1,7)]

metadata_metrics = get_metric('mean_resonance', metadata_metrics, 
                              mag_phase_mx_dict=mag_phase_mx_dict,
                              cols=cols, store_matrix=True, 
                              show_plot=True, save_name='mean_resonances', title='Mean Resonance')
metadata_metrics.head(1)

In [None]:
# per period ordinal plot
metadata_metrics = get_metric('mean_resonance', metadata_metrics, 
                              mag_phase_mx_dict=mag_phase_mx_dict,
                              cols=cols, store_matrix=True, 
                              show_plot=False, testing=True, save_name='mean_resonance_per_period', title='Mean Resonance per Period', boxplot=True,
                              ordinal=True, ordinal_col='years_periods')

# Center of Mass 
#### only on most resonant coefficients

In [None]:
metadata_metrics = metadata.copy()

In [None]:
cols = [f"center_of_mass_{i}" for i in range(1,7)]
metadata_metrics = get_metric('center_of_mass_2', metadata_metrics, 
                              max_coeffs=max_coeffs,
                              max_mags=max_mags,
                              cols=cols, store_matrix=True, testing=True, unified=True,
                              show_plot=True, save_name='center_of_mass', title='Center of Mass')
metadata_metrics.head(1)

# Moment of Inertia

Moment of inertia of coefficient $n$ in the summary wavescape: $I(n)=1/N \sum_{i \in S(n)} w_i y_i^2$, where N is the total number of nodes in the wavescape, $S(n)$ is the set of the indices of the nodes in the summary wavescapes that are attributed to coefficient $n$ (i.e., where coefficient n is the most prominent among the six), $w_i$ is the weight (opacity) of the $i$-th node in the summary wavescape, and $y_i$ is the vertical coordinate of the $i$-th node in the summary wavescape


In [None]:
cols = [f"moments_of_inertia_{i}" for i in range(1,7)]
print(len(cols))
metadata_metrics = get_metric('moment_of_inertia', metadata_metrics, 
                              max_coeffs=max_coeffs,
                              max_mags=max_mags,
                              cols=cols, store_matrix=True,
                              testing=True, 
                              show_plot=True, save_name='moments_of_inertia', title='Moments of Inertia', unified=True)
metadata_metrics.head(1)

# Prevalence of each coefficient

Prevalence of coefficient $n$ in a piece: $W(n)=1/N \sum_{i \in S(n)} i$ where $N$ is the total number of nodes in the wavescape, $S(n)$ is the set of the indices of the nodes in the summary wavescapes that are attributed to coefficient $n$ (i.e., where coefficient $n$ is the most prominent among the six).

In [None]:
cols = [f"percentage_resonances_{i}" for i in range(1,7)]

metadata_metrics = get_metric('percentage_resonance', metadata_metrics, 
                              max_coeffs=max_coeffs,
                              cols=cols, store_matrix=True, testing=True,
                              show_plot=True, save_name='percentage_resonance', title='Percentage Resonance', unified=True)
metadata_metrics.head(1)

In [None]:
# metadata_metrics = get_metric('percentage_resonance', metadata_metrics, 
#                               max_coeffs=max_coeffs,
#                               cols=cols, store_matrix=True, 
#                               show_plot=True, save_name='percentage_resonance_periods', title='Percentage Resonance (Periods)',  boxplot=True,
#                               ordinal=True, ordinal_col='years_periods')

In order to account for the certainty that a certain coefficient is actually the most resonance, we weigh the previous metric by entropy as follows: $W(n)=1/N \sum_{i \in S(n)} w_i$ where $N$ is the total number of nodes in the wavescape, $S(n)$ is the set of the indices of the nodes in the summary wavescapes that are attributed to coefficient $n$ (i.e., where coefficient $n$ is the most prominent among the six), and $w_i$ is the weight (opacity) of the $i$-th node in the summary wavescape, in this case, the entropy of $i$.



In [None]:
cols = [f"percentage_resonances_entropy_{i}" for i in range(1,7)]

metadata_metrics = get_metric('percentage_resonance_entropy', metadata_metrics, 
                              max_coeffs=max_coeffs,
                              inv_entropies=inv_entropies,
                              cols=cols, store_matrix=True, 
                              testing=True,
                              show_plot=True, save_name='percentage_resonance_entropy', title='Percentage Resonance (entropy)', unified=True)
metadata_metrics.head(1)

In [None]:
# metadata_metrics = get_metric('percentage_resonance_entropy', metadata_metrics, 
#                               max_coeffs=max_coeffs,
#                               inv_entropies=inv_entropies,
#                               cols=cols, store_matrix=True, 
#                               show_plot=True, save_name='percentage_resonance_entropy_period', title='Percentage Resonance (entropy period)',  boxplot=True,
#                               ordinal=True, ordinal_col='years_periods')

In [None]:
#metadata_metrics.to_csv('results/results.csv')

In [None]:
import pandas as pd

metadata = pd.read_csv('results/metrics_melted (1).csv')
metadata.head()
metadata['value_com']


# Measure Theoretic Entropy

Measure-theoretic entropy: Let $A={A_1,...,A_k}$ be a (finite) partition of a probability space $(X,P(X),)$: the entropy of the partition $A$ is defined as $H(A)= - \sum_{i} \mu(A_i) \log \mu(A_i)$. We can take $X$ as the support of the wavescape, $A$ as the set of the connected regions in the unified wavescape, and $\mu(Y)=(area-of-Y)/(area-of-X)$ for any subset $Y$ of the wavescape.


In [None]:
# takes quite long
cols = 'partition_entropy'
### add interaction year length
metadata_metrics = get_metric('partition_entropy', metadata_metrics, 
                              max_coeffs=max_coeffs,
                              cols=cols, store_matrix=True, scatter=True, testing=True,
                              show_plot=True, save_name='partition_entropy', title='Partition Entropy', unified=True)
metadata_metrics.head(1)

In [None]:
import pandas as pd
metadata_metrics_ = metadata_metrics.reset_index()
metadata_metrics_['fname'] = metadata_metrics_['index']
all_cols = [col for col in list(metadata_metrics_.columns) if col not in ['fname', 'length_qb', 'year', 'last_mc']]
metadata_metrics_ = pd.melt(metadata_metrics_, id_vars=['fname', 'length_qb', 'year', 'last_mc'], value_vars=all_cols)    
metadata_metrics_.head()  

# Decreasing magnitude in height

The inverse coherence is the slope of the regression line that starts from the magnitude resonance in the summary wavescape at bottom of the wavescape and reaches the one at the top of the wavescape.

In [None]:
cols = 'inverse_coherence'
metadata_metrics = get_metric('inverse_coherence', metadata_metrics, 
                              max_mags=max_mags,
                              cols=cols, store_matrix=True, 
                              show_plot=True, save_name='inverse_coherence', title='Inverse Coherence', unified=True, scatter=True)
metadata_metrics.head(1)

In [None]:
metadata_metrics.head(1)

In [None]:
metadata_metrics.to_csv('results/results.csv')

In [None]:
metadata_metrics.sort_values('inverse_coherence').tail()

In [None]:
max_mag = max_mags['l123-08_preludes_ondine']
#max_coeff = max_coeffs['l108_morceau']
np.polyfit((max_mag.shape[1] - np.arange(max_mag.shape[1]))/max_mag.shape[1], np.mean(max_mag, axis=0), 1)[0]

In [None]:
import seaborn as sns
import matplotlib.pyplot as plt
ax = sns.regplot(x=(max_mag.shape[1] - np.arange(max_mag.shape[1]))/max_mag.shape[1], y=np.mean(max_mag, axis=0), ci=False)
ax.set_title('Regression line. Example: Ondine')
ax.set_xlabel('hierarchical height')
ax.set_ylabel('mean maximum magnitude')

plt.tight_layout()
plt.savefig('figures/coherence.png')

plt.show()


In [None]:
metadata_metrics = get_metric('inverse_coherence', metadata_metrics, 
                              max_mags=max_mags,
                              cols=cols, store_matrix=True, 
                              show_plot=False, testing=True)

Storing the final metrics for future use:

In [None]:
metadata_metrics.reset_index().to_csv('normalized_coherence.csv')

In [None]:
metadata_metrics.reset_index().to_csv('metrics_new.csv')