Style Classification Analysis

This repository builds a series of classifiers that predict whether melodies from the Essen corpus originate from China or Europe, using numeric features from the Python package melody-features as predictors. It reproduces the manuscript’s confusion-matrix figures, runs exploratory factor analysis and a factor-based logistic model in R, and benchmarks logistic regression for each feature-extraction source (IDyOM, jSymbolic, etc.).

Run everything from the repo root unless noted otherwise.

Prerequisites

Requirement	Notes
Python 3.10+	Check with `python3 --version`.
R	Check with `Rscript --version`. Install from CRAN.
`melody-features`	Installed via `requirements.txt`. It ships the Essen corpus used to resolve melody paths from basename lists.

One-time setup

1. Clone and enter the repo

git clone https://github.com/dmwhyatt/Style-Classification-Analysis.git
cd Style-Classification-Analysis

2. Python environment

python3 -m venv .venv
source .venv/bin/activate          # Windows: .venv\Scripts\activate
pip install -r requirements.txt

3. R packages (for factor_logistic.R only)

Rscript -e 'install.packages(c("tidyverse", "psych", "jsonlite"), repos="https://cloud.r-project.org")'

ggplot2 is included in tidyverse.

4. melody-features

Feature extraction is run using the Python package defaults. Some inputs may be skipped (e.g. unsupported or polyphonic). This is expected behaviour.

Dataset

Two files list melody basenames (no paths):

File	Role
`usable_china.txt`	One basename per line → pool for China.
`usable_europa.txt`	One basename per line → pool for Europe.

logistic.py uses every China basename and draws a random subset of Europe (random.seed(42), at most 2200 melodies).

Full pipeline

With .venv activated:

python logistic.py
python xgbclassifer.py
Rscript factor_logistic.R
python factor_logistic_plot_confusion.py
python comparison.py

Step	Command	What it does
1	`python logistic.py`	Builds `essen_china_europe_features.csv` on first run (this can take a long time due to IDyOM runs). Same stratified train/test and CV as other scripts. Writes Figure 1 (`confusion_matrix.pdf`) plus coefficient and permutation-importance artefacts.
2	`python xgbclassifer.py`	Needs the features CSV. Same split/features as 1. Writes Figure 2 (`xgb_confusion_matrix_test.pdf`).
3	`Rscript factor_logistic.R`	EFA on the same numeric features (9 factors, promax, parallel analysis). Writes Figure 3 (`factor_eigenvalues_elbow.pdf`), factor GLM output, and CSVs consumed by 4.
4	`python factor_logistic_plot_confusion.py`	Reads R’s prediction CSVs. Writes Figure 4 (`factor_logistic_confusion_matrix_test.pdf`).
5	`python comparison.py`	Needs the features CSV from 1. Builds or loads `source_to_csv_columns_with_novel.json`, trains one logistic model per implementation source plus an all features baseline, writes comparison CSV/TeX/PDF and `coefficients/*.csv`.

First run of Step 1 can take a long time. Later runs load essen_china_europe_features.csv and skip re-extraction unless you delete that file.

Figures for `main.tex`

Figure	Output file	Produced by
1	`confusion_matrix.pdf`	`python logistic.py`
2	`xgb_confusion_matrix_test.pdf`	`python xgbclassifer.py`
3	`factor_eigenvalues_elbow.pdf`	`Rscript factor_logistic.R`
4	`factor_logistic_confusion_matrix_test.pdf`	`python factor_logistic_plot_confusion.py` (after 3)

Factor network webapp

Rscript factor_logistic.R also writes a self-contained 3D interactive visualization of the eight-factor solution to docs/:

File	Role
`docs/index.html`	Three.js / `3d-force-graph` viewer.
`docs/network_data.js`	Nodes (factors + variables with \|loading\| > 0.3) and links.
`docs/network_data.json`	Same data as a portable JSON sidecar.

Melody examples

python build_melody_examples.py populates docs/melody_examples/ with a piano-roll PNG and a synthesized WAV for the 3 highest and 3 lowest-scoring melodies for every feature node and every factor node in the network. Clicking any node then displays these examples.

Features are ranked by their value in essen_china_europe_features.csv.
Factors are ranked by the regression factor scores in factor_scores_for_logreg.csv (produced by factor_logistic.R).
All of this is precomputed to make the webapp performant.

Reproducibility

Random seeds: 42 is fixed in the Python scripts (train_test_split, CV folds, Europe subsample, XGBoost, etc.) and in factor_logistic.R (set.seed(42)).
Same train/test rows across logistic.py, comparison.py, and xgbclassifer.py — Keep test_size=0.2 and seeds unchanged.
Invalidate the feature cache — Delete essen_china_europe_features.csv to force re-extraction (e.g. after changing usable_*.txt or upgrading melody-features in a way that affects columns).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Style Classification Analysis

Prerequisites

One-time setup

Dataset

Full pipeline

Figures for `main.tex`

Factor network webapp

Melody examples

Reproducibility

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
.github/workflows		.github/workflows
docs		docs
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
build_melody_examples.py		build_melody_examples.py
comparison.py		comparison.py
factor_logistic.R		factor_logistic.R
factor_logistic_plot_confusion.py		factor_logistic_plot_confusion.py
feature_selection.py		feature_selection.py
logistic.py		logistic.py
pearce_exclusion.py		pearce_exclusion.py
requirements.txt		requirements.txt
usable_china.txt		usable_china.txt
usable_europa.txt		usable_europa.txt
xgbclassifer.py		xgbclassifer.py

Folders and files

Latest commit

History

Repository files navigation

Style Classification Analysis

Prerequisites

One-time setup

Dataset

Full pipeline

Figures for main.tex

Factor network webapp

Melody examples

Reproducibility

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Figures for `main.tex`

Packages