## Melodic Feature Set

This notebook seeks to illustrate how to use the Melodic Feature Set. The feature set is accessed using the top level function `get_all_features`. This function computes a wide range of features on every melody found in the input directory, returning a single .csv file with all melodies and their features. 

In [1]:
from melodic_feature_set.features import get_all_features
# from melodic_feature_set.corpora import essen_corpus

# get_all_features(essen_corpus, "output.csv") 
import os
essen_dir="/workspaces/melodic_feature_set/src/melodic_feature_set/corpora/Essen_Corpus" 
files=sorted([f for f in os.listdir(essen_dir) if f.endswith('.mid')])[:10]
temp_dir='/tmp/essen_first_10'
os.makedirs(temp_dir, exist_ok=True)
[os.system(f'cp \"{essen_dir}/{f}\" \"{temp_dir}/{f}\"') for f in files]
get_all_features(temp_dir, 'essen_first_10_features.csv')

10:57:35 - melodic_feature_set - INFO - Starting feature extraction job...
10:57:35 - melodic_feature_set - INFO - Generating corpus statistics from: /src/melodic_feature_set/corpora/Essen_Corpus
10:57:35 - melodic_feature_set - INFO - Corpus statistics file will be at: Essen_Corpus_corpus_stats.json
10:57:35 - melodic_feature_set - INFO - Corpus statistics file not found. Generating a new one...
10:57:35 - melodic_feature_set - INFO - Found 8472 MIDI files
Loading MIDI files using 2 cores: 100%|██████████| 8472/8472 [00:15<00:00, 531.55it/s]
10:57:51 - melodic_feature_set - INFO - Processing 8470 valid melodies
Computing n-grams using 2 cores: 100%|██████████| 8470/8470 [00:48<00:00, 174.47it/s]
10:58:48 - melodic_feature_set - INFO - Corpus statistics saved and loaded successfully.
10:58:48 - melodic_feature_set - INFO - Corpus size: 8470 melodies
10:58:48 - melodic_feature_set - INFO - N-gram lengths: [1, 6]
10:58:48 - melodic_feature_set - INFO - Corpus statistics generated.
10:58:

** Putting Test dataset files in experiment history folder. **


10:58:50 - melodic_feature_set - INFO - Setting experiment parameters...
10:58:50 - melodic_feature_set - INFO - Running IDyOM analysis...


** Putting Pretraining dataset files in experiment history folder. **
** running lisp script **
To load "clsql":
  Load 1 ASDF system:
    clsql
; Loading "clsql"

To load "idyom":
  Load 1 ASDF system:
    idyom
; Loading "idyom"
................

Inserting 10 compositions into database: dataset 66081325105850.
| Progress: -----------------------------------------------|
Inserting 8472 compositions into database: dataset 99081325105850.
| Progress: -----------------------------------------------|
Written resampling set to /root/idyom/data/resampling/66081325105850-1.resample.
Written PPM* model to /root/idyom/data/models/cpint-cpintfref_99081325105850_66081325105850-1:1_melody.ppm.


10:59:50 - melodic_feature_set - INFO - Analysis complete!


 
** Finished! **


10:59:51 - melodic_feature_set - INFO - IDyOM processing completed successfully! Output: IDyOM_default_pitch_Results.dat
10:59:51 - melodic_feature_set - INFO - Cleaning up temporary directory: /tmp/idyom_key_cs8rzq2k
10:59:51 - melodic_feature_set - INFO - Starting parallel processing...
10:59:51 - melodic_feature_set - INFO - Using 2 CPU cores


⠦ Processing melodies...

10:59:51 - melodic_feature_set - INFO - Processing complete
10:59:51 - melodic_feature_set - INFO - Total processing time: 0.88 seconds
10:59:51 - melodic_feature_set - INFO - Results written to essen_first_10_features.csv
10:59:51 - melodic_feature_set - INFO - Timing Statistics (average milliseconds per melody):
10:59:51 - melodic_feature_set - INFO - pitch          :     6.36ms
10:59:51 - melodic_feature_set - INFO - interval       :     1.96ms
10:59:51 - melodic_feature_set - INFO - contour        :     0.78ms
10:59:51 - melodic_feature_set - INFO - duration       :     2.85ms
10:59:51 - melodic_feature_set - INFO - tonality       :     3.04ms
10:59:51 - melodic_feature_set - INFO - narmour        :     0.07ms
10:59:51 - melodic_feature_set - INFO - melodic_movement:     0.45ms
10:59:51 - melodic_feature_set - INFO - mtype          :    25.20ms
10:59:51 - melodic_feature_set - INFO - corpus         :    29.74ms
10:59:51 - melodic_feature_set - INFO - total          :    70.47ms






In [None]:
import pandas as pd

# Read and display first 5 rows
df = pd.read_csv('output.csv')
df.head()

The feature set has a few customisable aspects that change the behaviour of some of the feature calculations. There is no requirement to customise this configuration, as sensible values recommended in the literature are supplied as defaults. However, for users seeking more control over the behaviour of FANTASTIC and IDyOM, the `Config` dataclass is provided:

In [None]:
# Import the Config dataclasses
from melodic_feature_set.features import Config, IDyOMConfig, FantasticConfig

Once we import these dataclasses, we can begin to customise our configuration:

In [None]:
# Initialise the config object with the relevant parameters
config = Config(
    corpus="../corpora/Essen_Corpus",
    idyom={"pitch": IDyOMConfig(
        target_viewpoints=["cpitch"],
        source_viewpoints=[("cpint", "cpintfref")],
        ppm_order=1,
        models=":both"
    )},
        fantastic=FantasticConfig(
        max_ngram_order=2,
        phrase_gap=1.5
    )
)

The `corpus` parameter operates on different levels. If you wish to use the same corpus for both FANTASTIC and IDyOM, you need only set it in the top level of `Config()`; you then do not need to supply it to `IDyOMConfig` or `FantasticConfig`. 

If you want to use different corpora for each different toolbox, `IDyOMConfig` and `FantasticConfig` will override whatever is supplied in the top level of `Config`.

In [None]:
# Initialise the config object with different corpora
different_corpus_config = Config(
    corpus="../corpora/Essen_Corpus", # will be overridden
    idyom={"pitch": IDyOMConfig(
        target_viewpoints=["cpitch"],
        source_viewpoints=[("cpint", "cpintfref")],
        ppm_order=1,
        models=":both",
        corpus="../corpora/Trad_Flute_Dataset_Midi"
    )},
        fantastic=FantasticConfig(
        max_ngram_order=2,
        phrase_gap=1.5,
        corpus="../corpora/Essen_Corpus"
    )
)

We can also supply multiple IDyOM configurations, allowing us to compute information content using different 'viewpoints' or corpora in one run of the feature set. This can be achieved like so:

In [None]:
multi_idyom_config = Config(
    corpus="../corpora/Essen_Corpus",
    idyom={"pitch": IDyOMConfig(
        target_viewpoints=["cpitch"],
        source_viewpoints=[("cpint", "cpintfref")],
        ppm_order=1,
        models=":both",
        corpus="../corpora/Trad_Flute_Dataset_Midi"
    ),
    "rhythm": IDyOMConfig(
        target_viewpoints=["onset"],
        source_viewpoints=["ioi"],
        ppm_order=1,
        models=":both",
        corpus="../corpora/Trad_Flute_Dataset_Midi"
    )},
        fantastic=FantasticConfig(
        max_ngram_order=2,
        phrase_gap=1.5,
        corpus="../corpora/Essen_Corpus"
    )
)

In [None]:
# Now we can get the different IDyOM features along with everything else
get_all_features("../corpora/Trad_Flute_Dataset_Midi", "output2.csv", config=multi_idyom_config)

In [None]:
df = pd.read_csv('output2.csv')
df.head()