## Melody Features

This notebook seeks to illustrate how to use the Melodic Feature Set. The feature set is accessed using the top level function `get_all_features`. This function computes a wide range of features on every melody found in the input directory, returning a single .csv file with all melodies and their features. 

In [None]:
from melody_features import get_all_features
from melody_features.corpus import get_corpus_files

first_ten_essen = get_corpus_files('essen', max_files=10)

get_all_features(first_ten_essen, 'output1.csv')


In [None]:
import pandas as pd

# Read and display first 5 rows
df = pd.read_csv('output1.csv')
df.head()

The feature set has a few customisable aspects that change the behaviour of some of the feature calculations. There is no requirement to customise this configuration, as sensible values recommended in the literature are supplied as defaults. However, for users seeking more control over the behaviour of FANTASTIC and IDyOM, the `Config` dataclass is provided:

In [None]:
# Import the Config dataclasses
from melody_features.features import Config, IDyOMConfig, FantasticConfig

Once we import these dataclasses, we can begin to customise our configuration:

In [None]:
# Initialise the config object with the relevant parameters
from melody_features import essen_corpus
config = Config(
    corpus=essen_corpus,
    idyom={"pitch": IDyOMConfig(
        target_viewpoints=["cpitch"],
        source_viewpoints=[("cpint", "cpintfref")],
        ppm_order=1,
        models=":both"
    )},
        fantastic=FantasticConfig(
        max_ngram_order=2,
        phrase_gap=1.5
    )
)

The `corpus` parameter operates on different levels. If you wish to use the same corpus for both FANTASTIC and IDyOM, you need only set it in the top level of `Config()`; you then do not need to supply it to `IDyOMConfig` or `FantasticConfig`. 

If you want to use different corpora for each different toolbox, `IDyOMConfig` and `FantasticConfig` will override whatever is supplied in the top level of `Config`.

In [None]:
# Initialise the config object with different corpora
different_corpus_config = Config(
    corpus=None, # will be overridden
    idyom={"pitch": IDyOMConfig(
        target_viewpoints=["cpitch"],
        source_viewpoints=[("cpint", "cpintfref")],
        ppm_order=1,
        models=":both",
        corpus=None
    )},
        fantastic=FantasticConfig(
        max_ngram_order=2,
        phrase_gap=1.5,
        corpus=essen_corpus
    )
)

get_all_features(first_ten_essen, "output2.csv", config=different_corpus_config)

We can also supply multiple IDyOM configurations, allowing us to compute information content using different 'viewpoints' or corpora in one run of the feature set. This can be achieved like so:

In [None]:
multi_idyom_config = Config(
    corpus=essen_corpus,
    idyom={"pitch": IDyOMConfig(
        target_viewpoints=["cpitch"],
        source_viewpoints=[("cpint", "cpintfref")],
        ppm_order=1,
        models=":both",
        corpus=essen_corpus
    ),
    "rhythm": IDyOMConfig(
        target_viewpoints=["onset"],
        source_viewpoints=["ioi"],
        ppm_order=1,
        models=":both",
        corpus=None
    )},
        fantastic=FantasticConfig(
        max_ngram_order=2,
        phrase_gap=1.5,
        corpus=None
    )
)

In [None]:
# Now we can get the different IDyOM features along with everything else
get_all_features(first_ten_essen, "output3.csv", config=multi_idyom_config)

In [None]:
df = pd.read_csv('output3.csv')
df.head()

As well as skipping corpus-dependent features, we can choose to skip IDyOM entirely if we like, as it can be quite time-consuming if you don't intend to use its output:

In [None]:
get_all_features(first_ten_essen, "output3_no_idyom.csv", skip_idyom=True)