## Melodic Feature Set

This notebook seeks to illustrate how to use the Melodic Feature Set. The feature set is accessed using the top level function `get_all_features`. This function computes a wide range of features on every melody found in the input directory, returning a single .csv file with all melodies and their features. 

In [2]:
from melodic_feature_set.features import get_all_features

get_all_features("../corpora/Trad_Flute_Dataset_Midi", "output.csv")

10:36:10 - melodic_feature_set - INFO - Starting feature extraction job...
10:36:10 - melodic_feature_set - INFO - Generating corpus statistics from: /Users/davidwhyatt/feature_set/corpora/Essen_Corpus
10:36:10 - melodic_feature_set - INFO - Corpus statistics file will be at: Essen_Corpus_corpus_stats.json
10:36:10 - melodic_feature_set - INFO - Existing corpus statistics file found.
10:36:10 - melodic_feature_set - INFO - Corpus statistics loaded successfully.
10:36:10 - melodic_feature_set - INFO - Processing 20 melodies
10:36:10 - melodic_feature_set - INFO - Running IDyOM analysis for 'default_pitch' with corpus: /Users/davidwhyatt/feature_set/corpora/Essen_Corpus
10:36:10 - melodic_feature_set - INFO - Creating temporary MIDI files with detected key signatures for IDyOM processing...
10:36:10 - melodic_feature_set - INFO - Processing 20 MIDI files for key signature detection...
10:36:10 - melodic_feature_set - INFO - Successfully created 20 files in temporary directory
10:36:10 - 

** Putting Test dataset files in experiment history folder. **


10:36:15 - melodic_feature_set - INFO - Setting experiment parameters...
10:36:15 - melodic_feature_set - INFO - Running IDyOM analysis...


** Putting Pretraining dataset files in experiment history folder. **
** running lisp script **
To load "clsql":
  Load 1 ASDF system:
    clsql
; Loading "clsql"
.
To load "idyom":
  Load 1 ASDF system:
    idyom
; Loading "idyom"
..............

Inserting 20 compositions into database: dataset 66081125103615.
| Progress: -----------------------------------------------|
Inserting 8472 compositions into database: dataset 99081125103615.
| Progress: -----------------------------------------------|
Written resampling set to /Users/davidwhyatt/idyom/data/resampling/66081125103615-1.resample.
Written PPM* model to /Users/davidwhyatt/idyom/data/models/cpint-cpintfref_99081125103615_66081125103615-1:1_melody.ppm.


10:37:16 - melodic_feature_set - INFO - Analysis complete!


 
** Finished! **


10:37:17 - melodic_feature_set - INFO - IDyOM processing completed successfully! Output: IDyOM_default_pitch_Results.dat
10:37:17 - melodic_feature_set - INFO - Cleaning up temporary directory: /var/folders/kd/t7g208g97fgcw1v9q3__l3b40000gn/T/idyom_key_gbs8xgz2
10:37:17 - melodic_feature_set - INFO - Starting parallel processing...
10:37:17 - melodic_feature_set - INFO - Using 10 CPU cores


⠙ Processing melodies...

10:37:17 - melodic_feature_set - INFO - Processing complete
10:37:17 - melodic_feature_set - INFO - Total processing time: 0.23 seconds
10:37:17 - melodic_feature_set - INFO - Results written to output.csv
10:37:17 - melodic_feature_set - INFO - Timing Statistics (average milliseconds per melody):
10:37:17 - melodic_feature_set - INFO - pitch          :     0.94ms
10:37:17 - melodic_feature_set - INFO - interval       :     0.40ms
10:37:17 - melodic_feature_set - INFO - contour        :     0.20ms
10:37:17 - melodic_feature_set - INFO - duration       :     0.61ms
10:37:17 - melodic_feature_set - INFO - tonality       :     0.89ms
10:37:17 - melodic_feature_set - INFO - narmour        :     0.01ms
10:37:17 - melodic_feature_set - INFO - melodic_movement:     0.09ms
10:37:17 - melodic_feature_set - INFO - mtype          :     1.92ms
10:37:17 - melodic_feature_set - INFO - corpus         :     2.68ms
10:37:17 - melodic_feature_set - INFO - total          :     7.75ms






In [None]:
import pandas as pd

# Read and display first 5 rows
df = pd.read_csv('output.csv')
df.head()

The feature set has a few customisable aspects that change the behaviour of some of the feature calculations. There is no requirement to customise this configuration, as sensible values recommended in the literature are supplied as defaults. However, for users seeking more control over the behaviour of FANTASTIC and IDyOM, the `Config` dataclass is provided:

In [None]:
# Import the Config dataclasses
from melodic_feature_set.features import Config, IDyOMConfig, FantasticConfig

Once we import these dataclasses, we can begin to customise our configuration:

In [10]:
# Initialise the config object with the relevant parameters
config = Config(
    corpus="../corpora/Essen_Corpus",
    idyom={"pitch": IDyOMConfig(
        target_viewpoints=["cpitch"],
        source_viewpoints=[("cpint", "cpintfref")],
        ppm_order=1,
        models=":both"
    )},
        fantastic=FantasticConfig(
        max_ngram_order=2,
        phrase_gap=1.5
    )
)

The `corpus` parameter operates on different levels. If you wish to use the same corpus for both FANTASTIC and IDyOM, you need only set it in the top level of `Config()`; you then do not need to supply it to `IDyOMConfig` or `FantasticConfig`. 

If you want to use different corpora for each different toolbox, `IDyOMConfig` and `FantasticConfig` will override whatever is supplied in the top level of `Config`.

In [11]:
# Initialise the config object with different corpora
different_corpus_config = Config(
    corpus="../corpora/Essen_Corpus", # will be overridden
    idyom={"pitch": IDyOMConfig(
        target_viewpoints=["cpitch"],
        source_viewpoints=[("cpint", "cpintfref")],
        ppm_order=1,
        models=":both",
        corpus="../corpora/Trad_Flute_Dataset_Midi"
    )},
        fantastic=FantasticConfig(
        max_ngram_order=2,
        phrase_gap=1.5,
        corpus="../corpora/Essen_Corpus"
    )
)

We can also supply multiple IDyOM configurations, allowing us to compute information content using different 'viewpoints' or corpora in one run of the feature set. This can be achieved like so:

In [12]:
multi_idyom_config = Config(
    corpus="../corpora/Essen_Corpus",
    idyom={"pitch": IDyOMConfig(
        target_viewpoints=["cpitch"],
        source_viewpoints=[("cpint", "cpintfref")],
        ppm_order=1,
        models=":both",
        corpus="../corpora/Trad_Flute_Dataset_Midi"
    ),
    "rhythm": IDyOMConfig(
        target_viewpoints=["onset"],
        source_viewpoints=["ioi"],
        ppm_order=1,
        models=":both",
        corpus="../corpora/Trad_Flute_Dataset_Midi"
    )},
        fantastic=FantasticConfig(
        max_ngram_order=2,
        phrase_gap=1.5,
        corpus="../corpora/Essen_Corpus"
    )
)

In [13]:
# Now we can get the different IDyOM features along with everything else
get_all_features("../corpora/Trad_Flute_Dataset_Midi", "output2.csv", config=multi_idyom_config)

10:40:23 - melodic_feature_set - INFO - Starting feature extraction job...
10:40:23 - melodic_feature_set - INFO - Generating corpus statistics from: ../corpora/Essen_Corpus
10:40:23 - melodic_feature_set - INFO - Corpus statistics file will be at: Essen_Corpus_corpus_stats.json
10:40:23 - melodic_feature_set - INFO - Existing corpus statistics file found.
10:40:23 - melodic_feature_set - INFO - Corpus statistics loaded successfully.
10:40:23 - melodic_feature_set - INFO - Processing 20 melodies
10:40:23 - melodic_feature_set - INFO - Running IDyOM analysis for 'pitch' with corpus: ../corpora/Trad_Flute_Dataset_Midi
10:40:23 - melodic_feature_set - INFO - Creating temporary MIDI files with detected key signatures for IDyOM processing...
10:40:23 - melodic_feature_set - INFO - Processing 20 MIDI files for key signature detection...
10:40:23 - melodic_feature_set - INFO - Successfully created 20 files in temporary directory
10:40:23 - melodic_feature_set - INFO - Starting IDyOM...
10:40:

** Putting Test dataset files in experiment history folder. **
** Putting Pretraining dataset files in experiment history folder. **
** running lisp script **
To load "clsql":
  Load 1 ASDF system:
    clsql
; Loading "clsql"
.
To load "idyom":
  Load 1 ASDF system:
    idyom
; Loading "idyom"
..............

Inserting 20 compositions into database: dataset 66081125104023.
| Progress: -----------------------------------------------|
Inserting 20 compositions into database: dataset 99081125104023.
| Progress: -----------------------------------------------|
Written resampling set to /Users/davidwhyatt/idyom/data/resampling/66081125104023-1.resample.
Written PPM* model to /Users/davidwhyatt/idyom/data/models/cpint-cpintfref_99081125104023_66081125104023-1:1_melody.ppm.


10:40:45 - melodic_feature_set - INFO - Analysis complete!
10:40:45 - melodic_feature_set - INFO - IDyOM processing completed successfully! Output: IDyOM_pitch_Results.dat
10:40:45 - melodic_feature_set - INFO - Cleaning up temporary directory: /var/folders/kd/t7g208g97fgcw1v9q3__l3b40000gn/T/idyom_key_3k35dxqi
10:40:45 - melodic_feature_set - INFO - Running IDyOM analysis for 'rhythm' with corpus: ../corpora/Trad_Flute_Dataset_Midi
10:40:45 - melodic_feature_set - INFO - Creating temporary MIDI files with detected key signatures for IDyOM processing...
10:40:45 - melodic_feature_set - INFO - Processing 20 MIDI files for key signature detection...
10:40:45 - melodic_feature_set - INFO - Successfully created 20 files in temporary directory
10:40:45 - melodic_feature_set - INFO - Starting IDyOM...
10:40:45 - melodic_feature_set - INFO - Found 20 MIDI files in pretraining directory: ../corpora/Trad_Flute_Dataset_Midi
10:40:45 - melodic_feature_set - INFO - Found 20 MIDI files in input dir

 
** Finished! **
** Putting Test dataset files in experiment history folder. **
** Putting Pretraining dataset files in experiment history folder. **
** running lisp script **
To load "clsql":
  Load 1 ASDF system:
    clsql
; Loading "clsql"
.
To load "idyom":
  Load 1 ASDF system:
    idyom
; Loading "idyom"
..............

Inserting 20 compositions into database: dataset 66081125104045.
| Progress: -----------------------------------------------|
Inserting 20 compositions into database: dataset 99081125104045.
| Progress: -----------------------------------------------|
Written resampling set to /Users/davidwhyatt/idyom/data/resampling/66081125104045-1.resample.
Written PPM* model to /Users/davidwhyatt/idyom/data/models/ioi_99081125104045_66081125104045-1:1_melody.ppm.


10:41:04 - melodic_feature_set - INFO - Analysis complete!
10:41:04 - melodic_feature_set - INFO - IDyOM processing completed successfully! Output: IDyOM_rhythm_Results.dat
10:41:04 - melodic_feature_set - INFO - Cleaning up temporary directory: /var/folders/kd/t7g208g97fgcw1v9q3__l3b40000gn/T/idyom_key_di9boiom
10:41:04 - melodic_feature_set - INFO - Starting parallel processing...
10:41:04 - melodic_feature_set - INFO - Using 10 CPU cores


 
** Finished! **
⠙ Processing melodies...

10:41:05 - melodic_feature_set - INFO - Processing complete
10:41:05 - melodic_feature_set - INFO - Total processing time: 0.23 seconds
10:41:05 - melodic_feature_set - INFO - Results written to output2.csv
10:41:05 - melodic_feature_set - INFO - Timing Statistics (average milliseconds per melody):
10:41:05 - melodic_feature_set - INFO - pitch          :     0.90ms
10:41:05 - melodic_feature_set - INFO - interval       :     0.40ms
10:41:05 - melodic_feature_set - INFO - contour        :     0.19ms
10:41:05 - melodic_feature_set - INFO - duration       :     0.61ms
10:41:05 - melodic_feature_set - INFO - tonality       :     0.97ms
10:41:05 - melodic_feature_set - INFO - narmour        :     0.01ms
10:41:05 - melodic_feature_set - INFO - melodic_movement:     0.09ms
10:41:05 - melodic_feature_set - INFO - mtype          :     1.76ms
10:41:05 - melodic_feature_set - INFO - corpus         :     1.94ms
10:41:05 - melodic_feature_set - INFO - total          :     6.86ms






In [14]:
df = pd.read_csv('output2.csv')
df.head()

Unnamed: 0,melody_num,melody_id,pitch_features.pitch_range,pitch_features.pitch_standard_deviation,pitch_features.pitch_entropy,pitch_features.pcdist1,pitch_features.basic_pitch_histogram,pitch_features.mean_pitch,pitch_features.most_common_pitch,pitch_features.number_of_pitches,...,mtype_features.mean_productivity,corpus_features.tfdf_spearman,corpus_features.tfdf_kendall,corpus_features.mean_log_tfdf,corpus_features.norm_log_dist,corpus_features.max_log_df,corpus_features.min_log_df,corpus_features.mean_log_df,idyom_pitch_features.mean_information_content,idyom_rhythm_features.mean_information_content
0,1,allemande_fifth_fragment.midi,31,6.373207,4.122412,"{0.0: 0.13131313131313133, 1.0: 0.020202020202...","{'62.00-63.29': 1, '63.29-64.58': 4, '64.58-65...",76.363636,81,24,...,0.464286,0.467493,0.418718,0.008151,0.077266,8.998755,5.808142,7.554577,4.843572,0.139864
1,2,allemande_first_fragment.midi,26,5.538038,3.896842,"{0.0: 0.1640625, 2.0: 0.1171875, 4.0: 0.179687...","{'62.00-63.24': 1, '63.24-64.48': 4, '64.48-65...",74.632812,76,21,...,0.40678,0.778952,0.584685,0.010012,0.077126,8.998755,2.772589,7.334746,4.20457,0.113777
2,3,allemande_fourth_fragment.midi,21,4.814213,3.549268,"{0.0: 0.10416666666666666, 1.0: 0.020833333333...","{'62.00-63.40': 4, '63.40-64.80': 1, '64.80-66...",73.895833,74,15,...,0.054054,0.610884,0.422577,0.009245,0.103267,8.998755,5.808142,7.937652,3.840379,0.134007
3,4,allemande_second_fragment.midi,21,4.90544,3.883993,"{0.0: 0.08547008547008547, 1.0: 0.008547008547...","{'63.00-64.11': 2, '64.11-65.21': 0, '65.21-66...",75.067961,76,19,...,0.567164,0.091308,0.064223,0.009246,0.056269,8.998755,2.772589,7.663928,4.830534,0.353821
4,5,allemande_third_fragment.midi,20,4.728223,3.842208,"{0.0: 0.07500000000000001, 1.0: 0.025, 2.0: 0....","{'63.00-64.00': 2, '64.00-65.00': 2, '65.00-66...",72.7625,76,20,...,0.304348,0.763269,0.585323,0.009778,0.08414,8.998755,5.808142,7.749487,4.301585,0.146339
