# CRIM Intervals:  Presentation Types

### Note:  Still Under Development!:

#### Import Music Files

* If you are exploring pieces from CRIM, importing simply involves providing the CRIM URL of the MEI file:  
    * **`piece = importScore('https://crimproject.org/mei/CRIM_Model_0008.mei')`**

* But you can also use the Notebook with any MEI, MusicXML, or MIDI file of your own. You can easily do this when you run the Notebooks on Jupyter Hub, you will also find a folder called **`Music_Files`**.  Upload the file here, then provide the path to that file: 
    * **`piece = importScore('Music_Files/My_File_Name.mei')`**.  

#### Save outputs as CSV or Excel

* The Jupyter Hub version of these Notebooks also provides a folder called **`saved_csv`**.  You can save **csv** files of any data frame there with this command: 
    * **`notebook_data_frame_name.to_csv('saved_csv/your_file_title.csv')`**.
* If you prefer **Excel** documents (which are better for anything with a complex set of columns or hierarhical index), use **ExcelWriter**.  In the following code, you will need to provide these commands:
    * **`writer = pd.ExcelWriter('saved_csv/file_name.xlsx', engine='xlsxwriter')`**
* Now convert your dataframe to Excel
    * **`frame_name.to_excel(writer, sheet_name='Sheet1')`**
* And finally save the new file to the folder here in the Notebook:
    * **`writer.save()`**

Put the following code to a new cell and update the frame_name and file_name:

`writer = pd.ExcelWriter('saved_csv/file_name.xlsx', engine='xlsxwriter')` <br>
`frame_name.to_excel(writer, sheet_name='Sheet1')` <br>
`writer.save()` <br>


## A. Import Intervals and Other Code

* The first step is to import all the code required for the Notebook
* **`arrow/run`** or **`Shift + Enter`** in the following cell:

In [45]:
import intervals
from intervals import * 
import intervals.visualizations as viz
import pandas as pd
import re
import altair as alt 
from ipywidgets import interact
from pandas.io.json import json_normalize
from pyvis.network import Network
from IPython.display import display
import requests
import os
import numpy

# You should change 'test' to your preferred folder.
MYDIR = ("saved_csv")
CHECK_FOLDER = os.path.isdir(MYDIR)

# If folder doesn't exist, then create it.
if not CHECK_FOLDER:
    os.makedirs(MYDIR)
    print("created folder : ", MYDIR)

else:
    print(MYDIR, "folder already exists.")
    

        

saved_csv folder already exists.


In [46]:
# New version to include in main_objs code
def find_entry_int_distance(coordinates, piece: intervals.main_objs.ImportedPiece):
    tone_list = []
    all_tones = piece.getNoteRest()
    
    for item in coordinates:
        filtered_tones = all_tones.loc[item] 
        tone_list.append(filtered_tones)
        
    noteObjects = [note.Note(tone) for tone in tone_list]
    _ints = [interval.Interval(noteObjects[i], noteObjects[i + 1]) for i in range(len(noteObjects) - 1)]
    entry_ints = []
    
    for _int in _ints:
        entry_ints.append(_int.directedName)
    
    return entry_ints
        
        


## B. Importing a Piece and Run Imitation Classifier

In [47]:
piece = importScore('https://crimproject.org/mei/CRIM_Model_0008.mei')
points = piece.getPoints()


Memoized piece detected.
Memoized piece detected.
Finding close matches...
137 melodic intervals had more than 3 exact or close matches.



### View the Results with Each Voice Entry as a Row

* The full results present each voice entry as a single row, including many details about the voice, offset, total duration, and relationship to others in the same "sub-group" of entries, which in turn form the basis of the Predicted Type.
* The system also offers `close` as well as `exact` matches:  two columns show the differences between these, which are helpful in the case of flexed entries.
*  See the interactive version of this engine below for options to change the many settings that determine the results.

Run `points` below to see the result.

In [48]:
points.head()

Unnamed: 0,pattern_generating_match,pattern_matched,piece_title,part,end_measure,end_beat,start_offset,end_offset,note_durations,ema,ema_url,sum_durs,group_number,prev_entry_off,next_entry_off,is_first,is_last,last_off_diff,next_off_diff,parallel,forward_gapped,back_gapped,singleton,split_group,combined_group,sub_group_id,predicted_type,entry_number,start
0,"(4, 1, 2, 2)","(4, 1, 2, 2)",Ave Maria,[Superius],3,3.0,0.0,20.0,"(4.0, 8.0, 4.0, 4.0, 4.0)","1-3/1/@1.0-end,@start-end,@start-3.0",https://ema.crimproject.org/https%3A%2F%2Fcrim...,24.0,142,472.0,16.0,True,False,472.0,16.0,False,False,False,False,False,True,293.0,PEN,1,1/1.0
5,"(1, 2, 2, -3)","(1, 2, 2, -3)",Ave Maria,[Superius],4,1.0,4.0,24.0,"(8.0, 4.0, 4.0, 4.0, 8.0)","1-4/1/@3.0-end,@start-end,@start-end,@start-1.0",https://ema.crimproject.org/https%3A%2F%2Fcrim...,28.0,87,804.0,4.0,True,False,800.0,0.0,False,False,False,False,False,True,188.0,PEN,1,1/3.0
2,"(4, 1, 2, 2)","(4, 1, 2, 2)",Ave Maria,Altus,5,3.0,16.0,36.0,"(4.0, 8.0, 4.0, 4.0, 4.0)","3-5/2/@1.0-end,@start-end,@start-3.0",https://ema.crimproject.org/https%3A%2F%2Fcrim...,24.0,142,0.0,32.0,False,False,16.0,16.0,False,False,False,False,False,False,293.0,PEN,2,3/1.0
13,"(1, 2, 2, -3)","(1, 2, 2, -3)",Ave Maria,Altus,6,1.0,20.0,40.0,"(8.0, 4.0, 4.0, 4.0, 8.0)","3-6/2/@3.0-end,@start-end,@start-end,@start-1.0",https://ema.crimproject.org/https%3A%2F%2Fcrim...,28.0,87,4.0,20.0,False,False,16.0,0.0,False,False,False,False,False,False,188.0,PEN,2,3/3.0
3,"(4, 1, 2, 2)","(4, 1, 2, 2)",Ave Maria,Tenor,7,3.0,32.0,52.0,"(4.0, 8.0, 4.0, 4.0, 4.0)","5-7/3/@1.0-end,@start-end,@start-3.0",https://ema.crimproject.org/https%3A%2F%2Fcrim...,24.0,142,16.0,48.0,False,False,16.0,16.0,False,False,False,False,False,False,293.0,PEN,3,5/1.0


#### B.2 Finding Time and Entry Interval Details

* Here we run an additional set of filters that show each Presentation Type as a **single row** in a dataframe.

    * This is temporary code, but please run it.  Results viewed in a subsequent cell.

In [49]:

df3 = points

# make lists of voices and offsets and put them in dfs to be merged

voice_groups = df3.groupby('sub_group_id')['part'].apply(list)
voice_groups_df = voice_groups.reset_index(name = 'voice_list')

offset_groups = df3.groupby('sub_group_id')['start_offset'].apply(list)
offset_groups_df = offset_groups.reset_index(name = 'offset_list')

meas_beat_groups = df3.groupby('sub_group_id')['start'].apply(list)
meas_beat_groups_df = meas_beat_groups.reset_index(name = 'measure_beat_list')


# keep just the first entry in each point
df3 = df3[df3["entry_number"] == 1]

# merge the dfs of offsets and voices into the main list of points
# zip the lists of offsets and voices together as tuples
# find the offset = time intervals

df5 = pd.merge(df3, voice_groups_df,
how='inner', on='sub_group_id')

df6 = pd.merge(df5, offset_groups_df,
how='inner', on='sub_group_id')

df7 =  pd.merge(df6, meas_beat_groups_df,
how='inner', on='sub_group_id')

df7["time_intervals"] = df7['offset_list'].apply(lambda x: numpy.diff(x))

df8 = df7[["piece_title", "pattern_matched", "predicted_type", 'start_offset', "measure_beat_list", "offset_list", "time_intervals", "voice_list"]].copy()

df8['tone_coordinates'] = df8.apply(lambda row: list(zip(row.offset_list, row.voice_list)), axis='columns')

#  apply the function to find the entry intervals
# remove cols no long needed

df8['entry_intervals'] = df8['tone_coordinates'].apply(lambda x: find_entry_int_distance(x, piece))
classified_with_intervals =  df8.drop(columns=['offset_list', 'tone_coordinates'])

#### B.2.1. View the Results of Classified with Time and Entry Intervals

* The results now group each "point" as a single row in a dataframe.  The rows include information about:
    * `pattern_matched` = the soggetto
    * `predicted_type` = the presentation type
    * `voice_list` = the voices involved, in order of entry
    * `measure_beat_list` = where each entry begins
    * `time_intervals` = the number of offsets between each entry 
    * `entry_intervals` = the time (in offsets) between entries
    * other information needed only for these calculations


In [50]:
classified_with_intervals = df8.iloc[:, [0, 1, 2, 7, 4, 6, 9, 3, 5, 8]]
classified_with_intervals.head()

Unnamed: 0,piece_title,pattern_matched,predicted_type,voice_list,measure_beat_list,time_intervals,entry_intervals,start_offset,offset_list,tone_coordinates
0,Ave Maria,"(4, 1, 2, 2)",PEN,"[[Superius], Altus, Tenor, Bassus]","[1/1.0, 3/1.0, 5/1.0, 7/1.0]","[16.0, 16.0, 16.0]","[P-8, P1, P-8]",0.0,"[0.0, 16.0, 32.0, 48.0]","[(0.0, [Superius]), (16.0, Altus), (32.0, Teno..."
1,Ave Maria,"(1, 2, 2, -3)",PEN,"[[Superius], Altus, Tenor, Bassus]","[1/3.0, 3/3.0, 5/3.0, 7/3.0]","[16.0, 16.0, 16.0]","[P-8, P1, P-8]",4.0,"[4.0, 20.0, 36.0, 52.0]","[(4.0, [Superius]), (20.0, Altus), (36.0, Teno..."
2,Ave Maria,"(-2, -2, -2, 2)",PEN,"[[Superius], Altus, Tenor, Bassus]","[8/1.0, 10/1.0, 12/1.0, 14/1.0]","[16.0, 16.0, 16.0]","[P-8, P1, P-8]",56.0,"[56.0, 72.0, 88.0, 104.0]","[(56.0, [Superius]), (72.0, Altus), (88.0, Ten..."
3,Ave Maria,"(-2, -2, 2, -2)",PEN,"[[Superius], Altus, Tenor]","[8/4.0, 10/4.0, 12/4.0]","[16.0, 16.0]","[P-8, P1]",62.0,"[62.0, 78.0, 94.0]","[(62.0, [Superius]), (78.0, Altus), (94.0, Ten..."
4,Ave Maria,"(2, -2, 4, -2)",PEN,"[[Superius], Altus, Tenor]","[9/3.0, 11/3.0, 13/3.0]","[16.0, 16.0]","[P-8, P1]",68.0,"[68.0, 84.0, 100.0]","[(68.0, [Superius]), (84.0, Altus), (100.0, Te..."


#### B.2.2 We can abbreviate this still further, dropping columns not needed by analysts

And to save CSV:

    title = classified_with_intervals_brief['piece_title'][0]
    fn = 'saved_csv/' + title + '_Brief_Points.csv'
    classified_with_intervals_brief.to_csv(fn)

In [51]:
classified_with_intervals_brief = df8.iloc[:, [0, 1, 2, 7, 4, 6, 9, 3]]
classified_with_intervals_brief.head()


Unnamed: 0,piece_title,pattern_matched,predicted_type,voice_list,measure_beat_list,time_intervals,entry_intervals,start_offset
0,Ave Maria,"(4, 1, 2, 2)",PEN,"[[Superius], Altus, Tenor, Bassus]","[1/1.0, 3/1.0, 5/1.0, 7/1.0]","[16.0, 16.0, 16.0]","[P-8, P1, P-8]",0.0
1,Ave Maria,"(1, 2, 2, -3)",PEN,"[[Superius], Altus, Tenor, Bassus]","[1/3.0, 3/3.0, 5/3.0, 7/3.0]","[16.0, 16.0, 16.0]","[P-8, P1, P-8]",4.0
2,Ave Maria,"(-2, -2, -2, 2)",PEN,"[[Superius], Altus, Tenor, Bassus]","[8/1.0, 10/1.0, 12/1.0, 14/1.0]","[16.0, 16.0, 16.0]","[P-8, P1, P-8]",56.0
3,Ave Maria,"(-2, -2, 2, -2)",PEN,"[[Superius], Altus, Tenor]","[8/4.0, 10/4.0, 12/4.0]","[16.0, 16.0]","[P-8, P1]",62.0
4,Ave Maria,"(2, -2, 4, -2)",PEN,"[[Superius], Altus, Tenor]","[9/3.0, 11/3.0, 13/3.0]","[16.0, 16.0]","[P-8, P1]",68.0


In [28]:
# here check the total number of rows = points of imitation 
len(classified_with_intervals_brief)

105

In [42]:
# Save to CSV
title = classified_with_intervals_brief['piece_title'][0]
fn = 'saved_csv/' + title + '_Brief_Points.csv'
classified_with_intervals_brief.to_csv(fn)


### C. Filter the Points of Imitation According to the Nearby Cadences

* This method assumes that you have already created the 'abbreviated view' seen in the previous block:  `classified_with_intervals_brief`
* If the point begins within 8 beats of the end of any cadence, keep it.  Otherwise, omit it.
* The threshold around the cadence is adjustable:  

        ln = item - 8
        un = item + 8

In [52]:
piece = importScore('https://crimproject.org/mei/CRIM_Model_0009.mei')
cads = piece.classifyCadences()
# points = piece.getPoints()
cads_index = cads.index.to_list()
cads_index
cads_index_rounded = [round(x) for x in cads_index]
cads_index_rounded

final_filter_list = []
for item in cads_index_rounded:
    ln = item - 8
    un = item + 8
    for x in range(ln, un):
        final_filter_list.append(x)
filtered_pts = classified_with_intervals_brief.loc[classified_with_intervals_brief[('start_offset')].isin(final_filter_list)]
first_event = classified_with_intervals_brief.iloc[0]
filtered_pts = filtered_pts.append(first_event)
filtered_pts.sort_index(inplace = True)
filtered_pts



Memoized piece detected.


Unnamed: 0,piece_title,pattern_matched,predicted_type,voice_list,measure_beat_list,time_intervals,entry_intervals,start_offset
0,Ave Maria,"(4, 1, 2, 2)",PEN,"[[Superius], Altus, Tenor, Bassus]","[1/1.0, 3/1.0, 5/1.0, 7/1.0]","[16.0, 16.0, 16.0]","[P-8, P1, P-8]",0.0
2,Ave Maria,"(-2, -2, -2, 2)",PEN,"[[Superius], Altus, Tenor, Bassus]","[8/1.0, 10/1.0, 12/1.0, 14/1.0]","[16.0, 16.0, 16.0]","[P-8, P1, P-8]",56.0
3,Ave Maria,"(-2, -2, 2, -2)",PEN,"[[Superius], Altus, Tenor]","[8/4.0, 10/4.0, 12/4.0]","[16.0, 16.0]","[P-8, P1]",62.0
4,Ave Maria,"(2, -2, 4, -2)",PEN,"[[Superius], Altus, Tenor]","[9/3.0, 11/3.0, 13/3.0]","[16.0, 16.0]","[P-8, P1]",68.0
8,Ave Maria,"(1, 1, 2, 2)",PEN,"[[Superius], Altus, Tenor, Bassus]","[16/3.0, 18/3.0, 20/3.0, 22/3.0]","[16.0, 16.0, 16.0]","[P-8, P1, P-8]",124.0
9,Ave Maria,"(1, 2, 2, -3)",PEN,"[[Superius], Altus, Tenor, Bassus]","[17/2.0, 19/2.0, 21/2.0, 23/2.0]","[16.0, 16.0, 16.0]","[P-8, P1, P-8]",130.0
11,Ave Maria,"(-2, 2, 2, 2)",Fuga,"[Altus, Altus, Bassus]","[24/1.0, 28/1.5, 29/1.0]","[33.0, 7.0]","[P4, P-11]",184.0
12,Ave Maria,"(2, 2, 2, 2)",Fuga,"[Altus, Altus]","[24/3.0, 25/1.0]",[4.0],[M2],188.0
14,Ave Maria,"(-2, -2, -2, 2)",Fuga,"[Altus, [Superius]]","[27/4.5, 28/4.0]",[7.0],[M7],215.0
15,Ave Maria,"(-2, -2, 2, 2)",Fuga,"[Altus, Tenor]","[28/1.0, 28/3.5]",[5.0],[M-2],216.0


#### C.2 Save Presentation Types Filtered by Cadence:

    title = filtered_pts['piece_title'][0]
    fn = 'saved_csv/' + title + '_Brief_Points_Cadence_Filter.csv'
    filtered_pts.to_csv(fn)

In [43]:
title = filtered_pts['piece_title'][0]
fn = 'saved_csv/' + title + '_Brief_Points_Cadence_Filter.csv'
filtered_pts.to_csv(fn)

### D. Get Presentation Types with Interact

* Here we use the same `interactive` feature found in some of our other tools.  There are many options!
    * **interval_type** = `generic` means diatonic; `semitone` means chromatic
    * **vector_size** = the number of intervals in the soggetto (similar to ngram size in previous methods)
    * **match_type** = `exact` means soggetti must be exactly the same melodic intervals; `close` means they can differ by the threshold
    * **close_distance** = if you select `close` above, this represents the total intervallic distance among the soggetti.  The comparison is made on a side-by-side basis:  a soggetto of `4, 1, -2` would differ from a soggetto of `3, 1, -2` by a total of 1.  Note that diatonic vs chromatic representations of this will give rather different results, since `-2` vs `2` in diatonic space is **not** a difference of a fourth, but just a third.
    * **min_exact_matches** or **min_close_matches** = the minimum number of soggetti required.  This is the minimum across the entire piece, not in the point of imitation (since some entries will be filtered out if they are too short, or too far from others)
    * **duration_type** =  `real` uses the actual durations; `incremental` samples only at the chosen time scale (thus:  every semibreve, etc).  The numbers represent offsets: 1 is breve, 2 is semi-breve, etc.
    * **min_sum_durations** and **max_sum_durations** = the total (in offsets) of a given soggetto before it is included as a match.  This helps to filter out very short or very long soggetti
    * Other limits are best left as their default values, but relate to the length (in offsets) before the system assumes that the next entries are part of a new point of imitation (we would not assume, for instance, that a single fuga had a gap of 10 bars between entries)


In [53]:
@interact
def get_points(interval_type=["generic", "semitone"], vector_size=[5, 2, 3, 4, 6, 7], match_type=["close", "exact"], close_distance=[1, 2, 3, 4, 5, 6], min_exact_matches=[3, 2, 4, 5, 6], min_close_matches=[3, 2, 4, 5, 6], duration_type=['real', 'incremental'], increment_size=[4, 2, 1], min_sum_durations=[10, 5, 20], max_sum_durations=[30, 50, 15], forward_gap_limit=[40, 20, 10], backward_gap_limit=[40, 20, 10], offset_difference_limit=[500, 100, 50]):
    points = piece.getPoints(duration_type=duration_type, interval_type=interval_type, match_type=match_type, min_exact_matches=min_exact_matches, min_close_matches=min_close_matches, close_distance=close_distance, vector_size=vector_size, increment_size=increment_size, forward_gap_limit=forward_gap_limit, backward_gap_limit=backward_gap_limit, min_sum_durations=min_sum_durations, max_sum_durations=max_sum_durations, offset_difference_limit=offset_difference_limit)
    return points
    points

interactive(children=(Dropdown(description='interval_type', options=('generic', 'semitone'), value='generic'),…

### E. Batch Search for Presentation Types and Save CSV

* List all pieces in quotation marks, and separated by commas.
* Note that for the moment this will produce the full results as shown for getPoints above (not factored into single rows)

**Now access the results:**  

* First:  `list_of_dfs[0]`
* Second: `list_of_dfs[1]`

**Or view them all:**

    for result in list_of_dfs:
        print(result)

**Or save all as CSV:**

    for result in list_of_dfs:
        title = result["piece_title"][0]
        # print(title)
        fn = 'saved_csv/' + title + '_Presentation_Types.csv'
        result.to_csv(fn)


In [29]:
list_of_pieces = ['https://crimproject.org/mei/CRIM_Mass_0014_3.mei',
                             'https://crimproject.org/mei/CRIM_Model_0009.mei']
corpus = CorpusBase(list_of_pieces)
func = ImportedPiece.getPoints  # <- NB there are no parentheses here
list_of_dfs = corpus.batch(func)

for result in list_of_dfs:
    title = result["piece_title"][0]
    # print(title)
    fn = 'saved_csv/' + title + '_Presentation_Types.csv'
    result.to_csv(fn)

Downloading remote score...
Successfully imported https://crimproject.org/mei/CRIM_Mass_0014_3.mei
Memoized piece detected.
Memoized piece detected.
Finding close matches...
223 melodic intervals had more than 3 exact or close matches.

Memoized piece detected.
Finding close matches...
60 melodic intervals had more than 3 exact or close matches.

