## Search with Options

- Piece or Corpus
- Actual or Incremental Durations
- Chromatic or Diatonic
- Exact or Close
- Classify

***


In [23]:
from crim_intervals import *
import pandas as pd
import ast
import matplotlib
from itertools import tee, combinations
import numpy as np

### Short Corpus

In [24]:
# work_list = ['CRIM_Mass_0002_1.mei',
#  'CRIM_Mass_0002_2.mei',
#  'CRIM_Mass_0002_3.mei',
#  'CRIM_Mass_0002_4.mei',
#  'CRIM_Mass_0002_5.mei',
# 'CRIM_Model_0001.mei']

work_list = ['CRIM_Mass_0005_1.mei']

## Load File and Correct the MEI Metadata

In [25]:
work_list = [el.replace("CRIM_", "https://crimproject.org/mei/MEI_4.0/CRIM_") for el in work_list]
corpus = CorpusBase(work_list)

import xml.etree.ElementTree as ET
import requests

MEINSURI = 'http://www.music-encoding.org/ns/mei'
MEINS = '{%s}' % MEINSURI

for i, path in enumerate(work_list):
    
    try:
        if path[0] == '/':
            mei_doc = ET.parse(path)
        else:
            mei_doc = ET.fromstring(requests.get(path).text)

      # Find the title from the MEI file and update the Music21 Score metadata
        title = mei_doc.find('mei:meiHead//mei:titleStmt/mei:title', namespaces={"mei": MEINSURI}).text
        print(path, title)
        corpus.scores[i].metadata.title = title
    except:
        continue

for s in corpus.scores:
    print(s.metadata.title)

Requesting file from https://crimproject.org/mei/MEI_4.0/CRIM_Mass_0005_1.mei...
Successfully imported.
https://crimproject.org/mei/MEI_4.0/CRIM_Mass_0005_1.mei Missa Ave Maria: Kyrie
Missa Ave Maria: Kyrie



## Select Actual or Incremental Durations

#### About Rhythmic Durations

- For `find_close_matches` and `find_exact_matches`, rhythmic variation/duration is displayed, but **not** factored into the calculation of matching.
- **Incremental Offset** calculates the intervals using a **fixed offset between notes**, no matter their actual duration.  Use this to ignore passing tones or other ornaments.  The offsets are expressed in multiples of the quarter note (Offset = 1 samples at quarter note; Offset = 2 at half note, etc). Set with `vectors = IntervalBase(corpus.note_list_incremental_offset(2))`

In [27]:
vectors = IntervalBase(corpus.note_list)
#vectors = IntervalBase(corpus.note_list_incremental_offset(2))

***

## Select Generic or Semitone Scale:

- **Length of the Soggetto**: `into_patterns([vectors.semitone_intervals], 5)` 

- The **number** in this command represents the **minimum number of vectors to find**. 5 vectors is 6 notes.


In [28]:
patterns = into_patterns([vectors.generic_intervals], 5)
#patterns = into_patterns([vectors.semitone_intervals], 4)

***

## Select Exact Matches Here, or Close Below
#### (Use comment feature to select screen preview or CSV output) 

- **Exact** is exact in *all* ways `find_exact_matches(patterns, 2)` 
- The **number** in this command represents the **minimum number of matching melodies needed before reporting**. This allows us to filter for common or uncommon soggetti.

In [33]:
exact_matches = find_exact_matches(patterns, 2)
output_exact = export_pandas(exact_matches)
pd.DataFrame(output_exact).head()
output_exact
# output_exact["pattern_generating_match"] = output_exact["pattern_generating_match"].apply(tuple)

# results = pd.DataFrame(output_exact)
# results["pattern_generating_match"] = results["pattern_generating_match"].apply(tuple)
# results.head(10)
# export_to_csv(exact_matches)
output_exact.to_csv("Mass_0005_1_Exact_4.csv")

Finding exact matches...
49 melodic intervals had more than 2 exact matches.



***

### Select Close Matches Here
#### (Comment out the 'for item iteration' in order to skip screen preview)

- **Close** matches allow for melodic variation (see more below). `find_close_matches(patterns, 2, 1)`
- The **first number** in this command is the **minimum number of melodies** needed before reporting
- The **second number** is **threshold of similarity** needed in order to find a match. 
- Lower number = very similar; higher number = less similar

##### More about Close Matches  
- The **threshold for close matches** is determined by the **second number** called in the method. 
- We select two patterns, then compare *each vector in each pattern successively*. 
- The *differences between each vector are summed*. 
- If that value is **below the threshold specified**, we consider the **two patterns closely matched**.
- The format of the method call is  `find_close_matches(the array you get from into_patterns, minimum matches needed to be displayed, threshold for close match)`.

In [17]:
close_matches = find_close_matches(patterns, 4, 1)
#for item in close_matches:
   #item.print_close_matches()
    #return pd.DataFrame(close_matches)

output_close = export_pandas(close_matches)
output_close["pattern_generating_match"] = output_close["pattern_generating_match"].apply(tuple)

results = pd.DataFrame(output_close)
results["pattern_generating_match"] = results["pattern_generating_match"].apply(tuple)
results.head()

results.to_csv("Model_0001_Close_3_1")
# export_to_csv(close_matches)

Finding close matches...
65 melodic intervals had more than 4 exact or close matches.



***

## Classify Patterns Here 
#### Note:  depends on choice of Close or Exact above!  Must choose appropriate one below!
#### Enable "export_to_csv" line to allow this within Notebook (must answer "Y" and provide filename)

#### note that the classifier can only find patterns with AT LEAST THREE entries!  2-voice fugas are invisible at the moment.  Could be added!

In [164]:
%%capture
#cm = classify_matches(close_matches, 1)
cm = classify_matches(exact_matches, 1)
#pd.DataFrame(classified_matches)
output_cm = export_pandas(cm)

## For CSV export, use the following (and follow prompts for file name)
#export_to_csv(cm)

cm

In [163]:
def classified_matches_to_pandas(matches):
    
    soggetti_matches = []
    
    for i, cm in enumerate(matches):
        
        for j, soggetti in enumerate(cm.matches):
            
            soggetti_matches.append({
                "piece": soggetti.first_note.metadata.title,
                "type": cm.type,
                "part": soggetti.first_note.part.strip("[] "),
                "bar": soggetti.first_note.note.measureNumber,
                "offset": soggetti.first_note.note.offset,
                "entry_number": j + 1,
                "pattern": tuple(cm.pattern),
                "match_number": i + 1
            })
    return pd.DataFrame(soggetti_matches)
    

In [138]:
df = classified_matches_to_pandas(cm)
pd.set_option('display.max_rows', 50)
df_sorted = df.sort_values("offset")
df_sorted

Unnamed: 0,piece,type,part,bar,offset,entry_number,pattern,match_number
0,Ave Maria,periodic_entry,Superius,1,0.0,1,"(4, 1, 2, 2, -3)",1
1,Ave Maria,periodic_entry,Altus,3,16.0,2,"(4, 1, 2, 2, -3)",1
2,Ave Maria,periodic_entry,Tenor,5,32.0,3,"(4, 1, 2, 2, -3)",1
3,Ave Maria,periodic_entry,Bassus,7,48.0,4,"(4, 1, 2, 2, -3)",1
12,Ave Maria,fuga,Altus,33,256.0,1,"(1, 2, -2, -2, -2)",4
...,...,...,...,...,...,...,...,...
71,Ave Maria,imitative duo,Tenor,124,1048.0,4,"(-2, -2, -2, 1, 3)",19
75,Ave Maria,imitative duo,Tenor,124,1054.0,4,"(-2, -2, 1, 3, -2)",20
79,Ave Maria,imitative duo,Tenor,124,1055.0,4,"(-2, 1, 3, -2, -2)",21
83,Ave Maria,imitative duo,Tenor,125,1056.0,4,"(1, 3, -2, -2, -2)",22


In [139]:
#df["pattern"] = df["pattern"].apply(tuple)

wide_df = df.pivot_table(index=["match_number", "piece", "type", "pattern"],
            columns="entry_number",
            values=["part", "offset"],
            aggfunc=lambda x: x)

wide_df.columns = [f"{a}_{b}" for a, b in wide_df.columns]
wide_df = wide_df.sort_values("offset_1")
wide_df = wide_df.head().reset_index()
wide_df

Unnamed: 0,match_number,piece,type,pattern,offset_1,offset_2,offset_3,offset_4,part_1,part_2,part_3,part_4
0,1,Ave Maria,periodic_entry,"(4, 1, 2, 2, -3)",0.0,16.0,32.0,48.0,Superius,Altus,Tenor,Bassus
1,4,Ave Maria,fuga,"(1, 2, -2, -2, -2)",256.0,288.0,320.0,320.0,Altus,Bassus,Superius,Tenor
2,5,Ave Maria,imitative duo,"(4, -2, 2, 2, -3)",428.0,432.0,468.0,472.0,Superius,Altus,Tenor,Bassus
3,6,Ave Maria,imitative duo,"(-2, 2, 2, -3, -2)",432.0,436.0,472.0,476.0,Superius,Altus,Tenor,Bassus
4,2,Ave Maria,imitative duo,"(2, 2, -3, -2, -2)",438.0,442.0,478.0,482.0,Superius,Altus,Tenor,Bassus


## Improved Classifier 2021

### Filter for Overall Length of Presentation Type

In [7]:
# This function filters for the length of the Presentation Type in the Classifier

def limit_offset_size(array, limit):
    under_limit = np.cumsum(array) <= limit
    return array[: sum(under_limit)]


#### Functions to work with output of the original classifier

In [168]:
# the following configured to work with output of FG Classifier

import numpy as np

def lists_to_tuples(el):
    if isinstance(el, list):
        return tuple(el)
    else:
        return el

def classify_offsets(offset_difference_list):
    """
    Put logic for classifying an offset list here
    """
    offset_difference_list = limit_offset_size(offset_difference_list, 64)
    
    alt_list = offset_difference_list[::2]
    
    
    if len(set(offset_difference_list)) == 1 and len(offset_difference_list) > 1:
        return f"PEN {offset_difference_list}"
    elif (len(offset_difference_list) %2 != 0) and (len(set(alt_list)) == 1):
        return f"ID {offset_difference_list}"
    elif len(offset_difference_list) >= 1:
        return f"Fuga {offset_difference_list}"
    else: 
        return f"Singleton {offset_difference_list}"


def get_offset_difference_list(group):
    group = group.sort_values("offset")
    group["next_offset"] = group.offset.shift(-1)
    offset_difference_list = (group.next_offset - group.offset).dropna().tolist()
    return offset_difference_list
   
def predict_type(group):
    offset_differences = get_offset_difference_list(group)
    group["predicted_type"] = classify_offsets(offset_differences)
    return group


### Call Function to work with Output of Original Classifier


In [169]:

df = df_sorted.applymap(lists_to_tuples).groupby("pattern").apply(predict_type)[:20]
df


Unnamed: 0,piece,type,part,bar,offset,entry_number,pattern,match_number,predicted_type
0,Ave Maria,periodic_entry,Superius,1,0.0,1,"(4, 1, 2, 2, -3)",1,"PEN [16.0, 16.0, 16.0]"
1,Ave Maria,periodic_entry,Altus,3,16.0,2,"(4, 1, 2, 2, -3)",1,"PEN [16.0, 16.0, 16.0]"
2,Ave Maria,periodic_entry,Tenor,5,32.0,3,"(4, 1, 2, 2, -3)",1,"PEN [16.0, 16.0, 16.0]"
3,Ave Maria,periodic_entry,Bassus,7,48.0,4,"(4, 1, 2, 2, -3)",1,"PEN [16.0, 16.0, 16.0]"
12,Ave Maria,fuga,Altus,33,256.0,1,"(1, 2, -2, -2, -2)",4,"Fuga [32.0, 32.0, 0.0]"
13,Ave Maria,fuga,Bassus,37,288.0,2,"(1, 2, -2, -2, -2)",4,"Fuga [32.0, 32.0, 0.0]"
14,Ave Maria,fuga,Superius,41,320.0,3,"(1, 2, -2, -2, -2)",4,"Fuga [32.0, 32.0, 0.0]"
15,Ave Maria,fuga,Tenor,41,320.0,4,"(1, 2, -2, -2, -2)",4,"Fuga [32.0, 32.0, 0.0]"
16,Ave Maria,imitative duo,Superius,54,428.0,1,"(4, -2, 2, 2, -3)",5,"ID [4.0, 36.0, 4.0]"
17,Ave Maria,imitative duo,Altus,55,432.0,2,"(4, -2, 2, 2, -3)",5,"ID [4.0, 36.0, 4.0]"


### Function to work with output of Close or Exact Matches

In [12]:
# THIS IS DEV COPY for use with CLOSE/EXACT Matches

import numpy as np

def lists_to_tuples_a(el):
    if isinstance(el, list):
        return tuple(el)
    else:
        return el

def classify_offsets_a(offset_difference_list):
    """
    Put logic for classifying an offset list here
    """
    offset_difference_list = limit_offset_size(offset_difference_list, 64)
    
    alt_list = offset_difference_list[::2]
    
    
    if len(set(offset_difference_list)) == 1 and len(offset_difference_list) > 1:
        return "PEN", offset_difference_list
    elif (len(offset_difference_list) %2 != 0) and (len(set(alt_list)) == 1):
        return "ID", offset_difference_list
    elif len(offset_difference_list) >= 1:
        return "Fuga", offset_difference_list
    else: 
        return "Singleton", offset_difference_list

    
    # if len(set(offset_difference_list)) == 1 and len(offset_difference_list) > 1:
    #     return f"PEN {offset_difference_list}"
    # elif (len(offset_difference_list) %2 != 0) and (len(set(alt_list)) == 1):
    #     return f"ID {offset_difference_list}"
    # else:
    #      return f"Fuga {offset_difference_list}"

def get_offset_difference_list_a(group):
    group = group.sort_values("start_offset")
    group["next_offset"] = group.start_offset.shift(-1)
    offset_difference_list = (group.next_offset - group.start_offset).dropna().tolist()
    return offset_difference_list
   
# def predict_type_a(group):
#     offset_differences = get_offset_difference_list_a(group)
#     group["predicted_type"] = classify_offsets_a(offset_differences)
#     return group

def predict_type_a(group):
    offset_differences = get_offset_difference_list_a(group)
    # group["predicted_type"] = classify_offsets_a(offset_differences)
    predicted_type, offsets = classify_offsets_a(offset_differences)

    group["predicted_type"] = [predicted_type for i in range(len(group))]
    group["offset_diffs"] = [offsets for i in range(len(group))]
    group["entry_number"] = [i + 1 for i in range(len(group))]

    return group

### Call Function to Classify with Close/Exact Matches
#### use 'output.close' or 'output.exact'

In [13]:
dff = output_exact.applymap(lists_to_tuples_a).groupby("pattern_generating_match").apply(predict_type_a)[:50]
# dff[["pattern_generating_match", "start_offset", "predicted_type"]]
dff


Unnamed: 0,pattern_generating_match,pattern_matched,piece_title,part,start_measure,start_beat,end_measure,end_beat,start_offset,end_offset,note_durations,ema,ema_url,predicted_type,offset_diffs,entry_number
0,"(4, 1, 2, 2, -3)","(4, 1, 2, 2, -3)",Ave Maria,[Superius],1,1.0,4,1.0,0.0,24.0,"(4.0, 8.0, 4.0, 4.0, 4.0, 8.0)","1-4/1/@1.0-end,@start-end,@start-end,@start-1.0",https://ema.crimproject.org/https%3A%2F%2Fcrim...,PEN,"[16.0, 16.0, 16.0]",1
1,"(4, 1, 2, 2, -3)","(4, 1, 2, 2, -3)",Ave Maria,[Superius],105,3.0,107,3.5,884.0,910.0,"(4.0, 8.0, 4.0, 4.0, 6.0, 2.0)","105-107/1/@3.0-end,@start-end,@start-3.5",https://ema.crimproject.org/https%3A%2F%2Fcrim...,PEN,"[16.0, 16.0, 16.0]",2
2,"(4, 1, 2, 2, -3)","(4, 1, 2, 2, -3)",Ave Maria,Altus,3,1.0,6,1.0,16.0,40.0,"(4.0, 8.0, 4.0, 4.0, 4.0, 8.0)","3-6/2/@1.0-end,@start-end,@start-end,@start-1.0",https://ema.crimproject.org/https%3A%2F%2Fcrim...,PEN,"[16.0, 16.0, 16.0]",3
3,"(4, 1, 2, 2, -3)","(4, 1, 2, 2, -3)",Ave Maria,Tenor,5,1.0,8,1.0,32.0,56.0,"(4.0, 8.0, 4.0, 4.0, 4.0, 8.0)","5-8/3/@1.0-end,@start-end,@start-end,@start-1.0",https://ema.crimproject.org/https%3A%2F%2Fcrim...,PEN,"[16.0, 16.0, 16.0]",4
4,"(4, 1, 2, 2, -3)","(4, 1, 2, 2, -3)",Ave Maria,Bassus,7,1.0,10,1.0,48.0,72.0,"(4.0, 8.0, 4.0, 4.0, 4.0, 8.0)","7-10/4/@1.0-end,@start-end,@start-end,@start-1.0",https://ema.crimproject.org/https%3A%2F%2Fcrim...,PEN,"[16.0, 16.0, 16.0]",5
5,"(-2, -2, -2, 2, -2)","(-2, -2, -2, 2, -2)",Ave Maria,[Superius],8,1.0,10,2.0,56.0,74.0,"(6.0, 2.0, 4.0, 4.0, 2.0, 2.0)","8-10/1/@1.0-end,@start-end,@start-2.0",https://ema.crimproject.org/https%3A%2F%2Fcrim...,PEN,"[16.0, 16.0]",1
6,"(-2, -2, -2, 2, -2)","(-2, -2, -2, 2, -2)",Ave Maria,Altus,10,1.0,12,2.5,72.0,91.0,"(6.0, 2.0, 4.0, 4.0, 3.0, 1.0)","10-12/2/@1.0-end,@start-end,@start-2.5",https://ema.crimproject.org/https%3A%2F%2Fcrim...,PEN,"[16.0, 16.0]",2
7,"(-2, -2, -2, 2, -2)","(-2, -2, -2, 2, -2)",Ave Maria,Tenor,12,1.0,14,2.0,88.0,106.0,"(6.0, 2.0, 4.0, 4.0, 2.0, 2.0)","12-14/3/@1.0-end,@start-end,@start-2.0",https://ema.crimproject.org/https%3A%2F%2Fcrim...,PEN,"[16.0, 16.0]",3
8,"(-2, -2, -2, 2, -2)","(-2, -2, -2, 2, -2)",Ave Maria,Bassus,139,4.0,141,1.0,1174.0,1184.0,"(3.0, 1.0, 1.0, 1.0, 4.0, 8.0)","139-141/4/@4.0-end,@start-end,@start-1.0",https://ema.crimproject.org/https%3A%2F%2Fcrim...,PEN,"[16.0, 16.0]",4
9,"(1, 1, 2, 2, -3)","(1, 1, 2, 2, -3)",Ave Maria,[Superius],16,3.0,19,1.0,124.0,144.0,"(6.0, 2.0, 4.0, 4.0, 4.0, 3.0)","16-19/1/@3.0-end,@start-end,@start-end,@start-1.0",https://ema.crimproject.org/https%3A%2F%2Fcrim...,PEN,"[16.0, 16.0, 16.0]",1


In [None]:
[]

res_list = [i for i, value in enumerate(test_list) if value == 3] 


In [14]:
wide_df = dff.pivot_table(index=["piece_title", "pattern_matched", "predicted_type"],
            columns=["part"],
            values=["start_offset"],
            )

wide_df.columns = [f"{a}_{b}" for a, b in wide_df.columns]
wide_df = wide_df.sort_values("start_offset_[Superius]")
wide_df = wide_df.reset_index()

wide_df.head(50)

Unnamed: 0,piece_title,pattern_matched,predicted_type,start_offset_Altus,start_offset_Bassus,start_offset_Tenor,start_offset_[Superius]
0,Ave Maria,"(-2, -2, -2, 2, -2)",PEN,72.0,1174.0,88.0,56.0
1,Ave Maria,"(1, 1, 2, 2, -3)",PEN,140.0,172.0,156.0,124.0
2,Ave Maria,"(2, 2, 1, 2, -2)",ID,754.666667,280.0,280.0,248.0
3,Ave Maria,"(1, 2, -2, 1, -2)",ID,624.0,648.0,288.0,256.0
4,Ave Maria,"(2, -2, 1, -2, 2)",Fuga,694.666667,652.0,292.0,260.0
5,Ave Maria,"(2, 2, -3, -2, -2)",Singleton,442.0,482.0,478.0,285.0
6,Ave Maria,"(-2, -2, -2, -2, -2)",ID,,,216.0,324.666667
7,Ave Maria,"(4, 1, 2, 2, -3)",PEN,16.0,48.0,32.0,442.0
8,Ave Maria,"(1, 2, -2, -2, -2)",Fuga,,,,645.333333
9,Ave Maria,"(1, 1, 2, -2, -2)",ID,768.0,,772.0,731.0


# Durational Ratios

#### This Function Calculates the Ratios of the Durations in each Match

In [140]:
# makes pairs of ratio strings

def pairwise(iterable):
    "s -> (s0,s1), (s1,s2), (s2, s3), ..."
    a, b = tee(iterable)
    next(b, None)
    return zip(a, b)

def get_ratios(input_list):
    ratio_pairs = []
    for a, b in pairwise(input_list):
        ratio_pairs.append(b / a)
    return ratio_pairs


#### Now call the function to operate on the RESULTS file from earlier

In [141]:
# calculates 'duration ratios' for each soggetto, then adds this to the DF

results["duration_ratios"] = results.note_durations.apply(get_ratios)
short_results = results.drop(columns=["ema_url", "ema"])
short_results.head()

Unnamed: 0,pattern_generating_match,pattern_matched,piece_title,part,start_measure,start_beat,end_measure,end_beat,start_offset,end_offset,note_durations,duration_ratios
0,"(4, 1, 2, 2, -3)","[4, 1, 2, 2, -3]",Ave Maria,[Superius],1,1.0,4,1.0,0.0,24.0,"[4.0, 8.0, 4.0, 4.0, 4.0, 8.0]","[2.0, 0.5, 1.0, 1.0, 2.0]"
1,"(4, 1, 2, 2, -3)","[4, 1, 2, 2, -3]",Ave Maria,[Superius],105,3.0,107,3.5,884.0,910.0,"[4.0, 8.0, 4.0, 4.0, 6.0, 2.0]","[2.0, 0.5, 1.0, 1.5, 0.3333333333333333]"
2,"(4, 1, 2, 2, -3)","[4, 1, 2, 2, -3]",Ave Maria,Altus,3,1.0,6,1.0,16.0,40.0,"[4.0, 8.0, 4.0, 4.0, 4.0, 8.0]","[2.0, 0.5, 1.0, 1.0, 2.0]"
3,"(4, 1, 2, 2, -3)","[4, 1, 2, 2, -3]",Ave Maria,Tenor,5,1.0,8,1.0,32.0,56.0,"[4.0, 8.0, 4.0, 4.0, 4.0, 8.0]","[2.0, 0.5, 1.0, 1.0, 2.0]"
4,"(4, 1, 2, 2, -3)","[4, 1, 2, 2, -3]",Ave Maria,Bassus,7,1.0,10,1.0,48.0,72.0,"[4.0, 8.0, 4.0, 4.0, 4.0, 8.0]","[2.0, 0.5, 1.0, 1.0, 2.0]"


In [161]:
df_s = df[["piece_title", "pattern_generating_match", "part", "start_measure", "start_offset", "predicted_type"]]
df_s.head(20)

Unnamed: 0,piece_title,pattern_generating_match,part,start_measure,start_offset,predicted_type
0,Ave Maria,"(4, 1, 2, 2, -3)",[Superius],1,0.0,"Fuga [16.0, 16.0, 16.0, 836.0]"
1,Ave Maria,"(4, 1, 2, 2, -3)",[Superius],105,884.0,"Fuga [16.0, 16.0, 16.0, 836.0]"
2,Ave Maria,"(4, 1, 2, 2, -3)",Altus,3,16.0,"Fuga [16.0, 16.0, 16.0, 836.0]"
3,Ave Maria,"(4, 1, 2, 2, -3)",Tenor,5,32.0,"Fuga [16.0, 16.0, 16.0, 836.0]"
4,Ave Maria,"(4, 1, 2, 2, -3)",Bassus,7,48.0,"Fuga [16.0, 16.0, 16.0, 836.0]"
5,Ave Maria,"(-2, -2, -2, 2, -2)",[Superius],8,56.0,"Fuga [16.0, 16.0, 634.0]"
6,Ave Maria,"(-2, -2, -2, 2, -2)",Altus,10,72.0,"Fuga [16.0, 16.0, 634.0]"
7,Ave Maria,"(-2, -2, -2, 2, -2)",Altus,132,1109.0,"Fuga [16.0, 16.0, 634.0]"
8,Ave Maria,"(-2, -2, -2, 2, -2)",Tenor,12,88.0,"Fuga [16.0, 16.0, 634.0]"
9,Ave Maria,"(-2, -2, -2, 2, -2)",Bassus,91,722.0,"Fuga [16.0, 16.0, 634.0]"


## Group by the Pattern Generating Match
- Each has its own string of durations, and duration ratios
- and then we compare the ratios to get the differences
- the "list(combinations)" method takes care of building the pairs, using data from our dataframe 'results'

In [149]:
def compare_ratios(ratios_1, ratios_2):
    
    ## division of lists 
    # using zip() + list comprehension 
    diffs = [i - j for i, j in zip(ratios_1, ratios_2)] 
    abs_diffs = [abs(ele) for ele in diffs] 
    sum_diffs = sum(abs_diffs)

    return sum_diffs

#results["Pattern_Generating_Match"] = results["Pattern_Generating_Match"].apply(tuple) 

def get_ratio_distances(results, pattern_col, output_cols):
    
    matches = []

    for name, group in results.groupby(pattern_col):

        ratio_pairs = list(combinations(group.index.values, 2))

        for a, b in ratio_pairs:
            
            a_match = results.loc[a]
            b_match = results.loc[b]
            
            sum_diffs = compare_ratios(a_match.duration_ratios, b_match.duration_ratios)
            
            match_dict = {
                "pattern": name,
                "sum_diffs": sum_diffs
            }
            
            for col in output_cols:
                match_dict.update({
                    f"match_1_{col}": a_match[col],
                    f"match_2_{col}": b_match[col]
                })
                
            matches.append(match_dict)
            
    return pd.DataFrame(matches)

### Now Run the Function to get the 'edit distances' for the durations of matching patterns

In [151]:
ratio_distances = get_ratio_distances(results, "pattern_generating_match", ["piece_title", "part", "start_measure", "start_offset"])
ratio_distances

Unnamed: 0,pattern,sum_diffs,match_1_piece_title,match_2_piece_title,match_1_part,match_2_part,match_1_start_measure,match_2_start_measure,match_1_start_offset,match_2_start_offset
0,"(-5, 1, 2, 1, -5)",0.000000,Ave Maria,Ave Maria,Bassus,Bassus,94,98,756.0,804.0
1,"(-3, -2, -2, -2, -2)",5.583333,Ave Maria,Ave Maria,[Superius],[Superius],18,19,140.0,144.0
2,"(-3, -2, -2, -2, -2)",7.583333,Ave Maria,Ave Maria,[Superius],[Superius],18,20,140.0,158.0
3,"(-3, -2, -2, -2, -2)",5.416667,Ave Maria,Ave Maria,[Superius],[Superius],18,85,140.0,672.0
4,"(-3, -2, -2, -2, -2)",3.750000,Ave Maria,Ave Maria,[Superius],Altus,18,131,140.0,1104.0
...,...,...,...,...,...,...,...,...,...,...
4082,"(6, -2, -2, -2, -2)",5.416667,Ave Maria,Ave Maria,[Superius],Bassus,20,90,156.0,716.0
4083,"(6, -2, -2, -2, -2)",6.583333,Ave Maria,Ave Maria,Altus,Bassus,27,88,211.0,700.0
4084,"(6, -2, -2, -2, -2)",6.416667,Ave Maria,Ave Maria,Altus,Bassus,27,90,211.0,716.0
4085,"(6, -2, -2, -2, -2)",0.833333,Ave Maria,Ave Maria,Bassus,Bassus,88,90,700.0,716.0


### And FILTER the results according to any threshold we like

In [152]:
ratios_filtered = ratio_distances[ratio_distances.sum_diffs <= 1]
ratios_filtered

Unnamed: 0,pattern,sum_diffs,match_1_piece_title,match_2_piece_title,match_1_part,match_2_part,match_1_start_measure,match_2_start_measure,match_1_start_offset,match_2_start_offset
0,"(-5, 1, 2, 1, -5)",0.000000,Ave Maria,Ave Maria,Bassus,Bassus,94,98,756.0,804.0
16,"(-3, -2, -2, 1, -2)",0.000000,Ave Maria,Ave Maria,Altus,Bassus,57,62,448.0,488.0
17,"(-3, -2, -2, 1, 1)",0.000000,Ave Maria,Ave Maria,[Superius],Tenor,56,61,444.0,484.0
31,"(-3, 1, 2, -3, 1)",0.000000,Ave Maria,Ave Maria,[Superius],Altus,73,70,582.0,558.0
34,"(-3, 1, 2, -3, 2)",0.000000,Ave Maria,Ave Maria,[Superius],Altus,73,70,582.0,558.0
...,...,...,...,...,...,...,...,...,...,...
4067,"(5, -2, -2, -2, -2)",0.833333,Ave Maria,Ave Maria,Bassus,Bassus,88,90,700.0,716.0
4069,"(5, -2, -2, 2, 2)",0.000000,Ave Maria,Ave Maria,[Superius],Tenor,10,14,74.0,106.0
4077,"(5, -2, -2, 2, 2)",0.000000,Ave Maria,Ave Maria,Bassus,Bassus,116,124,988.0,1052.0
4079,"(5, 1, 2, -2, 1)",0.000000,Ave Maria,Ave Maria,Altus,Bassus,78,81,622.0,646.0


### Now Group the Duration-Filter Results by the Pattern (which shows us very closely related soggetti in sets)

In [153]:
grouped = ratios_filtered.groupby("pattern")
len(grouped['pattern'].nunique())

152

In [None]:
ratios_filtered.to_csv("filtered_sample_pair.csv")

### Greedy Soggetti
* Groups by voice part
* Gets one voice, then creates shifted cols to contain current+previous intervals and durations
* slices each tuple: removes first from previous match and last from current match
* if these are identical, then we can merge the two soggetti

In [154]:
results["pattern_matched"] = results["pattern_matched"].apply(tuple)
# results_s = results.drop(columns=["ema_url", "ema", "duration_ratios", "pattern_generating_match"])
# # #results_s["group_number"] = group_number
# # results_grouped = results_s.groupby(by=["piece_title", "part"])
# # results_grouped.sort_values("start_measure")
# # results_grouped.head()
# results_shifted = results_s.groupby(["part"]).shift(1)
# results_shifted.head()


In [155]:
# function to group by piece and part, then add shfted columns to accept Greedy Data
# the 'df: pd.DataFrame' here is a way of specifying the type that ought to go here
# it's a form of annotation or 'typing'

def add_shifted_cols(df: pd.DataFrame,
                    group_cols: list,
                    shift_cols: list,
                    shift_periods=-1,
                    shifted_prefix="next"
                    ) -> pd.DataFrame:
    
    
    df = df.copy()

    df_shifted = df.groupby(group_cols).shift(shift_periods)

    df[[ f"{shifted_prefix}_{c}" for c in shift_cols]] = df_shifted[shift_cols]

    return df

In [156]:
# Sequence here helps us deal with tuples in the data, slicing as needed the various lists of vectors and durations

from typing import Sequence

def add_subsequence_cols(df: pd.DataFrame, this_sequence_cols: Sequence, next_sequence_cols: Sequence) -> pd.DataFrame:

    df = df.copy()

    for col in this_sequence_cols:
            df[f"{col}_short"] = df[col].dropna().apply(lambda x: x[1:]) 
    
    for col in next_sequence_cols:
            df[f"{col}_short"] = df[col].dropna().apply(lambda x: x[:-1])
          

    return df



In [157]:
# here we call the function to add the cols

df_shifted = add_shifted_cols(results,
                    group_cols=["piece_title", "part"],
                    shift_cols=["start_measure", "end_measure", "note_durations", "pattern_matched"])

df_brief = df_shifted.sort_values("start_measure").drop(columns=["ema", "ema_url", "pattern_generating_match"])
df_brief

Unnamed: 0,pattern_matched,piece_title,part,start_measure,start_beat,end_measure,end_beat,start_offset,end_offset,note_durations,duration_ratios,next_start_measure,next_end_measure,next_note_durations,next_pattern_matched
0,"(4, 1, 2, 2, -3)",Ave Maria,[Superius],1,1.0,4,1.0,0.0,24.0,"[4.0, 8.0, 4.0, 4.0, 4.0, 8.0]","[2.0, 0.5, 1.0, 1.0, 2.0]",105.0,107.0,"[4.0, 8.0, 4.0, 4.0, 6.0, 2.0]","(4, 1, 2, 2, -3)"
2,"(4, 1, 2, 2, -3)",Ave Maria,Altus,3,1.0,6,1.0,16.0,40.0,"[4.0, 8.0, 4.0, 4.0, 4.0, 8.0]","[2.0, 0.5, 1.0, 1.0, 2.0]",10.0,12.0,"[6.0, 2.0, 4.0, 4.0, 3.0, 1.0]","(-2, -2, -2, 2, -2)"
3,"(4, 1, 2, 2, -3)",Ave Maria,Tenor,5,1.0,8,1.0,32.0,56.0,"[4.0, 8.0, 4.0, 4.0, 4.0, 8.0]","[2.0, 0.5, 1.0, 1.0, 2.0]",12.0,14.0,"[6.0, 2.0, 4.0, 4.0, 2.0, 2.0]","(-2, -2, -2, 2, -2)"
4,"(4, 1, 2, 2, -3)",Ave Maria,Bassus,7,1.0,10,1.0,48.0,72.0,"[4.0, 8.0, 4.0, 4.0, 4.0, 8.0]","[2.0, 0.5, 1.0, 1.0, 2.0]",91.0,92.0,"[4.0, 1.0, 1.0, 2.0, 2.0, 4.0]","(-2, -2, -2, 3, -2)"
1032,"(-2, -2, -2, 2, -2)",Ave Maria,[Superius],8,1.0,10,2.0,56.0,74.0,"[6.0, 2.0, 4.0, 4.0, 2.0, 2.0]","[0.3333333333333333, 2.0, 1.0, 0.5, 1.0]",134.0,137.0,"[4.0, 4.0, 6.0, 6.0, 6.0, 6.0]","(-2, -2, 2, -2, -2)"
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
617,"(-2, -2, -2, 2, 1)",Ave Maria,[Superius],148,3.0,152,3.0,1236.0,1284.0,"[4.0, 4.0, 8.0, 16.0, 16.0, 16.0]","[1.0, 2.0, 2.0, 1.0, 1.0]",32.0,34.0,"[4.0, 4.0, 4.0, 2.0, 3.0, 1.0]","(2, 2, 1, 2, -2)"
817,"(-2, 2, 2, -2, 1)",Ave Maria,Tenor,148,1.0,152,5.0,1240.0,1288.0,"[4.0, 4.0, 8.0, 16.0, 16.0, 16.0]","[1.0, 2.0, 2.0, 1.0, 1.0]",37.0,39.0,"[2.0, 3.0, 1.0, 4.0, 2.0, 8.0]","(2, -2, 1, -2, 2)"
321,"(-2, -2, -2, 2, 1)",Ave Maria,[Superius],148,3.0,152,3.0,1236.0,1284.0,"[4.0, 4.0, 8.0, 16.0, 16.0, 16.0]","[1.0, 2.0, 2.0, 1.0, 1.0]",51.0,52.0,"[1.0, 1.0, 1.0, 2.0, 3.0, 1.0]","(-2, -2, 2, 2, -2)"
1070,"(-2, 2, 2, -2, 1)",Ave Maria,Tenor,148,1.0,152,5.0,1240.0,1288.0,"[4.0, 4.0, 8.0, 16.0, 16.0, 16.0]","[1.0, 2.0, 2.0, 1.0, 1.0]",44.0,46.0,"[2.0, 4.0, 4.0, 4.0, 4.0, 4.0]","(2, 2, 2, -3, 2)"


In [158]:
# And here we call the function to slice and enter the data

df_brief = add_subsequence_cols(df_shifted, 
    this_sequence_cols=["pattern_matched", "note_durations"], 
    next_sequence_cols=["next_pattern_matched", "next_note_durations"]
    )
df_brief.drop(columns=["ema", "ema_url", "pattern_generating_match", "duration_ratios"])

Unnamed: 0,pattern_matched,piece_title,part,start_measure,start_beat,end_measure,end_beat,start_offset,end_offset,note_durations,next_start_measure,next_end_measure,next_note_durations,next_pattern_matched,pattern_matched_short,note_durations_short,next_pattern_matched_short,next_note_durations_short
0,"(4, 1, 2, 2, -3)",Ave Maria,[Superius],1,1.0,4,1.0,0.0,24.0,"[4.0, 8.0, 4.0, 4.0, 4.0, 8.0]",105.0,107.0,"[4.0, 8.0, 4.0, 4.0, 6.0, 2.0]","(4, 1, 2, 2, -3)","(1, 2, 2, -3)","[8.0, 4.0, 4.0, 4.0, 8.0]","(4, 1, 2, 2)","[4.0, 8.0, 4.0, 4.0, 6.0]"
1,"(4, 1, 2, 2, -3)",Ave Maria,[Superius],105,3.0,107,3.5,884.0,910.0,"[4.0, 8.0, 4.0, 4.0, 6.0, 2.0]",8.0,10.0,"[6.0, 2.0, 4.0, 4.0, 2.0, 2.0]","(-2, -2, -2, 2, -2)","(1, 2, 2, -3)","[8.0, 4.0, 4.0, 6.0, 2.0]","(-2, -2, -2, 2)","[6.0, 2.0, 4.0, 4.0, 2.0]"
2,"(4, 1, 2, 2, -3)",Ave Maria,Altus,3,1.0,6,1.0,16.0,40.0,"[4.0, 8.0, 4.0, 4.0, 4.0, 8.0]",10.0,12.0,"[6.0, 2.0, 4.0, 4.0, 3.0, 1.0]","(-2, -2, -2, 2, -2)","(1, 2, 2, -3)","[8.0, 4.0, 4.0, 4.0, 8.0]","(-2, -2, -2, 2)","[6.0, 2.0, 4.0, 4.0, 3.0]"
3,"(4, 1, 2, 2, -3)",Ave Maria,Tenor,5,1.0,8,1.0,32.0,56.0,"[4.0, 8.0, 4.0, 4.0, 4.0, 8.0]",12.0,14.0,"[6.0, 2.0, 4.0, 4.0, 2.0, 2.0]","(-2, -2, -2, 2, -2)","(1, 2, 2, -3)","[8.0, 4.0, 4.0, 4.0, 8.0]","(-2, -2, -2, 2)","[6.0, 2.0, 4.0, 4.0, 2.0]"
4,"(4, 1, 2, 2, -3)",Ave Maria,Bassus,7,1.0,10,1.0,48.0,72.0,"[4.0, 8.0, 4.0, 4.0, 4.0, 8.0]",91.0,92.0,"[4.0, 1.0, 1.0, 2.0, 2.0, 4.0]","(-2, -2, -2, 3, -2)","(1, 2, 2, -3)","[8.0, 4.0, 4.0, 4.0, 8.0]","(-2, -2, -2, 3)","[4.0, 1.0, 1.0, 2.0, 2.0]"
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1091,"(2, 2, -3, 1, -2)",Ave Maria,[Superius],106,3.0,108,3.0,896.0,920.0,"[4.0, 4.0, 6.0, 2.0, 8.0, 4.0]",,,,,"(2, -3, 1, -2)","[4.0, 6.0, 2.0, 8.0, 4.0]",,
1092,"(2, 2, -3, 3, -2)",Ave Maria,Tenor,21,3.0,23,3.5,164.0,181.0,"[4.0, 4.0, 4.0, 2.0, 3.0, 1.0]",81.0,83.0,"[2.0, 2.0, 4.0, 2.0, 4.0, 2.0]","(1, 2, -3, 2, -2)","(2, -3, 3, -2)","[4.0, 4.0, 2.0, 3.0, 1.0]","(1, 2, -3, 2)","[2.0, 2.0, 4.0, 2.0, 4.0]"
1093,"(1, 2, -3, 2, -2)",Ave Maria,Tenor,81,3.0,83,2.0,644.0,658.0,"[2.0, 2.0, 4.0, 2.0, 4.0, 2.0]",,,,,"(2, -3, 2, -2)","[2.0, 4.0, 2.0, 4.0, 2.0]",,
1094,"(2, 2, -4, 2, -2)",Ave Maria,Bassus,131,3.5,133,1.0,1109.0,1120.0,"[1.0, 2.0, 2.0, 2.0, 4.0, 8.0]",138.0,140.0,"[1.0, 2.0, 2.0, 2.0, 3.0, 1.0]","(2, 2, -3, 2, -2)","(2, -4, 2, -2)","[2.0, 2.0, 2.0, 4.0, 8.0]","(2, 2, -3, 2)","[1.0, 2.0, 2.0, 2.0, 3.0]"


In [None]:
# Now  filter previous results to make sure the melodic and rhythmic vectors match for each sequence

df_filter_durs = df_brief[df_brief["note_durations_short"] == df_brief["next_note_durations_short"]]
df_filter_patts = df_filter_durs[df_filter_durs["pattern_matched_short"] == df_filter_durs["next_pattern_matched_short"]]
df_filter_patts.count()

In [None]:
# This is just a way to inspect a single voice

df_filter_one_voice = df_filter_patts[df_filter_patts["part"].str.contains('Contratenor')]
df_filter_one_voice.head()

In [None]:
# Make sure there are no NaN values 

# Add new column for check of matching end measures +1 (which we do by subtracting 1, then checking for ==)
# combine the results:  these are the soggetti that need to get Greedy

df_filter_patts['next_end_measure'].fillna(0)
df_filter_patts['next_end_measure_minus'] = df_filter_patts['next_end_measure'].apply(lambda x: x-1)
df_test_1 = df_filter_patts[df_filter_patts["end_measure"] == df_filter_patts["next_end_measure"]]
df_test_2 = df_filter_patts[df_filter_patts["end_measure"] == df_filter_patts["next_end_measure_minus"]]
df_test_combined = pd.concat([df_test_1, df_test_2])
df_test_combined
