# About Crim Intervals 

- See more at https://github.com/HCDigitalScholarship/intervals/blob/master/README.md

## How to Select Pieces

- For local file(s), use: `corpus = CorpusBase(['/Users/rfreedma/MEI/CRIM_Intervals_Tests/Brumel_Complete.mei'])`
- For remote file, use: `corpus = CorpusBase(['https://crimproject.org/mei/CRIM_Model_0008.mei', 'https://crimproject.org/mei/CRIM_Mass_0005_5.mei'])`


## Notes about the Various Parameters

- **Length of the Soggetto**: into_patterns([vectors.semitone_intervals], 5) The **number** in this command represents the **minimum number of vectors to find**. 5 vectors is 6 notes.

### Chromatic vs Diatonic 
- **Chromatic** uses `into_patterns([vectors.semitone_intervals], 5)`
- **Diatonic** uses `into_patterns([vectors.generic_intervals], 5)`

### Exact vs Close  
- **Exact** is exact in *all* ways find_exact_matches(patterns, 2). The **number** in this command represents the **minimum number of matching melodies needed before reporting**. This allows us to filter for common or uncommon soggetti.
- **Close** matches allow for melodic variation (see more below). `find_close_matches(patterns, 2, 1)`
. The **first number** in this command is the minimum number of melodies needed before reporting; the **second number** is threshold needed in order to find a match. Lower number = very similar; higher number = less similar

### More about Close Matches  
- The **threshold for close matches** is determined by the third number called in the method. We select two patterns, then compare *each vector in each pattern successively*. The "differences" between each vector are summed. If that value is below the threshold specified, we consider the two patterns closely matched.
- The format of the method call is  `find_close_matches(the array you get from into_patterns, minimum matches needed to be displayed, threshold for close match)`.

### About Rhythmic Durations

- For `find_close_matches` and `find_exact_matches`, rhythmic variation/duration is displayed, but **not** factored into the calculation of matching.
- **Incremental Offset** calculates the intervals using a fixed offset between notes, no matter their actual duration.  Use this to ignore passing tones or other ornaments.  The offsets are expressed in multiples of the quarter note (Offset = 1 samples at quarter note; Offset = 2 at half note, etc). Set with `vectors = IntervalBase(corpus.note_list_incremental_offset(2))`

### Dataframe Preview vs Full Results in Browser

- Comment out the first of these for preview only.  Include for full results

    - `#pd.set_option("display.max_rows", None, "display.max_columns", None)`
    - `return pd.DataFrame(match_data)`
    
### To Set CSV Output

- Specify file name here `pd.Series(match_data).to_csv("Match_data_10_13.csv")`

***

## Search with Options

- Piece or Corpus
- Actual or Incremental Durations
- Chromatic or Diatonic
- Exact or Close
- Classify

***


### Load Crim Intervals and Pandas

In [1]:
from crim_intervals import *
import pandas as pd
import ast
import matplotlib


### Selected Piece or Corpus 

A Note About Local Files
Use /content/your_file_name.mei for any local file you've uploaded

Otherwise provide URL

To load CRIM files directly from Git:

### The Complete Corpus

In [None]:
work_list = ['CRIM_Mass_0001_1.mei',
 'CRIM_Mass_0001_2.mei',
 'CRIM_Mass_0001_3.mei',
 'CRIM_Mass_0001_4.mei',
 'CRIM_Mass_0001_5.mei',
 'CRIM_Mass_0002_1.mei',
 'CRIM_Mass_0002_2.mei',
 'CRIM_Mass_0002_3.mei',
 'CRIM_Mass_0002_4.mei',
 'CRIM_Mass_0002_5.mei',
 'CRIM_Mass_0003_1.mei',
 'CRIM_Mass_0003_2.mei',
 'CRIM_Mass_0003_3.mei',
 'CRIM_Mass_0003_4.mei',
 'CRIM_Mass_0003_5.mei',
 'CRIM_Mass_0004_1.mei',
 'CRIM_Mass_0004_2.mei',
 'CRIM_Mass_0004_3.mei',
 'CRIM_Mass_0004_4.mei',
 'CRIM_Mass_0004_5.mei',
 'CRIM_Mass_0005_1.mei',
 'CRIM_Mass_0005_2.mei',
 'CRIM_Mass_0005_3.mei',
 'CRIM_Mass_0005_4.mei',
 'CRIM_Mass_0005_5.mei',
 'CRIM_Mass_0006_1.mei',
 'CRIM_Mass_0006_2.mei',
 'CRIM_Mass_0006_3.mei',
 'CRIM_Mass_0006_4.mei',
 'CRIM_Mass_0006_5.mei',
 'CRIM_Mass_0007_1.mei',
 'CRIM_Mass_0007_2.mei',
 'CRIM_Mass_0007_3.mei',
 'CRIM_Mass_0007_4.mei',
 'CRIM_Mass_0007_5.mei',
 'CRIM_Mass_0008_1.mei',
 'CRIM_Mass_0008_2.mei',
 'CRIM_Mass_0008_3.mei',
 'CRIM_Mass_0008_4.mei',
 'CRIM_Mass_0008_5.mei',
 'CRIM_Mass_0009_1.mei',
 'CRIM_Mass_0009_2.mei',
 'CRIM_Mass_0009_3.mei',
 'CRIM_Mass_0009_4.mei',
 'CRIM_Mass_0009_5.mei',
 'CRIM_Mass_0010_1.mei',
 'CRIM_Mass_0010_2.mei',
 'CRIM_Mass_0010_3.mei',
 'CRIM_Mass_0010_4.mei',
 'CRIM_Mass_0010_5.mei',
 'CRIM_Mass_0011_1.mei',
 'CRIM_Mass_0011_2.mei',
 'CRIM_Mass_0011_3.mei',
 'CRIM_Mass_0011_4.mei',
 'CRIM_Mass_0011_5.mei',
 'CRIM_Mass_0012_1.mei',
 'CRIM_Mass_0012_2.mei',
 'CRIM_Mass_0012_3.mei',
 'CRIM_Mass_0012_4.mei',
 'CRIM_Mass_0012_5.mei',
 'CRIM_Mass_0013_1.mei',
 'CRIM_Mass_0013_2.mei',
 'CRIM_Mass_0013_3.mei',
 'CRIM_Mass_0013_4.mei',
 'CRIM_Mass_0013_5.mei',
 'CRIM_Mass_0014_1.mei',
 'CRIM_Mass_0014_2.mei',
 'CRIM_Mass_0014_3.mei',
 'CRIM_Mass_0014_4.mei',
 'CRIM_Mass_0014_5.mei',
 'CRIM_Mass_0015_1.mei',
 'CRIM_Mass_0015_2.mei',
 'CRIM_Mass_0015_3.mei',
 'CRIM_Mass_0015_4.mei',
 'CRIM_Mass_0015_5.mei',
 'CRIM_Mass_0016_1.mei',
 'CRIM_Mass_0016_2.mei',
 'CRIM_Mass_0016_3.mei',
 'CRIM_Mass_0016_4.mei',
 'CRIM_Mass_0016_5.mei',
 'CRIM_Mass_0017_1.mei',
 'CRIM_Mass_0017_2.mei',
 'CRIM_Mass_0017_3.mei',
 'CRIM_Mass_0017_4.mei',
 'CRIM_Mass_0017_5.mei',
 'CRIM_Mass_0018_1.mei',
 'CRIM_Mass_0018_2.mei',
 'CRIM_Mass_0018_3.mei',
 'CRIM_Mass_0018_4.mei',
 'CRIM_Mass_0018_5.mei',
 'CRIM_Mass_0019_1.mei',
 'CRIM_Mass_0019_2.mei',
 'CRIM_Mass_0019_3.mei',
 'CRIM_Mass_0019_4.mei',
 'CRIM_Mass_0019_5.mei',
 'CRIM_Mass_0020_1.mei',
 'CRIM_Mass_0020_2.mei',
 'CRIM_Mass_0020_3.mei',
 'CRIM_Mass_0020_4.mei',
 'CRIM_Mass_0020_5.mei',
 'CRIM_Mass_0021_1.mei',
 'CRIM_Mass_0021_2.mei',
 'CRIM_Mass_0021_3.mei',
 'CRIM_Mass_0021_4.mei',
 'CRIM_Mass_0021_5.mei',
 'CRIM_Mass_0022_2.mei',
 'CRIM_Model_0001.mei',
 'CRIM_Model_0008.mei',
 'CRIM_Model_0009.mei',
 'CRIM_Model_0010.mei',
 'CRIM_Model_0011.mei',
 'CRIM_Model_0012.mei',
 'CRIM_Model_0013.mei',
 'CRIM_Model_0014.mei',
 'CRIM_Model_0015.mei',
 'CRIM_Model_0016.mei',
 'CRIM_Model_0017.mei',
 'CRIM_Model_0019.mei',
 'CRIM_Model_0020.mei',
 'CRIM_Model_0021.mei',
 'CRIM_Model_0023.mei',
 'CRIM_Model_0025.mei',
 'CRIM_Model_0026.mei',
]

### Sample Pair of Mass+Model

In [2]:
work_list = ['CRIM_Model_0008.mei']

In [3]:
work_list = [el.replace("CRIM_", "https://crimproject.org/mei/MEI_4.0/CRIM_") for el in work_list]

In [None]:
#corpus = CorpusBase(['/Users/rfreedma/MEI/A_MEI_Tests/Sandrin_Doulce_RF_12_15_20.mei'])
#url = 'https://raw.githubusercontent.com/CRIM-Project/CRIM-online/master/crim/static/mei/MEI_4.0/'
#corpus_files = ['https://raw.githubusercontent.com/CRIM-Project/CRIM-online/master/crim/static/mei/MEI_4.0/CRIM_Model_0001.mei']
#corpus = CorpusBase(corpus_files)

In [4]:
corpus = CorpusBase(work_list)

Requesting file from https://crimproject.org/mei/MEI_4.0/CRIM_Model_0008.mei...
Successfully imported.


# Correct the MEI Metadata

Just as an experiment, we load the MEI files directly in order to adjust metadata. 

This routine fixes the title information.

These changes would be implemented in https://github.com/HCDigitalScholarship/intervals/blob/master/main_objs.py

In [5]:
import xml.etree.ElementTree as ET
import requests

MEINSURI = 'http://www.music-encoding.org/ns/mei'
MEINS = '{%s}' % MEINSURI

for i, path in enumerate(work_list):
    
    try:
        if path[0] == '/':
            mei_doc = ET.parse(path)
        else:
            mei_doc = ET.fromstring(requests.get(path).text)

      # Find the title from the MEI file and update the Music21 Score metadata
        title = mei_doc.find('mei:meiHead//mei:titleStmt/mei:title', namespaces={"mei": MEINSURI}).text
        print(path, title)
        corpus.scores[i].metadata.title = title
    except:
        continue

https://crimproject.org/mei/MEI_4.0/CRIM_Model_0008.mei Ave Maria


In [6]:
for s in corpus.scores:
    print(s.metadata.title)

Ave Maria


***

### Select Actual or Incremental Durations

#### About Rhythmic Durations

- For `find_close_matches` and `find_exact_matches`, rhythmic variation/duration is displayed, but **not** factored into the calculation of matching.
- **Incremental Offset** calculates the intervals using a **fixed offset between notes**, no matter their actual duration.  Use this to ignore passing tones or other ornaments.  The offsets are expressed in multiples of the quarter note (Offset = 1 samples at quarter note; Offset = 2 at half note, etc). Set with `vectors = IntervalBase(corpus.note_list_incremental_offset(2))`

In [7]:
vectors = IntervalBase(corpus.note_list)
#vectors = IntervalBase(corpus.note_list_incremental_offset(2))

***

### Select Generic or Semitone:

- **Length of the Soggetto**: `into_patterns([vectors.semitone_intervals], 5)` 

- The **number** in this command represents the **minimum number of vectors to find**. 5 vectors is 6 notes.


In [8]:
patterns = into_patterns([vectors.generic_intervals], 6)
#patterns = into_patterns([vectors.semitone_intervals], 4)

***

### Select Exact Matches Here
#### (Use comment feature to select screen preview or CSV output) 

- **Exact** is exact in *all* ways `find_exact_matches(patterns, 2)` 
- The **number** in this command represents the **minimum number of matching melodies needed before reporting**. This allows us to filter for common or uncommon soggetti.

In [10]:
exact_matches = find_exact_matches(patterns, 3)
# Use this for exact screen preview
#for item in exact_matches:
    #item.print_exact_matches()

output_exact = export_pandas(exact_matches)
pd.DataFrame(output_exact).head()
results = pd.DataFrame(output_exact)
results
#export_to_csv(exact_matches)

Finding exact matches...
24 melodic intervals had more than 3 exact matches.



Unnamed: 0,pattern_generating_match,pattern_matched,piece_title,part,start_measure,end_measure,note_durations,ema,ema_url
0,"[1, 2, -2, 1, -2, 2]","[1, 2, -2, 1, -2, 2]",Ave Maria,[Superius],33,35,"[4.0, 2.0, 3.0, 1.0, 4.0, 2.0, 8.0]","33-35/1/@1.0-end,@start-end,@start-1.0",https://ema.crimproject.org/https%3A%2F%2Fcrim...
1,"[1, 2, -2, 1, -2, 2]","[1, 2, -2, 1, -2, 2]",Ave Maria,Altus,79,81,"[4.0, 2.0, 3.0, 1.0, 4.0, 2.0, 8.0]","79-81/2/@1.0-end,@start-end,@start-1.0",https://ema.crimproject.org/https%3A%2F%2Fcrim...
2,"[1, 2, -2, 1, -2, 2]","[1, 2, -2, 1, -2, 2]",Ave Maria,Tenor,37,39,"[4.0, 2.0, 3.0, 1.0, 4.0, 2.0, 8.0]","37-39/3/@1.0-end,@start-end,@start-1.0",https://ema.crimproject.org/https%3A%2F%2Fcrim...
3,"[1, 2, -2, 1, -2, 2]","[1, 2, -2, 1, -2, 2]",Ave Maria,Bassus,82,84,"[4.0, 2.0, 3.0, 1.0, 4.0, 2.0, 8.0]","82-84/4/@1.0-end,@start-end,@start-1.0",https://ema.crimproject.org/https%3A%2F%2Fcrim...
4,"[1, 1, 2, -2, -2, -2]","[1, 1, 2, -2, -2, -2]",Ave Maria,[Superius],40,43,"[4.0, 4.0, 4.0, 6.0, 2.0, 4.0, 4.0]","40-43/1/@3.0-end,@start-end,@start-end,@start-3.0",https://ema.crimproject.org/https%3A%2F%2Fcrim...
...,...,...,...,...,...,...,...,...,...
108,"[2, 2, 1, 2, -2, -2]","[2, 2, 1, 2, -2, -2]",Ave Maria,Bassus,36,38,"[4.0, 4.0, 4.0, 2.0, 4.0, 2.0, 4.0]","36-38/4/@1.0-end,@start-end,@start-3.0",https://ema.crimproject.org/https%3A%2F%2Fcrim...
109,"[2, 1, 2, -2, -2, -2]","[2, 1, 2, -2, -2, -2]",Ave Maria,Altus,32,35,"[4.0, 4.0, 2.0, 4.0, 2.0, 4.0, 8.0]","32-35/2/@3.0-end,@start-end,@start-end,@start-1.0",https://ema.crimproject.org/https%3A%2F%2Fcrim...
110,"[2, 1, 2, -2, -2, -2]","[2, 1, 2, -2, -2, -2]",Ave Maria,Altus,116,117,"[2.0, 4.0, 2.0, 1.0, 1.0, 1.0, 1.0]","116-117/2/@3.0-end,@start-4.5",https://ema.crimproject.org/https%3A%2F%2Fcrim...
111,"[2, 1, 2, -2, -2, -2]","[2, 1, 2, -2, -2, -2]",Ave Maria,Altus,124,125,"[2.0, 4.0, 2.0, 1.0, 1.0, 1.0, 1.0]","124-125/2/@3.0-end,@start-4.5",https://ema.crimproject.org/https%3A%2F%2Fcrim...


In [55]:
output_exact.shape

(1532, 9)

#### Check the first column (pattern_generating_match) to confirm data type

In [21]:
type(output_exact.pattern_generating_match.iloc[0])

list

#### change "list" type to tuple

In [11]:
output_exact["pattern_generating_match"] = output_exact["pattern_generating_match"].apply(tuple)
results["pattern_generating_match"] = results["pattern_generating_match"].apply(tuple)

#### How many "unique" patterns in the set? (CRIM Intevals reports that above)

In [12]:
output_exact.pattern_generating_match.apply(str).nunique()

24

#### Group by the Pattern Generating Match and Check Distribution of Results

- Report Top Ten Results (can adjust)

In [13]:
pattern_inventory = pd.DataFrame(output_exact.groupby("pattern_generating_match").size().sort_values(ascending=False)[:10])
pattern_inventory

Unnamed: 0_level_0,0
pattern_generating_match,Unnamed: 1_level_1
"(1, 1, 2, -2, -2, -2)",9
"(1, 2, -2, -2, -2, -2)",7
"(2, -3, 2, -2, -2, -2)",7
"(-3, 2, 2, 2, -3, 2)",6
"(2, 2, -3, 2, 2, 2)",6
"(2, 2, 2, -3, 2, 2)",6
"(4, -2, 2, 2, -3, -2)",4
"(-2, -2, -2, -2, 2, 2)",4
"(-2, -2, -2, 1, 3, -2)",4
"(-2, -2, 1, 3, -2, -2)",4


***

### Select Close Matches Here
#### (Comment out the 'for item iteration' in order to skip screen preview)

- **Close** matches allow for melodic variation (see more below). `find_close_matches(patterns, 2, 1)`
- The **first number** in this command is the **minimum number of melodies** needed before reporting
- The **second number** is **threshold of similarity** needed in order to find a match. 
- Lower number = very similar; higher number = less similar

##### More about Close Matches  
- The **threshold for close matches** is determined by the **second number** called in the method. 
- We select two patterns, then compare *each vector in each pattern successively*. 
- The *differences between each vector are summed*. 
- If that value is **below the threshold specified**, we consider the **two patterns closely matched**.
- The format of the method call is  `find_close_matches(the array you get from into_patterns, minimum matches needed to be displayed, threshold for close match)`.

In [14]:
close_matches = find_close_matches(patterns, 2, 1)
#for item in close_matches:
   #item.print_close_matches()
    #return pd.DataFrame(close_matches)

output_close = export_pandas(close_matches)
results = pd.DataFrame(output_close)
results.head()
#export_to_csv(close_matches)

Finding close matches...
81 melodic intervals had more than 2 exact or close matches.



Unnamed: 0,pattern_generating_match,pattern_matched,piece_title,part,start_measure,end_measure,note_durations,ema,ema_url
0,"[-2, -2, -2, 2, -2, 4]","[-2, -2, -2, 2, -2, 4]",Ave Maria,[Superius],8,10,"[6.0, 2.0, 4.0, 4.0, 2.0, 2.0, 6.0]","8-10/1/@1.0-end,@start-end,@start-3.0",https://ema.crimproject.org/https%3A%2F%2Fcrim...
1,"[-2, -2, -2, 2, -2, 4]","[-2, -2, -2, 2, -2, 4]",Ave Maria,Altus,10,12,"[6.0, 2.0, 4.0, 4.0, 3.0, 1.0, 6.0]","10-12/2/@1.0-end,@start-end,@start-3.0",https://ema.crimproject.org/https%3A%2F%2Fcrim...
2,"[-2, -2, -2, 2, -2, 4]","[-2, -2, -2, 2, -2, 4]",Ave Maria,Tenor,12,14,"[6.0, 2.0, 4.0, 4.0, 2.0, 2.0, 6.0]","12-14/3/@1.0-end,@start-end,@start-3.0",https://ema.crimproject.org/https%3A%2F%2Fcrim...
3,"[-2, -2, 2, -2, 4, -2]","[-2, -2, 2, -2, 4, -2]",Ave Maria,[Superius],8,11,"[2.0, 4.0, 4.0, 2.0, 2.0, 6.0, 1.0]","8-11/1/@4.0-end,@start-end,@start-end,@start-2.0",https://ema.crimproject.org/https%3A%2F%2Fcrim...
4,"[-2, -2, 2, -2, 4, -2]","[-2, -2, 2, -2, 4, -2]",Ave Maria,Altus,10,13,"[2.0, 4.0, 4.0, 3.0, 1.0, 6.0, 1.0]","10-13/2/@4.0-end,@start-end,@start-end,@start-2.0",https://ema.crimproject.org/https%3A%2F%2Fcrim...


#### Also see ways to check shape and type, and correct data types above for Exact Matches.
- Report Top Ten results (can adjust)




In [17]:

output_close["pattern_generating_match"] = output_close["pattern_generating_match"].apply(tuple)

pattern_inventory = pd.DataFrame(output_close.groupby("pattern_generating_match").size().sort_values(ascending=False)[:10])
pattern_inventory

Unnamed: 0_level_0,0
pattern_generating_match,Unnamed: 1_level_1
"(1, 1, 2, -2, -2, -2)",13
"(2, 1, 2, -2, -2, -2)",13
"(1, 3, -2, -2, -2, -2)",13
"(1, 2, -2, -2, -2, -2)",12
"(2, 2, -2, -2, -2, -2)",10
"(2, 2, 2, -3, 2, 2)",9
"(-2, -2, -2, -2, 2, 2)",9
"(2, 2, 2, -2, 2, 2)",9
"(2, -2, 2, -3, 2, -2)",8
"(-2, -2, -2, 2, 3, -2)",8


In [18]:
output_close.pattern_generating_match.apply(str).nunique()

81

***

### Classify Patterns Here 
#### Note:  depends on choice of Close or Exact above!  Must choose appropriate one below!
#### No Pandas Preview, but CSV Export OK
#### Scroll to bottom of output to name and save CSV file

In [50]:
classify_matches(close_matches, 2)
#classify_matches(exact_matches, 2)
classified_matches = classify_matches(close_matches, 2)
#pd.DataFrame(classified_matches)
output = export_pandas(classified_matches)
#pd.DataFrame(output).head()

## For CSV export, use the following (and follow prompts for file name)

#export_to_csv(classified_matches)

periodic entry:
Pattern: [2, 1, 2, -2, -2, -2], Locations in entry: 
- Measure 32 in voice 2
- Measure 36 in voice 4
- Measure 40 in voice 1
fuga:
Pattern: [2, 1, 2, -2, -2, -2], Locations in entry: 
- Measure 36 in voice 4
- Measure 40 in voice 1
- Measure 40 in voice 3
fuga:
Pattern: [1, 1, 2, -2, -2, -2], Locations in entry: 
- Measure 40 in voice 1
- Measure 40 in voice 3
- Measure 94 in voice 2
- Measure 98 in voice 2
fuga:
Pattern: [1, 1, 2, -2, -2, -2], Locations in entry: 
- Measure 94 in voice 2
- Measure 98 in voice 2
- Measure 111 in voice 1
- Measure 114 in voice 3
fuga:
Pattern: [1, 2, -2, -2, -2, -2], Locations in entry: 
- Measure 41 in voice 1
- Measure 41 in voice 3
- Measure 94 in voice 2
- Measure 98 in voice 2
fuga:
Pattern: [-3, 2, 2, 2, -3, 2], Locations in entry: 
- Measure 44 in voice 1
- Measure 44 in voice 3
- Measure 44 in voice 4
- Measure 45 in voice 3
fuga:
Pattern: [-3, 2, 2, 2, -3, 2], Locations in entry: 
- Measure 44 in voice 4
- Measure 45 in voice 3


In [48]:
output_cm

Unnamed: 0,pattern_generating_match,pattern_matched,piece_title,part,start_measure,end_measure,note_durations,ema,ema_url
0,"[2, 1, 2, -2, -2, -2]","[2, 1, 2, -2, -2, -2]",Ave Maria,Altus,32,35,"[4.0, 4.0, 2.0, 4.0, 2.0, 4.0, 8.0]","32-35/2/@3.0-end,@start-end,@start-end,@start-1.0",https://ema.crimproject.org/https%3A%2F%2Fcrim...
1,"[2, 1, 2, -2, -2, -2]","[2, 1, 2, -2, -2, -2]",Ave Maria,Bassus,36,39,"[4.0, 4.0, 2.0, 4.0, 2.0, 4.0, 8.0]","36-39/4/@3.0-end,@start-end,@start-end,@start-1.0",https://ema.crimproject.org/https%3A%2F%2Fcrim...
2,"[2, 1, 2, -2, -2, -2]","[1, 1, 2, -2, -2, -2]",Ave Maria,[Superius],40,43,"[4.0, 4.0, 4.0, 6.0, 2.0, 4.0, 4.0]","40-43/1/@3.0-end,@start-end,@start-end,@start-3.0",https://ema.crimproject.org/https%3A%2F%2Fcrim...
3,"[2, 1, 2, -2, -2, -2]","[2, 1, 2, -2, -2, -2]",Ave Maria,Bassus,36,39,"[4.0, 4.0, 2.0, 4.0, 2.0, 4.0, 8.0]","36-39/4/@3.0-end,@start-end,@start-end,@start-1.0",https://ema.crimproject.org/https%3A%2F%2Fcrim...
4,"[2, 1, 2, -2, -2, -2]","[1, 1, 2, -2, -2, -2]",Ave Maria,[Superius],40,43,"[4.0, 4.0, 4.0, 6.0, 2.0, 4.0, 4.0]","40-43/1/@3.0-end,@start-end,@start-end,@start-3.0",https://ema.crimproject.org/https%3A%2F%2Fcrim...
...,...,...,...,...,...,...,...,...,...
180,"[2, -2, 2, -3, 2, -3]","[2, -3, 2, -3, 2, -2]",Ave Maria,Altus,66,69,"[4.0, 4.0, 4.0, 4.0, 4.0, 3.0, 1.0]","66-69/2/@3.0-end,@start-end,@start-end,@start-2.5",https://ema.crimproject.org/https%3A%2F%2Fcrim...
181,"[2, -3, 2, -3, 2, -2]","[2, -3, 2, -3, 2, -2]",Ave Maria,[Superius],64,67,"[4.0, 4.0, 4.0, 4.0, 4.0, 3.0, 1.0]","64-67/1/@3.0-end,@start-end,@start-end,@start-2.5",https://ema.crimproject.org/https%3A%2F%2Fcrim...
182,"[2, -3, 2, -3, 2, -2]","[2, -3, 2, -3, 2, -2]",Ave Maria,Altus,66,69,"[4.0, 4.0, 4.0, 4.0, 4.0, 3.0, 1.0]","66-69/2/@3.0-end,@start-end,@start-end,@start-2.5",https://ema.crimproject.org/https%3A%2F%2Fcrim...
183,"[2, -3, 2, -3, 2, -2]","[2, -3, 2, -3, 2, -2]",Ave Maria,Tenor,67,70,"[4.0, 4.0, 4.0, 4.0, 4.0, 3.0, 1.0]","67-70/3/@3.0-end,@start-end,@start-end,@start-2.5",https://ema.crimproject.org/https%3A%2F%2Fcrim...


In [32]:
def classified_pandas(matches):
    import pandas as pd
    classified_data = []
    for classified_matches in matches:
            classified_dict = {
              #"Piece": classified_matches.first_note.metadata.title,
              "Pattern_Generating_Match": classified_matches.pattern,
              "Classification_Type": classified_matches.type,
              }
    #for soggetti in classified_matches.matches:
            #classified_dict = {
                #"Soggetto 1 Part": classified_matches.matches.soggetti.first_note.part,
                #"Soggetti 1 Measure": classified_matches.matches.soggetti.first_note.note.measure
              #       classified_data.append(soggetti.first_note.part)
              #       classified_data.append(soggetti.first_note.note.measureNumber)
            #}
    classified_data.append(classified_dict)
    return pd.DataFrame(classified_data)

In [34]:
classified_data

NameError: name 'classified_data' is not defined

## Read CSV of Classified Matches

- Update file name to match the output of previous cells for Classifier

In [3]:
results = pd.read_csv('Sandrin_Classified.csv')
results.rename(columns=
                   {'Pattern Generating Match': 'Pattern_Generating_Match', 
                    'Pattern matched':'Pattern_Matched',
                    'Classification Type': 'Classification_Type',
                    'Piece Title': 'Piece_Title',
                    'First Note Measure Number': 'Start_Measure',
                    'Last Note Measure Number': 'Stop_Measure',
                    'Note Durations': 'Note_Durations'
                   },
                    inplace=True)
results.head()

ParserError: Error tokenizing data. C error: Expected 12 fields in line 92, saw 16


## Now Durations Filter 
 - Works with RESULTS files (not with Classified Patterns)

In [63]:
# converts strings to numbers and replaces these in the dataframe

results['note_durations'] = results['note_durations'].apply(ast.literal_eval)

durations = results['note_durations']

ValueError: malformed node or string: [4.0, 6.0, 2.0, 2.0, 2.0, 2.0, 3.0]

In [64]:
results.head()

Unnamed: 0,pattern_generating_match,pattern_matched,piece_title,part,start_measure,end_measure,note_durations,ema,ema_url
0,"(5, -2, 2, 3, -2, -2)","[5, -2, 2, 3, -2, -2]",Veni speciosam,Superius,1,3,"[4.0, 6.0, 2.0, 2.0, 2.0, 2.0, 3.0]","1-3/1/@1.0-end,@start-end,@start-2.0",https://ema.crimproject.org/https%3A%2F%2Fcrim...
1,"(5, -2, 2, 3, -2, -2)","[5, -2, 2, 3, -2, -2]",Veni speciosam,Superius,10,12,"[4.0, 6.0, 2.0, 2.0, 2.0, 4.0, 2.0]","10-12/1/@1.0-end,@start-end,@start-3.0",https://ema.crimproject.org/https%3A%2F%2Fcrim...
2,"(5, -2, 2, 3, -2, -2)","[5, -2, 2, 3, -2, -2]",Veni speciosam,Superius,31,33,"[2.0, 6.0, 2.0, 2.0, 4.0, 2.0, 4.0]","31-33/1/@2.0-end,@start-end,@start-3.0",https://ema.crimproject.org/https%3A%2F%2Fcrim...
3,"(5, -2, 2, 3, -2, -2)","[5, -2, 2, 3, -2, -2]",Veni speciosam,Contratenor,34,36,"[2.0, 6.0, 2.0, 2.0, 2.0, 3.0, 1.0]","34-36/2/@4.0-end,@start-end,@start-4.5",https://ema.crimproject.org/https%3A%2F%2Fcrim...
4,"(5, -2, 2, 3, -2, -2)","[5, -2, 2, 3, -2, -2]",Veni speciosam,PrimusTenor,5,7,"[4.0, 6.0, 2.0, 2.0, 2.0, 2.0, 3.0]","5-7/3/@1.0-end,@start-end,@start-2.0",https://ema.crimproject.org/https%3A%2F%2Fcrim...


This Function Calculates the Ratios of the Durations in each Match

In [65]:
# makes pairs of ratio strings

def pairwise(iterable):
    "s -> (s0,s1), (s1,s2), (s2, s3), ..."
    a, b = tee(iterable)
    next(b, None)
    return zip(a, b)

def get_ratios(input_list):
    ratio_pairs = []
    for a, b in pairwise(input_list):
        ratio_pairs.append(b / a)
    return ratio_pairs




Now call the function to operate on the RESULTS file from earlier

In [67]:
# calculates 'duration ratios' for each soggetto, then adds this to the DF

results["duration_ratios"] = results.note_durations.apply(get_ratios)
results.head(4)

Unnamed: 0,pattern_generating_match,pattern_matched,piece_title,part,start_measure,end_measure,note_durations,ema,ema_url,duration_ratios
0,"(5, -2, 2, 3, -2, -2)","[5, -2, 2, 3, -2, -2]",Veni speciosam,Superius,1,3,"[4.0, 6.0, 2.0, 2.0, 2.0, 2.0, 3.0]","1-3/1/@1.0-end,@start-end,@start-2.0",https://ema.crimproject.org/https%3A%2F%2Fcrim...,"[1.5, 0.3333333333333333, 1.0, 1.0, 1.0, 1.5]"
1,"(5, -2, 2, 3, -2, -2)","[5, -2, 2, 3, -2, -2]",Veni speciosam,Superius,10,12,"[4.0, 6.0, 2.0, 2.0, 2.0, 4.0, 2.0]","10-12/1/@1.0-end,@start-end,@start-3.0",https://ema.crimproject.org/https%3A%2F%2Fcrim...,"[1.5, 0.3333333333333333, 1.0, 1.0, 2.0, 0.5]"
2,"(5, -2, 2, 3, -2, -2)","[5, -2, 2, 3, -2, -2]",Veni speciosam,Superius,31,33,"[2.0, 6.0, 2.0, 2.0, 4.0, 2.0, 4.0]","31-33/1/@2.0-end,@start-end,@start-3.0",https://ema.crimproject.org/https%3A%2F%2Fcrim...,"[3.0, 0.3333333333333333, 1.0, 2.0, 0.5, 2.0]"
3,"(5, -2, 2, 3, -2, -2)","[5, -2, 2, 3, -2, -2]",Veni speciosam,Contratenor,34,36,"[2.0, 6.0, 2.0, 2.0, 2.0, 3.0, 1.0]","34-36/2/@4.0-end,@start-end,@start-4.5",https://ema.crimproject.org/https%3A%2F%2Fcrim...,"[3.0, 0.3333333333333333, 1.0, 1.0, 1.5, 0.333..."


### Here we group the rows in the DF by the Pattern Generating Match
- Each has its own string of durations, and duration ratios
- and then we compare the ratios to get the differences
- the "list(combinations)" method takes care of building the pairs, using data from our dataframe 'results'

In [72]:
def compare_ratios(ratios_1, ratios_2):
    
    ## division of lists 
    # using zip() + list comprehension 
    diffs = [i - j for i, j in zip(ratios_1, ratios_2)] 
    abs_diffs = [abs(ele) for ele in diffs] 
    sum_diffs = sum(abs_diffs)

    return sum_diffs

#results["Pattern_Generating_Match"] = results["Pattern_Generating_Match"].apply(tuple) 

def get_ratio_distances(results, pattern_col, output_cols):
    
    matches = []

    for name, group in results.groupby(pattern_col):

        ratio_pairs = list(combinations(group.index.values, 2))

        for a, b in ratio_pairs:
            
            a_match = results.loc[a]
            b_match = results.loc[b]
            
            sum_diffs = compare_ratios(a_match.duration_ratios, b_match.duration_ratios)
            
            match_dict = {
                "pattern": name,
                "sum_diffs": sum_diffs
            }
            
            for col in output_cols:
                match_dict.update({
                    f"match_1_{col}": a_match[col],
                    f"match_2_{col}": b_match[col]
                })
                
            matches.append(match_dict)
            
    return pd.DataFrame(matches)

Now Run the Function to get the 'edit distances' for the durations of matching patterns

In [75]:
ratio_distances = get_ratio_distances(results, "pattern_generating_match", ["piece_title", "part", "start_measure", "end_measure"])
ratio_distances.head()

Unnamed: 0,pattern,sum_diffs,match_1_piece_title,match_2_piece_title,match_1_part,match_2_part,match_1_start_measure,match_2_start_measure,match_1_end_measure,match_2_end_measure
0,"(-5, 2, 2, 2, 2, 2)",3.666667,Veni speciosam,Missa Vidi speciosam: Gloria,SecundusTenor,[II.Tenor],107,106,108,108
1,"(-5, 2, 2, 2, 2, 2)",1.666667,Veni speciosam,Missa Vidi speciosam: Credo,SecundusTenor,[II.Tenor],107,27,108,28
2,"(-5, 2, 2, 2, 2, 2)",1.666667,Veni speciosam,Missa Vidi speciosam: Sanctus,SecundusTenor,I.Tenor,107,64,108,65
3,"(-5, 2, 2, 2, 2, 2)",0.25,Veni speciosam,Missa Vidi speciosam: Sanctus,SecundusTenor,[II.Tenor],107,57,108,58
4,"(-5, 2, 2, 2, 2, 2)",3.416667,Veni speciosam,Missa Vidi speciosam: Sanctus,SecundusTenor,[Bassus],107,54,108,56


And FILTER the results according to any threshold we like

In [77]:
ratios_filtered = ratio_distances[ratio_distances.sum_diffs <= 1]
ratios_filtered

Unnamed: 0,pattern,sum_diffs,match_1_piece_title,match_2_piece_title,match_1_part,match_2_part,match_1_start_measure,match_2_start_measure,match_1_end_measure,match_2_end_measure
3,"(-5, 2, 2, 2, 2, 2)",0.250000,Veni speciosam,Missa Vidi speciosam: Sanctus,SecundusTenor,[II.Tenor],107,57,108,58
24,"(-3, -2, 2, 2, 2, 2)",1.000000,Veni speciosam,Missa Vidi speciosam: Gloria,Bassus,[II.Tenor],9,46,11,47
26,"(-3, -2, 2, 2, 2, 2)",1.000000,Veni speciosam,Missa Vidi speciosam: Credo,Bassus,[Contratenor],9,161,11,163
28,"(-3, -2, 2, 2, 2, 2)",0.000000,Missa Vidi speciosam: Gloria,Missa Vidi speciosam: Credo,[II.Tenor],[Contratenor],46,161,47,163
35,"(-3, 2, 2, -2, -2, -2)",0.666667,Veni speciosam,Veni speciosam,Contratenor,Bassus,67,85,67,86
...,...,...,...,...,...,...,...,...,...,...
12986,"(5, -2, 2, 3, -2, -2)",0.500000,Missa Vidi speciosam: Credo,Missa Vidi speciosam: Agnus Dei,[II.Tenor],I.Tenor,9,1,11,3
12988,"(5, -2, 2, 3, -2, -2)",0.000000,Missa Vidi speciosam: Sanctus,Missa Vidi speciosam: Sanctus,[Superius],I.Tenor,10,1,12,3
12990,"(5, -2, 2, 3, -2, -2)",0.000000,Missa Vidi speciosam: Sanctus,Missa Vidi speciosam: Agnus Dei,[Superius],I.Tenor,10,1,12,3
12993,"(5, -2, 2, 3, -2, -2)",0.000000,Missa Vidi speciosam: Sanctus,Missa Vidi speciosam: Agnus Dei,I.Tenor,I.Tenor,1,1,3,3


Now Group the Duration-Filter Results by the Pattern (which shows us very closely related soggetti in sets)

In [78]:
grouped = ratios_filtered.groupby("pattern")
grouped.head()

Unnamed: 0,pattern,sum_diffs,match_1_piece_title,match_2_piece_title,match_1_part,match_2_part,match_1_start_measure,match_2_start_measure,match_1_end_measure,match_2_end_measure
3,"(-5, 2, 2, 2, 2, 2)",0.250000,Veni speciosam,Missa Vidi speciosam: Sanctus,SecundusTenor,[II.Tenor],107,57,108,58
24,"(-3, -2, 2, 2, 2, 2)",1.000000,Veni speciosam,Missa Vidi speciosam: Gloria,Bassus,[II.Tenor],9,46,11,47
26,"(-3, -2, 2, 2, 2, 2)",1.000000,Veni speciosam,Missa Vidi speciosam: Credo,Bassus,[Contratenor],9,161,11,163
28,"(-3, -2, 2, 2, 2, 2)",0.000000,Missa Vidi speciosam: Gloria,Missa Vidi speciosam: Credo,[II.Tenor],[Contratenor],46,161,47,163
35,"(-3, 2, 2, -2, -2, -2)",0.666667,Veni speciosam,Veni speciosam,Contratenor,Bassus,67,85,67,86
...,...,...,...,...,...,...,...,...,...,...
12676,"(5, -2, 2, 3, -2, -2)",0.000000,Veni speciosam,Veni speciosam,Superius,PrimusTenor,1,5,3,7
12679,"(5, -2, 2, 3, -2, -2)",0.000000,Veni speciosam,Missa Vidi speciosam: Kyrie,Superius,[Superius],1,1,3,3
12680,"(5, -2, 2, 3, -2, -2)",0.000000,Veni speciosam,Missa Vidi speciosam: Kyrie,Superius,I.Tenor,1,4,3,6
12684,"(5, -2, 2, 3, -2, -2)",0.000000,Veni speciosam,Missa Vidi speciosam: Gloria,Superius,[Superius],1,9,3,11


In [None]:
ratios_filtered.to_csv("filtered_sample_pair.csv")

In [46]:
output_exact["duration_ratios"] = output_exact.note_durations.apply(get_ratios)

In [None]:
compare_ratios
matches_df=pd.DataFrame(matches)
matches_df.head(5)
# could SORT by these!
matches_df[matches_df.sum_diffs == 0]