# About Crim Intervals 

- See more at https://github.com/HCDigitalScholarship/intervals/blob/master/README.md

## How to Select Pieces

- For local file(s), use: `corpus = CorpusBase(['/Users/rfreedma/MEI/CRIM_Intervals_Tests/Brumel_Complete.mei'])`
- For remote file, use: `corpus = CorpusBase(['https://crimproject.org/mei/CRIM_Model_0008.mei', 'https://crimproject.org/mei/CRIM_Mass_0005_5.mei'])`


## Notes about the Various Parameters

- **Length of the Soggetto**: into_patterns([vectors.semitone_intervals], 5) The **number** in this command represents the **minimum number of vectors to find**. 5 vectors is 6 notes.

### Chromatic vs Diatonic 
- **Chromatic** uses `into_patterns([vectors.semitone_intervals], 5)`
- **Diatonic** uses `into_patterns([vectors.generic_intervals], 5)`

### Exact vs Close  
- **Exact** is exact in *all* ways find_exact_matches(patterns, 2). The **number** in this command represents the **minimum number of matching melodies needed before reporting**. This allows us to filter for common or uncommon soggetti.
- **Close** matches allow for melodic variation (see more below). `find_close_matches(patterns, 2, 1)`
. The **first number** in this command is the minimum number of melodies needed before reporting; the **second number** is threshold needed in order to find a match. Lower number = very similar; higher number = less similar

### More about Close Matches  
- The **threshold for close matches** is determined by the third number called in the method. We select two patterns, then compare *each vector in each pattern successively*. The "differences" between each vector are summed. If that value is below the threshold specified, we consider the two patterns closely matched.
- The format of the method call is  `find_close_matches(the array you get from into_patterns, minimum matches needed to be displayed, threshold for close match)`.

### About Rhythmic Durations

- For `find_close_matches` and `find_exact_matches`, rhythmic variation/duration is displayed, but **not** factored into the calculation of matching.
- **Incremental Offset** calculates the intervals using a fixed offset between notes, no matter their actual duration.  Use this to ignore passing tones or other ornaments.  The offsets are expressed in multiples of the quarter note (Offset = 1 samples at quarter note; Offset = 2 at half note, etc). Set with `vectors = IntervalBase(corpus.note_list_incremental_offset(2))`

### Dataframe Preview vs Full Results in Browser

- Comment out the first of these for preview only.  Include for full results

    - `#pd.set_option("display.max_rows", None, "display.max_columns", None)`
    - `return pd.DataFrame(match_data)`
    
### To Set CSV Output

- Specify file name here `pd.Series(match_data).to_csv("Match_data_10_13.csv")`

***

## Search with Options

- Piece or Corpus
- Actual or Incremental Durations
- Chromatic or Diatonic
- Exact or Close
- Classify

***


### Load Crim Intervals and Pandas

In [2]:
from crim_intervals import *
import pandas as pd
import ast
from itertools import tee, combinations
import matplotlib

***

### Selected Piece or Corpus 

A Note About Local Files
Use /content/your_file_name.mei for any local file you've uploaded

Otherwise provide URL

To load CRIM files directly from Git:

In [3]:
#corpus = CorpusBase(['/Users/rfreedma/MEI/A_MEI_Tests/Sandrin_Doulce_RF_12_15_20.mei'])
#url = 'https://raw.githubusercontent.com/CRIM-Project/CRIM-online/master/crim/static/mei/MEI_4.0/'
corpus_files = ['https://raw.githubusercontent.com/CRIM-Project/CRIM-online/master/crim/static/mei/MEI_4.0/CRIM_Model_0001.mei']
corpus = CorpusBase(corpus_files)

Requesting file from https://raw.githubusercontent.com/CRIM-Project/CRIM-online/master/crim/static/mei/MEI_4.0/CRIM_Model_0001.mei...
Successfully imported.


# Correct the MEI Metadata

Just as an experiment, we load the MEI files directly in order to adjust metadata. 

This routine fixes the title information.

These changes would be implemented in https://github.com/HCDigitalScholarship/intervals/blob/master/main_objs.py

In [4]:
import xml.etree.ElementTree as ET
import requests

MEINSURI = 'http://www.music-encoding.org/ns/mei'
MEINS = '{%s}' % MEINSURI

for i, path in enumerate(corpus_files):
  if path[0] == '/':
    mei_doc = ET.parse(path)
  else:
    mei_doc = ET.fromstring(requests.get(path).text)
  # Find the title from the MEI file and update the Music21 Score metadata
  title = mei_doc.find(f'{MEINS}meiHead//{MEINS}titleStmt/{MEINS}title').text
  corpus.scores[i].metadata.title = title

***

### Select Actual or Incremental Durations

#### About Rhythmic Durations

- For `find_close_matches` and `find_exact_matches`, rhythmic variation/duration is displayed, but **not** factored into the calculation of matching.
- **Incremental Offset** calculates the intervals using a **fixed offset between notes**, no matter their actual duration.  Use this to ignore passing tones or other ornaments.  The offsets are expressed in multiples of the quarter note (Offset = 1 samples at quarter note; Offset = 2 at half note, etc). Set with `vectors = IntervalBase(corpus.note_list_incremental_offset(2))`

In [6]:
vectors = IntervalBase(corpus.note_list)
#vectors = IntervalBase(corpus.note_list_incremental_offset(2))

***

### Select Generic or Semitone:

- **Length of the Soggetto**: `into_patterns([vectors.semitone_intervals], 5)` 

- The **number** in this command represents the **minimum number of vectors to find**. 5 vectors is 6 notes.


In [7]:
patterns = into_patterns([vectors.generic_intervals], 4)
#patterns = into_patterns([vectors.semitone_intervals], 4)

***

### Select Exact Matches Here
#### (Use comment feature to select screen preview or CSV output) 

- **Exact** is exact in *all* ways `find_exact_matches(patterns, 2)` 
- The **number** in this command represents the **minimum number of matching melodies needed before reporting**. This allows us to filter for common or uncommon soggetti.

In [8]:
exact_matches = find_exact_matches(patterns, 2)
# Use this for exact screen preview
#for item in exact_matches:
    #item.print_exact_matches()

output_exact = export_pandas(exact_matches)
pd.DataFrame(output_exact).head()
#results = pd.DataFrame(output_exact)
export_to_csv(exact_matches) 

Finding exact matches...
106 melodic intervals had more than 2 exact matches.

This method will create a csv file in your current working directory. Continue? (y/n): y
Enter a name for your csv file (.csv will be appended): results_Model_0001_exact
CSV created in your current working directory.


In [9]:
output_exact.dtypes

pattern_generating_match    object
pattern_matched             object
piece_title                 object
part                        object
start_measure                int64
end_measure                  int64
note_durations              object
ema                         object
ema_url                     object
dtype: object

***

### Select Close Matches Here
#### (Comment out the 'for item iteration' in order to skip screen preview)

- **Close** matches allow for melodic variation (see more below). `find_close_matches(patterns, 2, 1)`
- The **first number** in this command is the **minimum number of melodies** needed before reporting
- The **second number** is **threshold of similarity** needed in order to find a match. 
- Lower number = very similar; higher number = less similar

##### More about Close Matches  
- The **threshold for close matches** is determined by the **second number** called in the method. 
- We select two patterns, then compare *each vector in each pattern successively*. 
- The *differences between each vector are summed*. 
- If that value is **below the threshold specified**, we consider the **two patterns closely matched**.
- The format of the method call is  `find_close_matches(the array you get from into_patterns, minimum matches needed to be displayed, threshold for close match)`.

In [None]:
close_matches = find_close_matches(patterns, 2, 1)
#for item in close_matches:
   #item.print_close_matches()
    #return pd.DataFrame(close_matches)

output_close = export_pandas(close_matches)
results = pd.DataFrame(output_close)
results.head()
#export_to_csv(close_matches)

***

### Classify Patterns Here 
#### Note:  depends on choice of Close or Exact above!  Must choose appropriate one below!
#### No Pandas Preview, but CSV Export OK
#### Scroll to bottom of output to name and save CSV file

In [None]:
classify_matches(close_matches, 2)
#classify_matches(exact_matches, 2)
classified_matches = classify_matches(close_matches, 2)
#pd.DataFrame(classified_matches)
output = export_pandas(classified_matches)
#pd.DataFrame(output).head()

## For CSV export, use the following (and follow prompts for file name)

export_to_csv(classified_matches)

## Read CSV of Classified Matches

- Update file name to match the output of previous cells for Classifier

In [None]:
pd.read_csv('Sandrin_Soggetti.csv', usecols=['Pattern Generating Match', 'Classification Type', 'Soggetti 1 Part', 'Soggetti 1 Measure', 'Soggetti 2 Part', 'Soggetti 2 Measure', 'Soggetti 3 Part', 'Soggetti 3 Measure', 'Soggetti 4 Part', 'Soggetti 4 Measure'])


# Now use Duration Ratio Matching

## Read CSV of Match Output

In [10]:
results = pd.read_csv('results_Model_0001_exact.csv')
results.rename(columns=
                   {'Pattern Generating Match': 'Pattern_Generating_Match', 
                    'Pattern matched':'Pattern_Matched',
                    'Piece Title': 'Piece_Title',
                    'First Note Measure Number': 'First_Note_Measure_Number',
                   'Last Note Measure Number': 'Last_Note_Measure_Number',
                    'Note Durations': 'Note_Durations'
                   },
                    inplace=True)
results.head()

Unnamed: 0,Pattern_Generating_Match,Pattern_Matched,Piece_Title,Part,First_Note_Measure_Number,Last_Note_Measure_Number,Note_Durations,EMA,EMA url
0,"[5, -2, 2, 3]","[5, -2, 2, 3]",Veni speciosam,Superius,1,2,"[4.0, 6.0, 2.0, 2.0, 2.0]","1-2/1/@1.0-end,@start-4.0",https://ema.crimproject.org/https%3A%2F%2Fcrim...
1,"[5, -2, 2, 3]","[5, -2, 2, 3]",Veni speciosam,Superius,10,11,"[4.0, 6.0, 2.0, 2.0, 2.0]","10-11/1/@1.0-end,@start-4.0",https://ema.crimproject.org/https%3A%2F%2Fcrim...
2,"[5, -2, 2, 3]","[5, -2, 2, 3]",Veni speciosam,Superius,31,32,"[2.0, 6.0, 2.0, 2.0, 4.0]","31-32/1/@2.0-end,@start-4.0",https://ema.crimproject.org/https%3A%2F%2Fcrim...
3,"[5, -2, 2, 3]","[5, -2, 2, 3]",Veni speciosam,Contratenor,34,36,"[2.0, 6.0, 2.0, 2.0, 2.0]","34-36/2/@4.0-end,@start-end,@start-2.0",https://ema.crimproject.org/https%3A%2F%2Fcrim...
4,"[5, -2, 2, 3]","[5, -2, 2, 3]",Veni speciosam,PrimusTenor,5,6,"[4.0, 6.0, 2.0, 2.0, 2.0]","5-6/3/@1.0-end,@start-4.0",https://ema.crimproject.org/https%3A%2F%2Fcrim...


In [11]:
# converts strings to numbers and replaces these in the dataframe

results['Note_Durations'] = results['Note_Durations'].apply(ast.literal_eval)

durations = results['Note_Durations']

In [12]:
# makes pairs of ratio strings

def pairwise(iterable):
    "s -> (s0,s1), (s1,s2), (s2, s3), ..."
    a, b = tee(iterable)
    next(b, None)
    return zip(a, b)

def get_ratios(input_list):
    ratio_pairs = []
    for a, b in pairwise(input_list):
        ratio_pairs.append(b / a)
    return ratio_pairs




In [13]:
# calculates 'duration ratios' for each soggetto, then adds this to the DF

results["Duration_Ratios"] = results.Note_Durations.apply(get_ratios)
results.head(4)

Unnamed: 0,Pattern_Generating_Match,Pattern_Matched,Piece_Title,Part,First_Note_Measure_Number,Last_Note_Measure_Number,Note_Durations,EMA,EMA url,Duration_Ratios
0,"[5, -2, 2, 3]","[5, -2, 2, 3]",Veni speciosam,Superius,1,2,"[4.0, 6.0, 2.0, 2.0, 2.0]","1-2/1/@1.0-end,@start-4.0",https://ema.crimproject.org/https%3A%2F%2Fcrim...,"[1.5, 0.3333333333333333, 1.0, 1.0]"
1,"[5, -2, 2, 3]","[5, -2, 2, 3]",Veni speciosam,Superius,10,11,"[4.0, 6.0, 2.0, 2.0, 2.0]","10-11/1/@1.0-end,@start-4.0",https://ema.crimproject.org/https%3A%2F%2Fcrim...,"[1.5, 0.3333333333333333, 1.0, 1.0]"
2,"[5, -2, 2, 3]","[5, -2, 2, 3]",Veni speciosam,Superius,31,32,"[2.0, 6.0, 2.0, 2.0, 4.0]","31-32/1/@2.0-end,@start-4.0",https://ema.crimproject.org/https%3A%2F%2Fcrim...,"[3.0, 0.3333333333333333, 1.0, 2.0]"
3,"[5, -2, 2, 3]","[5, -2, 2, 3]",Veni speciosam,Contratenor,34,36,"[2.0, 6.0, 2.0, 2.0, 2.0]","34-36/2/@4.0-end,@start-end,@start-2.0",https://ema.crimproject.org/https%3A%2F%2Fcrim...,"[3.0, 0.3333333333333333, 1.0, 1.0]"


In [29]:
# Here we group the rows in the DF by the Pattern Generating Match
# Each has its own string of durations, and duration ratios
# and then we compare the ratios to get the differences
# the "list(combinations)" method takes care of building the pairs, using data from our dataframe 'results'

def compare_ratios_2 (ratios_1, ratios_2):
    
    ## division of lists 
    # using zip() + list comprehension 
    diffs = [i - j for i, j in zip(ratios_1, ratios_2)] 
    abs_diffs = [abs(ele) for ele in diffs] 
    sum_diffs = sum(abs_diffs)

    return sum_diffs

matches_2 = []

for name, group in results.groupby("Pattern_Generating_Match"):
    
    ratio_pairs = list(combinations(group.index.values, 2))
    
    for a,b in ratio_pairs:
        sum_diffs = compare_ratios_2(results.iloc[a].Duration_Ratios, results.iloc[b].Duration_Ratios)
        
        matches_2.append(
            {
                "pattern": name,
                #"ratio_1": results.iloc[a].Duration_Ratios,
                #"ratio_2": results.iloc[b].Duration_Ratios,
                "sum_diffs": sum_diffs,
                "match_1_title": results.iloc[a].Piece_Title,
                "match_1_part": results.iloc[a].Part,
                "match_1_measure": results.iloc[a].First_Note_Measure_Number,
                "match_2_title": results.iloc[b].Piece_Title,
                "match_2_part": results.iloc[b].Part,
                "match_2_measure": results.iloc[b].First_Note_Measure_Number,
                #"match_1_index": a,
                #"match_2_index": b,
                "match_1_ema": results.iloc[a].EMA,
                "match_2_ema": results.iloc[b].EMA,
                
            }
        )

In [30]:
compare_ratios_2
matches_df=pd.DataFrame(matches_2)
matches_df.head(5)
# could SORT by these!
matches_df[matches_df.sum_diffs == 0]

Unnamed: 0,pattern,sum_diffs,match_1_title,match_1_part,match_1_measure,match_2_title,match_2_part,match_2_measure,match_1_ema,match_2_ema
13,"[-2, -2, -2, -2]",0.0,Veni speciosam,Superius,4,Veni speciosam,Superius,120,"4-5/1/@2.0-end,@start-2.5","120-121/1/@4.0-end,@start-4.5"
89,"[-2, -2, -2, -2]",0.0,Veni speciosam,Superius,51,Veni speciosam,Superius,75,"51-52/1/@4.0-end,@start-3.0","75-76/1/@2.0-end,@start-1.0"
117,"[-2, -2, -2, -2]",0.0,Veni speciosam,Superius,51,Veni speciosam,SecundusTenor,80,"51-52/1/@4.0-end,@start-3.0","80-81/4/@4.0-end,@start-3.0"
122,"[-2, -2, -2, -2]",0.0,Veni speciosam,Superius,51,Veni speciosam,SecundusTenor,113,"51-52/1/@4.0-end,@start-3.0","113-114/4/@2.0-end,@start-1.0"
156,"[-2, -2, -2, -2]",0.0,Veni speciosam,Superius,74,Veni speciosam,SecundusTenor,80,"74-75/1/@3.0-end,@start-4.0","80-81/4/@1.0-end,@start-2.0"
...,...,...,...,...,...,...,...,...,...,...
10486,"[5, -2, -2, -2]",0.0,Veni speciosam,PrimusTenor,112,Veni speciosam,Bassus,97,"112-113/3/@2.0-end,@start-1.0","97-98/5/@2.0-end,@start-1.0"
10496,"[5, -2, 2, 3]",0.0,Veni speciosam,Superius,1,Veni speciosam,Superius,10,"1-2/1/@1.0-end,@start-4.0","10-11/1/@1.0-end,@start-4.0"
10499,"[5, -2, 2, 3]",0.0,Veni speciosam,Superius,1,Veni speciosam,PrimusTenor,5,"1-2/1/@1.0-end,@start-4.0","5-6/3/@1.0-end,@start-4.0"
10504,"[5, -2, 2, 3]",0.0,Veni speciosam,Superius,10,Veni speciosam,PrimusTenor,5,"10-11/1/@1.0-end,@start-4.0","5-6/3/@1.0-end,@start-4.0"
