# About Crim Intervals 

- See more at https://github.com/HCDigitalScholarship/intervals/blob/master/README.md

## How to Select Pieces

- For local file(s), use: `corpus = CorpusBase(['/Users/rfreedma/MEI/CRIM_Intervals_Tests/Brumel_Complete.mei'])`
- For remote file, use: `corpus = CorpusBase(['https://crimproject.org/mei/CRIM_Model_0008.mei', 'https://crimproject.org/mei/CRIM_Mass_0005_5.mei'])`


## Notes about the Various Parameters

- **Length of the Soggetto**: into_patterns([vectors.semitone_intervals], 5) The **number** in this command represents the **minimum number of vectors to find**. 5 vectors is 6 notes.

### Chromatic vs Diatonic 
- **Chromatic** uses `into_patterns([vectors.semitone_intervals], 5)`
- **Diatonic** uses `into_patterns([vectors.generic_intervals], 5)`

### Exact vs Close  
- **Exact** is exact in *all* ways find_exact_matches(patterns, 2). The **number** in this command represents the **minimum number of matching melodies needed before reporting**. This allows us to filter for common or uncommon soggetti.
- **Close** matches allow for melodic variation (see more below). `find_close_matches(patterns, 2, 1)`
. The **first number** in this command is the minimum number of melodies needed before reporting; the **second number** is threshold needed in order to find a match. Lower number = very similar; higher number = less similar

### More about Close Matches  
- The **threshold for close matches** is determined by the third number called in the method. We select two patterns, then compare *each vector in each pattern successively*. The "differences" between each vector are summed. If that value is below the threshold specified, we consider the two patterns closely matched.
- The format of the method call is  `find_close_matches(the array you get from into_patterns, minimum matches needed to be displayed, threshold for close match)`.

### About Rhythmic Durations

- For `find_close_matches` and `find_exact_matches`, rhythmic variation/duration is displayed, but **not** factored into the calculation of matching.
- **Incremental Offset** calculates the intervals using a fixed offset between notes, no matter their actual duration.  Use this to ignore passing tones or other ornaments.  The offsets are expressed in multiples of the quarter note (Offset = 1 samples at quarter note; Offset = 2 at half note, etc). Set with `vectors = IntervalBase(corpus.note_list_incremental_offset(2))`

### Dataframe Preview vs Full Results in Browser

- Comment out the first of these for preview only.  Include for full results

    - `#pd.set_option("display.max_rows", None, "display.max_columns", None)`
    - `return pd.DataFrame(match_data)`
    
### To Set CSV Output

- Specify file name here `pd.Series(match_data).to_csv("Match_data_10_13.csv")`

***

## Search with Options

- Piece or Corpus
- Actual or Incremental Durations
- Chromatic or Diatonic
- Exact or Close
- Classify

***


### Load Crim Intervals and Pandas

In [1]:
from crim_intervals import *
import pandas as pd

***

### Selected Piece or Corpus 

A Note About Local Files
Use /content/your_file_name.mei for any local file you've uploaded

Otherwise provide URL

To load CRIM files directly from Git:

In [2]:
#corpus = CorpusBase(['/Users/rfreedma/MEI/A_MEI_Tests/Sandrin_Doulce_RF_12_15_20.mei'])
#url = 'https://raw.githubusercontent.com/CRIM-Project/CRIM-online/master/crim/static/mei/MEI_4.0/'
corpus_files = ['/Users/rfreedma/MEI/A_MEI_Tests/Sandrin_Doulce_RF_12_15_20.mei_msg.mei']
corpus = CorpusBase(corpus_files)

Requesting file from /Users/rfreedma/MEI/A_MEI_Tests/Sandrin_Doulce_RF_12_15_20.mei_msg.mei...
Successfully imported.


# Correct the MEI Metadata

Just as an experiment, we load the MEI files directly in order to adjust metadata. 

This routine fixes the title information.

These changes would be implemented in https://github.com/HCDigitalScholarship/intervals/blob/master/main_objs.py

In [3]:
import xml.etree.ElementTree as ET
import requests

MEINSURI = 'http://www.music-encoding.org/ns/mei'
MEINS = '{%s}' % MEINSURI

for i, path in enumerate(corpus_files):
  if path[0] == '/':
    mei_doc = ET.parse(path)
  else:
    mei_doc = ET.fromstring(requests.get(path).text)
  # Find the title from the MEI file and update the Music21 Score metadata
  title = mei_doc.find(f'{MEINS}meiHead//{MEINS}titleStmt/{MEINS}title').text
  corpus.scores[i].metadata.title = title

***

### Select Actual or Incremental Durations

#### About Rhythmic Durations

- For `find_close_matches` and `find_exact_matches`, rhythmic variation/duration is displayed, but **not** factored into the calculation of matching.
- **Incremental Offset** calculates the intervals using a **fixed offset between notes**, no matter their actual duration.  Use this to ignore passing tones or other ornaments.  The offsets are expressed in multiples of the quarter note (Offset = 1 samples at quarter note; Offset = 2 at half note, etc). Set with `vectors = IntervalBase(corpus.note_list_incremental_offset(2))`

In [4]:
vectors = IntervalBase(corpus.note_list)
#vectors = IntervalBase(corpus.note_list_incremental_offset(2))

***

### Select Generic or Semitone:

- **Length of the Soggetto**: `into_patterns([vectors.semitone_intervals], 5)` 

- The **number** in this command represents the **minimum number of vectors to find**. 5 vectors is 6 notes.


In [5]:
patterns = into_patterns([vectors.generic_intervals], 4)
#patterns = into_patterns([vectors.semitone_intervals], 4)

***

### Select Exact Matches Here
#### (Use comment feature to select screen preview or CSV output) 

- **Exact** is exact in *all* ways `find_exact_matches(patterns, 2)` 
- The **number** in this command represents the **minimum number of matching melodies needed before reporting**. This allows us to filter for common or uncommon soggetti.

In [None]:
exact_matches = find_exact_matches(patterns, 2)
# Use this for exact screen preview
#for item in exact_matches:
    #item.print_exact_matches()

output = export_pandas(exact_matches)
pd.DataFrame(output).head()
#export_to_csv(exact_matches) 

***

### Select Close Matches Here
#### (Comment out the 'for item iteration' in order to skip screen preview)

- **Close** matches allow for melodic variation (see more below). `find_close_matches(patterns, 2, 1)`
- The **first number** in this command is the **minimum number of melodies** needed before reporting
- The **second number** is **threshold of similarity** needed in order to find a match. 
- Lower number = very similar; higher number = less similar

##### More about Close Matches  
- The **threshold for close matches** is determined by the **second number** called in the method. 
- We select two patterns, then compare *each vector in each pattern successively*. 
- The *differences between each vector are summed*. 
- If that value is **below the threshold specified**, we consider the **two patterns closely matched**.
- The format of the method call is  `find_close_matches(the array you get from into_patterns, minimum matches needed to be displayed, threshold for close match)`.

In [8]:
close_matches = find_close_matches(patterns, 2, 1)
#for item in close_matches:
   #item.print_close_matches()
    #return pd.DataFrame(close_matches)

output = export_pandas(close_matches)
pd.DataFrame(output).head()
#export_to_csv(close_matches)

Finding close matches...
226 melodic intervals had more than 2 exact or close matches.



Unnamed: 0,pattern_generating_match,pattern_matched,piece_title,part,start_measure,end_measure,note_durations,ema,ema_url
0,"[1, -2, -2, -2]","[1, -2, -2, -2]",Doulce memoire,[Superius],1,3,"[8.0, 2.0, 2.0, 4.0, 4.0]","1-3/1/@1.0-end,@start-end,@start-1.0",File must be a crim url to have a valid EMA url
1,"[1, -2, -2, -2]","[2, -2, -2, -2]",Doulce memoire,[Contratenor],7,8,"[2.0, 3.0, 1.0, 4.0, 2.0]","7-8/2/@3.0-end,@start-8.0",File must be a crim url to have a valid EMA url
2,"[1, -2, -2, -2]","[1, -2, -2, -2]",Doulce memoire,[Contratenor],14,15,"[2.0, 3.0, 0.5, 0.5, 2.0]","14-15/2/@2.0-end,@start-5.0",File must be a crim url to have a valid EMA url
3,"[1, -2, -2, -2]","[2, -2, -2, -2]",Doulce memoire,[Contratenor],22,23,"[4.0, 3.0, 1.0, 2.0, 2.0]","22-23/2/@1.0-end,@start-6.0",File must be a crim url to have a valid EMA url
4,"[1, -2, -2, -2]","[2, -2, -2, -2]",Doulce memoire,[Contratenor],27,28,"[2.0, 3.0, 1.0, 1.0, 1.0]","27-28/2/@6.0-end,@start-1.5",File must be a crim url to have a valid EMA url


***

### Classify Patterns Here 
#### Note:  depends on choice of Close or Exact above!  Must choose appropriate one below!
#### No Pandas Preview, but CSV Export OK
#### Scroll to bottom of output to name and save CSV file

In [None]:
classify_matches(close_matches, 2)
#classify_matches(exact_matches, 2)
classified_matches = classify_matches(close_matches, 2)
#pd.DataFrame(classified_matches)
output = export_pandas(classified_matches)
#pd.DataFrame(output).head()

## For CSV export, use the following (and follow prompts for file name)

export_to_csv(classified_matches)

periodic entry:
Pattern: [2, 2, -3, -2], Locations in entry: 
- Measure 4 in voice 1
- Measure 4 in voice 4
- Measure 53 in voice 3
- Measure 54 in voice 1
imitative duo:
Pattern: [-2, -2, 4, -2], Locations in entry: 
- Measure 28 in voice 1
- Measure 31 in voice 1
- Measure 36 in voice 1
- Measure 39 in voice 1
fuga:
Pattern: [1, 1, 1, 1], Locations in entry: 
- Measure 11 in voice 2
- Measure 17 in voice 1
- Measure 18 in voice 4
- Measure 19 in voice 3
fuga:
Pattern: [1, 2, 2, 1], Locations in entry: 
- Measure 18 in voice 4
- Measure 19 in voice 3
- Measure 28 in voice 4
- Measure 30 in voice 4
imitative duo:
Pattern: [1, 1, 2, 1], Locations in entry: 
- Measure 28 in voice 4
- Measure 30 in voice 4
- Measure 36 in voice 4
- Measure 38 in voice 4
fuga:
Pattern: [1, 1, 2, 1], Locations in entry: 
- Measure 30 in voice 4
- Measure 36 in voice 4
- Measure 38 in voice 4
fuga:
Pattern: [2, 2, 1, -3], Locations in entry: 
- Measure 5 in voice 2
- Measure 7 in voice 1
- Measure 12 in voic

periodic entry:
Pattern: [2, 2, -3, -2], Locations in entry: 
- Measure 4 in voice 1
- Measure 4 in voice 4
- Measure 53 in voice 3
- Measure 54 in voice 1
imitative duo:
Pattern: [-2, -2, 4, -2], Locations in entry: 
- Measure 28 in voice 1
- Measure 31 in voice 1
- Measure 36 in voice 1
- Measure 39 in voice 1
fuga:
Pattern: [1, 1, 1, 1], Locations in entry: 
- Measure 11 in voice 2
- Measure 17 in voice 1
- Measure 18 in voice 4
- Measure 19 in voice 3
fuga:
Pattern: [1, 2, 2, 1], Locations in entry: 
- Measure 18 in voice 4
- Measure 19 in voice 3
- Measure 28 in voice 4
- Measure 30 in voice 4
imitative duo:
Pattern: [1, 1, 2, 1], Locations in entry: 
- Measure 28 in voice 4
- Measure 30 in voice 4
- Measure 36 in voice 4
- Measure 38 in voice 4
fuga:
Pattern: [1, 1, 2, 1], Locations in entry: 
- Measure 30 in voice 4
- Measure 36 in voice 4
- Measure 38 in voice 4
fuga:
Pattern: [2, 2, 1, -3], Locations in entry: 
- Measure 5 in voice 2
- Measure 7 in voice 1
- Measure 12 in voic

## Read CSV of Classified Matches

- Update file name to match the output of previous cells for Classifier

In [10]:
pd.read_csv('Sandrin_Soggetti.csv', usecols=['Pattern Generating Match', 'Classification Type', 'Soggetti 1 Part', 'Soggetti 1 Measure', 'Soggetti 2 Part', 'Soggetti 2 Measure', 'Soggetti 3 Part', 'Soggetti 3 Measure', 'Soggetti 4 Part', 'Soggetti 4 Measure'])


ValueError: Usecols do not match columns, columns expected but not found: ['Soggetti 4 Measure', 'Soggetti 2 Part', 'Soggetti 3 Part', 'Classification Type', 'Soggetti 4 Part', 'Soggetti 3 Measure', 'Soggetti 2 Measure', 'Soggetti 1 Part', 'Soggetti 1 Measure']