#  Cadences in a Corpus



## A. Import Intervals and Other Code

* See the Corpus Methods notebook for details on the various options for local and remote files


In [7]:
import intervals
from intervals import * 
from intervals import main_objs
import intervals.visualizations as viz
import pandas as pd
import re
import altair as alt 
from ipywidgets import interact
from pandas.io.json import json_normalize
from pyvis.network import Network
from IPython.display import display
import requests
import os
import glob as glob

MYDIR = ("saved_csv")
CHECK_FOLDER = os.path.isdir(MYDIR)

# If folder doesn't exist, then create it.
if not CHECK_FOLDER:
    os.makedirs(MYDIR)
    print("created folder : ", MYDIR)
else:
    print(MYDIR, "folder already exists.")
    
MUSDIR = ("Music_Files")
CHECK_FOLDER = os.path.isdir(MUSDIR)

# If folder doesn't exist, then create it.
if not CHECK_FOLDER:
    os.makedirs(MUSDIR)
    print("created folder : ", MUSDIR)
else:
    print(MUSDIR, "folder already exists.")

saved_csv folder already exists.
Music_Files folder already exists.


## B. Importing Corpus

One method:

* The pieces are provided as a **list**, within square brackets and separated by commas.  
* Complete URLs of each piece in this case
* The bracketed list is then contained within the parentheses of `CorpusBase()`
* For example: 

>`corpus CorpusBase(['https://crimproject.org/mei/CRIM_Mass_0006_1.mei', 'https://crimproject.org/mei/CRIM_Mass_0006_2.mei', 'https://crimproject.org/mei/CRIM_Mass_0006_3.mei'])`
    
Another method:

* Simply load **all** the pieces in the local `Music_Files/` folder:

> `piece_list = []
for name in glob.glob('Music_Files/*'):
    piece_list.append(name)
piece_list`

Read the documentation:  `print(CorpusBase.batch.__doc__)`


### Full CRIM Corpus

* Here we omit various monophonic pieces and a few others for which there are errors.

>`piece_list = []`<br>
    
> `raw_prefix = "https://raw.githubusercontent.com/CRIM-Project/CRIM-online/master/crim/static/mei/MEI_4.0/"`<br>
    
> `URL = "https://api.github.com/repos/CRIM-Project/CRIM-online/git/trees/990f5eb3ff1e9623711514d6609da4076257816c"`<br>
> `piece_json = requests.get(URL).json()`

* A list of files to exclude:

> `exclude_list = ['CRIM_Model_0003.mei', 'CRIM_Model_0004.mei', 'CRIM_Model_0005.mei', 'CRIM_Model_0006.mei', 'CRIM_Model_0007.mei','CRIM_Model_0022.mei', 'CRIM_Model_0028.mei', 'CRIM_Model_0035.mei', 'CRIM_Mass_0029_4.mei', 'CRIM_Mass_0049_2.mei','CRIM_Mass_0049_5.mei']`

*  The following ensures that we don't try to analyze the 'Mass head only' files, which have no musical content:

>`pattern = 'CRIM_Mass_([0-9]{4}).mei'`

* Now the request for all the files

>`for p in piece_json["tree"]:
p_name = p["path"]
if re.search(pattern, p_name):
    pass
elif p_name in exclude_list:
    pass
else:
    piece_list.append(raw_prefix + p["path"])`

In [14]:
# this will pull ALL pieces from CRIM on Github
# Note that we exclude various monophonic pieces (which have no contrapuntal cadences)
# and also a few pieces that seem to throw errors for reasons we don't understand.
piece_list = []
raw_prefix = "https://raw.githubusercontent.com/CRIM-Project/CRIM-online/master/crim/static/mei/MEI_4.0/"
URL = "https://api.github.com/repos/CRIM-Project/CRIM-online/git/trees/990f5eb3ff1e9623711514d6609da4076257816c"
piece_json = requests.get(URL).json()

# list of files to exclude
exclude_list = ['CRIM_Model_0003.mei', 'CRIM_Model_0004.mei', 'CRIM_Model_0005.mei', 'CRIM_Model_0006.mei', 
             'CRIM_Model_0007.mei',
            'CRIM_Model_0022.mei', 'CRIM_Model_0028.mei', 'CRIM_Model_0035.mei', 'CRIM_Mass_0029_4.mei', 
             'CRIM_Mass_0049_2.mei',
            'CRIM_Mass_0049_5.mei']

# this ensures that we don't try to analyze the 'Mass head only' files, which have no musical content

pattern = 'CRIM_Mass_([0-9]{4}).mei'

# and now the request for all the files
for p in piece_json["tree"]:
    p_name = p["path"]
    if re.search(pattern, p_name):
        pass
    elif p_name in exclude_list:
        pass
    else:
        piece_list.append(raw_prefix + p["path"])

In [9]:
# use this to make a list of all the pieces in the Music_Files folder

piece_list = []
for name in glob.glob('Music_Files/*'):
    piece_list.append(name)
corpus = CorpusBase(piece_list)

Previously imported piece detected.
Previously imported piece detected.


In [10]:
piece_list

['Music_Files/A_Senfl_ave_from_midi.musicxml',
 'Music_Files/CRIM_Model_0008.mei']

In [11]:
corpus = CorpusBase(piece_list)

Previously imported piece detected.
Previously imported piece detected.


In [12]:
corpus.batch(func=ImportedPiece.cadences, verbose=True)


Running cadences analysis on 2 pieces:
	1: Ave, maria
	2: Ave Maria


[                       CadType  LeadingTones  CVFs Low RelLow Tone RelTone  \
 76.0      Evaded Clausula Vera           1.0    tC  C4     P8    C      P8   
 252.0            Clausula Vera           1.0    CT  G3     P5    G      P5   
 264.0                Authentic           1.0  CSTB  C3     P1    C      P8   
 336.0            Clausula Vera           1.0    CT  F3     P4    C      P8   
 344.0            Clausula Vera           1.0    CT  C3     P1    C      P8   
 348.0         Evaded Authentic           1.0    Cb  D3     M2    F      P4   
 356.0     Evaded Clausula Vera           1.0    tC  F3     P4    F      P4   
 364.0     Evaded Clausula Vera           1.0    tC  C3     P1    C      P8   
 400.0                Authentic           1.0  CTtB  F3     P4    F      P4   
 424.0            Clausula Vera           1.0    CT  G2    -P4    G      P5   
 456.0            Clausula Vera           1.0    CT  C3     P1    G      P5   
 544.0            Clausula Vera           1.0   tCT 

### C. 1 Find the Cadences in the Corpus

* Sample code (remember to omit "()" after the cadences function!

>`func = ImportedPiece.cadences
list_of_dfs = corpus.batch(func=func, metadata=True)
combined_df = pd.concat(list_of_dfs, ignore_index=False)`

* Suggested reorganization of columns in the output:

>`col_list = ['Composer', 'Title', 'Measure', 'Beat', 'CadType', 'Tone','Evaded', 'LeadingTones', 'Low','RelLow','RelTone','Progress','SinceLast','ToNext', 'Validation', 'Comments']`
    
>`combined_df = combined_df[col_list]`

In [13]:
func = ImportedPiece.cadences
list_of_dfs = corpus.batch(func=func, metadata=True)
combined_df = pd.concat(list_of_dfs, ignore_index=False)


combined_df['Validation'] = ""
combined_df['Comments'] = ""
col_list = ['Composer', 'Title', 'Measure', 'Beat', 'CadType', 'Tone','CVFs',
                'LeadingTones', 'Low','RelLow','RelTone',
                'Progress','SinceLast','ToNext', 'Validation', 'Comments']
combined_df = combined_df[col_list]
# combined_df.to_csv("full_corpus_results.csv")
combined_df

Unnamed: 0,Composer,Title,Measure,Beat,CadType,Tone,CVFs,LeadingTones,Low,RelLow,RelTone,Progress,SinceLast,ToNext,Validation,Comments
76.0,Ludwig Senfl,"Ave, maria",10,3.0,Evaded Clausula Vera,C,tC,1.0,C4,P8,P8,0.027066,76.0,176.0,,
252.0,Ludwig Senfl,"Ave, maria",32,3.0,Clausula Vera,G,CT,1.0,G3,P5,P5,0.089744,176.0,12.0,,
264.0,Ludwig Senfl,"Ave, maria",34,1.0,Authentic,C,CSTB,1.0,C3,P1,P8,0.094017,12.0,72.0,,
336.0,Ludwig Senfl,"Ave, maria",43,1.0,Clausula Vera,C,CT,1.0,F3,P4,P8,0.119658,72.0,8.0,,
344.0,Ludwig Senfl,"Ave, maria",44,1.0,Clausula Vera,C,CT,1.0,C3,P1,P8,0.122507,8.0,4.0,,
348.0,Ludwig Senfl,"Ave, maria",44,3.0,Evaded Authentic,F,Cb,1.0,D3,M2,P4,0.123932,4.0,8.0,,
356.0,Ludwig Senfl,"Ave, maria",45,3.0,Evaded Clausula Vera,F,tC,1.0,F3,P4,P4,0.126781,8.0,8.0,,
364.0,Ludwig Senfl,"Ave, maria",46,3.0,Evaded Clausula Vera,C,tC,1.0,C3,P1,P8,0.12963,8.0,36.0,,
400.0,Ludwig Senfl,"Ave, maria",51,1.0,Authentic,F,CTtB,1.0,F3,P4,P4,0.14245,36.0,24.0,,
424.0,Ludwig Senfl,"Ave, maria",54,1.0,Clausula Vera,G,CT,1.0,G2,-P4,P5,0.150997,24.0,32.0,,


### C.2.  Summary by Type, Tone, etc

* Here you can report an inventory of cadences by **type** and **tone** (and **evaded** status):
> `combined_df['Tone'].value_counts().to_frame()`




In [6]:
combined_df['CadType'].value_counts().to_frame()


Unnamed: 0,CadType
Authentic,7
Evaded Clausula Vera,6
Clausula Vera,5
Abandoned Clausula Vera,1


* Or, various groupings:


>`combined_df.groupby(['CadType', 'Tone', 'Evaded']).size().reset_index(name='counts')`

In [9]:

grouped_types = combined_df.groupby(['Tone', 'CadType', 'CVFs']).size().reset_index(name='counts')
grouped_types

Unnamed: 0,Tone,CadType,CVFs,counts
0,C,Authentic,CB,2
1,C,Clausula Vera,CT,4
2,C,Evaded Clausula Vera,tC,1
3,G,Abandoned Clausula Vera,zC,1
4,G,Authentic,CB,2
5,G,Authentic,tCB,3
6,G,Clausula Vera,CT,1
7,G,Evaded Clausula Vera,tC,5


In [32]:
combined_df.to_csv("CRIM_Corpus_Cadences.csv")