# Using Large Language Models to Analyze Music as MEI and JSON

## Goals

In this notebook, we will be using LangChain and OpenAI to evaluate a Large Language Model's (LLM's) ability to analyze music. To do this, we will be exploring the LLM's capabilities by touching on each of the following concepts:

* **Get Tabular Data from MEI (Music Encoding Initiative).**  We sort of know it can already do this:  returning tables of all the notes in each voice part.  Can it do so also with the octave information?  Same thing for durations.  Then I suppose we might ask it to calculate things, like the range of a piece, or the distribution of notes.  See subsequent ideas about finding ‘pieces with similar distributions to this one’.  
* **Work with metadata in the MEI files.**  Tell us something about which ones are most similar in terms of sources, editors, composers.  A structured output, I suppose, like a database of pieces based on metadata in them.  
* **Get Text from MEI and Recreate It.**  Lyrics are also elements in MEI (and MusicXML)  of course they are distributed among different vocal lines in the case of Renaissance music, and there are lots of word repetitions.  Could we recover the text and then use LLM to put into shape (as a rhymed poem, or something like that?).  Could they LLM use the placement of rests as a way to understand the form?  
* **Could it Guess the Key.** of a piece from the distribution or proportion of pitches?  What would it need to know from us in order to do this?  Could it tell us which pieces are the least tonal, or least easy to rank by key?  Can it tell us if a piece changes key along the way?  (Hint:  not using key signature directly).  
* **Rank Pieces by Difficulty.**  Things with lots of changes of direction, or wide leaps, or difficult rhythms, etc.  The Bartok pieces are in fact pedagogical (and even in order). 
* **Extract editorial feature data from files.** this would involve both editorial and music feature data.  The CRIM files (and lots of other MEI files of early music) have things like Musica Ficta in them.  These are editorial accidentals applied according to various rules of counterpoint.  Could the tool tell us about where these appear, which composers and editors make use of them?  Could it also produced some structured data for this, like a report or table?  

During these tests, we are using a sample of 9 mei files:

* Invention No. 1 in C major - Bach, Johann Sebastian
* Invention No. 7 in E minor - Bach, Johann Sebastian
* Invention No. 15 in B minor - Bach, Johann Sebastian
* Mikrokosmos No. 22: Imitation and Counterpoint - Bartók, Béla
* Mikrokosmos No. 31: Little Dance in Canon Form - Bartók, Béla
* Mikrokosmos No. 104: Wandering through the Keys - Bartók, Béla
* Go ye my canzonettes - Morley, Thomas
* Leave now mine eyes - Morley, Thomas
* Flora wilt thou - Morley, Thomas

For the LLM with tools, we feed the LLM a basic dictionary with the title, composer, filepath, and number of parts. It then can use the filepath (after some manual parsing) to run Music21.

For the llm without tools, we saved extended summary information as a JSON file. This JSON includes basic metadata, all of the pitches and intervals for each part, and the key. 

All of the LLM music analysis in this experiment uses gpt-4o by OpenAI.

## Setup Code

### Imports

In [3]:
# Standard library imports
import ast
import getpass
import io
import json
import os
import re
from collections import Counter
from pathlib import Path
import xml.etree.ElementTree as ET

# Third-party imports
import pandas as pd
from music21 import converter, metadata, note

from typing import List, Optional, Union
from typing_extensions import TypedDict

from langchain.agents import initialize_agent, AgentType
from langchain.chat_models import init_chat_model
from langchain_core.documents import Document
from langchain_core.prompts import ChatPromptTemplate
from langchain.tools import tool

from langchain_community.document_loaders import TextLoader
from langchain_experimental.tools.python.tool import PythonREPLTool

from langgraph.graph import START, StateGraph

In [8]:
# Extract titles from all MEI files in the specified directory
mei_dir = Path(r"C:\Users\charl\Documents\VSCode\Encoding Music\Music Analysis\MEI Sample")
mei_titles = []
mei_composers = []

for file in mei_dir.glob("*.mei"):
    try:
        tree = ET.parse(file)
        root = tree.getroot()
        ns = {'mei': 'http://www.music-encoding.org/ns/mei'}
        # Try <workList>/<work>/<title>
        title = root.find('.//mei:workList/mei:work/mei:title', ns)
        if title is None:
            # fallback to <fileDesc>/<titleStmt>/<title>
            title = root.find('.//mei:fileDesc/mei:titleStmt/mei:title', ns)
        title_text = title.text.strip() if title is not None else "Unknown Title"
        mei_titles.append(title_text)

        # Try <workList>/<work>/<composer>
        composer = root.find('.//mei:workList/mei:work/mei:composer', ns)
        if composer is None:
            # fallback to <persName role="composer">
            composer = root.find('.//mei:persName[@role="composer"]', ns)
        composer_text = composer.text.strip() if composer is not None else "Unknown Composer"
        mei_composers.append(composer_text)
    except Exception as e:
        mei_titles.append("Unknown Title")
        mei_composers.append("Unknown Composer")

for i,title in enumerate(mei_titles):
    print(f"{title} - {mei_composers[i]}")

Invention No. 1 in C major - Bach, Johann Sebastian
Invention No. 7 in E minor - Bach, Johann Sebastian
Invention No. 15 in B minor - Bach, Johann Sebastian
Mikrokosmos No. 22: Imitation and Counterpoint - Bartók, Béla
Mikrokosmos No. 31: Little Dance in Canon Form - Bartók, Béla
Mikrokosmos No. 104: Wandering through the Keys - Bartók, Béla
Go ye my canzonettes - Morley, Thomas
Leave now mine eyes - Morley, Thomas
Flora wilt thou - Morley, Thomas


### Loading Docs / Setting up Summary Dictionary

In [30]:
# Load the 9 sample documents as XML


def loadXML(filepath: str):
    loader = TextLoader(filepath)
    pages = loader.load()
    doc: Document = Document(page_content="", metadata=pages[0].metadata)
    for page in pages:
        doc.page_content += page.page_content
    return doc


directory = Path(r"C:\Users\charl\Documents\VSCode\Encoding Music\Music Analysis\MEI Sample")
file_paths = [str(file) for file in directory.iterdir() if file.is_file()]
docs: List[Document] = []
for file in file_paths:
    docs.append(loadXML(file))

In [31]:
# Convert to a music21 dict

def extract_title_composer_from_mei(filepath):
    try:
        tree = ET.parse(filepath)
        root = tree.getroot()
        ns = {'mei': 'http://www.music-encoding.org/ns/mei'}

        # Try <workList>/<work>/<title>
        title = root.find('.//mei:workList/mei:work/mei:title', ns)
        if title is None:
            # fallback to <fileDesc>/<titleStmt>/<title>
            title = root.find('.//mei:fileDesc/mei:titleStmt/mei:title', ns)
        title_text = title.text.strip() if title is not None else "Unknown Title"

        # Try <workList>/<work>/<composer>
        composer = root.find('.//mei:workList/mei:work/mei:composer', ns)
        if composer is None:
            # fallback to <persName role="composer">
            composer = root.find('.//mei:persName[@role="composer"]', ns)
        composer_text = composer.text.strip() if composer is not None else "Unknown Composer"

        return title_text, composer_text

    except Exception as e:
        return "Unknown Title", "Unknown Composer"

def musicxml_to_summary_dict(filepath):
    try:
        score = converter.parse(filepath)
    except Exception as e:
        return {"error": f"Failed to parse file: {e}"}

    meta = score.metadata or metadata.Metadata()
    title, composer = extract_title_composer_from_mei(filepath)


    parts: int = 0
    for part in score.parts:
        parts +=1

    return {
        "title": title,
        "composer": composer,
        "mei_path": filepath,
        "num_parts": parts,
    }


directory = Path(r"C:\Users\charl\Documents\VSCode\Encoding Music\Music Analysis\MEI Sample")
summaries = {}

for file in directory.glob("*.mei"):
    print(f"Processing {file.name}...")
    summary = musicxml_to_summary_dict(str(file))
    summaries[file.stem] = summary

Processing Bach_BWV_0772.mei...
Processing Bach_BWV_0778.mei...
Processing Bach_BWV_0786.mei...
Processing Bartok_Mikrokosmos_022.mei...
Processing Bartok_Mikrokosmos_031.mei...
Processing Bartok_Mikrokosmos_104.mei...
Processing Morley_1595_01_Go_ye_my_canzonettes.mei...
Processing Morley_1595_07_Leave_now_mine_eyes.mei...
Processing Morley_1595_12_Flora_wilt_thou.mei...


### Tools

In [32]:
# Helper functions for tools
def parse_context(full_context: str) -> dict:
    import ast
    import re

    # If the string includes 'Context:', isolate just the context part
    if "Context:" in full_context:
        full_context = full_context.split("Context:", 1)[1].strip()

    pattern = r"Piece:\s*(\S+)\n(.*?)(?=\nPiece:|\Z)"
    matches = re.findall(pattern, full_context, re.DOTALL)

    context_dict = {}
    for piece_name, dict_str in matches:
        try:
            context_dict[piece_name.strip()] = ast.literal_eval(dict_str.strip())
        except Exception as e:
            raise ValueError(f"Failed to parse piece '{piece_name}': {e}")

    return context_dict

def count_notes_for_piece(filepath: str, pitch_with_octave: bool) -> Counter:
    score = converter.parse(filepath)
    notes = [
        n.nameWithOctave if pitch_with_octave else n.name
        for n in score.recurse().notes
        if isinstance(n, note.Note)
    ]
    return Counter(notes)

def extract_mei_metadata(filepath: str) -> dict:

    """
    Extracts detailed metadata from an MEI file, including:
    - Title
    - Composer
    - Editors (mei, xml, analyst)
    - Publication date
    - Application used to encode
    - Availability statement
    - Work title
    """
    ns = {'mei': 'http://www.music-encoding.org/ns/mei'}
    tree = ET.parse(filepath)
    root = tree.getroot()
    
    metadata = {
        'title': None,
        'composer': None,
        'mei_editors': [],
        'xml_editors': [],
        'analysts': [],
        'publication_date': None,
        'availability': None,
        'application': None,
        'work_title': None
    }

    # Title
    title = root.find('.//mei:titleStmt/mei:title', ns)
    if title is not None:
        metadata['title'] = title.text.strip()

    # People with roles
    people = root.findall('.//mei:titleStmt/mei:respStmt/mei:persName', ns)
    for person in people:
        role = person.attrib.get('role', '').lower()
        name = person.text.strip()
        if role == 'composer':
            metadata['composer'] = name
        elif role == 'mei_editor':
            metadata['mei_editors'].append(name)
        elif role == 'xml_editor':
            metadata['xml_editors'].append(name)
        elif role == 'analyst':
            metadata['analysts'].append(name)

    # Publication date
    date = root.find('.//mei:pubStmt/mei:date', ns)
    if date is not None:
        metadata['publication_date'] = date.attrib.get('isodate', None)

    # Availability / copyright
    availability = root.find('.//mei:pubStmt/mei:availability', ns)
    if availability is not None:
        metadata['availability'] = availability.text.strip()

    # Application name
    app = root.find('.//mei:appInfo/mei:application/mei:name', ns)
    if app is not None:
        metadata['application'] = app.text.strip()

    # Work title
    work_title = root.find('.//mei:workList/mei:work/mei:title', ns)
    if work_title is not None:
        metadata['work_title'] = work_title.text.strip()

    return metadata

def get_key_signature(filepath: str) -> str:
    score = converter.parse(filepath)
    key = score.analyze('key')
    return str(key)

def get_time_signature(filepath: str) -> str:
    """Returns the first time signature of the piece."""
    score = converter.parse(filepath)
    ts = score.recurse().getElementsByClass('TimeSignature')[0]
    return str(ts)

def get_pitch_histogram(filepath: str) -> dict:
    """Returns a histogram of pitches in the score."""
    score = converter.parse(filepath)
    return score.plot('histogram', 'pitch', returnDict=True)


In [41]:
# Tools!

@tool
def tool_count_notes(full_context: str, pitch_with_octave: bool = True) -> str:
    """
    Returns pitch count tables grouped by piece, using the 'mei_path' field in the score dictionary.
    """
    try:
        full_data = parse_context(full_context)
        output_sections = []
        log_lines = []

        for piece_name, data in full_data.items():
            filepath = data.get("mei_path")
            if not filepath:
                output_sections.append(f"Piece: {piece_name}\nError: 'mei_path' not found.\n")
                continue

            counts = count_notes_for_piece(filepath, pitch_with_octave)
            if not counts:
                output_sections.append(f"Piece: {piece_name}\nNo notes found in the score.\n")
                continue

            df = pd.DataFrame(counts.items(), columns=["Pitch", "Count"])
            df = df.sort_values("Pitch").reset_index(drop=True)

            section = f"Piece: {piece_name}\n{df.to_string(index=False)}\n"
            output_sections.append(section)

            log_lines.append(f"Processed {piece_name} - MEI path: {filepath}")

        with open("debug_output.txt", "a") as f:
            f.write("Parsed context keys: {}\n".format(', '.join(full_data.keys())))
            for line in log_lines:
                f.write(line + "\n")

        return "\n".join(output_sections).strip()

    except Exception as e:
        return f"count_notes failed: {e}"
    
@tool
def tool_get_full_metadata(full_context: str):
    """
    Extracts detailed metadata from an MEI file, including:
    - Title
    - Composer
    - Editors (mei, xml, analyst)
    - Publication date
    - Application used to encode
    - Availability statement
    - Work title
    """
    try:
        full_data = parse_context(full_context)
        filepaths = []
        metadata_summaries = []
        for piece_name, data in full_data.items():
            filepath = data.get("mei_path")
            if filepath:
                filepaths.append(filepath)
            else:
                metadata_summaries.append({"error": f"No 'mei_path' for {piece_name}"})
        
        for filepath in filepaths:
            metadata_summaries.append(
                extract_mei_metadata(filepath)
            )
        return metadata_summaries

    except Exception as e:
        return f"get_full_metadata failed: {e}"
    
@tool
def tool_get_key_signature(full_context: str):
    """
    Returns the key signature of each piece in the full context.
    """
    try:
        full_data = parse_context(full_context)
        output_sections = []

        for piece_name, data in full_data.items():
            filepath = data.get("mei_path")
            if not filepath:
                output_sections.append(f"Piece: {piece_name}\nError: 'mei_path' not found.\n")
                continue

            key_signature = get_key_signature(filepath)
            output_sections.append(f"Piece: {piece_name}\nKey Signature: {key_signature}\n")

        return "\n".join(output_sections).strip()

    except Exception as e:
        return f"get_key_signature failed: {e}"
    
@tool
def tool_get_time_signature(full_context: str):
    """
    Returns the time signature of each piece in the full context.
    """
    try:
        full_data = parse_context(full_context)
        output_sections = []

        for piece_name, data in full_data.items():
            filepath = data.get("mei_path")
            if not filepath:
                output_sections.append(f"Piece: {piece_name}\nError: 'mei_path' not found.\n")
                continue

            time_signature = get_time_signature(filepath)
            output_sections.append(f"Piece: {piece_name}\nTime Signature: {time_signature}\n")

        return "\n".join(output_sections).strip()

    except Exception as e:
        return f"get_time_signature failed: {e}"

@tool
def tool_get_pitch_histogram(full_context: str):
    """
    Returns a histogram of pitches for each piece in the full context.
    """
    try:
        full_data = parse_context(full_context)
        output_sections = []

        for piece_name, data in full_data.items():
            filepath = data.get("mei_path")
            if not filepath:
                output_sections.append(f"Piece: {piece_name}\nError: 'mei_path' not found.\n")
                continue

            histogram = get_pitch_histogram(filepath)
            output_sections.append(f"Piece: {piece_name}\nPitch Histogram: {histogram}\n")

        return "\n".join(output_sections).strip()

    except Exception as e:
        return f"get_pitch_histogram failed: {e}"
    
@tool 
def get_notes(full_context: str):
    """
    Returns a list of notes for each piece in the full context.
    """
    try:
        full_data = parse_context(full_context)
        output_sections = []

        for piece_name, data in full_data.items():
            filepath = data.get("mei_path")
            if not filepath:
                output_sections.append(f"Piece: {piece_name}\nError: 'mei_path' not found.\n")
                continue

            score = converter.parse(filepath)
            notes = [n.nameWithOctave for n in score.recurse().notes if isinstance(n, note.Note)]
            output_sections.append(f"Piece: {piece_name}\nNotes: {', '.join(notes)}\n")

        return "\n".join(output_sections).strip()

    except Exception as e:
        return f"get_notes failed: {e}"
    
@tool
def get_lyrics(full_context: str):
    """
    Returns the lyrics for each piece in the full context.
    """
    try:
        full_data = parse_context(full_context)
        output_sections = []

        for piece_name, data in full_data.items():
            filepath = data.get("mei_path")
            if not filepath:
                output_sections.append(f"Piece: {piece_name}\nError: 'mei_path' not found.\n")
                continue

            score = converter.parse(filepath)
            lyrics = []
            for n in score.recurse().notes:
                if isinstance(n, note.Note) and n.lyrics:
                    lyrics.append(n.lyrics[0].text)

            output_sections.append(f"Piece: {piece_name}\nLyrics: {', '.join(lyrics)}\n")

        return "\n".join(output_sections).strip()

    except Exception as e:
        return f"get_lyrics failed: {e}"

# AI Setup
tools = [
    tool_count_notes,
    tool_get_full_metadata,
    tool_get_key_signature,
    tool_get_time_signature,
    tool_get_pitch_histogram,
    get_notes,
    get_lyrics,
    PythonREPLTool()
]


### Setting up LLM and LangGraph

In [45]:

if not os.environ.get("OPENAI_API_KEY"):
  os.environ["OPENAI_API_KEY"] = getpass.getpass("Enter API key for OpenAI: ")

llm = init_chat_model("gpt-4o", model_provider="openai")
llm_mini = init_chat_model("gpt-4o-mini", model_provider="openai")

agent = initialize_agent(
    tools=tools,             # your local tool functions like count_notes
    llm=llm,                 # your ChatOpenAI or init_chat_model(...)
    agent=AgentType.OPENAI_FUNCTIONS,
    verbose=False,
)
# Detailed data for w/o tools
with open("music_summaries.json", "r") as f:
    all_scores = json.load(f)

# Main template for the chat prompt
template = ChatPromptTemplate([
    ("system", """You are an expert on music analysis. 
You are analyzing two-part scores using tools. 
You are going to be given various scores as `context`. When a tool calls for `full_context`,
input all of the text from the human question and context.
Only use the information provided in `data`. If you have uncertainty, express it. If you use a tool and it gives you an exception or error, include the exception or error in your response."""),
    
    ("human", "Question: {question}\n\nContext:\n{context}")
])

# Secondary template for choosing relevant score names
secondary_template = ChatPromptTemplate([
    ("system", "Your job is to choose the relevant score names from the provided context based on the question. The score names are the keys in the full context dictionary. For example, 'Bach_BWV_0772' is a score name. You are to return each relevant score name with a comma between each name and no spaces. If you see no relevant score names, return an empty string."),
    ("human", "Question: {question}\n\nContext:\n{context}\n\nPlease return the relevant score names as a comma-separated list without spaces.")
])

class State(TypedDict):
    question: str
    context: dict
    answer: str
    score_name: Optional[list[str]] = None
    use_tools: Optional[bool] = True

def llm_filter(state: State):
    """
    If no score_name list was supplied, ask gpt‑4o‑mini to guess the relevant pieces.
    """
    # ── 1. Maybe fill in score_name ────────────────────────────────
    if not state["score_name"]:
        prompt = secondary_template.invoke({
            "question": state["question"],
            "context": state["context"],
        })

        resp = llm_mini.invoke(prompt)          # ← AIMessage
        text = (
            resp["output"]                      # dict path (rare)
            if isinstance(resp, dict)
            else resp.content                   # AIMessage path (common)
        )
        names = [n.strip() for n in text.split(",") if n.strip()]
        state["score_name"] = names or None

    # ── 2. Fallback when nothing found ─────────────────────────────
    if not state["score_name"]:
        state["use_tools"] = True               # cheaper / safer path

    # ── 3. MUST return the keys we changed ─────────────────────────
    return {
        "score_name": state["score_name"],
        "use_tools":  state["use_tools"],
    }


def retrieve(state: State):
    print(f"Scores chosen: {state['score_name']}")
    data_source = summaries if state["use_tools"] else all_scores
    selected_scores = state["score_name"]

    if selected_scores:
        pieces = ((name, data_source[name]) for name in selected_scores if name in data_source)
    else:
        pieces = data_source.items()

    context = "\n".join(f"Piece: {name}\n{data}" for name, data in pieces)
    state["context"] = context
    return {"context": context}


def ask_llm(state: State):
    if state["use_tools"]:
        return ask_llm_tools(state)
    else:
        return ask_llm_no_tools(state)

def ask_llm_tools(state: State):
    prompt = template.invoke(
        {"question": state["question"], "context": state["context"]}
    )

    # OpenAIFunctions agent usually returns {"output": "..."}
    response = agent.invoke({"input": prompt})
    answer_text = (
        response["output"] if isinstance(response, dict) else response
    )

    # Return a partial‑state update
    return {"answer": answer_text}


def ask_llm_no_tools(state: State):
    prompt = template.invoke(
        {"question": state["question"], "context": state["context"]}
    )
    response = llm.invoke(prompt)
    answer_text = (
        response["output"] if isinstance(response, dict) else response
    )
    return {"answer": answer_text}


graph_builder = StateGraph(State).add_sequence([llm_filter, retrieve, ask_llm])
graph_builder.add_edge(START, "llm_filter")
graph = graph_builder.compile()

def run_graph(question: str, score_names: Optional[Union[str, list[str]]] = None, use_tools: Optional[bool] = True):
    if isinstance(score_names, str):
        score_names = [score_names]

    return graph.invoke({
        "question": question,
        "score_name": score_names,
        "use_tools": use_tools,
        "context": {},
        "answer": ""
    })

## Process Graph

```mermaid
graph TD
A[Submit Question, Optional Piece Names, Use tools default yes] --> B{Has piece name?}
B -->|NO| C[LLM picks relevant documents]
C --> D[Retrieve documents by filtering the dictionary]
B -->|YES| D
D --> E{Use Tools?}
E -->|NO| F[use detailed json data]
F --> G[Ask LLM]
E -->|YES| H[Use summary dict text]
H --> I[Convert text to dict object]
I --> J[Use tools]
J --> K[Pull full MEI/XML from dict filepath]
K --> L[Use full MEI with Music21 or CRIM tools]
L --> M[Return response]
G --> M

```

## Queries

### Tabular Data: LLM with tools vs LLM vs Correct response

LLM With Tools:

In [None]:
result_state = run_graph(
    "Return tabular data on the following piece of music. That is, return a table of how many of each notes there is throughout the piece.",
    "Bach_BWV_0772",
    use_tools=True,
)

print(result_state["answer"]) 

Scores chosen: ['Bach_BWV_0772']
Here is the tabular data for the piece "Invention No. 1 in C major" by Johann Sebastian Bach:

| Pitch | Count |
|-------|-------|
| A2    | 3     |
| A3    | 21    |
| A4    | 28    |
| A5    | 10    |
| B-3   | 4     |
| B-4   | 4     |
| B2    | 4     |
| B3    | 19    |
| B4    | 26    |
| B5    | 3     |
| C#4   | 1     |
| C#5   | 3     |
| C2    | 1     |
| C3    | 6     |
| C4    | 25    |
| C5    | 29    |
| C6    | 2     |
| D2    | 1     |
| D3    | 13    |
| D4    | 25    |
| D5    | 32    |
| E3    | 16    |
| E4    | 25    |
| E5    | 27    |
| F#3   | 8     |
| F#4   | 7     |
| F3    | 6     |
| F4    | 17    |
| F5    | 24    |
| G#3   | 2     |
| G#4   | 4     |
| G#5   | 1     |
| G2    | 4     |
| G3    | 18    |
| G4    | 24    |
| G5    | 20    |


LLM Without Tools:

In [8]:
result_state = run_graph(
    "Return tabular data on the following piece of music. That is, return a table of how many of each notes there is throughout the piece. Sort by note name (all the A's first, this the B's, etc.)",
    "Bach_BWV_0772",
    use_tools=False,
)

print(result_state["answer"].content) 

Scores chosen: ['Bach_BWV_0772']
Based on the data provided, the pitches from both parts can be tabulated as follows, sorted by note name:

| Note | Frequency |
|------|-----------|
| A2   | 2         |
| A3   | 12        |
| A4   | 15        |
| A5   | 4         |
| B-3  | 5         |
| B-4  | 3         |
| B2   | 3         |
| B3   | 19        |
| B4   | 16        |
| B5   | 2         |
| C2   | 1         |
| C3   | 11        |
| C4   | 18        |
| C5   | 18        |
| C6   | 2         |
| C#4  | 1         |
| C#5  | 2         |
| D2   | 2         |
| D3   | 12        |
| D4   | 20        |
| D5   | 18        |
| E3   | 13        |
| E4   | 19        |
| E5   | 12        |
| F3   | 4         |
| F4   | 14        |
| F5   | 11        |
| F#3  | 6         |
| F#4  | 5         |
| G2   | 2         |
| G3   | 15        |
| G4   | 22        |
| G5   | 12        |
| G#3  | 2         |
| G#4  | 3         |
| G#5  | 1         |

This table summarizes the frequency of each note across the t

Correct Response:

In [None]:
def count_notes(filepath, pitch_with_octave=True):
    """
    Count the number of times each pitch appears in a score.
    
    Args:
        filepath (str): Path to the MEI/MusicXML file.
        pitch_with_octave (bool): If True, include octave (e.g., 'C4'); else use pitch class (e.g., 'C').
    
    Returns:
        pd.DataFrame: A table showing pitch and count.
    """
    score = converter.parse(filepath)
    all_notes = []

    for n in score.recurse().notes:
        if isinstance(n, note.Note):
            if pitch_with_octave:
                all_notes.append(n.nameWithOctave)
            else:
                all_notes.append(n.name)

    counts = Counter(all_notes)
    df = pd.DataFrame(counts.items(), columns=["Pitch", "Count"])
    df = df.sort_values("Pitch").reset_index(drop=True)
    return df

count_notes(r"C:\Users\charl\Documents\VSCode\Encoding Music\Music Analysis\MEI Sample\Bach_BWV_0772.mei")

Unnamed: 0,Pitch,Count
0,A2,3
1,A3,21
2,A4,28
3,A5,10
4,B-3,4
5,B-4,4
6,B2,4
7,B3,19
8,B4,26
9,B5,3


### Metadata: LLM with tools vs LLM vs Correct response

LLM With Tools:

In [36]:
result_state = run_graph(
    "Comparing all of the pieces, tell me about the metadata. That is, which ones are most similar in terms of sources, editors, composers, etc."
)

print(result_state["answer"]) 

Scores chosen: None
The pieces can be grouped in terms of similarity in metadata as follows:

### Bach Inventions
- **Pieces**: 
  - *Invention No. 1 in C major (BWV 772)*
  - *Invention No. 7 in E minor (BWV 778)*
  - *Invention No. 15 in B minor (BWV 786)*
- **Composer**: Bach, Johann Sebastian
- **MEI Editors**: Richard Freedman
- **XML Editors**: Tobias Schölkopf
- **Analysts**: Varied (This Student, Rowan Shigeno, Holden Starkey)
- **Publication Date**: 2024-11-19
- **Application**: MEI Soup Updater 2024
- **Availability**: Governed by the copyright laws of Germany; requires prior written consent for utilization beyond copyright scope.

### Bartók Mikrokosmos
- **Pieces**: 
  - *Mikrokosmos No. 22: Imitation and Counterpoint*
  - *Mikrokosmos No. 31: Little Dance in Canon Form*
  - *Mikrokosmos No. 104: Wandering through the Keys*
- **Composer**: Bartók, Béla
- **MEI Editors**: Richard Freedman
- **XML Editors**: Pedro Ramoneda
- **Analysts**: Varied (Kylie McCombs, Maika Kogawara

LLM Without Tools:

In [37]:
result_state = run_graph(
    "Comparing all of the pieces, tell me about the metadata. That is, which ones are most similar in terms of sources, editors, composers, etc.",
    use_tools=False,
)

print(result_state["answer"]) 

Scores chosen: None
Here is an analysis of the metadata for the given pieces, highlighting similarities based on sources, editors, composers, etc.:

### Composer
- **Bach, Johann Sebastian**: Composed the Inventions (BWV 772, BWV 778, BWV 786).
- **Bartók, Béla**: Composed the Mikrokosmos pieces (Nos. 22, 31, 104).
- **Morley, Thomas**: Composed the canzonettes (Go ye my canzonettes, Leave now mine eyes, Flora wilt thou).

### Editors
- **MEI Editor**: Richard Freedman worked on all pieces; common to all the collections (Bach, Bartók, Morley).
- **XML Editors**:
  - Tobias Schölkopf edited Bach's works.
  - Pedro Ramoneda worked on Bartók's works.
  - André Vierendeels edited Morley's pieces.

### Analysts
- Bach's pieces have various analysts: This Student, Rowan Shigeno, and Starkey, Holden.
- Bartók's pieces: Kylie McCombs, Maika Kogawara, and Starkey, Holden.
- Morley's pieces: Rohan Sarma, Kylie McCombs, and Starkey, Holden.

### Publication Date
- All pieces apparently have a pub

Both do a great job here! This is to be expected, as it is simply pulling metadata using a tool, or reading it from the dictionary fed to it. 

### Lyrics: LLM with tools vs LLM vs Correct response

LLM With Tools:

In [43]:
result_state = run_graph(
    "Find the lyrics in this piece of music.",
    "Morley_1595_01_Go_ye_my_canzonettes",
    use_tools=True,
)

print(result_state["answer"]) 

Scores chosen: ['Morley_1595_01_Go_ye_my_canzonettes']
The lyrics of the piece "Go ye my canzonettes" by Thomas Morley are as follows:

```
Goe yee my Canzonets, to my deer darling,
goe yee my Canzonets, to my deer darling,
goe yee my Canzonets, to my deer darling,
to my deer darling,
and with your gentle daintie sweet accentings,
desire hir to vouchsafe these my lamentings,
desire hir to vouchsafe these my lamentings,
and with a crownet, of hir rayes supernall,
t'adorne your locks and make your name eternal,
t'adorne your locks and make your name eternal,
and with a crownet, of hir rayes supernall,
t'adorne your locks and make your name eternal,
t'adorne yt locks and make your name eternall.
Goe yee my Canzonets, to my deer darling,
deer darling,
goe yee my Canzonets, to my deer darling,
to my deer darling,
and with your gentle daintie sweet accentings,
desire hir to vouchsafe these my lamentings,
desire hir to vouchsafe these my lamentings,
and with a crownet, of hir rayes spernall,


In [46]:
result_state = run_graph(
    "Find the lyrics for Morley's 'Go ye my canzonettes'.",
    use_tools=True,
)

print(result_state["answer"]) 

Scores chosen: None
Here are the lyrics for Thomas Morley's "Go ye my canzonettes":

```
Goe, yee my Canzonets, to my deer darling,
Goe, yee my Canzonets, to my deer darling,
Goe, yee my Canzonets, to my deer darling,
to my deer darling, and with your gentlr dain-tie sweet accentings,
desire hir to vouch-safe these my lamentings,
desire hir to vouch-safe these my lamentings,
and with a crownet, of hir rayes supernall,
t'adore your locks and make your name eternal,
t'adore your locks and make yout name eternall,
and with a crownet of hir rayes supernall,
t'adore your locks and make your name eternal,
t'adore yout locks and make your name eternall.

Goe, yee my Canzonets, to my deer darling,
deer darling, goe, yee my Canzonets, to my deer darling,
to my deer darling, and with your gentle daintie sweet accentings,
desire hir to vouch-safe these my lamentings,
desire hir to vouch-safe these my lamentings,
and with a crownet, of hir rayes spernall,
t'adore your locks and make your name eter

It's interesting that the first call corrected one of the instances "eternall" to "eternal".

LLM Without Tools:

In [49]:
result = loadXML(r"C:/Users/charl/Documents/VSCode/Encoding Music/Music Analysis/MEI Sample/Morley_1595_01_Go_ye_my_canzonettes.mei").page_content

answer = llm.invoke(
    f"""Question: Find the lyrics for Morley's 'Go ye my canzonettes'. 
    Context: 
    {result}
    
    """
)

print(answer.content)

The lyrics for Thomas Morley's "Go ye my canzonettes" are embedded within the provided XML data surrounding the `<verse>` and `<syl>` tags. To extract the lyrics, we can sequentially pull out these tags and assemble the text, which is structured as individual syllables. Here's a transcription of the lyrics based on the given data:

```
Go ye my canzonets to my deer
Go ye my canzonets to my deer
Go ye my canzonets to my deer
Go ye my canzonets to my deer
Go ye my canzonets to my deer
Go ye my canzonets to my deer
Go ye my canzonets to my deer
Go ye my canzonets to my deer
Go ye my canzonets to my deer
Go ye my canzonets to my deer

Darling, go ye my darling, go ye my darling
Go ye, my darling
With your gentle title
With your gentle daintie
Sweet accenings desir hir
To vouche safe there my laments
To vouche safe there my laments
To vouche safe there my laments
To vouche safe there my laments
And with a crowne of her rayes supernall
Dorne your locks, make your name
Eternal
To adorn your n

Correct Response:

In [57]:
score = converter.parse(r"C:\Users\charl\Documents\VSCode\Encoding Music\Music Analysis\MEI Sample\Morley_1595_01_Go_ye_my_canzonettes.mei")
lyrics = []

for n in score.recurse().notes:
    if isinstance(n, note.Note) and n.lyrics:
        for lyric in n.lyrics:
            if lyric.text:
                cleaned = lyric.text.strip().replace("\n", "").replace("\r", "")
                lyrics.append(cleaned)

formatted = " ".join(lyrics)
formatted = formatted.replace(" --", "-").replace("--", "-")
formatted = " ".join(formatted.split())
formatted = formatted.replace(",","\n")
formatted = formatted.replace(" - - ","")
print("Lyrics:\n", formatted)

Lyrics:
 Goe yee my Canzonets to my deer darling
 goe yee my Canzonets to my deer darling
 goe yee my Canzonets to my deer darling
 to my deer darling
 and with your gentlr daintie sweet accentings
 desire hir to vouchsafe these my lamentings
 desire hir to vouchsafe these my lamentings
 and with a crownet
 of hir rayes supernall
 t'adorne your locks and make your name eternal
 t'adorne your locks and make yout name eternall
 and with a crownet of hir rayes supernall
 t'adorne your locks and make your name eternal
 t'adorne yout locks and make your name eternall. Goe yee my Canzonets to my deer darling
 deer darling
 goe yee my Canzonets to my deer darling
 to my deer darling
 and with your gentle daintie sweet accentings
 desire hir to vouchsafe these my lamentings
 desire hir to vouchsafe these my lamentings
 and with a crownet
 of hir rayes spernall
 t'adorne your locks and make your name eternall t'adorne your looks and make your name eternall
 and with a crownet
 of hir rayes supe

### Guessing the Key: LLM with tools vs LLM vs Correct response

LLM With Tools:

In [64]:
result_state = run_graph(
    "Guess the key of the piece. Explain why you chose the key.",
    "Bach_BWV_0778",
    use_tools=True,
)

print(result_state["answer"]) 

Scores chosen: ['Bach_BWV_0778']
The key of the piece "Invention No. 7 in E minor" by Johann Sebastian Bach is E minor. This is confirmed both by the title of the piece itself and the key signature extracted from the MEI file, which indicates E minor. The E minor key is consistent with its baroque context and Bach's typical use of tonality.


In [65]:
result_state = run_graph(
    "Guess the key of each piece. Explain why you chose each key. Return a table of piece names and keys.",
    use_tools=True,
)

print(result_state["answer"]) 

Scores chosen: None
Here is a table of piece names and their corresponding key signatures:

| Piece                                   | Key Signature |
|-----------------------------------------|---------------|
| Bach_BWV_0772                           | C major       |
| Bach_BWV_0778                           | E minor       |
| Bach_BWV_0786                           | B minor       |
| Bartok_Mikrokosmos_022                  | A minor       |
| Bartok_Mikrokosmos_031                  | D minor       |
| Bartok_Mikrokosmos_104                  | D major       |
| Morley_1595_01_Go_ye_my_canzonettes     | F major       |
| Morley_1595_07_Leave_now_mine_eyes      | G minor       |
| Morley_1595_12_Flora_wilt_thou          | G major       |

The key signatures for each piece were determined by analyzing the MEI files and extracting the key signature data. This information aligns with the titles of the Bach inventions, which often indicate the key directly, and the descriptive nature o

LLM Without Tools:

In [66]:
result_state = run_graph(
    "Guess the key of the piece. Explain why you chose the key.",
    "Bach_BWV_0778",
    use_tools=False,
)

print(result_state["answer"].content) 

Scores chosen: ['Bach_BWV_0778']
The piece "Invention No. 7 in E minor" by Johann Sebastian Bach is most likely in the key of E minor, based on several indicators:

1. **Title:** The title itself suggests the key "E minor."

2. **Analyzed Key:** The data indicates that the analyzed key is E minor for both parts of the piece.

3. **Pitch Content and Key Signatures:** The presence of pitches like B4, F#4, and G4, along with a key signature of one sharp, aligns with the notes and key signature typically associated with E minor.

4. **Cadences and Harmonic Progression:** The frequent presence of E, G, B, D#, C# notes acts as a basis for minor harmonic progression signaling towards a tonal center of E minor.

The combination of these elements substantiates the likelihood that the piece is in E minor.


In [68]:
result_state = run_graph(
    "Guess the key of each piece. Explain why you chose each key. Return a table of piece names and keys.",
    use_tools=False,
)

print(result_state["answer"]) 

Scores chosen: None
Here is a table of piece names along with their corresponding keys:

| Piece Name                                     | Key         |
|------------------------------------------------|-------------|
| Bach_BWV_0772 - Invention No. 1 in C major     | C major     |
| Bach_BWV_0778 - Invention No. 7 in E minor     | e minor     |
| Bach_BWV_0786 - Invention No. 15 in B minor    | b minor     |
| Bartok_Mikrokosmos_022 - Imitation and Counterpoint | a minor     |
| Bartok_Mikrokosmos_031 - Little Dance in Canon Form  | d minor     |
| Bartok_Mikrokosmos_104 - Wandering through the Keys  | D major     |
| Morley_1595_01_Go_ye_my_canzonettes            | F major     |
| Morley_1595_07_Leave_now_mine_eyes             | g minor     |
| Morley_1595_12_Flora_wilt_thou                 | G major     |

The keys are determined based on the key signatures extracted from each piece.


Correct Response:

Both were correct for all of the pieces.

### Ranking by Difficulty: LLM with tools vs LLM vs Correct response

LLM With Tools:

In [69]:
result_state = run_graph(
    "Rank all of the pieces by difficulty. Explain your reasoning. List any tools you use.",
    use_tools=True,
)

print(result_state["answer"]) 

Scores chosen: None
To rank the pieces by difficulty, we can begin by considering factors such as the pitch range, complexity of rhythms, note density, harmonic complexity, and counterpoint. Let's analyze these factors:

1. **Pitch Diversity and Density:** Pieces with more diverse and dense pitch usage might be more technically demanding.
2. **Complexity of Rhythms and Melodies:** More complex rhythms and interweaving melodies can increase difficulty.
3. **Harmonic and Counterpoint Complexity:** More intricate harmonies and counterpoints can increase the sophistication needed to perform the piece.

### Analysis of the Provided Pieces:

- **Bach's Inventions (BWV 772, 778, 786):**
  - Bach_BWV_0772 and BWV_0786 involve a wider range of pitches including up to 4 or 5 accidentals, suggesting they use more complex harmonies or modulations, which can be challenging.
  - BWV_0778 includes frequent modulation and more accidentals, indicating advanced harmonic complexity.

- **Bartók's Mikroko

LLM Without Tools:

In [70]:
result_state = run_graph(
    "Rank all of the pieces by difficulty. Explain your reasoning. List any tools you use.",
    use_tools=False,
)

print(result_state["answer"]) 

Scores chosen: None
It seems that the tool to get time signatures didn't return the needed data. Given this limitation, I'll base part of the difficulty ranking on typical baroque and early 20th-century rhythmic complexities associated with the pieces' styles, along with the known analyses.

### Difficulty Ranking:

1. **Bach's Inventions (Challenging due to contrapuntal complexity)**
   - **Most Difficult**: BWV 0786 (B minor) - The minor key with more accidentals adds complexity.
   - **Moderate**: BWV 0778 (E minor) - Also a minor piece, typically a slight reduction in difficulty.
   - **Simplest**: BWV 0772 (C major) - No sharps/flats, structurally less complex.

2. **Bartók's Mikrokosmos (Each piece emphasizes specific technical skills)**
   - **Most Difficult**: No. 104 (D major) - Complex key changes give it more challenge.
   - **Moderate**: No. 31 (D minor) - Likely focuses on specific patterns or rhythmic elements beyond the key challenge.
   - **Simplest**: No. 22 (A minor) 

There is not "correct" answer for this, but it's interesting to see the paths that it takes. It gets much more specific when run with the tools, listing each individual piece. The non-tools groups by composer. Both, for the most part, seem to agree with each other.

### Musica Ficta with an LLM

In [72]:
result = loadXML(r"C:\Users\charl\Downloads\CRIM_Model_0001.mei").page_content

answer = llm.invoke(
    f"""Question: Find and explain any instances of Musica Ficta in this piece. 
    Context: 
    {result[0:50_000]}
    
    """
)

print(answer.content)

Musica ficta refers to the practice in medieval and Renaissance music of adding accidentals or altering pitches that are not notated in the musical score. This was done primarily to avoid dissonance and to create smoother melodic lines or better cadential progressions.

In the provided piece "Veni speciosam" by Johannes Lupi, instances of musica ficta may occur where you see alterations not explicitly written in the notation but implied based on the rules of counterpoint and performance practice of the period. A typical scenario where musica ficta might be applied is at cadences, where performers might raise the leading tone or alter pitches to facilitate smoother voice leading.

From the given XML (Music Encoding Initiative) data, we can observe some pitches marked with accidentals (e.g., `<accid xml:id="m-151" accid.ges="f"/>`), indicating that the editor has applied these alterations, potentially following the practice of musica ficta. An example can be found in measures 4 through 7

## Conclusions

The LLM's did surprisingly well on a majority of the queries, with or without tools. Several queries, however, such as returning tabular data, showed much higher accuracy when tools were used. 

In all, the LLM performed with 100% accuracy (6/6) with access to relevant tools - music21 functions in this case. The LLM without tools performed with about 80% accuracy (4/5) - still a solid number considering it only struggled with counting massive numbers of notes.

Thus, gpt-4o had over 90% accuracy when performing relatively complex analysis of MEI and JSON data. This figure is impressive in itself, but becomes even more impressive when noting its flawless performance when it's able to simulate the results with code. 