Skip to content

(AI) Humdrum parser drops notes from sub-spines created by *^ after a *v-merge #1882

@weselyj

Description

@weselyj

music21 version

9.9.1

Operating System(s) checked

Windows 11 (Python 3.13.13)
Linux (Python 3.14.4)

Same parser output on both — the bug is not OS-specific.

Problem summary

When a Humdrum **kern spine is split with *^, merged with *v, then re-split with another *^ (a "voice rejoin" pattern common in piano music), the parser drops notes from the newly-created sub-spines after the second split. The notes are present in the source file and the spine syntax is valid Humdrum, but they never appear in the resulting Stream.

This is related to but distinct from #884 (closed) — that issue was about the merge itself; here the merge succeeds but the subsequent re-split loses notes.

Steps to reproduce

The bug is in the humdrum subconverter, so a tinynotation reproducer isn't possible. Save the following as repro.krn:

**kern	**kern
*clefF4	*clefG2
*k[]	*k[]
*M4/4	*M4/4
=1	=1
*	*^
4C	4c	4d
4E	4e	4f
*	*v	*v
=2	=2
*	*^
*	*	*^
4F	4g	4a	4b
4G	4g	4a	4b
*	*v	*v
*	*v	*v
=3	=3
4A	4cc
*-	*-

(Tab-separated columns. 1 bass spine + 1 treble spine; the treble splits into 2 voices in m1, merges, splits into 3 voices in m2, then merges back to 1 voice for m3.)

import music21

s = music21.converter.parse('repro.krn', format='humdrum')
notes = list(s.recurse().notes)
print(f'parsed: {len(notes)} notes')
for n in notes:
    parent = n.getContextByClass(music21.stream.Part)
    pidx = list(s.parts).index(parent) if parent else -1
    offset = float(n.getOffsetInHierarchy(s))
    print(f'  part={pidx} offset={offset:>4.1f} pitch={n.pitch.nameWithOctave}')

Expected vs. actual behavior

The kern source contains 16 notes:

measure bass treble
1 C E (2) c d e f (4)
2 F G (2) g a b g a b (6)
3 A (1) cc (1)

Actual output is 12 notes:

part=0 offset= 0.0 pitch=C4
part=0 offset= 1.0 pitch=E4
part=0 offset= 0.0 pitch=D4
part=0 offset= 1.0 pitch=F4
part=0 offset= 2.0 pitch=G4
part=0 offset= 3.0 pitch=G4
part=0 offset= 2.0 pitch=C5    ← cc note appears here, expected at offset 4.0
part=1 offset= 0.0 pitch=C3
part=1 offset= 1.0 pitch=E3
part=1 offset= 2.0 pitch=F3
part=1 offset= 3.0 pitch=G3
part=1 offset= 4.0 pitch=A3

Missing: the a b a b content from the second *^ split in m2 (4 notes). The cc note from m3 also appears at offset 2.0 instead of offset 4.0, suggesting a cascade from the missing content.

No exception is raised; the notes are silently absent.

More information

I encountered this while building a music-OMR pipeline that compares music21's humdrum parse against a custom kern-to-token converter as a validation oracle. Across ~54k Beethoven sonata files from KernScores, ~560 (~1.0%) exhibit this pattern: the converter's output is a strict superset of music21's, with the extra notes always corresponding to post-*^-after-merge sub-spine content.

Likely fix locus: music21/humdrum/spineParser.py::createMusic21Streams or thereabouts (no exception raised, so no stack trace). Sub-spine tracking after a *v-merge appears to lose track of new sub-spines created by a subsequent *^.

For now, in my pipeline I treat "music21 produces fewer notes than the kern source" as a known-incorrect-reference signal rather than a converter error.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions