Skip to content
This repository has been archived by the owner on Apr 6, 2022. It is now read-only.

Schema validation of input MEI files #34

Open
napulen opened this issue May 20, 2021 · 2 comments
Open

Schema validation of input MEI files #34

napulen opened this issue May 20, 2021 · 2 comments
Labels
medium priority question Further information is requested to investigate

Comments

@napulen
Copy link
Member

napulen commented May 20, 2021

All XML inputs are usually validated against a schema file (e.g., XSD).

MEI is no exception: https://music-encoding.org/resources/schemas.html

If possible, given the current Python library used to parse the XML, a first step to assess an MEI file could be to validate it against the corresponding schema (e.g., Neume or CWMN).

Not sure if this functionality is provided in the xml library.

If not, what are the options?

@napulen napulen added question Further information is requested medium priority to investigate labels May 20, 2021
@kemalkongar
Copy link
Member

For Neume MEI Files:

These will be outputs of the OMR process. As far as I know, our assumption of syllable tags enclosing the syl and note values is valid for all outputs. Due to the algorithm in dictionary creation (mapping syllables to their neueme components and neuemes) that single condition is all that's needed to create a volpiano with at least 1 note.

For CWMN Files:

The Andrew Hughes repository is set in a certain way where notes preceed syllables. I'm not sure if this is the generally accepted way of writing MEI files but if it is, the algorithm should continue to work.

Idiosyncracies such as unknown pnames and syllable-independent notes are handles below the volpiano output.

In general:

I think it's best to have a lenient volpiano that'll be mostly correct even in the face of unknown inputs given that it's the user's responsiblity to input valid MEI files. In both automated cases, that being the conversion of Andrew Hughes repository and outputs from Rodan -- known MEI formats that are being tested in pytest.

@kemalkongar
Copy link
Member

Most recent version of dev has several methods regarding the "standardization" of volpiano strings. My aim was to create a stable way of doing database comparison -- even in the case of human error or lack of MEI information for hyphens. Basically, except the initial "---" after the clef, all multiple hyphens are reduced to singles. This allows for a valid volpiano to be printed without the need to word information. Furthermore, it partially negates the difficulty of CWMN not having neueme components for single hyphens.

Basically, explicitly paired neueme components, i.e. "gfeh" are kept together while any sort of separation, whether it be a neueme component or syllable, inserts a single hyphen. I've added a method that compares a volpiano (say, one from Cantus) to an MEI file (CWMN or Neueme) and return a "standardized volpiano".

I know we want to get everything to "work" as a first priority so I thought this would be a flawed but efficient way of streamlining the Cantus volpiano conversion. It also addresses the point in this issue, which is why I wrote it here.

By standardizing volpiano around notes, we ensure close neueme components are printed next to each other and that spaces will exist for whatever reasons the components weren't together. This allows for even invalid XML files to be converted into valid volpiano with the only loss of information during conversion being the number of hyphens. It's a Band-Aid style solution but I think it will help us immensly with testing.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
medium priority question Further information is requested to investigate
Projects
None yet
Development

No branches or pull requests

2 participants