Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

XMLLyricLanguage has no attribute {http://www.w3.org/XML/1998/namespace}lang #6

Open
MikeEcho opened this issue Jan 30, 2024 · 8 comments

Comments

@MikeEcho
Copy link
Collaborator

MikeEcho commented Jan 30, 2024

I wanted to point out a curious issue that arises immediately when I try to parse an XML exported directly from Sibelius Ultimate version 2022.12. I've read in your documentation that this library has been built with Finale in mind rather than other sheet music software, but I figure I'd ask just in case.

Calling score = parse_musicxml(Path(__file__).parent / 'test_language.xml') in a unit test I was able to reproduce this error for where the XML file itself contains the XML block before part-list:

<defaults>
    <lyric-language xml:lang="en" />
</defaults>

causes the exception (in the issue title) to be thrown. Full error message:

AttributeError: XMLLyricLanguage has no attribute {http://www.w3.org/XML/1998/namespace}lang. Allowed attributes are: ['lang', 'name', 'number'] or possible children as attributes: []

In debug mode I found that the {http://www.w3.org/XML/1998/namespace}lang key was already set after xml library calls ET.parse():

Screenshot 2024-01-30 at 10 43 57 pm

Which seems to me to be not an issue with this musicxml library, but with xml?

Because of this initial finding, I'm inclined to close this issue and code a workaround where my own xml code removes the XMLLyricLanguage element first before parsing the rest of the XML. Just wanted to get your thoughts before I do that!

@alexgorji
Copy link
Owner

The parse_musicxml function is not really matured yet. Until now the focus of both libraries musicxml and musicscore was to create MusicXML files rather than importing or analysing them, though that could be a very interesting feature for the future. So in the long run it would be desirable to fix this function to be able to parse also MusicXML files which are created with Sibelius. So if you have a good idea for an easy fix, go for it. Please keep in mind that the tests for this function are also still only a rough draft and are not included yet in the test suite.

@alexgorji
Copy link
Owner

alexgorji commented Feb 8, 2024 via email

@MikeEcho
Copy link
Collaborator Author

I'll be responding here soon, sorry for the delay. I'll create an issue for the other bug I found :)

@MikeEcho
Copy link
Collaborator Author

MikeEcho commented Apr 7, 2024

Do you have an idea how we could design tests for parse_musicxml function to cover all these kinds of problems? If you are going to invest some time in enhancing this function it would be nice to think about testing first. But anyway feel free to add issues and describe your ideas. It doesn’t do any harm at all!

Definitely, any issue found must have at least one test case (or more) to back it up. Currently I've created a simple test that invokes the parse_musicxml function to parse an xml file containing the namespace'd xml:lang element attribute:

score = parse_musicxml(Path(__file__).parent / 'test_lyric-language.xml')
assert score is not None

At this stage I'm considering which approach to fix it is most maintainable in the long run, since namespaces are an integral part of XML, and in musicXML there are multiple instances where a namespace is used in an element attribute.

@MikeEcho
Copy link
Collaborator Author

MikeEcho commented Apr 18, 2024

Hi @alexgorji, I've isolated the root cause for why parse_musicxml (in parser/parser.py) cannot parse musicxml files containing

<lyric-language xml:lang="en" />

It is not a fault of the function itself, but of how the xml:lang XSDAttribute is created when parsing the musicxml_4_0.xsd file. This line at https://github.com/alexgorji/musicxml/blob/master/musicxml/xsd/xsdattribute.py#L29 hardcodes the attribute name to lang. When the Python xml library parses a musicxml file with the lyric-language element shown above, the attribute name is parsed as {http://www.w3.org/XML/1998/namespace}lang. During parsing, a check is made against the allowed XSD attributes on the XMLLyricLanguage class (XSDAttribute(name=number), XSDAttribute(name=name), XSDAttribute(name=lang)), searching by name. Since {http://www.w3.org/XML/1998/namespace}lang is not in any of the names (it should be matched with lang somehow), parsing fails.

I'd like to ask why you hardcoded xml:lang parsing, was there a specific reason? It'd be good to know why before I propose different fix options!

@alexgorji
Copy link
Owner

Hi @MikeEcho,
very nice! I can absolutley not remember why I did that :- ) Feel free to change it.

@MikeEcho
Copy link
Collaborator Author

MikeEcho commented Apr 23, 2024

@alexgorji it'll be my pleasure! Before I do that, I'd like to brainstorm options for a fix, because there are different ways we could go about this. As the main maintainer of this library, you'd probably be in the best position to know which option is best.

What we know:

  • The Python library xml, used by the parse_musicxml function, parses namespaces as {...}:attr_name, where ... is the namespace URL
  • The musicxml library itself, when assembling Python classes from the musicxml XSD, currently hardcodes xml:lang (and xml:space) refs to be rendered a specific way and added as an ETree to the corresponding XSDAttribute class, losing the namespace information in the process

First off, I assume we will want the ability to export namespaced attributes (e.g. export Python -> musicxml with a lyric-language element specifying xml:lang="some language code"). Otherwise we'd have to build in some way for the "namespace matcher" at export time to figure out erased namespace associations and attach them as expected by the musicxml schema within the elements to be exported, which seems a bit messy. Therefore, going with the first assumption, we will need to maintain this namespace info somewhere so it can be accessed for export.

The question then becomes: how should namespaces be represented within the library w.r.t element attributes which rely on them according to the musicxml schema? For a class like XMLLyricLanguage, should the xml:lang attribute be internally represented like {http://www.w3.org/XML/1998/namespace}lang? This means that when parsing from the musicxml XSD, the relevant XSDAttribute would encode {http://www.w3.org/XML/1998/namespace}lang somehow rather than just lang as it currently is hardcoded to be. Then when parse_musicxml and subsequently the standard xml library is called, the _check_attribute() flow should work because it'll find XSDAttribute(name="{http://www.w3.org/XML/1998/namespace}lang") as an allowed attribute under XMLLyricLanguage. This approach would require the "namespace matcher" on export to replace the bracketed part with the appropriate namespace label.

Alternatively, we take a more "sophisticated" approach by introducing some sort of attribute-namespace mapping within a relevant element class like XMLLyricLanguage, such that we store just lang as an attribute name/ref-name and a dictionary looking like attribute_namespaces = {"lang": "xml"}, to be accessed at export time by a "namespace matcher" to reassemble the musicxml schema-compliant lyric-language attribute. The pros of this option are that namespaces are symbolically separated from attributes and we don't have to rely on string methods to piece the correct attributes back together on export. The con is that this introduces complexity to both the initial musicxml XSD parsing + the _check_attribute logic on import, and I can't predict at this stage how much work is required there, but I am guessing it will be more involved than the first option.

If you have any other ideas, please let me know, and I hope that wasn't too long! Namespace parsing will be an important milestone in the development of this library I think!

@alexgorji
Copy link
Owner

I see. It could indeed get a little bit complicated. I admit that I did have neglected these namespace exceptions at the first place but I understand the need to change it now.

I created an extra branch for this issue to be able to experiment around without damaging the master branch. I would suggest that you add one or more tests to: musicxml/tests/test_xsd/test_xsd_attribute.py to determine exactly which changes we are looking for. Then we can decide how deep we need to go into the subject.

I would recommend to stick to a tdd workflow: 1- Write a test that fails. 2- Make minimal changes to the code for the test to pass. 3- Refactor the changes if necessary to have a clean code.
I hope that the existing tests will let us know if some other changes are going to be necessary. Than we can think about a strategy and so on.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

When branches are created from issues, their pull requests are automatically linked.

2 participants