Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GbXML ReverseTranslator needs validation and/or better error handling #4441

Closed
jmarrec opened this issue Sep 21, 2021 · 17 comments · Fixed by #4587
Closed

GbXML ReverseTranslator needs validation and/or better error handling #4441

jmarrec opened this issue Sep 21, 2021 · 17 comments · Fixed by #4587

Comments

@jmarrec
Copy link
Collaborator

jmarrec commented Sep 21, 2021

Enhancement Request

See #4440 (comment)

Really we need error handling in gbxml... There is currently no validation against the gbXML schema happening and there is very little checking that is done manually (like ensuring that required values/attributes are supplied, or checking that a YearSchedule references a weekScheduleIdRef you make sure it can be found).

Detailed Description

I found this because a test I added was failing without any logging when I had a typo in a weekScheduleIdRef

  <Schedule type="Fraction" id="Simple_Schedule">
    <YearSchedule id="Simple_Schedule_YearSchedule">
      <BeginDate>2017-01-01</BeginDate>
      <EndDate>2017-12-31</EndDate>
-      <WeekScheduleId weekScheduleIdRef="SOMETHING_THAT_CANT_BE_FOUND" />
    </YearSchedule>
    [ .... ]

Another issue might be when you pass Schedule with zero YearSchedule children. Or if YearSchedule doesn't have a BeginDate children. etc etc.

https://gbxml.org/schema/6-01/GreenBuildingXML_Ver6.01.xsd#element_YearSchedule

Possible Implementation

Schema validation against the gxbml XSD (herewould be an option. pugixml doesn't support that, so that would require using a different library for xml such as xeres-c (which is on conan-center-index: https://github.com/conan-io/conan-center-index/tree/master/recipes/xerces-c).

Another option is to manually enforce that required values and attributes are there, and handle errors appropriately (at least logging to inform the user).

@jmarrec
Copy link
Collaborator Author

jmarrec commented Sep 21, 2021

These two are wrong per 6.01

std::string type = element.attribute("type").value();

and

std::string type = element.attribute("type").value();

should be scheduleType. but type is correct for Schedule though (which is confusing)

https://www.gbxml.org/schema/6-01/GreenBuildingXML_Ver6.01.xsd#element_WeekSchedule

https://www.gbxml.org/schema/6-01/GreenBuildingXML_Ver6.01.xsd#element_DaySchedule

@jmarrec
Copy link
Collaborator Author

jmarrec commented Sep 21, 2021

Example "manual" error handling in 2dfa66b

@shorowit
Copy link
Contributor

shorowit commented Sep 21, 2021

If XSD validation is added to OS, could it be done in such a way that it’s exposed in the SDK? Maybe Schematron validation too?

I have wanted this for a long time for our HPXML work. The Oga library we’re using is great, but it doesn’t support XSD or Schemtron validation. And the performance suffers from being in Ruby.

@kbenne
Copy link
Contributor

kbenne commented May 6, 2022

@jmarrec my understanding is that the momentum is toward bringing in Xerces to support validation. I know that Xerces is popular and capable, and I have confidence that you did a wide search for other library options. But, some folks on the team mentioned libxml2 and have used it through Ruby/Python bindings and really liked it. Can you think of a reason to choose Xerces over libxml2?

@shorowit
Copy link
Contributor

shorowit commented May 6, 2022

I would love a solution that supports Schematron, which libxml2 appears to support but not xerces.

@shorowit
Copy link
Contributor

shorowit commented May 6, 2022

For what it's worth, I did find this document, which described why pugixml was chosen in the past (instead of xerces or libxml2).

@kbenne
Copy link
Contributor

kbenne commented May 6, 2022

I love pugixml. I don't think we will likely remove it as we already went through one conversion from QXML to pugi, and doing it again would be a lot of work and unnecessary. I suggest we continue encouraging pugi for parsing / xml manipulation needs, but we obviously need one of these other libraries for validation.

@kbenne
Copy link
Contributor

kbenne commented May 6, 2022

libxml2 is C based, but if we keep the application of it focused on validation, we can put a small C++ wrapper around it to hide the pointers.

@jmarrec
Copy link
Collaborator Author

jmarrec commented May 7, 2022

@jmarrec my understanding is that the momentum is toward bringing in Xerces to support validation. I know that Xerces is popular and capable, and I have confidence that you did a wide search for other library options. But, some folks on the team mentioned libxml2 and have used it through Ruby/Python bindings and really liked it. Can you think of a reason to choose Xerces over libxml2?

Libxml2 is C. Aside from that no. I don't really care which one we choose. Both are available for conan. Xerces does feel like C-ish as well (start / stop methods, takes const char pointers etc)

Also I do think like you that we should keep pugixml in place, just do the validation with another one, instead just using one lib to do both.

So we'll provide a simple class that does only one thing: validate a path or stream, reports errors and warnings. The implementation details would be hidden and we won't expose the internal library we use. So both would work

@jmarrec
Copy link
Collaborator Author

jmarrec commented May 10, 2022

@kbenne There's something that makes me uneasy about libxml2, is that it doesn't seem to be truly thread safe. It's possible to use it with multiple threads though if you're careful. It's also not like we necesarilly need thread safety here, but still.

https://gnome.pages.gitlab.gnome.org/libxml2/devhelp/libxml2-parser.html#xmlCleanupParser
https://gitlab.gnome.org/GNOME/libxml2/-/wikis/Thread-safety

Anyways, we need to settle on a lib to use. @joseph-robertson has already started implementation with xerces-C in #4578, so if you insist on a library change it should happen now rather than when it's finished and polished. Did you guys discuss it on Friday maybe?

@joseph-robertson
Copy link
Collaborator

Here is a current "main points" list that @jmarrec and I complied:

  • A native ruby gem based on libxml2
    • nokogiri, but we dropped it for oga, or ruby-libxml, but do they work with MSVC being non C89 compliant?
    • also would this be slower than using our own C++ wrapper? And is it a big deal?
  • libxml2 over xerces because of schematron (c++ wrapper for schematron?)
  • could use existing libxml2 c++ wrapper - libxmlplusplus or xmlwrap - but are they really good options?
    • starred < 40 times,
    • no CMake support
    • no existing conan recipe... that means some amount of pain creating a conan recipe that uses autotools
  • could write own libxml2 c++ wrapper but that might be somewhat tricky and hard to maintain (as Jason Turner said, because this is C based)

@joseph-robertson
Copy link
Collaborator

Here is an additional note on using libxmlplusplus c++ wrapper for schematron.

@kbenne
Copy link
Contributor

kbenne commented May 10, 2022

@jmarrec you are correct we did talk about this at the iteration meeting on Friday. After that conversation and the discussion here, I favor libxml2. People seem to have a high opinion of libxml2 and especially the Schematron support.

But I think we should keep libxml2 confined to schema validation, and I feel like we should write our own C++ wrapper rather than adopting an existing wrapper. I think we basically need about one public API validate(std::path xmlfile).

@shorowit
Copy link
Contributor

Unfortunately it seems to be more complicated than I realized. lxml, which is a very popular python library for working with xml, uses libxml2 but not really for schematron support. From here:

lxml also provides support for ISO-Schematron, based on the pure-XSLT skeleton implementation of Schematron:

There is also basic support for pre-ISO-Schematron through the libxml2 Schematron features. However, this does not currently support error reporting in the validation phase due to insufficiencies in the implementation as of libxml2 2.6.30.

So it sounds like libxml2 support for Schematron is pretty outdated and poor. No idea if it will work well for us.

@jmarrec
Copy link
Collaborator Author

jmarrec commented May 10, 2022

@shorowit So this may be misleading. In my search of an example of using libxml2 to do schematron validation, the only example I could find was a Cython extension... for lxml. https://github.com/lxml/lxml/blob/master/src/lxml/schematron.pxi

I'm trying to make a MCVE in C right now, to see if it would work or not.

@jmarrec
Copy link
Collaborator Author

jmarrec commented May 11, 2022

I have a shitty MCVE implementation at https://github.com/jmarrec/TestCpp-xml-schematron

libxml2 uses a too old schematron. So I do the same as what python's lxml module does: I convert the schematron file to an XSLT stylesheet and apply that via libxslt (which is based on libxml2) to get the validation errors. Those do not include line numbers where the errors occurs though (neither does lxml as far as I can tell)

Example API usage: https://github.com/jmarrec/TestCpp-xml-schematron/blob/6dfc1f02fbe67f98e3e9279b0ee1228e5a151496/test/XMLValidator_GTest.cpp#L68-L80

@jmarrec
Copy link
Collaborator Author

jmarrec commented May 11, 2022

FYI, https://github.com/vslavik/xmlwrapp does include both libxml and libxslt, so it could be used to do this will less apparent pointers and co. The issue is making a conan recipe for it...

@tijcolem tijcolem moved this from In progress to Ready to Test in OpenStudio Jun 3, 2022
OpenStudio automation moved this from Ready to Test to Done Jun 27, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
No open projects
OpenStudio
  
Done
Development

Successfully merging a pull request may close this issue.

4 participants