Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve Error Reporting for Schema Validation #116

Merged
merged 8 commits into from
Jul 18, 2023

Conversation

JDSeiler
Copy link
Contributor

@JDSeiler JDSeiler commented Jul 15, 2023

I have some prior Rust experience, but I'd consider myself an intermediate Rust programmer at best, and I have very little experience with C and FFI. Point being, I'm very open to feedback on these changes.

Motivation

Closes #115

Previously, StructuredError was a thin wrapper around a raw pointer to libxml2's xmlError struct. This was problematic because libxml2 does not allocate separate xmlError structs. Instead, it uses an effectively global (or thread-global) xmlError that is rewritten (but does not move) every time a new error is generated. As a result, all of the errors produced by this library were all pointing to the same memory and all had the same contents: the last error libxml2 produced.

The following code from libxml2 was consulted to confirm its behavior:

  • https://github.com/GNOME/libxml2/blob/884474477284474e0151280aaa275a18e3d7a036/error.c#L470 is the function ultimately responsible for calling the structured error callback (called schannel in libxml)
    • line 632 is where the data is actually provided to the callback. The error object is called to.
    • Walking backwards, we see the error object being cleared and initialized starting on line 573
    • Walking back even farther, we see that to is not a fresh xmlError, but rather a reference to the xmlLastError global. This either points to a file-level xmlError defined in globals.c, or a struct-member that's part of a thread-specific global context. In either case, it does not appear that xmlLastError is ever allocated more than once.

Description of Changes

I replaced the wrapper around the raw pointer with a more traditional Rust struct so that each individual error could be preserved. Because the struct no longer has any ties to the underlying C-managed data once constructed, the Drop implementation was removed.

Not all of the fields in the xmlError struct had obvious utility, or even safe ways of managing them. The fields ommitted are:

  • str1, str2, str3 :: I couldn't find any indication as to what these would be used for. Perhaps they are "empty buckets" for users to put their own custom error messages into? It didn't seem worth spending the CPU cycles on looking at them.
  • int1 :: Similarly, I couldn't find any information on this aside from the cryptic "extra number information".
  • ctxt, node :: Both of these are void * and I have 0 clue how to safely include them without massively complicating things.

Regarding enums, the xmlErrorLevel field was converted to a normal Rust enum because it's small. The code and domain fields were included in their raw forms because they might be useful to someone. However, the xmlErrorDomain and xmlParserError C enums are 30 and 700+ (!!) members respectively, so it didn't seem wise to write out Rust enum versions of them.

Draft -> Ready for Review Checklist

  • :: What should the fallback value for the XmlErrorLevel::from_raw function be? Do we trust that libxml2 is always going to provide something in the range [0, 3], and the catch-all branch can be something like unreachable!()?
  • :: Is the type Option<String> acceptable for the message and filename fields? What about the lossy conversion to utf-8? I went through a few different options here. The underlying C string could be a null pointer (at least for filename) and of course there's no guarantee that the underlying C string is valid utf-8.
  • :: Any other finishing tasks I need to do? Run Valgrind? Improve documentation? I'm all ears.

JDSeiler and others added 2 commits July 14, 2023 23:02
Write down a plan

Add notes about replacing transmute

Add notes about whether calling xmlResetError is needed
Revert "Add notes about replacing transmute"

This reverts commit 7e7dab4.
Copy link
Collaborator

@imcsk8 imcsk8 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've also spent some time working on a solution and came out with the same approach. I will test your patch and send you more comments if needed.


/// Human-readable informative error message
pub fn message(&self) -> &str {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that at least for a couple of revisions leave this function for backwards compatibility

src/error.rs Outdated
fn drop(&mut self) {
unsafe { bindings::xmlResetError(self.0) }
/// Returns the provided c_str as Some(String), or None if the provided pointer is null.
unsafe fn convert_to_owned(c_str: *mut c_char) -> Option<String> {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The name of the function might be clearer like this: ptr_to_string.

src/error.rs Outdated
return None;
}

let raw_str = CStr::from_ptr(c_str);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the only unsafe statement, might be more clear to have:

let raw_str = unsafe { CStr::from_ptr(c_str) };

Instead of making the whole function unsafe

src/error.rs Outdated
impl StructuredError {
/// Copies the error information stored at `error_ptr` into a new `StructuredError`
pub fn from_raw(error_ptr: *mut bindings::_xmlError) -> Self {
unsafe {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be more clear to have just unsafe blocks in the respective assigments instead of having a big unsafe block

@dginev
Copy link
Member

dginev commented Jul 17, 2023

@JDSeiler thank you for the thorough work in this PR, and thanks to @imcsk8 for the unexpected review. This is already more than the usual PR burden on this wrapper repo, I appreciate the contributions. Maybe it is a sign of the crate starting to mature.

As to the questions by @JDSeiler :

  • I think 0..=3 is indeed a reliably documented range for xmlErrorLevel, so your suggestion to make other cases unreachable!() is the correct design to signal that expectation.

  • Running valgrind on the updated test is a good idea, though we haven't done it consistently in the past. The usual run of

    valgrind --leak-check=full ./target/debug/deps/schema_tests-1dd74906a490bf98
    

    seems to indicate that there are no issues with the new test on a local build here.

  • The Option<String> is fine in general, the crate uses that return type on occasion. But I would also honor the request by @imcsk8 and keep the original message around as #[deprecated] from version 0.3.3. We can keep it around until 0.4.0 - although if someone wants to sink in some time to improve the crate, I am more than happy to recruit co-maintainers, and have others speak into the versioning roadmap.

  • clippy also caught that StructuredError::from_raw should be marked as an unsafe function, see not_unsafe_ptr_arg_deref

  • I usually also update CHANGELOG.md in the crate root when a new PR gets merged - I can do that after merging here, or you're welcome to update it yourself if you want to choose the exact wording.


Aside: What actually stuck out to me is libxml's decision to add trailing \n newline chars to the error values. I am a bit uneasy about exposing those from the Rust struct, without first running a .trim() on them, especially if we are allocating a String anyway. But I could see this going either way.

@JDSeiler JDSeiler marked this pull request as ready for review July 17, 2023 23:41
src/error.rs Outdated
/// StructuredError `message` field directly.
#[deprecated(since="0.3.3", note="Please use the `message` field directly instead.")]
pub fn message(&self) -> &String {
self.message.as_ref().unwrap()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To keep the original &str type, I think this is a workable variation:

pub fn message(&self) -> &str {
  self.message.as_deref().unwrap_or("")
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah excellent! I was about to push a change where the implementation was self.message.as_ref().unwrap().as_str(). I didn't think it was even possible to get both the &str return type and a default value working, since the compiler was quite mad about returning references to local values.

I'll push up that change shortly.

Copy link
Member

@dginev dginev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left a small type suggestion, and will wait for a final nod from @imcsk8 , but we're pretty much good to merge and ship a v0.3.3 here.

Thanks again @JDSeiler

@JDSeiler
Copy link
Contributor Author

I left a small type suggestion, and will wait for a final nod from @imcsk8 , but we're pretty much good to merge and ship a v0.3.3 here.

Thanks again @JDSeiler

Awesome, thank you (and @imcsk8 !) for feedback on the changes. This was a great learning experience!

@imcsk8
Copy link
Collaborator

imcsk8 commented Jul 18, 2023

... I am more than happy to recruit co-maintainers, and have others speak into the versioning roadmap.

I'll be happy to help maintaining. What do you need me to do?

@dginev
Copy link
Member

dginev commented Jul 18, 2023

@imcsk8 basically what you just did in reviewing here, whenever you have time/interest.

I am currently available enough to ship minor releases, but I have been a little easy-going with spending time in developing the crate. Btw this is also how @triptec got added in here some time back, when he had a season of libxml work.

And since this is currently a care-free open source crate you can also just sit on the rights without doing anything, it keeps the crate safer in case I'm missing for whatever reasons.

@dginev
Copy link
Member

dginev commented Jul 18, 2023

@imcsk8 and in the interest of speed, I just added you as an admin to the repo (you should get an invite).

Feel free to merge this PR in as a warm-up, if the general setup sounds reasonable enough.

Copy link
Collaborator

@imcsk8 imcsk8 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

I approved but also left a couple of suggestions in case you want to add them to the PR

Comment on lines +75 to +79
let line = if error.line == 0 {
None
} else {
Some(error.line)
};
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might be too much but I leave you a more idiomatic way of doing these assignments

Suggested change
let line = if error.line == 0 {
None
} else {
Some(error.line)
};
let line = match error.line {
0 => None,
_ => Some(error.line)
};

Comment on lines +80 to +84
let col = if error.int2 == 0 {
None
} else {
Some(error.int2)
};
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
let col = if error.int2 == 0 {
None
} else {
Some(error.int2)
};
let col = match error.int2 {
0 => None,
_ => Some(error.int2)
};

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, the if-else variant is the first snippet in the std::option doc page, so it's not really criminal... This one may come down to taste.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, the if-else variant is the first snippet in the std::option doc page, so it's not really criminal... This one may come down to taste.

You're right, it's a matter of taste 👍

Thanks @JDSeiler for the patch and thanks @dginev for the quick review and merge!!

I'm using heavily this library and I would like to help on whatever I can.

Cheers!

@dginev dginev merged commit 115c32e into KWARC:master Jul 18, 2023
1 check passed
@dginev dginev mentioned this pull request Jul 18, 2023
@JDSeiler JDSeiler deleted the feat/schema-validation-improvements branch September 7, 2023 20:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

All StructuredError returned by SchemaValidationContext::validate_* are identical
3 participants