-
Notifications
You must be signed in to change notification settings - Fork 5
Conversation
The XMSyntaxError occurs when there is a problem with a string. The other errors occur when the inputted value is completely wrong. As such, they should be separated so they can be handled separately. This will become useful when creating detailed error logs.
This makes the not-a-string errors use an error log. This has been implemented first since there is a single error message required, unlike the many different ones that exist for the other types of error such as XMLSyntaxError.
To implement, have also had to add a name attribute to ValidationErrors.
The above-referenced functionality is now merged into |
This converts the lxml errors into something IATI-specific. Should there not be a specific case for a particular error, a generic error will be provided. In an actual implementation, any errors that do not have a specific case should be logged to help identify errors that occur frequently but have not been categorised.
It appears that lxml contains an internal error log that builds up over time. When a test function is parameterised by pytest, all calls to the function add to this singular error log. As such, later calls to the function will be provided with the errors from earlier calls. This is bad. By providing a custom XMLParser every time, the error log only refers to a single call to a test function. As such, buildup of past errors does not occur and the tests are checking the correct information. This is good.
The name attribute has been added, so it seems reasonable to check that the returned error has the correct name and is not something else in disguise.
A limited number of specific errors have been tested and handled. For more errors to be added, it should be determined which errors occur most frequently in real data. This will allow resources to be put towards actual problems rather than trawling through a long list and hoping the correct things are being picked out. |
iati/validator.py
Outdated
return False | ||
if isinstance(error_log, ValidationErrorLog): | ||
return not error_log.contains_errors() | ||
else: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This if-else is no longer required now that it it fully implemented
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed in cc978b5
By using validator to determine whether something is XML, the code is more DRY.
This provides a secondary dimension for classification, alongside the category.
The new implementation that uses TypeErrors is more general.
There are some encodings that are supported, some that are not. Unhelpfully, some unsupported encodings return an incorrect error - EMPTY_DOCUMENT rather than UNSUPPORTED_ENCODING
When there is an encoding mismatch it would be expected that the ERR_INVALID_ENCODING error is raised. This is not always the case. Other errors including ERR_INVALID_CHAR, ERR_GT_REQUIRED and ERR_DOCUMENT_EMPTY may also occur depending on the document being processed. Something like a newline can completely change the errors that are output. The XML spec permits a full range of character encodings to be used. libxml2, however, only supports UTF-8, UTF-16, ASCII and ISO-Latin-1 by default. Even within this set of supported encodings, the expected errors are not returned consistently (if at all). With this in mind, a limited set of encodings will be tested. Other encodings will lead to somewhat unhelpful errors being returned. Since the probability of these situations occurring is fairly low, this shouldn't be a major problem.
This makes the code more DRY. A class has also been created to group some related tests.
89fb099 A detailed commit message was made about some problems with libxml2. Some of this has now been extracted into the relevant docstring in a slightly simplified format so that you don't need to go hunting for the information.
Switch Dataset creation to use iati.validator
"""A valid XML string with the text declaration removed.""" | ||
return '\n'.join(xml_str.strip().split('\n')[1:]) | ||
|
||
@pytest.fixture(params=iati.core.tests.utilities.find_parameter_by_type(['str'], False) + [iati.core.tests.utilities.XML_STR_INVALID]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice way to construct a list of invalid types!
"""Check whether a given parameter is valid XML. | ||
|
||
Args: | ||
maybe_xml (str): An string that may or may not contain valid XML. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
contain valid XML
or be valid XML
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
be
- fixed!
return _check_codelist_values(dataset, schema) | ||
error_log = ValidationErrorLog() | ||
|
||
error_log.extend(_check_is_xml(dataset)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we be doing a check here for is_iati_xml
too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This takes in a Dataset
. Since this is guaranteed to contain actual XML, the check is not necessary here.
iati/validator.py
Outdated
iati.validator.ValidationErrorLog: A log of the errors that occurred. | ||
|
||
""" | ||
return _check_is_xml(dataset) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
_check_is_xml
takes a str
, but this seems to be passing it a iati.core.Dataset
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, just suggest changing the docstring for validate_is_xml.
Fix a couple of docstrings
Docstring changed. Now merging. |
When
is_xml()
indicates that a value is not XML, errors are produced.This PR adds such errors to a
ValidationErrorLog
.