Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with HTTPS or Subversion.

Download ZIP

Loading…

Validation tutorial #128

Merged
merged 7 commits into from

3 participants

@ralienpp

Some basic examples, nothing major - but they should help one get started.

@plq
Owner

hi alex,

first, thanks a lot for your efforts. here are my remarks:

1) you're talking about the advanced validation methods without talking about the simpler ones. E.g. Integer(ge=1, le=12) will check for integers between 1 and 12 inclusive. or String(pattern='some_regex') will validate the incoming string using regexps. actually, the validation can be done using lxml's schema validator if you use these methods. if you override validation functions, only validator='soft' can check them.

2) validating by overriding the validation functions don't change the schema itself. so using them are not really encouraged.

3) I always use the "rhetorical we" in the documentation. We should rather be consistent about that one.

4) I always wrap documentation and code by 80 columns. You can see that it's too hard to read even using the web interface.

Again, thanks for your time.

@ralienpp

Hi, thanks for the comments.

I was aware of the regexp pattern for the validation of strings, but I didn't know about the approach you used with Integers. I'll add these examples.

Some questions:

  1. are there other tricks besides the regexp pattern matching for strings and the comparison operators for numbers?
  2. are there any performance considerations one should be aware of? Is the underlying lxml method validation method faster? If so, is it simply because it is "underneath" and there are no abstraction layers above it? Is it because lxml wraps C code? Anything else?
  3. are there any good reasons to use the "soft" validation methods instead of the simple ones? I think the example with prime numbers is fine, but I'm not sure about the other one. I chose to write it like that because "if ':' in value" is a simple check that doesn't involve regular expressions (which are most likely going to be slower). Is that the case?
  4. validation by overriding the functions doesn't change the schema itself, so people who use this SOAP service won't be aware of the actual constraints that are applied to the data. Are there other arguments against "soft" validation?

Style remarks:

  1. wrap at 80 lines
  2. no "I"; this reminds me of http://en.wikipedia.org/wiki/A_Time_of_Changes :-)
  3. is there anything else? I think I will create another section in the manual about this; so others can be synchronized

Questions that are related to other aspects I want to document:

  1. How to declare attributes? Can you provide a simple definition of a class that declares attributes such as "Instant", "MajorVersion" and "MinorVersion" in this schema http://pastebin.com/3hgStRi9 ? Preferably, the example should include optional and mandatory attributes
  2. The same schema has 'abstract="true" ' in the first line, I'm not sure how to express that in terms of rpclib either
  3. Please provide an example of creating an Enum. I saw enum.py in /model and I suppose that you use it like this: myEnum = Enum( ("foo", "bar"), type_name="MyCustomEnum"). Is there anything else one should keep in mind? Will this work: myEnum = Enum( (1, 3, 5), type_name="MyOtherEnum") ?
  4. Line 15 of the same schema contains this: type="mss:MeshMemberType". If I interpret this correctly, it refers to another namespace. To implement that, must I override the namespace variable in the class of the custom type with "mss:MeshMemberType"?

Thanks forward and good night :-)

edited to add numbers to make it easier to respond to individual points. --b.

@plq
Owner

Part I

  1. Yes, you have values constraint for every type. Integer(values=[1,2,3]) or Unicode(values=["a","b"]) or DateTime(values=[datetime.now()]) Also string types have a max_len constraint. You can check the class named Attributes that is defined inside every ModelBase child for possible declarative constraints.

  2. I'd imagine lxml's validation would be much faster because it's a call to C code. I didn't do any benchmarks though.

  3. Soft validation is done by python code. lxml validation won't tolerate one wrong byte, whereas soft validation is more forgiving, especially regarding namespaces. of course, lxml validation can only work on xml data. the prime numbers validation will be skipped when lxml validation is used, because it's just python code.

  4. None that I can think of. Soft validation could use a little bit more testing though. imho, lxml validation is rock solid.

Part II

  1. Columns

  2. Heh, thanks for the reference :) It seems that UKL further developed that idea in Disposessed.

  3. Code conforms with pep8. I can't think of anything else now.

Part III

  1. http://mail.python.org/pipermail/soap/2012-February/000736.html

  2. no abc support in rpclib yet.

  3. https://github.com/arskom/rpclib/blob/master/src/rpclib/test/model/test_enum.py

class MeshMemberType(ComplexModel):
    __namespace__ = "some_string"

Many examples can be found inside tests. They're also a valuable part of the documentation.

@ralienpp ralienpp - xml schema constraints
- notes about the schema not being updated in the case of advanced validation wizardry
7423988
@ralienpp

I applied the changes, here are some additional questions:

  1. For custom types derived from other types, rpclib will call the validation functions of each type in the inheritance chain. True or False?
  2. XML schema validation doesn't work for floats, is it true? I didn't see an Attribute class in the Float type declaration and I didn't see any code that would compare floats properly (i.e. within a margin of error)
  3. Creating new XML schema level constraints by adding an Attributes class - can this be done?

Thanks for the other examples, I will tinker with that. So far I haven't yet began implementing my own system with rpclib, but I think there will be more questions as I begin working on it.

@plq

u"alpha", u"bravo", u"charlie"

no actual enforcement of this, but that's the right way to do it.

@plq
Owner

this is going great so far. a few comments:

  1. lxml validation and soft validation don't co-operate. you either use one or the other. They should behave mostly the same though. The expected differences are:

    • Soft validation ignores unknown fields, whereas lxml validation just rejects them.
    • Soft validation doesn't care about namespaces, whereas lxml validation rejects unexpected namespaces.
    • Errors thrown by soft and lxml validation differ quite a lot (even client-visible ones). Maybe this is a bug.

    So, if you're going to do imperative validation, you should use soft validation. If your validation rules can be expressed by the declarative tools rpclib offers you, you should use lxml validation, which is, as I said earlier, much more mature than rpclib's built-in validation.

    I guess the above could be added to the relevant section :)

    I think we should call these "validation subsystems" rather than "validation layers" because the term "layer" makes it sound like they actually cooperate. they do not, not currently at least.

  2. This document is a little bit too XML-centric. I understand that currently this is an area where rpclib is more popular, but non-xml protocols can't be validated using lxml's schema validation -- only soft validation will work there. I think this should be made explicit and any xml-related discussion should happen in its own section.

  3. I also made small remarks in-line in the commit page.

@plq
Owner

As for your questions:

  1. False. The overriding function must call its bases' validation. That may not be always necessery or desired.

  2. No proper float comparison either, no. I wonder how lxml deals with this when validating floats though. It'd be a nice question to ask to the libxml people, I guess.

  3. Yes, but you should modify the schema generation code as well, new fields in Attributes class are not picked up automatically

@ralienpp ralienpp - xml validation only in soap/xml
- typo fix
- adding custom Attribute
bb26041
@ralienpp

I applied the changes, here are some additional questions about this doc:

  1. Are non-unicode strings converted to unicode at the lower levels? What's the side effect of "charlie"?
  2. You can use one XOR the other, hmmmm... So, you mean that if I used lxml to check if the thing is a number, later I cannot use the 'soft' validator to see if it is prime?
  3. Do the users see some kind of a warning that says "you're using both, disable one"?
  4. If the code attempts to use both methods, which one is actually applied? (assuming that the code works and no exceptions are thrown at the time when the classes are loaded)
  5. "Errors thrown by soft and lxml validation differ quite a lot" - can you provide an example? Is this worthy of mentioning in the doc? Is the delta in readability, or in something else? Somehow I imagine errors thrown by lxml are similar to errors shown by C++ compilers that complain about templates :-)
  6. Floats... Hmm, maybe I will experiment with this some time later and reflect this in the doc. The problem is that I haven't yet written my system and haven't even set up an environment that enables me to run rpclib with a debugger. So, if you can tell me anything else about it - do so, otherwise I'll get back to this matter later, when I learn more about lxml's behaviour. In any case, comparing floats without taking another argument such as "error margin" doesn't sound right. Maybe lxml uses a default value for that? We'll see.

There are some additional questions about the other things you said, but they're not related to validation. I think it is better if I take that to the soap mailing list; otherwise those questions are off-topic in this context.

@ralienpp

7 DateTime(values=[datetime.now()]) - how do I interpret that? The only allowed value for this variable is the current system time? Does this ever happen in practice?

@plq
Owner
  1. No, no conversion is done. There's no side effect if you stay in ascii. Be prepared for a random UnicodeError if it's not.

  2. Correct. You need to use the soft validation to do non-standard validation.

  3. No. That's impossible to detect 100% due to python's dynamic nature.

  4. You can't use both because you set validator='soft' or validator='lxml' when instantiating the protocol.

  5. Not off the top of my head. They are both human readable, so unless you're trying to parse them, you're fine. And yes, lxml's errors are more, um, "verbose", though not as bad as g++'s templated class errors.

  6. I'd just leave that be for the time being.

  7. Yes this means what you think it means. i don't think this makes sense at all either, it was just an example :)

@plq

it should be set to 1. making the type mandatory is not as trivial though. For example for string, you need to have String(min_occurs=1, min_len=1, nillable=False) to make it mandatory. see https://github.com/arskom/rpclib/blob/4f57b4abf284e51baca904d6649c8747ebc2804a/src/rpclib/model/primitive.py#L601

@plq

This will change for rpclib-2.8.0. You should set it to float('inf'), the native infinite value, to have arbitrary number of args.

@ralienpp ralienpp - ``unbounded`` in v2.8.0
- comparison table, soft vs lxml
- min_occurs 0->1
2dc0f05
@plq
Owner

Hi Alex,

I have no further comments. Unless there is anything else I can help you with, I'm going to merge this.

Thanks!

@ralienpp

No, just one remark: http://arskom.github.com/rpclib/reference/model.html#rpclib.model._base.ModelBase

On this page it says
{
min_occurs = 0
Set this to 0 to make the type mandatory. Can be set to any positive integer.
}

One of them must be wrong :-)

@plq plq merged commit 804e68e into from
@plq plq referenced this pull request from a commit
Commit has since been removed from the repository and is no longer available.
@plq
Owner

just fixed that, and the document is merged. thank you for your time.

@plq plq referenced this pull request from a commit in plq/spyne
@plq plq update min_occurs docstring in ModelBase class. #128 0059bd7
@jhnwsk

Hi,

I am very happy after reading this, helped me solve some problems I was experiencing with my little project. Only one thing comes to mind at this point - you mention generic restrictions on primitive types with example but there is no example code for ComplexType derivatives. Maybe just a small snippet? I for one, love the 'learn by example' approach of python documentation.

class MyClass(ComplexModel):
    """
    lucky guess at complex type retrictions
    """
    __namespace__ = "partner"
    class Attributes(object):
        min_occurs = 0
        max_occurs = 1
        default = None
        nillable = False
@plq
Owner

the attributes class should inherit from its parent. here's the correct version:

class MyClass(ComplexModel):
    """lucky guess at complex type retrictions"""

    __namespace__ = "partner"
    __type_name__ = "RealClassName"

    class Attributes(ComplexModel.Attributes):
        min_occurs = 0
        max_occurs = 1
        default = None
        nillable = False

if you think it's going to be helpful, send a small pull request and i'll make sure it gets decent attention :)

@ralienpp

Hi,

I am planning to write more documentation about this, and I had the same thing in mind - just add a bunch of examples at the bottom, with comments embedded into them.

I will do so in the next revision, but at the moment I am focused on documenting another part of rpclib - the creation of custom types.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Commits on Mar 19, 2012
  1. @ralienpp
  2. @ralienpp
  3. @ralienpp
Commits on Mar 20, 2012
  1. @ralienpp

    - xml schema constraints

    ralienpp authored
    - notes about the schema not being updated in the case of advanced validation wizardry
  2. @ralienpp

    - xml validation only in soap/xml

    ralienpp authored
    - typo fix
    - adding custom Attribute
  3. @ralienpp
Commits on Mar 21, 2012
  1. @ralienpp

    - ``unbounded`` in v2.8.0

    ralienpp authored
    - comparison table, soft vs lxml
    - min_occurs 0->1
This page is out of date. Refresh to see the latest.
Showing with 301 additions and 2 deletions.
  1. +301 −2 doc/source/manual/validation.rst
View
303 doc/source/manual/validation.rst
@@ -1,7 +1,306 @@
-
.. _manual-validation:
Input Validation
================
+This is necessary in the cases in which you have to ensure that the received
+data comply with a given format, such as:
+
+- a number must be within a certain range
+- a string that must contain a specific character
+- a string that can only take certain values
+
+
+Data validation can be handled by two subsystems:
+
+XML schema
+ such rules are enforced by *lxml*, the underlying XML parsing library
+"Soft" level
+ *rpclib* itself can apply additional checks after the data were validated by
+ the layer underneath
+
+The differences between them are:
+
+- Soft validation ignores unknown fields, while *lxml* validation rejects
+ them.
+- Soft validation doesn't care about namespaces, while *lxml* validation
+ rejects unexpected namespaces.
+- Soft validation works with any transport protocol supported by *rpclib*,
+ while *lxml* validation only works for XML data (i.e. just SOAP/XML).
+
+============================== ======== =========
+Criteria lxml soft
+============================== ======== =========
+Unknown fields reject ignore
+Unknown namespaces reject ignore
+Supported transport protocols SOAP/XML any
+============================== ======== =========
+
+
+
+
+.. NOTE::
+ The two validation sybsystems operate independently, you can use either one,
+ but not both at the same time. The validator is indicated when instantiating
+ the protocol: ``validator='soft'`` or ``validator='lxml'``.
+
+ ::
+
+ #using 'soft' validation with HttpRpc
+ application = Application([NameOfMonthService],
+ tns='rpclib.examples.multiprot',
+ in_protocol=HttpRpc(validator='soft'),
+ out_protocol=HttpRpc()
+ )
+
+ #using lxml validation with Soap
+ application = Application([UserService],
+ tns='rpclib.examples.authentication',
+ interface=Wsdl11(),
+ in_protocol=Soap11(validator='lxml'),
+ out_protocol=Soap11()
+ )
+
+
+
+
+Simple validation at the XML schema level
+-----------------------------------------
+This applies to all the primitive data types, and is suitable for simple logical
+conditions.
+
+.. NOTE::
+ Constraints applied at this level are reflected in the XML schema itself,
+ thus a client that retrieves the WSDL of the service will be able to see
+ what the constraints are.
+ As it was mentioned in the introduction, such validation is only effective
+ in the context of SOAP/XML.
+
+
+Any primitive type
+~~~~~~~~~~~~~~~~~~
+Certain generic restrictions can be applied to any type. They are listed below,
+along with their default values
+
+- ``default = None`` - default value if the input is ``None``
+- ``nillable = True`` - if True, the item is optional
+- ``min_occurs = 0`` - set this to 1 to make the type mandatory. Can be set to
+ any positive integer
+- ``max_occurs = 1`` - can be set to any strictly positive integer. Values
+ greater than 1 will imply an iterable of objects as native Python type. Can be
+ set to ``unbounded`` for arbitrary number of arguments
+
+ .. NOTE::
+ As of rpclib-2.8.0, use ``float('inf')`` instead of ``unbounded``.
+
+These rules can be combined, the example below illustrates how to create a
+mandatory string:
+
+ String(min_occurs=1, min_len=1, nillable=False)
+
+
+Numbers
+~~~~~~~
+Integers and other countable numerical data types (i.e. except Float or
+Double) can be compared with specific values, using the following keywords:
+``ge``, ``gt``, ``le``, ``lt`` (they correspond to >=, >, <=, <) ::
+
+ Integer(ge=1, le=12) #an integer between 1 and 12, i.e. 1 <= x <= 12
+ Integer(gt=1, le=42) #1 < x <= 42
+
+
+Strings
+~~~~~~~
+These can be validated against a regular expression: ::
+
+ String(pattern = "[0-9]+") #must contain at least one digit, digits only
+
+
+Length checks can be enforced as well: ::
+
+ String(min_len = 5, max_len = 10)
+ String(max_len = 10) #implicit value for min_len = 0
+
+
+Other string-related constraints are related to encoding issues. You can specify
+
+- which encoding the strings must be in
+- how to handle the situations in which a string cannot be decoded properly (to
+ understand how this works, consult `Python's documentation
+ <http://docs.python.org/howto/unicode.html>`_ ::
+
+ String(encoding = 'win-1251')
+ String(unicode_errors = 'strict') #could be 'replace' or 'ignore'
+
+
+These restrictions can be combined: ::
+
+ String(encoding = 'win-1251', max_len = 20)
+ String(min_len = 5, max_len = 20, pattern = '[a-z]')
+
+
+Possible values
+~~~~~~~~~~~~~~~
+Sometimes you may want to allow only a certain set of values, which would be
+difficult to describe in terms of an interval. If this is the case, you can
+explicitly indicate the set: ::
+
+ Integer(values = [1984, 13, 45, 42])
+ Unicode(values = [u"alpha", u"bravo", u"charlie"]) #note the 'u' prefix
+
+
+
+Extending the rules of XML validation
+-------------------------------------
+It is possible to add your own attributes to the XML schema and enforce them.
+
+
+To do so, create an ``Attributes`` in the definition of your custom type derived
+from ``ModelBase``.
+
+
+After that, you must apply the relevant changes in the code that generates the
+XML schema, otherwise these attributes will **not** be visible in the output.
+
+Examples of how to do that:
+https://github.com/arskom/rpclib/tree/master/src/rpclib/interface/xml_schema/model
+
+
+
+
+
+Advanced validation
+-------------------
+*rpclib* offers several primitives for this purpose, they are defined in
+the **ModelBase** class, from which all the types are derived:
+https://github.com/arskom/rpclib/blob/master/src/rpclib/model/_base.py
+
+These primitives are:
+
+- *validate_string* - invoked when the variable is extracted from the input XML
+ data.
+- *validate_native* - invoked after the string is converted to a specific Python
+ value.
+
+Since XML is a text file, when you read it - you get a string; thus
+*validate_string* is the first filter that can be applied to such data.
+
+At a later stage, the data can be converted to something else, for example - a
+number. Once that conversion occurs, you can apply some additional checks - this
+is handled by *validate_native*.
+
+ >>> stringNumber = '123'
+ >>> stringNumber
+ '123' #note the quotes, it is a string
+ >>> number = int(stringNumber)
+ >>> number
+ 123 #notice the absence of quotes, it is a number
+ >>> stringNumber == 123
+ False #note quite what one would expect, right?
+ >>> number == 123
+ True
+
+In the example above, *number* is an actual number and can be validated with
+*validate_native*, whereas *stringNumber* is a string and can be validated by
+*validate_string*.
+
+
+Another case in which you need a native validation would be a sanity check on a
+date. Imagine that you have to verify if a received date complies with the
+*"YYYY-MM-DDThh:mm:ss"* pattern (which is *xs:datetime*). You can devise a
+regular expression that will look for 4 digits (YYYY), followed by a dash, then
+by 2 more digits for the month, etc. But such a regexp will happily absorb dates
+that have "13" as a month number, even though that doesn't make sense. You can
+make a more complex regexp to deal with that, but it will be very hard to
+maintain and debug. The best approach is to convert the string into a datetime
+object and then perform all the checks you want.
+
+
+
+A practical example
+~~~~~~~~~~~~~~~~~~~
+A custom string type that cannot contain the colon symbol ':'.
+
+We'll have to declare our own class, derived from *Unicode* (which, in turn, is
+derived from *SimpleModel*, which inherits from *ModelBase*).::
+
+
+ class SpecialString(Unicode):
+ """Custom string type that prohibis the use of colons"""
+
+ @staticmethod
+ def validate_string(cls, value):
+ """Override the function to enforce our own verification logic"""
+ if value:
+ if ':' in value:
+ return True
+ return False
+
+
+
+A slightly more complicated example
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+A custom numerical type that verifies if the number is prime.
+
+This time both flavours of validation are combined: *validate_string* to see if
+it is a number, and then *validate_native* to see if it is prime.
+
+.. NOTE::
+ *rpclib* has a primitive type called *Integer*, it is reasonable to use that
+ one as a basis for this custom type. *Unicode* is used in this example
+ simply because it is an opportunity to show both types of validation
+ functions in action. This may be a good academic example, but it is
+ certainly not the approach one would use in production code.
+
+
+::
+
+ class PrimeNumber(Unicode):
+ """Custom integer type that only works with prime numbers"""
+
+ @staticmethod
+ def validate_string(cls, value):
+ """See if it is a number"""
+ import re
+
+ if re.search("[0-9]+", value):
+ return True
+ else:
+ return False
+
+ @staticmethod
+ def validate_native(cls, value):
+ """See if it is prime"""
+
+ #calling a hypothetical function that checks if it is prime
+ return IsPrime(value)
+
+
+.. NOTE::
+ Constraints applied at this level do **not modify** the XML schema itself,
+ thus a client that retrieves the WSDL of the service will not be aware of
+ these restrictions. Keep this in mind and make sure that validation rules
+ that are not visible in the XML schema are documented elsewhere.
+
+.. NOTE::
+ When overriding ``validate_string`` or ``validate_native`` in a custom type
+ class, the validation functions from the parent class are **not invoked**.
+ If you wish to apply those validation functions as well, you must call them
+ explicitly.
+
+
+
+Summary
+=======
+- simple checks can be applied at the XML schema level, you can control:
+
+ - the length of a string
+ - the pattern with which a string must comply
+ - a numeric interval, etc
+
+- *rpclib* can apply arbitrary rules for the validation of input data
-TODO
+ - *validate_string* is the first applied filter
+ - *validate_native* is the applied at the second phase
+ - Override these functions in your derived class to add new validation rules
+ - The validation functions must return a *boolean* value
+ - These rules are **not** shown in the XML schema
Something went wrong with that request. Please try again.