Skip to content

Validation tutorial #128

Merged
merged 7 commits into from Mar 21, 2012

3 participants

@ralienpp

Some basic examples, nothing major - but they should help one get started.

@plq
arskom member
plq commented Mar 19, 2012

hi alex,

first, thanks a lot for your efforts. here are my remarks:

1) you're talking about the advanced validation methods without talking about the simpler ones. E.g. Integer(ge=1, le=12) will check for integers between 1 and 12 inclusive. or String(pattern='some_regex') will validate the incoming string using regexps. actually, the validation can be done using lxml's schema validator if you use these methods. if you override validation functions, only validator='soft' can check them.

2) validating by overriding the validation functions don't change the schema itself. so using them are not really encouraged.

3) I always use the "rhetorical we" in the documentation. We should rather be consistent about that one.

4) I always wrap documentation and code by 80 columns. You can see that it's too hard to read even using the web interface.

Again, thanks for your time.

@ralienpp

Hi, thanks for the comments.

I was aware of the regexp pattern for the validation of strings, but I didn't know about the approach you used with Integers. I'll add these examples.

Some questions:

  1. are there other tricks besides the regexp pattern matching for strings and the comparison operators for numbers?
  2. are there any performance considerations one should be aware of? Is the underlying lxml method validation method faster? If so, is it simply because it is "underneath" and there are no abstraction layers above it? Is it because lxml wraps C code? Anything else?
  3. are there any good reasons to use the "soft" validation methods instead of the simple ones? I think the example with prime numbers is fine, but I'm not sure about the other one. I chose to write it like that because "if ':' in value" is a simple check that doesn't involve regular expressions (which are most likely going to be slower). Is that the case?
  4. validation by overriding the functions doesn't change the schema itself, so people who use this SOAP service won't be aware of the actual constraints that are applied to the data. Are there other arguments against "soft" validation?

Style remarks:

  1. wrap at 80 lines
  2. no "I"; this reminds me of http://en.wikipedia.org/wiki/A_Time_of_Changes :-)
  3. is there anything else? I think I will create another section in the manual about this; so others can be synchronized

Questions that are related to other aspects I want to document:

  1. How to declare attributes? Can you provide a simple definition of a class that declares attributes such as "Instant", "MajorVersion" and "MinorVersion" in this schema http://pastebin.com/3hgStRi9 ? Preferably, the example should include optional and mandatory attributes
  2. The same schema has 'abstract="true" ' in the first line, I'm not sure how to express that in terms of rpclib either
  3. Please provide an example of creating an Enum. I saw enum.py in /model and I suppose that you use it like this: myEnum = Enum( ("foo", "bar"), type_name="MyCustomEnum"). Is there anything else one should keep in mind? Will this work: myEnum = Enum( (1, 3, 5), type_name="MyOtherEnum") ?
  4. Line 15 of the same schema contains this: type="mss:MeshMemberType". If I interpret this correctly, it refers to another namespace. To implement that, must I override the namespace variable in the class of the custom type with "mss:MeshMemberType"?

Thanks forward and good night :-)

edited to add numbers to make it easier to respond to individual points. --b.

@plq
arskom member
plq commented Mar 20, 2012

Part I

  1. Yes, you have values constraint for every type. Integer(values=[1,2,3]) or Unicode(values=["a","b"]) or DateTime(values=[datetime.now()]) Also string types have a max_len constraint. You can check the class named Attributes that is defined inside every ModelBase child for possible declarative constraints.

  2. I'd imagine lxml's validation would be much faster because it's a call to C code. I didn't do any benchmarks though.

  3. Soft validation is done by python code. lxml validation won't tolerate one wrong byte, whereas soft validation is more forgiving, especially regarding namespaces. of course, lxml validation can only work on xml data. the prime numbers validation will be skipped when lxml validation is used, because it's just python code.

  4. None that I can think of. Soft validation could use a little bit more testing though. imho, lxml validation is rock solid.

Part II

  1. Columns

  2. Heh, thanks for the reference :) It seems that UKL further developed that idea in Disposessed.

  3. Code conforms with pep8. I can't think of anything else now.

Part III

  1. http://mail.python.org/pipermail/soap/2012-February/000736.html

  2. no abc support in rpclib yet.

  3. https://github.com/arskom/rpclib/blob/master/src/rpclib/test/model/test_enum.py

class MeshMemberType(ComplexModel):
    __namespace__ = "some_string"

Many examples can be found inside tests. They're also a valuable part of the documentation.

@ralienpp ralienpp - xml schema constraints
- notes about the schema not being updated in the case of advanced validation wizardry
7423988
@ralienpp

I applied the changes, here are some additional questions:

  1. For custom types derived from other types, rpclib will call the validation functions of each type in the inheritance chain. True or False?
  2. XML schema validation doesn't work for floats, is it true? I didn't see an Attribute class in the Float type declaration and I didn't see any code that would compare floats properly (i.e. within a margin of error)
  3. Creating new XML schema level constraints by adding an Attributes class - can this be done?

Thanks for the other examples, I will tinker with that. So far I haven't yet began implementing my own system with rpclib, but I think there will be more questions as I begin working on it.

@plq

typo

@plq

u"alpha", u"bravo", u"charlie"

no actual enforcement of this, but that's the right way to do it.

@plq
arskom member
plq commented Mar 20, 2012

this is going great so far. a few comments:

  1. lxml validation and soft validation don't co-operate. you either use one or the other. They should behave mostly the same though. The expected differences are:

    • Soft validation ignores unknown fields, whereas lxml validation just rejects them.
    • Soft validation doesn't care about namespaces, whereas lxml validation rejects unexpected namespaces.
    • Errors thrown by soft and lxml validation differ quite a lot (even client-visible ones). Maybe this is a bug.

    So, if you're going to do imperative validation, you should use soft validation. If your validation rules can be expressed by the declarative tools rpclib offers you, you should use lxml validation, which is, as I said earlier, much more mature than rpclib's built-in validation.

    I guess the above could be added to the relevant section :)

    I think we should call these "validation subsystems" rather than "validation layers" because the term "layer" makes it sound like they actually cooperate. they do not, not currently at least.

  2. This document is a little bit too XML-centric. I understand that currently this is an area where rpclib is more popular, but non-xml protocols can't be validated using lxml's schema validation -- only soft validation will work there. I think this should be made explicit and any xml-related discussion should happen in its own section.

  3. I also made small remarks in-line in the commit page.

@plq
arskom member
plq commented Mar 20, 2012

As for your questions:

  1. False. The overriding function must call its bases' validation. That may not be always necessery or desired.

  2. No proper float comparison either, no. I wonder how lxml deals with this when validating floats though. It'd be a nice question to ask to the libxml people, I guess.

  3. Yes, but you should modify the schema generation code as well, new fields in Attributes class are not picked up automatically

@ralienpp ralienpp - xml validation only in soap/xml
- typo fix
- adding custom Attribute
bb26041
@ralienpp

I applied the changes, here are some additional questions about this doc:

  1. Are non-unicode strings converted to unicode at the lower levels? What's the side effect of "charlie"?
  2. You can use one XOR the other, hmmmm... So, you mean that if I used lxml to check if the thing is a number, later I cannot use the 'soft' validator to see if it is prime?
  3. Do the users see some kind of a warning that says "you're using both, disable one"?
  4. If the code attempts to use both methods, which one is actually applied? (assuming that the code works and no exceptions are thrown at the time when the classes are loaded)
  5. "Errors thrown by soft and lxml validation differ quite a lot" - can you provide an example? Is this worthy of mentioning in the doc? Is the delta in readability, or in something else? Somehow I imagine errors thrown by lxml are similar to errors shown by C++ compilers that complain about templates :-)
  6. Floats... Hmm, maybe I will experiment with this some time later and reflect this in the doc. The problem is that I haven't yet written my system and haven't even set up an environment that enables me to run rpclib with a debugger. So, if you can tell me anything else about it - do so, otherwise I'll get back to this matter later, when I learn more about lxml's behaviour. In any case, comparing floats without taking another argument such as "error margin" doesn't sound right. Maybe lxml uses a default value for that? We'll see.

There are some additional questions about the other things you said, but they're not related to validation. I think it is better if I take that to the soap mailing list; otherwise those questions are off-topic in this context.

@ralienpp

7 DateTime(values=[datetime.now()]) - how do I interpret that? The only allowed value for this variable is the current system time? Does this ever happen in practice?

@plq
arskom member
plq commented Mar 20, 2012
  1. No, no conversion is done. There's no side effect if you stay in ascii. Be prepared for a random UnicodeError if it's not.

  2. Correct. You need to use the soft validation to do non-standard validation.

  3. No. That's impossible to detect 100% due to python's dynamic nature.

  4. You can't use both because you set validator='soft' or validator='lxml' when instantiating the protocol.

  5. Not off the top of my head. They are both human readable, so unless you're trying to parse them, you're fine. And yes, lxml's errors are more, um, "verbose", though not as bad as g++'s templated class errors.

  6. I'd just leave that be for the time being.

  7. Yes this means what you think it means. i don't think this makes sense at all either, it was just an example :)

@plq

it should be set to 1. making the type mandatory is not as trivial though. For example for string, you need to have String(min_occurs=1, min_len=1, nillable=False) to make it mandatory. see https://github.com/arskom/rpclib/blob/4f57b4abf284e51baca904d6649c8747ebc2804a/src/rpclib/model/primitive.py#L601

@plq

This will change for rpclib-2.8.0. You should set it to float('inf'), the native infinite value, to have arbitrary number of args.

@ralienpp ralienpp - ``unbounded`` in v2.8.0
- comparison table, soft vs lxml
- min_occurs 0->1
2dc0f05
@plq
arskom member
plq commented Mar 21, 2012

Hi Alex,

I have no further comments. Unless there is anything else I can help you with, I'm going to merge this.

Thanks!

@ralienpp

No, just one remark: http://arskom.github.com/rpclib/reference/model.html#rpclib.model._base.ModelBase

On this page it says
{
min_occurs = 0
Set this to 0 to make the type mandatory. Can be set to any positive integer.
}

One of them must be wrong :-)

@plq plq merged commit 804e68e into arskom:master Mar 21, 2012
@plq
arskom member
plq commented Mar 21, 2012

just fixed that, and the document is merged. thank you for your time.

@plq plq added a commit to plq/spyne that referenced this pull request Mar 21, 2012
@plq plq update min_occurs docstring in ModelBase class. #128 0059bd7
@jhnwsk
jhnwsk commented Mar 27, 2012

Hi,

I am very happy after reading this, helped me solve some problems I was experiencing with my little project. Only one thing comes to mind at this point - you mention generic restrictions on primitive types with example but there is no example code for ComplexType derivatives. Maybe just a small snippet? I for one, love the 'learn by example' approach of python documentation.

class MyClass(ComplexModel):
    """
    lucky guess at complex type retrictions
    """
    __namespace__ = "partner"
    class Attributes(object):
        min_occurs = 0
        max_occurs = 1
        default = None
        nillable = False
@plq
arskom member
plq commented Mar 27, 2012

the attributes class should inherit from its parent. here's the correct version:

class MyClass(ComplexModel):
    """lucky guess at complex type retrictions"""

    __namespace__ = "partner"
    __type_name__ = "RealClassName"

    class Attributes(ComplexModel.Attributes):
        min_occurs = 0
        max_occurs = 1
        default = None
        nillable = False

if you think it's going to be helpful, send a small pull request and i'll make sure it gets decent attention :)

@ralienpp

Hi,

I am planning to write more documentation about this, and I had the same thing in mind - just add a bunch of examples at the bottom, with comments embedded into them.

I will do so in the next revision, but at the moment I am focused on documenting another part of rpclib - the creation of custom types.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Something went wrong with that request. Please try again.