-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support Python 2 basestring for schema fields #92
Comments
On Python 2, Passing a This has been
I would suggest that's a problem in the way the data is being read, then. The supplied >>> from nti.externalization.representation import JsonRepresenter
>>> rep = JsonRepresenter()
>>> rep.load(b'1')
1
>>> rep.load(b'"a string"')
u'a string'
>>> |
I think we are running in to this "optimization" in lxml.
I'm not sure what to do about it yet, but I think that's why we are getting byte strings back when all other indications should be that we get unicode strings (unicode string input, with appropriate encoding in the xml doc). |
If so, I would expect the consumer of the lxml parsing data to transform into text as needed values that are intended to be text. That can be done in an XML schema agnostic way if you can assume that all PCDATA (element content) is text (doesn't it have to be?). (I don't think tag names would matter for this, but if they happen to specify field names, that's fine, field names are supposed to be native strings anyway.) |
Yes, I agree. Unfortunately were wrapped around a 3rd party library doing the parsing/consumption so it's going to take some work arounds on our side. I think it is doable though and it does seem like the proper place to do it. I think this can probably be closed. |
I can propose an |
I'd be in favor of that if we feel like it's a reasonable and good idea. Assuming they would be open to that. |
I was thinking about that some more, and I've come to the conclusion it's actually fairly problematic. The reason is that on Python 3 (which is now the dialect version used by the majority according to the most recent Tiobe rankings), native string and unicode are the same thing. An object would have to implement both interfaces and methods, which is redundant and ugly: class IFromNativeString(Interface):
def fromNativeString(value):
...
@interface.implementer(IFromUnicode,IFromNativeString)
class Python3StrIsUnicode(object):
def fromUnicode(self, value):
# existing code
if PY3:
fromNativeString = fromUnicode
else:
def fromNativeString(self, value):
# Python 2, do something with a bytestring I thought of two alternatives. One would be a @implementer(IFromUnicode,IFromByteString)
class Number(Field):
def fromByteString(self, bytestring):
return self.fromUnicode(bytestring.decode('utf-8') That's more general, and for things like The other alternative is to ... actually, I can't remember right now. I've been pulled away several times while writing this up and I don't remember what I was thinking before. |
#93 ought to resolve this if you want to take a look. (cf OpenNTI/nti.schema#34) |
Add support for IFromBytes. Fixes #92
We have encountered an issue with validating schema fields when the supplied value is a Python 2.7 byte string instead of a unicode object. In nti.externalization.internalization.fields.py,
In Python 2, the type check here fails for a base string type. This seems like a functional difference from the expected behavior in Python 3. Should we consider comparing against
string_types
here rather thantext_type
? We are seeing this as an issue in cases where the value is a byte string int or bool supplied through a 3rd party library.The text was updated successfully, but these errors were encountered: