-
Notifications
You must be signed in to change notification settings - Fork 8
Description
Getting data from Perl into Python has one major challenge: The Scalar type
cannot be mapped 1:1 into Python. For exapmple a scalar can be an integer, a
float or even a string. The scalar deserializers in python-storable
(SC_SCALAR, SX_LSCALAR and SX_TIED_SCALAR) do not properly determine the
actual type. Instead the type is guessed. This is bound to cause bugs
down-stream. For example if a user of the library stored the string "23" in
Perl and deserialises in Python, the value will become the integer value 23!
Blindly reading the values as bytes is also impractical, and caused issues
with the existing unit-tests.
Personally, I have not enought experience with Perl (and the storable format)
yet to provide a fix with enough certainty of correctness, so any help is
greatly appreciated.
With agreement of the involved parties, I'll summarise the previous discussion
about this topic:
exhuma Feb. 22. 2017
I've contacted the Perl maintainers to get some information about the
"scalar" type. And the main issue stems from the fact that in Perl, numbers
are actually subclasses from strings. Which means that a scalar can either be
a string, an int or a float/double which makes decoding into Python
impossible without type guessing. The best one could do would be to return
the values as plain bytes. But that's kind of impractical.
mhart Feb. 23. 2017
Perl is, kinda, typeless. It's a very hard thing to work with when moving
data into a typed language like Python (or between Perl and JSON even...).
The scalar type is a C struct which can hold multiple value types and a set
of flags indicating which of those entries can be read to extract the current
value held. This means that if you read in the string value "123abc" and then
tried to use that in an integer context ($var + 0) you'd get the value 123
returned, but then the struct of your original var would also hold the
integer 123 in the struct. Magic... :|
http://perldoc.perl.org/perlguts.html#Working-with-SVs and
http://cpansearch.perl.org/src/RURBAN/illguts-0.49/index.html might help you,
I'm not sure.Part of the type-guessing you're working on might want to decide which of the
flags to give precedence to. For example, if a number in the value-part of a
perl hash were used as a string at some point, when outputting the Python
dict equivalent would you output a string or int value? My preference is for
the number, but that's mostly because even when debugging Perl you can
accidentally cause the string value of a scalar to be created which would
affect the types you get. See the Perl JSON modules for example, or this kind
of question on SO
[...]
mhart 28. Feb. 2017
I think people make poor assumptions about types in Perl. If a value contains
something that looks like a number, it can be treated as such. If it is
treated as a number then it acts like one. That does not, however, make it a
number. It's just a scalar with some special flags. Working out how to best
use that is hard. The JSON encoders in Perl prove this, where numbers may
be rendered as strings because someone tried printing the variable earlier,
etc. :( Perl doesn't appear to support bigint by default, so expecting
b'112233445566778899' != 112233445566778899 to do "the right thing" won't
work either.However... there may also have been some poor decoding in our storable.py
module where we aren't doing the right thing with the flags associated with a
scalar, you remember that perlguts page I referred you to earlier? I think we
might need to revisit that to determine what the type should be in Python
in case we've made a mistake somewhere here.