-
Notifications
You must be signed in to change notification settings - Fork 287
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement conversion from IBufferProtocol to IList<byte> in Binder #762
Conversation
The only thing the
The whole buffer protocol thing has been bugging me for a while. They should really be a zero-copy way of exposing the underlying memory instead of serializing the object. It seems to me like we could easily do this using Span/Memory. For example, if
If we're going with a new attribute to enable conversion, maybe we could do something like
Did you already resolve this? It seems like you're binding to any
I've thought about having
These are probably left over from 2.7 and could be removed. |
1. BytesConversionAttribute
OK, I understand now, but then maybe it is an opportunity to rethink/reorganize the mechanism. When I remove With
Without
In CPython (example):
The type error still occurs because the Binder wraps But maybe the whole problem is stated backwards. I have scanned the whole codebase; there are 154 public interface functions that use an
A cursory check shows that most of them should have been tagged with 2. IBufferProtocol
I absolutely agree. I was having exactly the same thoughts. But I thought that we cannot use 3. BytesLikeAttributeI like the name. CPython's documentation often talks about a "bytes-like" object as an acceptable input type. But on some occasion it talks about objects implementing a buffer protocol. It is being reflected in error messages as well:
I don't know if there is any difference between those two concepts, will have to dig deeper. But it appears that whenever Python API accepts a "bytes-like object" or an object with a "buffer interface", in IronPython, the parameter is tagged with So here are the options for the interface: [BytesLike]IList<bytes> // if object must be modifiable
[BytesLike]IReadOnlyList<bytes> // my current favorite
[BytesLike,BufferProtocol]IList<bytes> // in case buffer interface is something different than bytes-like
[BytesLike,BufferProtocol]IReadOnlyList<bytes> // in case buffer interface is something different than bytes-like Also, if a parameter is tagged with And finally, maybe the overload resolver could produce a better error message in such case, something more in line with CPython, e.g. "expected bytes-like object, got xxx" rather "expected IList[byte], got xxx"? 4. Binding Constraint
No I haven't solved it yet. I think I wasn't clear in my description of the issue. It is true that the proposed code will perform binding for all instances implementing But after thinking about it for a day I am now not so sure what is best; after all, there is a tradeof. If the first expression tree produced were restricted to apply to every object implementing 5. Memoryview
If implementing 6. str to bytes conversion
I will gladly remove this code at the first opportunity. |
1. BytesConversionAttribute
Preventing 2. IBufferProtocol
We could include the 3. BytesLikeAttribute
CPython is somewhat inconsistent with their error messages. I think in most cases where they say "buffer protocol" they mean "bytes-like". I think it's somewhat cleaner in newer releases.
I am fine with renaming it.
While this is true, if you look at Python's definition of bytes-like object they specifically mention contiguous memory.
Because it didn't exist when the bulk of the codebase was written! 😄 I would be fine with making use of this since 4.5 is now the minimum target. Presumably [BytesLike]IList<bytes> // if object must be modifiable
[BytesLike]IReadOnlyList<bytes> // my current favorite These seem good. 5. Memoryview
That would be an option. We could go further and implement it on all Anyway, it seems to me like |
So a bytes-like object has two constraints: implement buffer protocol and be a contiguous buffer. Great. This indeed makes
I agree. I have realized that a Below there are some thoughts I have on the choices I currently see. I evaluate them from the implementation point of view, and from the client point of view (e.g. C# host code calling into IronPython code). A.
|
@BCSharp Thanks for the analysis!
Why do we need a reference to the original array? If the concern is that methods like unsafe {
fixed (byte* bp = span.Slice(start)) {
var z = e.GetString(bp, length);
}
}
This sounds like a good approach.
I agree. If you think about it, a Python list of bytes does not qualify as bytes-like so would would a .NET list of bytes work (although one could argue about the IList being strongly typed). There are enough differences between IronPython 2 and 3 that I think this one will be a minor pain point. Before jumping all in we should do some small scale experiments. I think there are issues with using |
I like the idea of going though unsafe code in this case. Why bump up to .NET 4.6.1? Default
Agreed. I will start with the experiments around encodings and make a draft pull requests if there is anything interesting to share/review.
I cannot think of any issues. Do you have any (pointers to) information or examples of those issues? |
Pointer overloads were only added to
|
Superseded by #765. |
This PR originated from a failing
codecs
test. It turns out that all codec functions accept a 'b' typearray
just as well asBytes
. At first I thought of adding another overload acceptingIBufferProtocol
objects, but it turns out that[BytestConversion]IList<byte>
will catcharray
as well. However, it does not go throughIBufferProtocol
, instead what is happening is that the binder sees thatarray
implementsIList<object>
and wraps it inListGenericWrapper<byte>
, hoping for the best.But it does not work.
ListGenericWrapper<byte>
, on access or on copy (which is triggered byToArray()
in the first line ofStringOps.DoDecode
) accesses each element as object and unboxes tobyte
. But thearray
, even for type 'B', when accessed thoughtIList<object>
, boxes its elements toint
for some reason. Even if I changed that and madearray
box abyte
, this still would not be good enough: the conversion will fail for other array types, likesbyte
orfloat
, and in CPython all objects work as long as they support the buffer protocol:So it is clear that wrapping in
ListGenericWrapper<byte>
is not a way to do it. In the end I went aheand modyfing the Binder logic to do the conversion on the fly in the expression tree. This seems to do the right thing, does it?Questions/issues:
Is
BytesConversionAttribute
really needed? I do not see it being checked/used anywhere by the Binder.Wouldn't it be better to wrap
IBufferProtocol
objects on binding and serialize to a memory stream only on access? This would delay the serialization to when the object is actually accessed, rather than having it done already in the binding expression tree.Use a dedicated attribute to enable this functionality, e.g.
BufferProtocolAttribute
? In this way,BytesConversion
will be used only forBytes
andByteArray
, andMemoryView
. On the other hand, I haven't found a case whenBytesConversion
would apply butBufferProtocol
conversion not. In all cases I tried, where IronPython usesBytesConversion
on parameters, CPython does accept an arbitraryarray
as well:Is the binding constraint right? It is supposed to constraint the binging to the argument that match the type exactly (I assume that
LimitType
means specialized type for generics). Ideally, the constraint would be more relaxed, matching any type that implements interfaceIBufferProtocol
but I coudn't figure out how to define such constraint. Maybe constraining on the exact type match is more efficient in the average case.Even with this change, things are not quite right. CPython will accept
memoryview
in all those instances, but IronPython not. Shouldn'tMemoryView
implementIBufferProtocol
? (Or at leastIList<byte>
).A side issue but caught my eye:
ConversionBinder.cs
, line 173, "blindly" converts a string toBytes
. This is error prone (only Latin 1 characters succeed) and it seems to me it is not a way to do it in Python 3, where allstr
tobytes
conversions are explicit. See NewTypeMaker.cs, line 413: "[BytesConversionAttribute]
to make enable automatic cast between .net IList and string." Really?