-
Notifications
You must be signed in to change notification settings - Fork 169
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add bitarray implementation to eliminate external dependency #265
Conversation
Great performance gain compared to the previous direct approach
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mndza thanks very much for doing this!
I've had a rummage, and found a few issues - sorry!
... this should be a "request changes" review, I'm still learning GitHub's interface.
def setall(self, value): | ||
value_int = ~(-1 << self._len_) if bool(value) else 0 | ||
value_bytes = value_int.to_bytes(len(self._array_), "little") | ||
self._array_ = array('B', value_bytes) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This implementation makes me nervous as it stands...
It seems to work with absurdly large integers, but did I just not go high / complex enough?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still haven't found a way to do it faster than this. Regarding your concern, this code should in fact work for arbitrarily large numbers, as Python's int
implementation is only limited by the amount of available memory.
I also tried some other approaches. The following runs at least 10x slower in my machine, testing with 1GiB bitarray
:
def setall(self, value):
byte = 0xFF if bool(value) else 0
self._array_ = array('B', bytes(byte for _ in range(len(self._array_))))
if self._len_ % 8:
self._array_[-1] &= ~(-1 << (self._len_ % 8))
self.fuse = bitarray(int(count, 10), endian="little") | ||
self.fuse = bitarray(0, int(count, 10)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again re avoiding changes...
Do you handle endianness at all at the moment? Is it important?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Regarding endianness, the class currently only works in "little-endian" mode: least significant bytes are stored at the smallest address, and bits are ordered from LSB to MSB inside a byte. This is also consistent with glasgow.support.bits
.
I am not sure if this is a problem though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Endianness should emphatically not be a property of the new bitarray
class. However, it must be possible to specify endianness when serializing/deserializing to/from buffer objects. E.g. SPI and JTAG have opposite endianness and we are actually going to hit this when working on gateware.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
However, I can see that glasgow.support.bits
does not have a way to do so. It's fine to extend both of them at the same time in a different PR, provided that existing bitarray
-using code can be ported as it is.
def byte_reverse(self): | ||
""" In-place byte reversal """ | ||
if not hasattr(self.byte_reverse, "table"): # only build once | ||
self.byte_reverse.__func__.table = self.__build_revbyte_table() | ||
table = self.byte_reverse.table | ||
for i in range(len(self._array_)): | ||
self._array_[i] = table[self._array_[i]] | ||
if self._len_ % 8: | ||
self._array_[-1] &= ~(-1 << (self._len_ % 8)) | ||
return self | ||
|
||
def byte_reversed(self): | ||
new = self.__class__(self._array_, self._len_) | ||
return new.byte_reverse() | ||
|
||
def reversed(self): | ||
if self._len_ % 8: | ||
# Adding zero padding in the low bits allows easier bit reversal | ||
# for the cases where bit length is not multiple of byte size | ||
out = self.__class__(0, 8-(self._len_%8)) + self | ||
else: | ||
out = self.__class__(self._array_, self._len_) | ||
out._array_.reverse() # Reverse byte order (last to first) | ||
out.byte_reverse() # Reverse bit order in every byte | ||
out._len_ = self._len_ # Truncate possible length extension due to padding | ||
return out |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The distinction between bitarray.byte_reverse()
, bitarray.byte_reversed()
and bitarray.reversed()
is very subtle on first glance... perhaps a comment?
Also, some don't behave properly, or observe self._len_
properly: 😬
>>> x = bitarray.from_bytes(b'abc', 8*4)
>>> x
bitarray('00000000011000110110001001100001')
>>> x.byte_reverse()
bitarray('00000000110001100100011010000110')
>>> x.byte_reversed()
bitarray('00000000011000110110001001100001')
>>> x.reversed()
bitarray('00000000011000010110001001100011')
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I know the distinction is not very clear (see my original PR message). I will add some docstrings, but maybe we could remove one of the methods (byte_reverse()
or byte_reversed()
) ?
The wrong result in reversed()
was caused by the bug fixed in b99b66f.
Regarding self._len_
:
- In
reversed()
,self._len_
is used to determine how much padding should be added before the actual reverse process. Not seeing the problem here. - In
byte_reverse()
/byte_reversed()
, entire bytes are reversed, so the only thing I do whenself._len_
is not multiple of 8 is mask the final byte. I also stated my doubts about the correct operation here in the original PR. Could you explain what behavior are you expecting in this case? Then I may modify the method to comply.
since commit b99b66f this assertion is redundant
software/glasgow/support/bitarray.py
Outdated
|
||
def __bitop(self, other, op, clear_top=False): | ||
other = self.__class__(other) | ||
# Only perform operation in-place when the other bitarray is smaller |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems like a wrong condition to pick between mutating and copying operations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What would be the proper response when other
's bit length is larger? Maybe raise an exception like @attie (#265 (comment)) mentioned? If so, should that only happen for the in-place / mutating operations?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed the behavior in commit ffc277a
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for your contribution. Unfortunately, while the basic approach to the task was sound, this particular implementation includes a substantial degree of complexity (e.g. the __copy_bit_slice
method) and numerous issues (see inline comments), and as a result I cannot convince myself that we should merge it. Again unfortunately, I also do not expect that I will have the review bandwidth necessary to bring this implementation to the point where it would be mergeable.
I apologize for causing you to spend your time in this way. I think that some of the code in this PR could be reused by a future developer handling the same task, so hopefully it will have some utility in the end.
I do agree with you: looking at this code now (it's been a while) and I'm not happy with it. I should have targeted for simplicity and correctness first. My apologies. |
Thank you for understanding. I'm happy to work with you on any future contributions, if you would like to. |
This PR is an attempt to address #179. This bitarray implementation follows the interface established by
glasgow.support.bits
while changing the backing storage to anarray()
object. Currently the element size of this array is a byte and not a machine word.I am not particularly happy with the
byte_reverse
/byte_reversed
method names to imply in-place modification or not. Also, I am not sure about the correct behavior in those functions when the number of stored bits is not a multiple of the byte size. Right now it drops the bits that exceed the bit length after the byte reversal.Please note that for initializing the bitarray from a
bytes
object you need to provide the explicit length in bits. This is because I copied the constructor interface fromglasgow.support.bits
, but I do not know if that is the desired operation.Most of the test cases were copied from
glasgow.support.bits
.