## Binary data

We can convert a number to it´s binary representation using **`bin`**

In [1]:
int(0b101)

5

In [2]:
bin(6)

'0b110'

We can use operations at bit level. For example the following operation takes the binary representation of a 6 and computes and AND operation with 10.

That is 110 AND 010 = 010 which is 2

In [3]:
6 & 0b010, 7 & 0b101

(2, 5)

In [4]:
bin(1 << 4)

'0b10000'

In [5]:
bin(0x8 << 4) 
bin(0x7 << 0), bin(0x7 << 1), bin(0x7 << 2), bin(0x7 << 3)

('0b111', '0b1110', '0b11100', '0b111000')

### Bitarray

https://pypi.org/project/bitarray/

In [6]:
from bitarray import bitarray

In [7]:
x = bitarray(5)
x

bitarray('10100')

In [8]:
a = bitarray('1010')
a

bitarray('1010')

In [9]:
a = bitarray(8)
a.setall(0)
a[5] = 1
a[6] = 1
print(a)
print(a.tobytes())

bitarray('00000110')
b'\x06'


In [10]:
a = bitarray(8)
a.setall(0)
a[2] = 1
a[0] = 1
print(a)
print(a.tobytes())

bitarray('10100000')
b'\xa0'


## Bytes

In [9]:
b = bytes(2)
print(b)
print('len(b)=', len(b))

b'\x00\x00'
len(b)= 2


In [10]:
b = bytes([1, 11, 3, 254])
print(b)
print('len(b)=', len(b))

b'\x01\x0b\x03\xfe'
len(b)= 4


In [11]:
import uuid
x = uuid.UUID('{0b010203-0405-0607-0809-0a0b0c0d0e0f}')
x.bytes

b'\x0b\x01\x02\x03\x04\x05\x06\x07\x08\t\n\x0b\x0c\r\x0e\x0f'

The function random.randbytes (included in python 3.9) allows us to generate random bytes

In [11]:
from random import randbytes

randbytes(16)

ImportError: cannot import name 'randbytes' from 'random' (/Users/davidbuchaca/opt/anaconda3/lib/python3.8/random.py)

In [12]:
import os
%timeit os.urandom(10)

2.27 µs ± 53 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)


In [14]:
import random
random.getrandbits(8)

22

In [15]:
import random
N = 100000
bits = random.getrandbits(N)

In [15]:
%timeit bytearray((random.getrandbits(8) for i in range(16)))

1.9 µs ± 131 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)


In [16]:
%timeit random.getrandbits(128)

112 ns ± 3.66 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)


In [17]:
bin(random.getrandbits(128))

'0b10011101111110011110111000110100001011111010000111100010111010010100001000001000010001000000000110110011000110110111100100100111'

In [18]:
?random.getrandbits

In [19]:
i = random.getrandbits(128)
i

60918683762361551615632537392054648668

In [73]:
format(i, '128b')

'  111101010011101000001001001010101110111100100110110111110001000100001010001111101100100001000010111011110110011001000010111101'

We can get an array of bytes representing an integer `i` using `i.to_bytes()`

In [90]:
zero_one_string = format(i, '128b')
n_bytes = (len(zero_one_string) + 7) // 8
barray = int(zero_one_string, 2).to_bytes(n_bytes, byteorder='big')
len(barray)

16

Here byteorder states he byte order used to represent the integer. 
- If byteorder is `'big'`, the most significant byte is at the beginning of the byte array. 

- If byteorder is `'little'`, the most significant byte is at the end of the
  byte array.  

In [173]:
i = 280

x_big = i.to_bytes(2, 'big')
x_little = i.to_bytes(2, 'little')

print(x_big)
print(x_little)

b'\x01\x18'
b'\x18\x01'


In [174]:
x_little

b'\x18\x01'

We can interpret bytes as ints with `int.from_bytes`

In [177]:
print(int.from_bytes(x_big, byteorder='big'))
print(int.from_bytes(x_little, byteorder='little'))

280
280


Note that if we don´t pass the correct byteorder the results might not be what we expect

In [135]:
print(int.from_bytes(x_big, byteorder='little'))
print(int.from_bytes(x_little, byteorder='big'))

6145
6145


What is happening here?

The bytes are interpretted differently depending on the byteorder.

In [182]:
print(x_big)

# Here * indicates the most significant bit (the one in the left because it´s big)
#     *
# b'\x01\x18'

# from right to the left
print(8*16**0 + 1*16**1 + 1*16**2 + 0*16**3)

# from left to right
print(0*16**3 + 1*16**2 + 1*16**1 + 8*16**0)

b'\x01\x18'
280
280


In [187]:
print(x_little)

# Here * indicates the most significant bit (the one in the left because it´s little)
#          *
# b'\x18\x01'

# from right to left coef
print(1*16**0 + 8*16**1 + 0*16**2 + 1*16**3)

# from left to right coef
print(1*16**3 + 8*16**2+ 0*16**1 + 1*16**0)

b'\x18\x01'
4225
6145


In [23]:
def create_random_bytes():
    i = random.getrandbits(128)
    zero_one_string = format(i, '128b')
    barray = int(zero_one_string, 2).to_bytes((len(zero_one_string) + 7) // 8, 'big')
    return barray

In [24]:
%timeit create_random_bytes()

1.04 µs ± 31.1 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)


In [200]:
def create_random_bytes2():
    random.seed(123)
    i = random.getrandbits(128)
    barray = i.to_bytes((128 + 7) // 8, 'big')
    return barray

In [26]:
%timeit create_random_bytes2()

275 ns ± 2.82 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)


In [201]:
x = create_random_bytes2()

In [202]:
# [hex(x_k) for x_k in x]

In [198]:
def create_random_bytes3():
    random.seed(123)
    i = random.getrandbits(128)
    barray = i.to_bytes((128 + 7) // 8, 'big')
    return bytearray(barray)

In [199]:
create_random_bytes3()

bytearray(b'\xc4\xdaS|\x16Q\xdd\xaeD\x86}\xb3\rg\xb3f')

In [42]:
%timeit create_random_bytes3()

409 ns ± 8.04 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)


We can modify a byte from a bytearray 

In [299]:
# hexadecimal format uuid: 8-4-4-4-12
# byte format uuid: 4-2-2-2-8

x = bytearray(create_random_bytes2())
print(uuid.UUID(bytes = bytes(x)))

x = bytearray(create_random_bytes2())
x[6:7] = b'\x40'
print(uuid.UUID(bytes = bytes(x)))

c4da537c-1651-ddae-4486-7db30d67b366
c4da537c-1651-40ae-4486-7db30d67b366


What if we want to modify only the upper part of a byte?

Note that to generate a UUID of version 4 we only want to modify the first hexadecimal in the first part of the byte from position 7. Nevertheless, we can´t do this directly at bytearray level.

In [319]:
x = bytearray(b'\xff\xff')
print(x)
x[0] = 0
print(x)

bytearray(b'\xff\xff')
bytearray(b'\x00\xff')


In [345]:
int.from_bytes(b'\xff', byteorder="little") 

255

Modify nibble (hex characters) from byte: https://stackoverflow.com/questions/52535980/how-to-set-bits-within-a-bytes-object-that-represents-hex-symbols-in-python?rq=1

https://stackoverflow.com/questions/42896154/python-split-byte-into-high-low-nibbles


The following cell shows how the first byte from the bytearray is modified in a way that only the first nibble (first halve of the byte) is modified. The bytearray initially contains `[ff,bb]` and we convert it to `[af,bb]`, modifying only the first nibble of the first byte.

In [407]:
x = bytearray(b'\xff\xbb')
x[0] = (0xa << 4) | (x[0] & 0xf) 
x

bytearray(b'\xaf\xbb')

In [408]:
bin((0xa << 4) + (x[0] & 0xf))

'0b10101111'

In [409]:
bin((0xa << 4) | (x[0] & 0xf))

'0b10101111'

In [418]:
bin((0xa << 4))

'0b10100000'

Using this trick we can modify the first nibble of the first byte in the third block of a UUID to state the UUID version to 4 (for example).

In [426]:
# hexadecimal format uuid: 8-4-4-4-12
# byte format uuid: 4-2-2-2-8

x = bytearray(create_random_bytes2())
print(uuid.UUID(bytes = bytes(x)))

x = bytearray(create_random_bytes2())
x[6] = (0x4 << 4) | (x[6] & 0xf) 
print(uuid.UUID(bytes = bytes(x)))

c4da537c-1651-ddae-4486-7db30d67b366
c4da537c-1651-4dae-4486-7db30d67b366


In [443]:
byte = int('ff', 16)
high, low = byte >> 4, byte & 0xaF
print(hex(byte), hex(high), hex(low))

0xff 0xf 0xaf


set the two most significant bits of the 9th byte to 10'B, 
so the high nibble will be one of `{8,9,A,B}`.

abData[8] = 0x80 | (abData[8] & 0x3f);

In [503]:
def create_random_bytes2(seed):
    random.seed(seed)
    i = random.getrandbits(128)
    barray = i.to_bytes((128 + 7) // 8, 'big')
    return barray

x = bytearray(create_random_bytes2(1238))
x[6] = (0x4 << 4) | (x[6] & 0xf) 
x[8] = (0x8 << 4) | (x[8] & 0x3f) 
print(uuid.UUID(bytes = bytes(x)))

5a6d0c82-364d-469c-be05-89cf06b0cb7d


#### Creating a UUID4 from scratch

In [240]:
def create_random_bytes_for_uuid_4(seed):
    i = random.getrandbits(128)
    barray = bytearray(i.to_bytes((128 + 7) // 8, 'big'))
    barray[6] = (0x4 << 4) | (barray[6] & 0xf) 
    barray[8] = (0x8 << 4) | (barray[8] & 0x3f) 
    return barray


In [241]:
%timeit r = create_random_bytes_for_uuid_4(12)

583 ns ± 7.35 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)


In [242]:
import binascii
r = create_random_bytes_for_uuid_4(12)
r_str = binascii.hexlify(r).decode('ascii')

In [243]:
res = r_str[0:8] + '-' +r_str[8:12] + '-' + r_str[12:16] + '-' + r_str[16:20]+ '-' + r_str[20:]
res

'1b0dc5e0-e2b5-44de-9e32-ee7659bccd9d'

In [244]:
def create_uuid_4(seed):
    barray = create_random_bytes_for_uuid_4(seed)
    r_str = binascii.hexlify(barray).decode('ascii')
    return r_str[0:8] + '-' +r_str[8:12] + '-' + r_str[12:16] + '-' + r_str[16:20]+ '-' + r_str[20:]


In [256]:
%timeit create_uuid_4(12)

1.68 µs ± 43.5 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)


In [257]:
%timeit str(uuid.uuid4())

4.72 µs ± 85.5 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)


### Speed up create uuid 4 with cython

In [16]:
%load_ext autoreload
%autoreload 2
%load_ext cython
%timeit

import Cython

In [235]:

import random

def create_random_bytes_for_uuid_4(seed):
    i = random.getrandbits(128)
    barray = bytearray(i.to_bytes((128 + 7) // 8, 'big'))
    barray[6] = (0x4 << 4) | (barray[6] & 0xf) 
    barray[8] = (0x8 << 4) | (barray[8] & 0x3f) 
    return barray

In [236]:
%timeit create_random_bytes_for_uuid_4(123)

588 ns ± 18 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)


In [237]:
%%cython -a

import random
cimport cython

@cython.boundscheck(False)
@cython.wraparound(False)
cpdef create_random_bytes_for_uuid_5(int seed):
    cdef unsigned char[::1] barray
    i = random.getrandbits(128)
    
    #barray = bytearray(i.to_bytes((128 + 7) // 8, 'big'))
    barray = bytearray(i.to_bytes(16, 'big'))
    barray[6] = (0x4 << 4) | (barray[6] & 0xf) 
    barray[8] = (0x8 << 4) | (barray[8] & 0x3f) 
    return barray

In [238]:
%timeit create_random_bytes_for_uuid_5(123)

663 ns ± 27.3 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)


In [239]:
barray = create_random_bytes_for_uuid_5(123)
print(bytes(barray))

b'J\x8c\x9c\xfa\x8f|F$\xb7q\xe1\xca\xf5\x98\xc2\xe1'


In [190]:
binascii.hexlify(barray).decode('ascii')

'12b8852259f64f41a72a1696751cb834'

In [172]:
%%timeit 
barray = create_random_bytes_for_uuid_5(123)
binascii.hexlify(barray).decode('ascii')

905 ns ± 13.8 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)


In [226]:
%%cython -a

import random
cimport cython

import binascii

@cython.boundscheck(False)
@cython.wraparound(False)
cpdef create_random_bytes_for_uuid_5(int seed):
    cdef unsigned char[::1] barray
    
    i = random.getrandbits(128)
    
    #barray = bytearray(i.to_bytes((128 + 7) // 8, 'big'))
    barray = bytearray(i.to_bytes(16, 'big'))
    barray[6] = (0x4 << 4) | (barray[6] & 0xf) 
    barray[8] = (0x8 << 4) | (barray[8] & 0x3f) 
    return barray

cpdef create_uuid_5(seed):
    

    barray = create_random_bytes_for_uuid_5(seed)
    r_str = binascii.hexlify(barray).decode('ascii')
    return r_str[0:8] + '-' +r_str[8:12] + '-' + r_str[12:16] + '-' + r_str[16:20]+ '-' + r_str[20:]


In [227]:
%timeit create_uuid_5(123)

1.44 µs ± 49.2 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)


In [228]:
create_uuid_5(123)

'5fb11372-e8dd-4c91-8e19-e54833eb7a31'

In [224]:
create_uuid_4(123)

'c4da537c-1651-4dae-8486-7db30d67b366'

In [200]:
%timeit create_uuid_4(123)

8.4 µs ± 94.9 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)


In [204]:
import uuid
%timeit uuid.uuid4()

3.74 µs ± 291 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)


In [255]:
%timeit str(uuid.uuid4())

5.06 µs ± 275 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)


## Bytearray

In [188]:
b = bytearray(3)
print(b)
print('len(b)=', len(b))

bytearray(b'\x00\x00\x00')
len(b)= 3


In [194]:
b = bytearray(3)
b[0]=255
b

bytearray(b'\xff\x00\x00')

In [197]:
b = bytearray(3)
b[2]=255
b

bytearray(b'\x00\x00\xff')

In [201]:
b = bytearray(3)
print(b)
b.append(5)
print(b)

bytearray(b'\x00\x00\x00')
bytearray(b'\x00\x00\x00\x05')


In [249]:
b = bytearray(3)
b[2:3] = b'\x41'
print(b)

bytearray(b'\x00\x00A')


There are objects, such as a UUID, that can be created from bytes data

In [383]:
%timeit uuid.UUID(bytes=b'\x0b\x01\x02\x03\x04\x05\x06\x07\x08\t\n\x0b\x0c\r\x0e\x0f')

995 ns ± 12.7 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)


In [388]:
%timeit uuid.uuid4()

3.75 µs ± 43 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)


In [389]:
len(b'\x0b\x01\x02\x03\x04\x05\x06\x07\x08\t\n\x0b\x0c\r\x0e\x0f')

16

## Hexadecimal

In [64]:
int('0xff', 16), int('0x20', 16), int('0x0a', 16)

(255, 32, 10)

'0b10000'

## Bits


```
Transformations Summary

Strings to Integers:

"1011101101": int(str, 2)
"m": ord(str)
"0xdecafbad": int(str, 16) (known to work in Python 2.4)
"decafbad": int(str, 16) (known to work in Python 2.4)
Integers to Strings:

"1011101101": built-in to Python 3 (see below)
"m": chr(str)
"0xdecafbad": hex(val)
"decafbad": "%x" % val
```

In [32]:
print(int('000101', 2))

5
