# Binary Data

Text data can be challenging, but binary data can be, well, interesting. You need to
know about concepts such as endianness (how your computer’s processor breaks data
into bytes) and sign bits for integers. You might need to delve into binary file formats
or network packets to extract or even change data. This section shows you the basics
of binary data wrangling in Python

# bytes and bytearray

Python 3 introduced the following sequences of eight-bit integers, with possible values from 0 to 255, in two types:
• bytes is immutable, like a tuple of bytes
• bytearray is mutable, like a list of bytes

Beginning with a list called blist, this next example creates a bytes variable called
the_bytes and a bytearray variable called the_byte_array:

In [4]:
blist = [1, 2, 3, 255]
the_bytes = bytes(blist)
print(the_bytes)

the_byte_array = bytearray(blist)
the_byte_array
print(the_byte_array)


b'\x01\x02\x03\xff'
bytearray(b'\x01\x02\x03\xff')


# NOTE :

The representation of a bytes value begins with a b and a quote
character, followed by hex sequences such as \x02 or ASCII characters, and ends with a matching quote character. Python converts
the hex sequences or ASCII characters to little integers, but shows
byte values that are also valid ASCII encodings as ASCII
characters:

print(b'\x61')
=> result :
b'a'

print(b'\x01abc\xff')
=> result :
b'\x01abc\xff'


In [5]:
print(b'\x61')
print(b'a')

b'a'
b'a'


### This next example demonstrates that you can’t change a bytes variable:

In [2]:
blist = [1, 2, 3, 255]
the_bytes = bytes(blist)
the_bytes[1] = 127

TypeError: 'bytes' object does not support item assignment

### But a bytearray variable is mellow and mutable:

In [6]:
blist = [1, 2, 3, 255]
the_byte_array = bytearray(blist)

print(the_byte_array)

the_byte_array[1] = 127
the_byte_array[0] = 127

print(the_byte_array)

bytearray(b'\x01\x02\x03\xff')
bytearray(b'\x7f\x7f\x03\xff')


### Each of these would create a 256-element result, with values from 0 to 255:

In [6]:
the_bytes = bytes(range(0, 256))
print(the_bytes)
the_byte_array = bytearray(range(0, 256))
print(the_byte_array)

b'\x00\x01\x02\x03\x04\x05\x06\x07\x08\t\n\x0b\x0c\r\x0e\x0f\x10\x11\x12\x13\x14\x15\x16\x17\x18\x19\x1a\x1b\x1c\x1d\x1e\x1f !"#$%&\'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\\]^_`abcdefghijklmnopqrstuvwxyz{|}~\x7f\x80\x81\x82\x83\x84\x85\x86\x87\x88\x89\x8a\x8b\x8c\x8d\x8e\x8f\x90\x91\x92\x93\x94\x95\x96\x97\x98\x99\x9a\x9b\x9c\x9d\x9e\x9f\xa0\xa1\xa2\xa3\xa4\xa5\xa6\xa7\xa8\xa9\xaa\xab\xac\xad\xae\xaf\xb0\xb1\xb2\xb3\xb4\xb5\xb6\xb7\xb8\xb9\xba\xbb\xbc\xbd\xbe\xbf\xc0\xc1\xc2\xc3\xc4\xc5\xc6\xc7\xc8\xc9\xca\xcb\xcc\xcd\xce\xcf\xd0\xd1\xd2\xd3\xd4\xd5\xd6\xd7\xd8\xd9\xda\xdb\xdc\xdd\xde\xdf\xe0\xe1\xe2\xe3\xe4\xe5\xe6\xe7\xe8\xe9\xea\xeb\xec\xed\xee\xef\xf0\xf1\xf2\xf3\xf4\xf5\xf6\xf7\xf8\xf9\xfa\xfb\xfc\xfd\xfe\xff'
bytearray(b'\x00\x01\x02\x03\x04\x05\x06\x07\x08\t\n\x0b\x0c\r\x0e\x0f\x10\x11\x12\x13\x14\x15\x16\x17\x18\x19\x1a\x1b\x1c\x1d\x1e\x1f !"#$%&\'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\\]^_`abcdefghijklmnopqrstuvwxyz{|}~\x7f\x80\x81\x82\x83\x84\x85\x

In [22]:
the_bytes = bytes(range(33,127))#97
print(the_bytes) # nice JOB :)

b'!"#$%&\'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\\]^_`abcdefghijklmnopqrstuvwxyz{|}~'


In [12]:
print(b'\x1f')

b'\x1f'


# Convert Binary Data with struct

As you’ve seen, Python has many tools for manipulating text. Tools for binary data
are much less prevalent. The standard library contains the struct module, which
handles data similar to structs in C and C++. Using struct, you can convert binary
data to and from Python data structures.
Let’s see how this works with data from a PNG file—a common image format that
you’ll see along with GIF and JPEG files. We’ll write a small program that extracts the
width and height of an image from some PNG data.

In [1]:
import struct
valid_png_header = b'\x89PNG\r\n\x1a\n'
data = b'\x89PNG\r\n\x1a\n\x00\x00\x00\rIHDR' + \
b'\x00\x00\x00\x9a\x00\x00\x00\x8d\x08\x02\x00\x00\x00\xc0'
if data[:8] == valid_png_header:
    width, height = struct.unpack('>LL', data[16:24])
    print('Valid PNG, width', width, 'height', height)
else:
    print('Not a valid PNG')

Valid PNG, width 154 height 141


above,above,above,above,above,above,above,above

Here’s what this code does:
• data contains the first 30 bytes from the PNG file. To fit on the page, I joined two
byte strings with + and the continuation character (\).
• valid_png_header contains the eight-byte sequence that marks the start of a
valid PNG file.
• width is extracted from bytes 16–19, and height from bytes 20–23.
The >LL is the format string that instructs unpack() how to interpret its input byte
sequences and assemble them into Python data types. Here’s the breakdown:
• The > means that integers are stored in big-endian format.
• Each L specifies a four-byte unsigned long integer.

## You can examine each four-byte value directly:


In [2]:
print(data[16:20])

print(data[20:24])

b'\x00\x00\x00\x9a'
b'\x00\x00\x00\x8d'


Big-endian integers have the most significant bytes to the left. Because the width and
height are each less than 255, they fit into the last byte of each sequence. You can ver‐
ify that these hex values match the expected decimal values:


In [3]:
0x9a

154

In [4]:
0x8d

141

When you want to go in the other direction and convert Python data to bytes, use the
struct pack() function:

In [5]:
import struct
struct.pack('>L', 154)

b'\x00\x00\x00\x9a'

In [6]:
struct.pack('>L', 141)

b'\x00\x00\x00\x8d'