# Working With Bytes

## Strings, Bytes and Unicode

In Python 3, strings are all Unicode encoded by default. It isn't easy, or advisable to store raw byte-width values in a `str()` instance due to multi-byte encoding features.

In [7]:
# Add unicode characters
u = chr(40960) + '\u0394abcd' + chr(1972)
print(u)

u.encode('utf-8')

ꀀΔabcd޴


b'\xea\x80\x80\xce\x94abcd\xde\xb4'

All that we need to worry about is that Unicode strings are not what we want to use for raw byte storage!

## The `bytes()` Type

*class* **`bytes`**([*source*[, *encoding*[, *errors*]]])

A call to `bytes()` returns a new `bytes` object, which is an immutable sequence of integers in the range `0 <= x < 256`. `bytes` is an immutable version of `bytearray` (see below) – it has the same non-mutating methods and the same indexing and slicing behavior.

Bytes objects can also be created with literals using the `b''` string prefix.

In [27]:
a = bytes("foobar".encode('utf-8'))
print(a)

b = bytes("foobar\u0394", 'utf-8')
print(b)

c = bytes(1)
print(c)

d = bytes([1,2,3])
print(d)


b'foobar'
b'foobar\xce\x94'
b'\x00'
b'\x01\x02\x03'


## The `bytearray()` Type

*class* **`bytearray`**([*source*[, *encoding*[, *errors*]]])

A call to `bytearray()` returns a new array of bytes. The bytearray class is a mutable sequence of integers in the range 0 <= x < 256. It has most of the usual methods of mutable sequences, as well as most methods that the `bytes` type has.

In [28]:
a = bytearray("foobar".encode('utf-8'))
print(a)

b = bytearray("foobar\u0394", 'utf-8')
print(b)

c = bytearray(1)
print(c)

d = bytearray([1,2,3])
print(d)

bytearray(b'foobar')
bytearray(b'foobar\xce\x94')
bytearray(b'\x00')
bytearray(b'\x01\x02\x03')


## Useful Functions

In addition to the byte storage types available, there are utility functions in Python useful for byte and bit representations.

### `bin(x)`

Convert an integer number to a binary string. The result is a valid Python expression. If `x` is not a Python `int` object, it has to define an `__index__()` method that returns an integer.


In [29]:
bin(14)

'0b1110'

### *class* **int**(*x=0*)
### *class* **int**(*x, base=10*)

Return an integer object constructed from a number or string x, or return 0 if no arguments are given. If x is a number, return x.__int__(). For floating point numbers, this truncates towards zero.

If `x` is not a number or if base is given, then `x` must be a string, bytes, or bytearray instance representing an integer literal in radix base.

A base-`n` literal consists of the digits `0` to `n-1`, with `a` to `z` (or `A` to `Z`) having values `10` to `35`. The default base is `10`. The allowed values are `0` and `2`–`36`. Base-`2`, -`8`, and -`16` literals can be optionally prefixed with `0b/0B`, `0o/0O`, or `0x/0X`, as with integer literals in code.

Base `0` means to interpret exactly as a code literal, so that the actual base is `2, 8, 10, or 16`, and so that `int('010', 0)` is not legal, while `int('010')` is, as well as `int('010', 8)`.

In [33]:
i = int("100101101")
print(i)

i = int("100101101", 2)
print(i)

100101101
301
