## Builtins Module: The Mutable Byte String Class (bytearray)

The previous notebook examined the immutable byte string (bytes). Immutable means that once an instance is instantiated, it cannot be modified. The ```bytes``` class has a mutable counterpart, the ```bytearray```. The ```bytearray``` can be conceptualised as the spreadsheet on the left where each field can be entered and mutated to a new value or deleted. The ```byte``` on the other hand can be conceptualised as the pdf on the right, where the data can be read but not modified. 

<img src='./images/img_001.png' alt='img_001' width='450'/>

## Categorize_Identifiers Module

This notebook will use the following functions ```dir2```, ```variables``` and ```view``` in the custom module ```categorize_identifiers``` which is found in the same directory as this notebook file. ```dir2``` is a variant of ```dir``` that groups identifiers into a ```dict``` under categories and ```variables``` is an IPython based a variable inspector. ```view``` is used to view a ```Collection``` in more detail:

In [None]:
from categorize_identifiers import dir2, variables, view

## Initialisation Signature

The docstring of the initialisation signature can be examined:

In [None]:
bytearray?

Notice that it is similar to that of the ```bytes``` class. Construction from a ```tuple``` iterable of ```int``` values between ```0:256``` gives:

In [None]:
bytearray((20, 104, 101, 108, 108, 111, 129))

Notice that the formal representation shown in the cell output casts a ```bytes``` instance to a ```bytearray```.

## identifiers

The identifiers for the ```bytearray``` class can be examined. Notice it has consistent datamodel methods to is immutable counterpart the ```bytes``` class:

In [None]:
dir2(bytearray, bytes, consistent_only=True)

Each of these identifiers behave consistently between the two classes. All the methods listed above are immutable and therefore have a ```return``` value. The counterpart to the methods in the ```bytes``` class which ```return``` a ```bytes``` instance, will instead return a ```bytesarray``` instance instead. The methods which ```return``` a single byte which is an ```int``` between ```0:256``` are consistent. Because the ```bytearray``` is mutable, a ```bytearray``` instance can be changed and therefore cannot be hashed. Therefore:

In [None]:
bytearray.__hash__ == None

In [None]:
bytes.__hash__ == None

The ```bytearray``` has the supplementary mutable methods. Most of these with the exception to pop have no ```return``` value and instead directly alter the instance data inplace. The exception is ```pop``` which pops of a value altering the instance data and returns the popped value:

In [None]:
dir2(bytearray, bytes, unique_only=True)

To recap the ```bytes``` class is consistent to the design pattern of the ```str``` class. This design pattern uses the following abstract base classes:

* object
* Container
* Hashable
* Collection
  * Sized
  * Iterable
* Sequence


The ```bytearray``` class is a ```MutableSequence``` which follows most of the abstract classes in the design pattern seen above but lacks the abstract class ```Hashable``` because it can be mutated. In addition it has the additional abstract class ```MutableSequence```:

* object
* Container
* ~~Hashable~~
* Collection
  * Sized
  * Iterable
* Sequence
* Mutable Sequence

## Hashable

Because the ```bytes``` instance is immutable, it has a value that cannot be changed and a corresponding hash value. The ```bytesarray``` is mutable and has no hash value:

In [None]:
hash(b'\x14hello\x81')

Attempting to do the same with the ```bytearray``` will result in a ```TypeError``` because a ```bytearray``` is not hashable:

```python
hash(bytearray(b'\x14hello\x81'))
```

A consequence of a ```bytearray``` instance being unhashable is a ```byte``` instance can be used as a key in a ```dict``` instance and a ```bytearray``` instance cannot:

In [None]:
mapping = {b'r': 'red',
           b'g': 'green',
           b'b': 'blue'}

In [None]:
mapping[b'r']

Conceptualise the ```dict``` instance as a grouping of unique lockers where each locker contains a reference to value. Each unique locker has a lock and a unique ```key``` is designed to fit in the lock. Distorting the ```key``` will prevent the key from working and therefore a mutable datatype cannot be used as a ```key```. Attempting to do so will flag a ```TypeError```:

```python
mapping = {bytearray(b'r'): 'red',
           bytearray(b'g'): 'green',
           bytearray(b'b'): 'blue'}
```

## Memory Allocation

If the following immutable ```byte``` and mutatable ```bytearray``` instances are created:

In [None]:
greeting_b = b'hello'
greeting_ba = bytearray(b'hello')

In [None]:
variables(['greeting_b', 'greeting_ba'], show_id=True)

Because these are immutable and mutable counterparts, the values at each index looks the same:

In [None]:
view(greeting_b)

In [None]:
view(greeting_ba)

If the memory occupied by the two instances is examined:

In [None]:
import sys

In [None]:
sys.getsizeof(greeting_b)

In [None]:
sys.getsizeof(greeting_ba)

Notice that the memory allocation in bytes is larger for the mutable counterpart as the instance has additional methods and the mutability requires additional memory overhead. If the length of the mutable ```bytearray``` instance is examined, it has a length of 5 bytes:

In [None]:
len(greeting_ba)

However it has a memory allocation of 6 bytes:

In [None]:
greeting_ba.__alloc__()

Having a spare unallocated byte under the hood optimises some of the ```bytearray``` instances mutable methods such as ```append```.

## Indexing

Both the ```byte``` and ```bytearray``` have the immutable datamodel method ```__getitem__``` implemented which means a byte can be retrieved using square brackets:

In [None]:
variables(['greeting_b', 'greeting_ba'], show_id=True)

In [None]:
ord('h')

In [None]:
greeting_b[0]

In [None]:
greeting_ba[0]

However only the mutable counterpart has the ```__setitem__``` implemented. This means a value at an index can be reassigned:

In [None]:
ord('H')

In [None]:
greeting_ba[0] = 72

Notice that this mutates the ```bytearray``` instance inplace:

In [None]:
variables(['greeting_b', 'greeting_ba'], show_id=True)

In [None]:
view(greeting_ba)

Notice that the ID is the same because it is the same instance but the value of the first letter is updated from the ASCII letter ```h``` to ```H'```. the mutable counterpart also has ```__delitem__``` defined meaning the ```del``` keyword on the referenced byte:

In [None]:
del greeting_ba[2]

In [None]:
view(greeting_ba)

Notice the length is now ```4``` bytes and the reference to the value ```108``` that was at index ```2``` has been removed. The reference to the value ```108```that was at index ```3``` is now at index ```2``` and so on...

A slice can also be accessed:

In [None]:
greeting_ba[0:2]

And reassigned to a new ```bytearray``` instance:

In [None]:
greeting_ba[0:2] = bytearray(b'ki')

Once again, the ```bytearray``` is mutated:

In [None]:
variables(['greeting_b', 'greeting_ba'], show_id=True)

## Inplace Operators

There is a subtle difference between the "inplace" operators ```+=``` and ```*=``` in the mutable and immutable instances. Returning to:

In [None]:
greeting_b = b'hello'
greeting_ba = bytearray(b'hello')

In [None]:
variables(['greeting_b', 'greeting_ba'], show_id=True)

Notice when the "inplace" operation is used with the immutable instance:

In [None]:
greeting_b += b' world!'

That the instance ID has been updated. This is because two operations have been carried out, concatenation with a return value and then assignment of this return value to the instance name ```greeting_b```. In other words the instance name is unpeeled from the old instance and placed on the new instance:

In [None]:
variables(['greeting_b', 'greeting_ba'], show_id=True)

In comparison:

In [None]:
greeting_ba += bytearray(b' world!')

Notice that the instance ID has not changed as a single inplace operation has been performed:

In [None]:
variables(['greeting_b', 'greeting_ba'], show_id=True)

Mutable methods are generally more efficient as there is no additional memory overhead required to create a new instance. There are however some side effects to watch out for such as unintended mutability may result in debugging challenges in large code bases.

## Mutable Methods

The mutable counterpart has the following mutable methods: 

In [None]:
dir2(bytearray, bytes, unique_only=True, print_output=False)['method']

The mutable methods below all mutate the ```bytearray``` instance directly and have no ```return``` value:

* append
* extend
* insert
* remove
* reverse
* pop
* clear 

The immutable method ```copy``` is usually a companion for mutable methods and returns a copy without modifying the original instance:

* copy

```pop``` mutates the original instance and returns the value popped:

* pop
  

If the following ```bytearray``` instance is instantiated:

In [None]:
greeting_ba = bytearray(b'hello')

In [None]:
variables(['greeting_ba'], show_id=True)

The mutable ```bytearray``` method ```append```, appends a single byte (to the back) of the ```bytearray``` instance:

In [None]:
greeting_ba.append?

The ```byte``` being appended is normally supplied in the form of an ordinal ```int``` instance. Recall each character has an ordinal ```int``` which can be seen using the ```ord``` function:

In [None]:
ord('!')

In [None]:
return_val = greeting_ba.append(33)

Notice that the instance is mutated inplace and the ```return``` value is therefore ```None```:

In [None]:
variables(['greeting_ba', 'return_val'], show_id=True)

A common mistake new programmers make when first encountering a mutable method is to use it, in combination with reassignment:

In [None]:
greeting_ba = greeting_ba.append(33)

Because the ```return``` value of a mutable method is ```None```, the instance name get reassigned to ```None```:

In [None]:
variables(['greeting_ba'], show_id=True)

Returning to:

In [None]:
greeting_ba = bytearray(b'hello')

In [None]:
variables(['greeting_ba'], show_id=True)

The mutable ```bytearray``` method ```extend```, extends the end of the ```bytearray``` instance using another ```bytearray``` which usually spans over multiple bytes:

In [None]:
greeting_ba.extend?

In [None]:
greeting_ba.extend(bytearray(b' world!')) # No return value

In [None]:
variables(['greeting_ba'], show_id=True)

In [None]:
greeting_ba # Original instance mutated

The mutable method ```extend``` and inplace addition ```+=``` are very similar however ```+=``` is stricter and required both instances being concatenated to be ```bytearray``` instances. ```extend``` on the other hand can also accept ```bytes``` instances:

In [None]:
greeting_ba + bytearray(b'Bye World!') # immutable return value

Note the original instance is unchanged with the mutable method:

In [None]:
variables(['greeting_ba'], show_id=True)

In [None]:
greeting_ba += bytearray(b'Bye World!') # mutated inplace, no return value

In [None]:
variables(['greeting_ba'], show_id=True)

In [None]:
greeting_ba.extend(b'???') # mutated inplace, no return value

In [None]:
variables(['greeting_ba'], show_id=True)

The mutable ```bytearray``` method ```insert```, can be used to insert a byte into the ```bytearray``` at a given index:

In [None]:
greeting_ba.insert?

In [None]:
greeting_ba = bytearray(b'hello')

In [None]:
view(greeting_ba)

The ASCII character:

In [None]:
ord('L')

Can be inserted at index ```3```:

In [None]:
greeting_ba.insert(3, 76) # mutated inplace, no return value

In [None]:
variables(['greeting_ba'], show_id=True)

The mutable method ```insert``` inserts a new byte so that the original byte at that index and all bytes at subsequent indexes are shifted along one. This behaviour is different from the mutable datamodel method ```__setitem__``` which replaces an original byte or slice with a new byte or slice. 

The mutable ```bytearray``` method ```remove```, removes the first occurrence of a byte in the ```bytearray``` instance:

In [None]:
greeting_ba.remove?

In [None]:
view(greeting_ba)

For example the first occurance of ```l``` can be removed:

In [None]:
ord('l')

In [None]:
greeting_ba.remove(108) # mutated inplace, no return value

In [None]:
view(greeting_ba)

The mutable ```bytearray``` method ```reverse``` can be used to reverse the order of the bytes in the ```bytearray``` instance:

In [None]:
greeting_ba.reverse?

In [None]:
greeting_ba.reverse() # mutated inplace, no return value

In [None]:
view(greeting_ba)

This should not be confused with the reversed iterator that is returned when the ```reversed``` builtins function is used:

In [None]:
backward = reversed(greeting_ba)

In [None]:
backward

In [None]:
next(backward)

The mutable ```bytearray``` method ```clear```, clears all bytes in the ```bytearray``` instance:

In [None]:
greeting_ba.clear?

In [None]:
greeting_ba.clear() # mutated inplace, no return value

In [None]:
variables(['greeting_ba'], show_id=True)

In [None]:
view(greeting_ba)

The mutable ```bytearray``` method ```copy```, can be used to copy a ```bytearray``` instance:

In [None]:
bytearray.copy?

Reassigning ```greeting_ba``` to a new instance:

In [None]:
greeting_ba = bytearray(b'hello') 

In [None]:
greeting_ba_copy = greeting_ba.copy() # return value assigned to new instance name

Notice that these two ```bytearray``` instances have the same values but different IDs:

In [None]:
variables(['greeting_ba', 'greeting_ba_copy'], show_id=True)

In [None]:
greeting_ba == greeting_ba_copy # equal value

In [None]:
greeting_ba is greeting_ba_copy # but not the same instance

One ```bytearray``` instance can be modified without influencing the other:

In [None]:
greeting_ba.clear() # mutated inplace, no return value

In [None]:
variables(['greeting_ba', 'greeting_ba_copy'], show_id=True)

The mutable ```bytearray``` method ```pop```, pops of the last byte in the ```bytearray``` instance, mutating it inplace **and** has a ```return``` statement, returning the popped value:

In [None]:
greeting_ba_copy.pop?

In [None]:
greeting_ba_copy.pop() # return value

In [None]:
variables(['greeting_ba', 'greeting_ba_copy'], show_id=True) # also mutated inplace

An index can also be selected to pop:

In [None]:
view(greeting_ba_copy)

For example index ```1```:

In [None]:
greeting_ba_copy.pop(1) # return value

In [None]:
variables(['greeting_ba', 'greeting_ba_copy'], show_id=True) # also mutated inplace

[Return to Anaconda Tutorial](../readme.md)