# Python data structures

This section seeks to understand some of Python's key built-in data structures.



## `list`

### Key properties

- Mutable: contents can change
- Contents can be of differing types 

### Implementation

The CPython implementation is a variable-length array, rather than a linked-list.
The list head structure holds:
1. a pointer to a contiguous array of references to other objects
2. the current length of the array

This design makes the indexing of a list independent of the size of the array, or the requested index.

The array is resized when items are appended or removed - with some extra space added during appends to improve the performance of repeated appends.

### Performance / complexity

| Operation         | Example            | Complexity class     | Notes |
| :-------------    |:-------------      |---------------       |-------------------------------|
| Initialisation    | `list(...)`        | $O(N)$               | depends on the number of elements |
| Retrieve by index | `l[i]`             | $O(1)$               | |
| Retrieve slice    | `l[a:b]`           | $O(b-a)$             | l[1:5]:O(l)/l[:]:O(len(l)-0)=O(N) |
| Write at index    | `l[i] = 0`         | $O(1)$               | |
| Write slice       | `l[a:b] = ...`     | $O(N)$               |  |
| List length       | `len(l)`           | $O(1)$               | |
| Append item       | `l.append(5)`      | $O(1)$               | (not always) |
| Pop               | `l.pop()`          | $O(1)$               | same as l.pop(-1), popping at end |
| Pop by index      | `l.pop(i)`         | $O(N)$               | O(N-i): l.pop(0):O(N) (see above) |
| Extend            | `l.extend(...)`    | $O(len(...))$        | depends only on len of extension |
| check ==, !=      | `l1 == l2`         | $O(N)$               | |
| Search            | `x in/not in l`    | $O(N)$               | linearly searches list  |
| Copy              | `l.copy()`         | $O(N)$               | Same as l[:] which is O(N) |
| Remove            | `l.remove(...)`    | $O(N)$               |  |
| Delete            | `del l[i]`         | $O(N)$               | depends on i; O(N) in worst case |
| Reverse           | `l.reverse()`      | $O(N)$               | |
| Iteration         | `for v in l:`      | $O(N)$               | Worst: no return/break in loop |
| Sort              | `l.sort()`         | $O(N Log N)$         | key/reverse mostly doesn't change |
| Multiply          | `k*l`              | $O(k N)$             | 5*l is O(N): len(l)*l is O(N**2) |

## `tuple`

The tuple in python is an immutable list. As with the list, it is implemented as an array but because it is immutable, it has a fixed size.

### Key properties

- Immutable: once created, the contents of a tuple cannot change (wrt the objects they refer to). To have a tuple with modified contents, we must create a new tuple (but should consider why we're using a tuple if we need to)
- Contents can be of different types
- Contents of tuples can be mutable: a tuple could contain a list as one of its elements

### Named tuples

Named tuples allow us to provide labels for the elements of our tuples, and to access values using these labels.


In [10]:
from collections import namedtuple
# Define the labels for the elements of the tuples you plan to create
LabelledDataStruct = namedtuple('LabelledDataStruct', ['name', 'age', 'attr2'])

# Create some records
nt1 = LabelledDataStruct('tup-1', 12, 'large')
nt2 = LabelledDataStruct('tup-2', 14, 'small')

# Index values as with normal tuples
print('Values accessed by indexing: ', nt1[1:])

# And access values using the labels
print('Values accessed by named attributes: ', nt2.name, nt2.attr2)

Values accessed by indexing:  (12, 'large')
Values accessed by named attributes:  tup-2 small


## `set`


### Key properties

- Mutable: you can add / remove elements from a set
- Contents can be of different types (but more common to have a set of the same types)
- Contents must be immutable (can't have a list, could have a tuple)

### Implementation

### Performance

| Operation         | Example            | Complexity class     | Notes |
| :-------------    |:-------------      |---------------       |-------------------------------|
| Initialisation    | `set(...)`         | $O(N)$               | Depends on the number of elements |
| Set size          | `len(s)`           | $O(1)$               | |
| Add item          | `l.add(5)`         | $O(1)$               | |
| Remove item       | `l.remove(...)`    | $O(1)$               | Scales better than list/tuple: $O(N)$ |
| Search for item   | `x in/not in s`    | $O(1)$               | Scales better than list/tuple: $O(N)$ |
| check ==, !=      | `s1 == s2`         | $O(1)$ or $O(N)$     | False if sets are not of same length |
| Is subset         | `s1 <= s2`         | $O(len(s1))$         | Chevron points to the complexity... |
| Is superset       | `s1 >= s2`         | $O(len(s2))$         | |
| Union             | `s1 | s2 `         | $O(len(s1)+len(s2))$ | |
| Intersection      | `s1 & s2 `         | $O(len(s1)+len(s2))$ | |
| Difference        | `s1 - s2 `         | $O(len(s1)+len(s2))$ | |
| Symmetric diff.   | `s1 ^ s2 `         | $O(len(s1)+len(s2))$ | |


## `dict`

### Key properties

- Keys must be immutable (so the hash function returns the same value)

### Implementation

The CPython implementation of `dict` is a **resizable hash table**

The hash function is **locality-insensitive**, so even small differences in the key can result in very different outputs from the hash function.

Keys are assigned to the hash array by taking the hash function output modulo the size of the hash table(?).

### Performance / complexity

| Operation         | Example            | Complexity class     | Notes |
| :-------------    |:-------------      |---------------       |-------------------------------|
| Initialisation    | `dict(...)`        | $O(N)$               | depends on the number of inputs |
| Retrieve by key   | `d[k]`             | $O(1)$               |  |
| Retrieve w get()  | `d.get(k)`         | $O(1)$               |  |
| Insert new k,v pair| `d[k] = v `       | $O(1)$               |  |
| Delete key        | `del d[k]`         | $O(1)$               |  |
| List keys / values| `d.keys()` / `d.values()`| $O(1)$         |  |
| Iteration         | `for k in d:`      | $O(N)$               |  |
| Pop               | `d.pop(k)`         | $O(1)$               |  |


## Summary

| type | sequence| mutable |
|:----:|:-------:|:-------:|
| tuple| yes     | no      |
| list | yes     | yes     |
| dict | no      | yes     |
| set  | no      | yes     |


## (Re)sources

[Python design docs](https://docs.python.org/3.8/faq/design.html#how-are-dictionaries-implemented-in-cpython)

[Complexity of Python Operations](https://www.ics.uci.edu/~pattis/ICS-33/lectures/complexitypython.txt)