# The Python Data Model

Many of Python's features rely on duck typing and the existence of protocols.

### Lengths

An object is _sized_ if it can be asked for its length. This protocol can be implemented using the `__len__` dunder.

In [1]:
len('a string') # sized

8

In [4]:
class LengthyThing:
    def __init__(self, x):
        self._x = x

    def __len__(self):
        return self._x

example_thing = LengthyThing(12)
len(example_thing)

12

### Truthiness

`None` is falsy, non-zero numbers are truthy, strings are truthy when non-empty etc. How does this extrapolate to other objects:

1. `__bool__` - when this dunder is implemented, determines the truthiness of an object, otherwise:

2. `__len__` - when an object is sized, if it has a non zero size it is truthy, otherwise

3. All objects otherwise are considered truthy (the assumption that the existence of an object means some expression's value is not None)

In [5]:
bool(example_thing)

True

In [6]:
bool(LengthyThing(0))

False

### Indexing


### Indexing

Indexable objects can implement `__getitem__(self, index)`, `__setitem__(self, index)`, and `__delitem__(self, index)`.

In [10]:
class Indexable:
    def __init__(self):
        pass

    def __getitem__(self, index):
        print(f'__getitem__ : {index}')

    def __setitem__(self, index, value):
        print(f'__setitem__ : {index} {value}')

    def __delitem__(self, index):
        print(f'__delitem__ : {index}')

In [11]:
indexable_obj = Indexable()

indexable_obj[2]
indexable_obj[82] = 'test'
del indexable_obj[11]

__getitem__ : 2
__setitem__ : 82 test
__delitem__ : 11


### Sequences

When an object implements both `__len__` and `__getitem__`, it follows the sequence protocol which allows an object to become iterable without explicitly having an `__iter__` method.

**Note from testing:**
The implicit forward iterator only stops when `__getitem__` raises `IndexError`, and reversed uses `__len__` to determine when to stop. So a forward iterator only stops on IndexError regardless of whether or not `__len__` is implemented.

In [21]:
class SequenceThing:
    def __init__(self, x):
        self._len = x

    def __getitem__(self, index):
        return index;

    def __len__(self):
        return self._len

list(iter(SequenceThing(3)))

3

In [23]:
list(reversed(even_nums))

[4, 2, 0]

When building sequences, we may often want to provide:

1. `__iter__(self)` - to construct custom iteration

2. `__reversed__(self)` - to construct custom reverse iteration

3. `__contains__(self)` - so we can use the `in` keyword on this object

### Slicing

Slicing allows us to 'slice' a section of a sequence out of an object. The syntax is similar to indexing however instead of `obj[index]`, it's `obj[start:stop:step]`.

For the indexing methods like `__getitem__`, the value passed into `index` is a slice object, which has attributes start, stop step.

In [24]:
class Sliceable:
    def __getitem__(self, index):
        print(index)

some_object = Sliceable()
some_object[1]
some_object[1:2:3]
some_object[1::3]

1
slice(1, 2, 3)
slice(1, None, 3)


We can also unpack the three values as such in case of negative values:
```Python
start, stop, step = index.indices(len(self))
```

### Hashing

Hashing is a means to distill an object's meaning down to a single integer. Mutable objects shouldn't be hashable because their meaning can change quite easily.

In [25]:
hash('hello world')

1294980323861224649

In [26]:
hash(['hello world'])

TypeError: unhashable type: 'list'

Hashable objects implement the dunder `__hash__(self)`, which returns a single integer that accounts for all of its data. Often times this means combining all of an objects attributes into a tuple and hashing said tuple.

It's also good practice to implement equivalence for hashables, because we generally want two things that are equivalent to have the same hash and things with different hashes to not be equivalent.

### Comparison

Between two objects we may compare equality with `==` or `!=` and identity with `is` or `is not`.

Equality asks "do these two objects contain the same thing" whereas identity asks "are these objects literally the same object". `a is b` can be thought of as `id(a) == id(b)`.

In [28]:
a = [1, 2, 3]
b = [1, 2, 3]
c = a
print(id(a), id(b), id(c))
print(a is b)
print(a is c)
print(a == b)
print(a == b)

4517190272 4517194816 4517190272
False
True
True
True


The other relational comparison operators which evaluate to booleans:
`<`, `>`, `<=`, `>=`

### Implementing comparisons

For comparisons of the form `self comparator other`, we can implement the following methods:

- `__lt__(self, other)` : $<$ (less than)

- `__gt__(self, other)` : $>$ (greater than)

- `__le__(self, other)` : $\le$ (less than or equal to)

- `__ge__(self, other)` : $\ge$ (greater than or equal to)

- `__eq__(self, other)` : $=$ (equivalent)

- `__neq__(self, other)` : $\neq$ (not equivalent)

- And the operator `is` cannot be overriden

Conditions that are the complement can automatically be implied, for example implementing `__eq__` means that `__neq__` will be the opposite unless otherwise specified. Implementing < will also handle >, and <= will handle >=, however `lt` and `eq` does not imply `le`.