# Salvaging Python's broken `object` by adding `__getstate__`

## Intro: Pickling and Backwards Compatibility

This unpattern is concerned with the persistence of python objects via pickle. I
realize that in the wild of python programming, comparatively few people care
about backwards compatibility in general, not even talking about backwards
compatibility of pickled objects.

Ensuring the latter means that if you (or your user) saved an object of some
class, say `A` with pickle, you should be able to load it with a newer version
of the codebase. Curiously, this is especially important in the context of 
machine learning (precisely where it is routinely neglected) because one might
want to load a previously saved model even after
performing a seemingly innocent update of some ML library dependency.

Alas, as most of us learn the hard way sooner or later, there is no such thing
as an innocent update of a dependency in the python world...

The main mechanism for providing backwards compatibility for pickled objects is
the `__setstate__` magic method. If your class definition has changed between
the time it was pickled and the time it is loaded, implementing `__setstate__`
allows you to modify how the state of the object is restored.

It's not always sufficient (e.g., if the class name has changed, it won't work),
but knowing about and using `__setstate__` already covers a lot of cases. In the
last section of the article I'll add a short overview of additional techniques
for backwards compatibility when just `__setstate__` is not enough. Since these
things are actually reasonable or even required, they don't fit the "unpattern"
scheme and are therefore not the main focus. If you want to learn useful stuff,
give them a glance, but useful stuff is not what we're here for ;).

### Example of Non-problematic Persistence

Before outlining the problem, let's have a look at a case without issues:

In [1]:
from typing import Any
import pickle

class A:
    def __init__(self, foo: int):
        self.foo = foo

serialized_a = pickle.dumps(A(42))


# changing the class definition to illustrate deserialization
# We add a new default argument to the constructor and a new field

class A:
    def __init__(self, foo: int, baz: str = "baz"):
        self.foo = foo
        self.baz = baz
        self.new_field = "new_value"


If we try to load the pickled object, we get a malformed object (note that there
is no error on loading, which is a problem in itself, since no static
code analysis will ever inform you about this):

In [2]:
def print_error(e: Exception):
    print(f"{e.__class__.__name__}: {e}")


In [3]:
a: A = pickle.loads(serialized_a)
try:
    a.baz
except AttributeError as e:
    print_error(e)

AttributeError: 'A' object has no attribute 'baz'


With `__setstate__` we can fix this:

In [4]:
class A:
    def __init__(self, foo: int, baz: str = "baz"):
        self.foo = foo
        self.baz = baz
        self.new_field = "new_value"
        
    def __setstate__(self, state: dict[str, Any]):
        full_state = {"baz": "baz", "new_field": "new_value"}
        full_state.update(state)
        self.__dict__.update(full_state)

In [5]:
a: A = pickle.loads(serialized_a)
a.new_field


'new_value'

```{note} 
Be careful what you put into your classes' states! Any extension in
fields, private or public, needs to be reflected in `__setstate__` if there's a
chance that you or you users might have pickled objects of the old version.

This is not true for methods, which are not part of the state and therefore
don't need special consideration. 
```

So far so good, `__setstate__` is a well-known mechanism,
and it allows loading serialized objects of older code versions.
Now we arrive at the central problem that this article deals with:

**It is possible to run into non-recoverable situations!**

## Problem: Non-recoverable Serialization Errors

Let's change the original class definition to not include state, just methods, 
and then to extend it with a new field.

In [6]:
class A:
    @staticmethod
    def get_foo():
        return "foo"
    
    
serialized_a = pickle.dumps(A())

class A:
    def __init__(self, foo: str = "foo"):
        self.foo = foo
        
    def __setstate__(self, state):
        full_state = {"foo": "foo"}
        full_state.update(state)
        self.__dict__.update(full_state)
    
    def get_foo(self):
        return self.foo

This is something than can easily happen in a real-world scenario. In a first version,
the class might have been a simple container for methods, but later it was
extended to include state as well (which is precisely how I ran into this
issue and why I'm writing this article).

Now, at deserialization, something unexpected happens:

In [7]:
a_deserialized: A = pickle.loads(serialized_a)

try:
    a_deserialized.get_foo()
except Exception as e:
    print_error(e)

AttributeError: 'A' object has no attribute 'foo'


The same if we try to access the attribute directly:

In [8]:
try:
    a_deserialized.foo
except Exception as e:
    print_error(e)

AttributeError: 'A' object has no attribute 'foo'


What's going on? We did the right thing and implemented `__setstate__`! Why is it
not working?

The reason is that if `__gestate__` is not implemented explicitly
in a stateless class, the deserialization process will not call `__setstate__` at all!

In all my years of python programming, I have never heard of this and I don't
fully understand why the python developers decided to implement it this way.
Optimization reasons don't really make sense here since setting an empty state
would never be a performance bottleneck.

This behavior, however, is documented and thus "desired" - see 
[here](https://docs.python.org/3/library/pickle.html#object.__setstate__)
(the output of `__reduce__`, which I fortunately never had to use, is essentially
controlled by `__getstate__`).

Here the crux of this behavior is clearly demonstrated:

In [9]:
class ClNoState:
    def amethod(self):
        pass
    
class ClWithState:
    def __init__(self):
        self.a = "a"
        
print(f"{ClNoState().__getstate__()=}")

print(f"{ClWithState().__getstate__()=}")

ClNoState().__getstate__()=None
ClWithState().__getstate__()={'a': 'a'}


The technical reason behind this is probably that `object()` doesn't have a
`__dict__`. This makes it probably the only object in python that doesn't have a
`__dict__` and there's a philosophical question whether `object()` is really an
object...

Interestingly, the `ClNoState` does have a `__dict__`, but `__getstate__` still
returns `None`.

In [10]:
print(f"{ClNoState().__dict__=}")
print(f"{object().__getstate__()=}")

try:
    object().__dict__
except AttributeError as e:
    print_error(e)

ClNoState().__dict__={}
None
AttributeError: 'object' object has no attribute '__dict__'


I think it's fair to say that `object()` is not a proper object. Without the
`__dict__` it can't have attributes, so there's some magic happening when a
class is inheriting from `object` that adds all the functionality of python
objects. I guess this happens at compile time.

In [11]:
# Can't assign attributes
try:
    object().a = "a"
except AttributeError as e:
    print_error(e)

AttributeError: 'object' object has no attribute 'a'


### A small rant:

Note that dealing with such problems from unpickling is pretty much a nightmare!
You can't properly debug, because neither `__setstate__` nor `__init__` will
ever be called. All you get is a malformed object, and you have to go figure on
your own what's going on. Googling things like "pickle not calling
`__setstate__`" does not provide immediate relief, and I was lucky enough that a
colleague had found the right place in the python docs to understand what was
going on.


Even after understanding the problem, we are still in a bad spot. There's no way
of fixing this! If any of your users have serialized an object of the old
version (without state), you can't help them. It can never be loaded with the
updated codebase. They would need to do some pretty nasty hacking on their side
to overcome this.

It seems almost as if python suggests that classes once defined without state,
should remain without state forever. This is not a reasonable limitation, and a
very unnecessary one at that.

## Avoiding this Mess

Without going into too many details on `__reduce__` and `__getstate__`, the
problem can be avoided by always implementing `__getstate__` in stateless classes that
might be pickled. For classes with state it's not strictly necessary (see above).

This is not an unpattern yet (have patience) but an actual advice. Here it is in action:

In [12]:
class A:
    def __getstate__(self):
        return self.__dict__
    
    @staticmethod
    def get_foo():
        return "foo"
    
    
serialized_a = pickle.dumps(A())

# we no longer need __getstate__ in the new version since we have state now
class A:
    def __init__(self, foo: str = "foo"):
        self.foo = foo
        
    def __setstate__(self, state):
        full_state = {"foo": "foo"}
        full_state.update(state)
        self.__dict__.update(full_state)
    
    def get_foo(self):
        return self.foo

Now things work as expected:

In [13]:
a_deserialized: A = pickle.loads(serialized_a)
a_deserialized.get_foo()

'foo'

## The Unpattern: Overwriting builtins

### Part 1: Overwriting `object`

There is no real reason that I'm aware of for any class not not have the default
of `__getstate__` returning `self.__dict__`.

Well, if `object`, from which any class inherits, does not behave the way we want to
(does not implement `__getstate__` properly), then let's force it! We're in
python after all - everything should be possible!

![what-to-do](images/i_know_what_to_do.jpeg)

### Disclaimer

I did not in fact have the stregth to do it... But not for lack of trying.

What follows below is a mostly failed attempt to overwrite python builtin behavior
of how classes are defined and objects instantiated
(with only partial and unsatisfactory success). Note that even if it worked (I think
in python 2.7 it was possible to fully overwrite the default metaclass), it would have
been a **terrible, terrible idea**!

With this sorted out, let's go ahead.

The first thing to do is to define a class that will always have `__getstate__`.
And the proper way of using this class (not the unpattern way) is to inherit
from when needed - so it's going to be a mixin.

In [14]:
class GetstateMixin:
    def __getstate__(self):
        return self.__dict__

Now let's try to overwrite the default behavior of `object`
such that all classes inherit from `GetstateMixin` by default.

In [15]:
import builtins

builtins.object = GetstateMixin

Did that do the trick? Well, almost, but not good enough. We do get that classes that 
inherit from `object` now have `__getstate__`:

In [16]:
class AExplicitObject(object):
    pass

In [17]:
AExplicitObject().__getstate__()

{}

However, unfortunately, all that we have achieved is that now inheriting from `object` explicitly
and implicitly leads to different behavior. So:

In [18]:
class AImplicitObject:
    pass

will still lead to the old behavior of `__getstate__` returning `None`:

In [19]:
print(f"{AImplicitObject().__getstate__()=}")

AImplicitObject().__getstate__()=None


This unexpected difference in behavior is a major WTF, and you should never do
the hack outlined above!

We can better understand why this happened by looking at the `__mro__` (method
resolution order) attribute of the classes, which will list the parent classes
in the order in which they are searched for attributes:

In [20]:
AExplicitObject.__mro__

(__main__.AExplicitObject, __main__.GetstateMixin, object)

In [21]:
AImplicitObject.__mro__

(__main__.AImplicitObject, object)

This makes clear: we didn't actually overwrite `object`,.
The real `object` class is added somewhere, I guess in the python compiler, and we can
neither get rid of it nor overwrite it.

### Part 2: Overwriting `type`

I'm not giving up yet! Since I had the misfortune of having to deal with
metaclasses in the past, I know that there is something beyond inheritance to
influence how classes are defined.

A metaclass defines how a class is defined, thus acting before the constructor
of the class is called, or before inheritance is carried out. The relevant
method for metaclasses is `__new__`.

The default metaclass that is used for all classes implicitly (just like object)
is `type`. If we didn't succeed in overwriting `object`, maybe we can overwrite
`type`? Let's try!

![go-deeper](images/have_to_go_deeper.jpg)

If we call `help(type)`, the first sentences show its signature:

```
class type(object)
 |  type(object) -> the object's type
 |  type(name, bases, dict, **kwds) -> a new type
 |  
```

So, if we want to sneak in our `GetstateMixin`,
we need to extend the `bases` to include it.
 Here the extended type implementation:

In [22]:
class type_with_getstate(type):
    def __new__(cls, *args):
        args = list(args)
        args[1] += (GetstateMixin,)
        return super().__new__(cls, *args)

Before overwriting the builtin, let's see whether this works

In [23]:
class AWithMeta(metaclass=type_with_getstate):
    pass

print(f"{AWithMeta().__getstate__()=}")
print(f"{AWithMeta.__mro__=}")

AWithMeta().__getstate__()={}
AWithMeta.__mro__=(<class '__main__.AWithMeta'>, <class '__main__.GetstateMixin'>, <class 'object'>)


Looks good. Quick check on the other functionality of `type` (you know, retrieving
the type of an object):

In [24]:
try:
    type_with_getstate(5)
except Exception as e:
    print_error(e)

IndexError: list index out of range


What happened? Why did it stop doing its job - we only overwrote `__new__`, not `__call__`.
Is `__new__` being called when whe use it to determine an object's type? This this should work:


In [25]:
class type_with_getstate_attempt2(type):
    def __new__(cls, *args):
        if len(args) > 1:
            args = list(args)
            args[1] += (GetstateMixin,)
        return super().__new__(cls, *args)


In [26]:
try:
    type_with_getstate_attempt2(5)
except Exception as e:
    print_error(e)

TypeError: type.__new__() takes exactly 3 arguments (1 given)


This got too weird for me, so I gave up on trying to undestand it exactly... Note that even this
won't work:

In [27]:
class type_extended_with_pass(type):
    pass

try:
    type_extended_with_pass(5)
except Exception as e:
    print_error(e)

TypeError: type.__new__() takes exactly 3 arguments (1 given)


#### Bruteforcing the Solution

As above, I take the attitude that if things don't want to behave my way, I will force them. 
Here an actually working extension of type:

In [28]:
from copy import deepcopy

_original_type = deepcopy(type)

class extended_type(_original_type):
    def __new__(cls, *args, **kwargs):
        # type of an object
        if len(args) == 1:
            return _original_type(*args)
        # used as metaclass
        args = list(args)
        args[1] += (GetstateMixin,)
        return super().__new__(cls, *args, **kwargs)
 
 
print(f"{extended_type(3)=}")

extended_type(3)=<class 'int'>


This works, so let's overwrite the builtin

In [29]:
builtins.type = extended_type

Unfortunately, this is only a partial success. Just like with overwriting `object`, all this
has done was to create a difference between classes that use `type` as metaclass explicitly 
and classes that don't.

In [30]:
class AExplicitMeta(metaclass=type):
    pass

class AImplicitMeta:
    pass

print(f"{AExplicitMeta().__getstate__()=}")
print(f"{AExplicitMeta.__mro__=}")
print("--------------------------------------")
print(f"{AImplicitMeta().__getstate__()=}")
print(f"{AImplicitMeta.__mro__=}")

AExplicitMeta().__getstate__()={}
AExplicitMeta.__mro__=(<class '__main__.AExplicitMeta'>, <class '__main__.GetstateMixin'>, <class 'object'>)
--------------------------------------
AImplicitMeta().__getstate__()=None
AImplicitMeta.__mro__=(<class '__main__.AImplicitMeta'>, <class 'object'>)


We can't fight the compiler. Or maybe we can, but I don't know how. Feel free to
fire up a PR if you want to hack even deeper and find a solution!

## Conclusion

We can't fully override builtin behavior because this behavior is not only
rooted in the `builtins` module but also somewhere else. We can only somehow
override it, by making explicit invocations of `object` and `type` behave
differently, but that's really not satisfactory...

In any case, trying that was a bad idea from the start! Although having
`__getstate__` return `None` for objects without a state seems like a bad idea
as well, so the goal was a noble one. Note, however, that two times minus
usually doesn't turn to plus in software development.

## Last Remarks: Advice on Backwards Compatibility with Pickling

Here some actual things you could and should do to prevent deserialization
errors:

1. Use `__setstate__`. I usually use the [setstate utility from sensAI](https://github.com/aai-institute/sensAI/blob/1d5d3d3bcd2b041d0d3084076a863d2b19f179db/src/sensai/util/pickle.py#L154)
which provides a very convenient way of taking care of backwards compatibility.
2. Write a `Serializable` class and always inherit from it
for all objects that are meant to be persisted. This is also a useful marker
interface, so you can easilly find all things that are expected to be serialized
by you or by users

```python 
class Serializable: 
    def __getstate__(self): 
        return self.__dict__
```
3. If you rename classes, add the old names to keep backwards compatibility.
Note that you can do that inside a function that you then call in the module, this way
the old name won't exist for code-analysis tools and won't appear in suggestions to import.
This looks something like this (imagine code within some python module):

```python

# was previously called AOld
class ANew:
    pass

def _restore_backwards_compatibility():
    global AOld
    AOld = ANew

# For new users and for yourself AOld has disappeard from all IDE suggestions, but
# by calling this you add it to the global scope, and thus in reality it will be there
_restore_backwards_compatibility()
```

4. Set up tests that previously pickled objects can still be unpickled. It's fairly easy
to do, just save instances of some classes as test resources, load them 
with pickle in tests and
tests basic properties like access to attributes and `isinstance` checks.