# Hashing

Hashing and equivalence are tightly.

Immutable objects such as strings are same object (a is b).........

In [1]:
a = "my string"
b = "my string"

if a is b:
    print(f"a = {a}")
    print(f"b = {b}")
    print(f"id(a) ={id(a)} ")
    print(f"id(b) ={id(b)} ")
    print("immutable objects such as strings are also equivalent")
    print(f"a == b: {a==b}")

In built mutual objects such as lists with same value are different objects (a is not b)......

In [2]:

a = [1,2,3]
b = [1,2,3]

if a is not b:
    print(f"a = {a}")
    print(f"b = {b}")
    print(f"id(a) ={id(a)} ")
    print(f"id(b) ={id(b)} ")
    print("in built mutual objects such as lists are equivalent")
    print(f"a == b: {a==b}")

a = [1, 2, 3]
b = [1, 2, 3]
id(a) =2789185299584 
id(b) =2789185306432 
in built mutual objects such as lists are equivalent
a == b: True


Mutable user objects are not the same object (a is not b)......

In [3]:
class MyClass:

    def __init__(self, name):
        self.name = name

a = MyClass("my string")
b = MyClass("my string")

if a is not b:
    print(f"a = {a}")
    print(f"b = {b}")
    print(f"id(a) ={id(a)} ")
    print(f"id(b) ={id(b)} ")
    print("mutuable user objects with __eq__ are not equivalent")
    print(f"a == b: {a==b}")

a = <__main__.MyClass object at 0x00000289686FFEB0>
b = <__main__.MyClass object at 0x000002896855A0B0>
id(a) =2789185945264 
id(b) =2789184217264 
mutuable user objects with __eq__ are not equivalent
a == b: False


Mutable user objects that implement `__eq__` not the same but can be equivalent......

In [4]:
class MyEquivClass:

    def __init__(self, name):
        self.name = name

    def __eq__(self, other):
        return self.name == other.name

a = MyEquivClass("my string")
b = MyEquivClass("my string")

print("\n")
print(f"a = {a}")
print(f"b = {b}")
print(f"a == b: {a==b}")



a = <__main__.MyEquivClass object at 0x00000289685D6AD0>
b = <__main__.MyEquivClass object at 0x00000289685D6740>
a == b: True


These different types are also different in terms of hashability. If an object is hashable it means it can be used a key in a dictionary.

In [5]:
print("hash('my string'): ", hash('my string'))
print("'my string'.__hash__(): ", 'my string'.__hash__())

hash('my string'):  -3588552068763006609
'my string'.__hash__():  -3588552068763006609


Immutable in built objects can be used in a dictionary....

In [6]:
my_dict = {'my string': 45, 'my other string': True}
print(my_dict)

# in built mutable objects are not hashable
try:
    hash([1,2,3])
except TypeError:
    print("\ninbuilt mutable objects are not hashable e.g. hash([1,2,3])....")

try:
    my_dict = {[1,2,3]: True}
except:
    print("as such, inbuilt mutable objects can not be keys e.g. {[1,2,3]: True}")


{'my string': 45, 'my other string': True}

inbuilt mutable objects are not hashable e.g. hash([1,2,3])....
as such, inbuilt mutable objects can not be keys e.g. {[1,2,3]: True}


What about user defined mutable classes?

In [7]:
hash(MyClass)
hash(MyEquivClass)
m_dict = {MyEquivClass, 'hello class'}
print("..classes are hashable! e.g. {MyEquivClass, 'hello class'} even if they only implement __eq__")

..classes are hashable! e.g. {MyEquivClass, 'hello class'} even if they only implement __eq__


Default user defined mutable class instances are also hashable.

In [8]:
print("hash(MyClass('my string') =", hash(MyClass('my string')))
print("this is because the hash is based on the id")

try:
    hash(MyEquivClass('my string'))
except TypeError:
    print("\nbut user objects that only implement __eq__ are not e.g. hash(MyEquivClass('my string'))....")

hash(MyClass('my string') = 174324093649
this is because the hash is based on the id

but user objects that only implement __eq__ are not e.g. hash(MyEquivClass('my string'))....


If `__eq__` and `__hash__` are implemented mutable class instances are also hashable

In [9]:
class MyEquivAndHashClass:

    def __init__(self, name):
        self.name = name

    def __eq__(self, other):
        return self.name == other.name

    def __hash__(self):
        return hash(self.name)

print("hash(MyEquivAndHashClass('my string') =", hash(MyEquivAndHashClass('my string')))
a = MyEquivAndHashClass('bob')
b = MyEquivAndHashClass('bob')
print(f"a = {a}")
print(f"b = {b}")
print(f"now equivalent a == b: {a==b}")
_my_dict = {a: True}
print("and can used in a dictionary _my_dict = {a: True}", _my_dict)
print("and updating dictionary with either a or b as the are the same thing")
_my_dict[a] = False
print("_my_dict[a] = False _my_dict becomes ", _my_dict)
_my_dict[b] = True
print("_my_dict[b] = True _my_dict becomes ", _my_dict)

hash(MyEquivAndHashClass('my string') = -3588552068763006609
a = <__main__.MyEquivAndHashClass object at 0x0000028968692710>
b = <__main__.MyEquivAndHashClass object at 0x00000289686930D0>
now equivalent a == b: True
and can used in a dictionary _my_dict = {a: True} {<__main__.MyEquivAndHashClass object at 0x0000028968692710>: True}
and updating dictionary with either a or b as the are the same thing
_my_dict[a] = False _my_dict becomes  {<__main__.MyEquivAndHashClass object at 0x0000028968692710>: False}
_my_dict[b] = True _my_dict becomes  {<__main__.MyEquivAndHashClass object at 0x0000028968692710>: True}


So what happens to the dictionary if the key changes?

In [10]:
print("lets change a.name = 'tim'")
a.name = 'tim'
print(f"no longer equivalent a == b: {a==b} as you would expect")
print(_my_dict)
try:
    condition = _my_dict[a]
except KeyError:
    print("however key is now different (different hash) so will not found e.g. condition = _my_dict[a]")
_my_dict[a] = False
print("if you try to update dictionary _my_dict[a] = False new key-pair created")
print(_my_dict)
print("which is slightly weird as we now have two keys for same object id")
print("but ok as we are saying tim and bob are not the same person so use different key")
_my_dict[b] = 'change from bool'
print("updating dictionary _my_dict[b] now create a further third new entry!")
print(_my_dict)

lets change a.name = 'tim'
no longer equivalent a == b: False as you would expect
{<__main__.MyEquivAndHashClass object at 0x0000028968692710>: True}
however key is now different (different hash) so will not found e.g. condition = _my_dict[a]
if you try to update dictionary _my_dict[a] = False new key-pair created
{<__main__.MyEquivAndHashClass object at 0x0000028968692710>: True, <__main__.MyEquivAndHashClass object at 0x0000028968692710>: False}
which is slightly weird as we now have two keys for same object id
but ok as we are saying tim and bob are not the same person so use different key
updating dictionary _my_dict[b] now create a further third new entry!
{<__main__.MyEquivAndHashClass object at 0x0000028968692710>: True, <__main__.MyEquivAndHashClass object at 0x0000028968692710>: False, <__main__.MyEquivAndHashClass object at 0x00000289686930D0>: 'change from bool'}


So what happens when two objects have different hash computation to equivalence?

In [11]:
# but you need hash and equivalence to use same information
# else weird stuff will happen
class MyBadEquivAndHashClass:

    def __init__(self, name, age):
        self.name = name
        self.age = age

    def __eq__(self, other):
        return self.name == other.name

    def __hash__(self):
        return hash(self.name+str(self.age))

a = MyBadEquivAndHashClass('bob', 15)
b = MyBadEquivAndHashClass('bob', 16)
print(f"a = {a}")
print(f"b = {b}")
print(f"again these are equivalent a == b: {a==b}")
print(f"but what happens in dictionary")
_my_dict = {a: True}
print("and can used in a dictionary _my_dict = {a: True}", _my_dict)
_my_dict[a] = False
print("_my_dict[a] = False _my_dict becomes ", _my_dict)
_my_dict[b] = True
print("_my_dict[b] = True _my_dict becomes ", _my_dict)
print("so this weird as a and b are supposedly equivalent but now 2 keyed items in dictionary")

a = <__main__.MyBadEquivAndHashClass object at 0x00000289686912D0>
b = <__main__.MyBadEquivAndHashClass object at 0x0000028968690970>
again these are equivalent a == b: True
but what happens in dictionary
and can used in a dictionary _my_dict = {a: True} {<__main__.MyBadEquivAndHashClass object at 0x00000289686912D0>: True}
_my_dict[a] = False _my_dict becomes  {<__main__.MyBadEquivAndHashClass object at 0x00000289686912D0>: False}
_my_dict[b] = True _my_dict becomes  {<__main__.MyBadEquivAndHashClass object at 0x00000289686912D0>: False, <__main__.MyBadEquivAndHashClass object at 0x0000028968690970>: True}
so this weird as a and b are supposedly equivalent but now 2 keyed items in dictionary


So if you want to use a mutable user object as a dictionary key then `__hash__` and `__eq__` should match modifiable entities!

In [12]:
class MyGoodEquivAndHashClass:

    def __init__(self, name, age):
        self.name = name
        self.age = age

    def __eq__(self, other):
        return self.name == other.name and self.age == other.age

    def __hash__(self):
        return hash(self.name+str(self.age))

a = MyGoodEquivAndHashClass('bob', 15)
b = MyGoodEquivAndHashClass('bob', 15)
a == b
print(f"a = {a}")
print(f"b = {b}")
print(f"again these are equivalent a == b: {a==b}")
_my_dict = {a: 0}
_my_dict[a] = 1
_my_dict[b] = 2
print("can use a or b to update dictionary, _my_dict = ", _my_dict)
print("setting a.age to 16")
a.age = 16
_my_dict[a] = 3
print("a is changed so new key, _my_dict = ", _my_dict)
print("_my_dict[a] = ", _my_dict[a])
a.age = 15
print("a change is reversed changed _my_dict[a] = ", _my_dict[a])
print("so in this example we are saying 15 year old bob is treated different to 16 year bob")

a = <__main__.MyGoodEquivAndHashClass object at 0x0000028968691270>
b = <__main__.MyGoodEquivAndHashClass object at 0x0000028968692020>
again these are equivalent a == b: True
can use a or b to update dictionary, _my_dict =  {<__main__.MyGoodEquivAndHashClass object at 0x0000028968691270>: 2}
setting a.age to 16
a is changed so new key, _my_dict =  {<__main__.MyGoodEquivAndHashClass object at 0x0000028968691270>: 3}
_my_dict[a] =  3
a change is reversed changed _my_dict[a] =  3
so in this example we are saying 15 year old bob is treated different to 16 year bob


Dictionaries work by using hashing/has tables. The key is converted to a hash code.

A quick reminder on how hashing works. As fundamental to Python as nearly everything under hood is a dict. If we have 100000 telephone number records is a linear array and we want to find the record for a particular number we can linearly search. if first in array great, if last when search time will be long a solution is to apply a function to the number/string whatever, a hashing function the resulting value, an integer, indexes into a hash table a hash table has a number or rows, or buckets the bucket index is usually computed as `hash_value % number buckets` each bucket may have a small number of phone records in it so when I want to do a look-up I hash the number (key) which gives me a bucket I then go along row (chain) to find record, or pointer to record, associated with my hash value if there is a collision - two keys generate same hash value then need to compare actual numbers to find right record so choice of hash function is important to spread records evenly over all the buckets and also to avoid collisions.
