<a href="https://colab.research.google.com/github/KenzieAcademy/python-notebooks/blob/master/demo_nodict.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

<table border="0" align="left" width="700" height="144">
<tbody>
<tr>
<td width="120"><img width="100" src="https://static1.squarespace.com/static/5992c2c7a803bb8283297efe/t/59c803110abd04d34ca9a1f0/1530629279239/" /></td>
<td style="width: 600px; height: 67px;">
<h1 style="text-align: left;">Interview Question</h1>
<h3 style="text-align: left;">Create your own Dict class without using the 'dict' keyword.</h3>
</td>
</tr>
</tbody>
</table>

### This coding test question was asked during an actual technical interview of a Kenzie student: 

> "Create a class that behaves like a Python `dict`.  However, the class definition cannot use the `dict` keyword or Python dictionary class objects."


This is difficult problem, but not beyond our comprehension.  If we spend some time to outline and understand the problem first, then coding it becomes easier.

- Study the properties of a dictionary:
   - Fast lookups (uses hashing instead of indexing)
   - What is hashing?
   - Uses 'buckets' internally.  What's a bucket?
   - How are duplicates handled?
   - How to handle square-bracket lookups?
- Model a key-value pair as a distinct OOP Thing.
- Model our dict as a container of those key-value OOP Thingys.

## `<Derail>` Part I: Hashing
Hashing is the process of using an algorithm to map data of any size to a fixed length. This is called a hash value. Hashing is used to create high performance, direct access data structures where large amount of data is to be stored and accessed quickly. Hash values are computed with hash functions.

### Let's look at what kinds of types are hashable in Python ...

In [0]:
# What are some hashable objects in Python?
# You may be tempted to simply check for the presence of a __hash__ method ...
print(f"String is hashable? {hasattr(str, '__hash__')}")
print(f"List is hashable? {hasattr(list, '__hash__')}")
print(f"Int is hashable? {hasattr(int, '__hash__')}")
print(f"Set is hashable? {hasattr(set, '__hash__')}")
print(f"Tuple is hashable? {hasattr(tuple, '__hash__')}")

# But remember: All built-in data types inherit from the generic 'object' class.
# 'Object' always carries a __hash__ method, but it's not always implemented.

# Use callable instead
# print(f"String is hashable? {callable(str.__hash__)}")
# print(f"List is hashable? {callable(list.__hash__)}")
# print(f"Int is hashable? {callable(int.__hash__)}")
# print(f"Set is hashable? {callable(set.__hash__)}")
# print(f"Tuple is hashable? {callable(tuple.__hash__)}")

## Python hash() function
The hash() function returns the hash value of the object if it has one. Hash values are integers. 



In [0]:
# You can use try/except to detect unhashables. 
# Let's hash some things.
thing = "Some string"
try:
    h = hash(thing)
    print(f"{thing} : hashes into {h}")
except TypeError:
    print(f"{thing} : No hash for you.")

## What is the hashing algorithm that Python uses?
Actually there are several.  Python hashes objects differently, according to their types.  If you would like to review the C source code you can find it at https://github.com/python/cpython/blob/master/Python/pyhash.c

## Python immutable builtins are hashable.
Hashable types are integers, strings, or tuples.

Python class objects are hashable by default. Their hash is derived from their Id.



In [0]:
class User:
    """A generic user"""
    def __init__(self, name, agency):
        self.name = name
        self.agency = agency

u1 = User('John Doe', 'cia')
u2 = User('John Doe', 'cia')

print(f'u1 hash: {hash(u1)}')
print(f'u2 hash: {hash(u2)}')

if (u1 == u2):
    print('same user')
else:
    print('different users')

# Can they be added to a set?
myset = {u1, u2}
print(f"myset = {myset}")

The user attributes are identical, but they are not identical objects because they occupy two separate memory locations and their IDs are different.  For the comparison to work, we need to implement the `__eq__()` method.



In [0]:
class User:
    """A generic user, with __eq__ and __repr__"""
    def __init__(self, name, agency):
        self.name = name
        self.agency = agency
    
    def __repr__(self):
        """Rendering ourself in a more readable way"""
        return f'[{self.name}:{self.agency}]'
    
    def __eq__(self, other):
        """Equality comparison func"""
        return self.name == other.name and self.agency == other.agency

# Test it
u1 = User('Valerie', 'CIA')
u2 = User('Valerie', 'CIA')

print(f'u1 hash: {hash(u1)}')
print(f'u2 hash: {hash(u2)}')

print(f"{u1} and {u2} are {'same' if u1 == u2 else 'different'}")

# Can they be added to a set?
myset = {u1, u2}
print(f"myset = {myset}")

The attribute comparison now returns the expected result, but the objects are not hashable yet.  We need to add the final piece to the puzzle: The `__hash__` method.

In [0]:
class User:
    """A generic user, with __eq__, __repr__ and __hash__"""
    def __init__(self, name, agency):
        self.name = name
        self.agency = agency
    
    def __repr__(self):
        """Make output look pretty"""
        return f'[{self.name}:{self.agency}]'
    
    def __eq__(self, other):
        """Equality comparison func"""
        return self.name == other.name and self.agency == other.agency

    def __hash__(self):
        """Performs a hash on the attributes. Note the tuplization of hash input"""
        return hash((self.name, self.agency))

# Test it
u1 = User('Valerie', 'CIA')
u2 = User('Valerie', 'CIA')

print(f'u1 hash: {hash(u1)}')
print(f'u2 hash: {hash(u2)}')

print(f"{u1} and {u2} are {'same' if u1 == u2 else 'different'}")

# Can they be added to a set?
myset = {u1, u2}
print(f"myset = {myset}")

## Hashing `</derail>`

## Part II : Creating a Node class object
Here we'll create an associative object that binds the keys and values together, along with a hash of the key.  This represents a single key/value association in the Dictionary

In [0]:
class Node:
    """A class that binds a key and a value together with a hash"""
    def __init__(self, key, value=None):
        self.key = key
        self.value = value
        # Create a hash value for this node, based on the key.
        self.hash = hash(self.key)

    def __repr__(self):
        """How this class renders its own data when asked for a string representation"""
        return f'{self.__class__.__name__}: k={self.key} v={self.value}'

    def __eq__(self, other):
        """This gets called when comparing to other Nodes, using == """
        return self.key == other.key

# Test it
n = Node('Mike', 21)
print(n)

## The NoDict class
This class implements a basic dictionary data type.  It is an associative hash map.  The hash of each Node is used to determine which 'bucket' the Node will be stored into.

Note that the hash value of each Node starts as a large integer.  Using modulo division, the hash value gets reduced to a range where it will map into one of the bucket indexes.

This is what makes the lookups fast.

In [0]:
# Create a dictionary class to manage the key value pairs (Nodes)
# The dictionary will be initialized with "buckets" aka "swim lanes" 
# We will default to 10 buckets.
class NoDict:
    def __init__(self, num_buckets=10):
        self.size = num_buckets
        self.buckets = [ [] for _ in range(self.size) ]

    def __repr__(self):
        """How this dict renders its data when printed"""
        # For this dict, we want to show all the buckets vertically
        return '\n'.join([f'{self.__class__.__name__}.{i}:{bucket}' for i, bucket in enumerate(self.buckets)])

    def add(self, key, value):
        """Inserts a new key-value Node into the NoDict"""
        new_kv = Node(key, value)
        index = new_kv.hash % self.size
        bucket = self.buckets[index]
        # Insert new node, remove duplicates
        for node in bucket:
            if node == new_kv:
                print("Hey I'm removing a duplicate")
                bucket.remove(node)
                break
        bucket.append(new_kv)

d = NoDict()
d.add('Daniel', 25)
d.add('Piero', 99)
d.add('Stew', 35)
print(d)

### Adding the `get` method
After adding the `get` method, we can do lookups as well as insertions!  Also adding two magic methods for `__setitem__` and `__getitem__` because dictionaries should be usable with square-bracket operators

In [0]:
# Create a dictionary class to manage the key value pairs (Nodes)
# The dictionary will be initialized with "buckets" aka "swim lanes" 
# We will default to 10 buckets.
class NoDict:
    def __init__(self, num_buckets=10):
        self.size = num_buckets
        self.buckets = [ [] for _ in range(self.size) ]

    def __repr__(self):
        """How this dict renders its data when printed"""
        # For this dict, we want to show all the buckets
        return '\n'.join([f'{self.__class__.__name__}.{i}:{bucket}' for i, bucket in enumerate(self.buckets)])

    def add(self, key, value):
        """Inserts a new key-value Node into the NoDict"""
        new_kv = Node(key, value)
        index = new_kv.hash % self.size
        bucket = self.buckets[index]
        # Insert new node, remove duplicates
        for node in bucket:
            if node == new_kv:
                print("Hey I'm removing a duplicate")
                bucket.remove(node)
                break
        bucket.append(new_kv)

    def get(self, key):
        """Lookup the value of a key"""
        to_find = Node(key)
        bucket = self.buckets[to_find.hash % self.size]
        for node in bucket:
            if node == to_find:
                return node.value
        raise KeyError(f'{key} not found')

    def __getitem__(self, key):
        """Makes our class responsive to [] lookup operators"""
        return self.get(key)

    def __setitem__(self, key, value):
        """Makes our class responsive to [] assignment operators"""
        self.add(key, value)

d = NoDict()
# Yay now it behaves like a real dict!
d['Daniel'] = 25
d['Piero'] = 99
d['Stew'] =35
print(d)
print(f"Stew's age is {d['Stew']}")

In [0]:
# This should raise a KeyError exception
print(f"Janell's age is {d['Janell']}")

### Conclusion
It's nice to understand the mechanics of what is going on inside a dictionary.  Hashing is an important topic to understand as well.  

Dictionaries in python are already highly optimized containers.  But under the hood, there are always tradeoffs.  By tweaking things like the number of buckets, hashing algorithm, a dictionary can be tuned to perform better in certain situations.

Of course there are more methods we could add:  How would you implement a method to remove a key?  What happens if two completely different Nodes produce the same hash value aka "hash collision"?

### Further Reading
[What Happens when you Mess with Hashing in Python](https://www.asmeurer.com/blog/posts/what-happens-when-you-mess-with-hashing-in-python/)