# EC2202 Hashing

**Disclaimer.**
This code examples are based on 
1. [MIT 6.006 (Professor Erik Demaine, Dr. Jason Ku, and Professor Justin Solomon)](https://ocw.mit.edu/courses/electrical-engineering-and-computer-science/6-006-introduction-to-algorithms-spring-2020/index.htm)
2. [KAIST CS206 (Professor Otfried Cheong)](https://otfried.org/courses/cs206/)
3. [LeetCode](https://leetcode.com/)
4. [GeeksForGeeks](https://practice.geeksforgeeks.org/)
5. Coding Interviews

## Chaining

In [None]:
class _Node():
  def __init__(self, key, value, next):
    self.key = key
    self.value = value #hash의 key는 List의 index
    self.next = next

def _hash(key):
  return (key) % 100

class dict():
  def __init__(self):
    self._data = [ None ] * 100 # the memory space we have is 100

  def __repr__(self):
    s = ""
    for i in range(100):
      s += "%02d: " % i
      p = self._data[i]
      while p is not None:
        s += str(p.key) + " "
        p = p.next
      s += "\n"
    return s

  def _findnode(self, key):
    i = _hash(key)     # memory location
    p = self._data[i]  # the head of the linked list at memory location i
    while p is not None: 
      if p.key == key:
        return p
      p = p.next
    return None

  def __contains__(self, key):
    return self._findnode(key) is not None

  # print(d[k]) -> value of the item with the key k
  def __getitem__(self, key):
    p = self._findnode(key)
    if p:
      return p.value
    else:
      raise ValueError(key)
  
  # 'ppp' exercise
  def __setitem__(self, key, value):  # d[k] = v
    p = self._findnode(key)
    if p:
      p.value = value
    else:
      h = _hash(key)
      self._data[h] = _Node(key, value, self._data[h])

## Open addressing



In [None]:
class _Entry():
  def __init__(self, key, value):
    self.key = key
    self.value = value

def _hash(key):
  return (key) % 100

class dict():
  def __init__(self):
    self._data = [ None ] * 100

  def __repr__(self):
    s = ""
    for i in range(100):
      s += "%02d: " % i
      if self._data[i] is not None:
        s += str(self._data[i].key)
      s += "\n"
    return s

  def _findkey(self, key):
    i = _hash(key)
    while self._data[i] is not None:
      if self._data[i].key == key:
        return (True, i)
      i = (i + 1) % 100
    return (False, i)

  def __contains__(self, key):
    found, i = self._findkey(key) 
    return found
  
  def __getitem__(self, key):
    found, i = self._findkey(key) 
    if found:
      return self._data[i].value
    else:
      raise ValueError(key)

  # 'ppp' exercise
  def __setitem__(self, key, value):
    found, i = self._findkey(key) 
    if found:
      self._data[i].value = value
    else:
      self._data[i] = _Entry(key, value)

## Practial issues

### Naive implementation

In [1]:
class Point():
  def __init__(self, x, y):
    self.x = x
    self.y = y

  def __repr__(self):
    return "Point(%s, %s)" % (self.x, self.y)

**WWPP**

In [2]:
s = set()
s.add(Point(3, 5))
print(s)
print(Point(3, 5) in s)

{Point(3, 5)}
False


Even though we can see that s contains a Point(3, 5), we cannot find it in the set. The reason becomes clear when we try the following:

In [None]:
p = Point(3, 5)
q = Point(3, 5)
print(p == q)

In [None]:
print(hash(p))
print(hash(q))

Even though two points have the same coordinates, Python does not consider them equal, and they have different hash codes—so there is no way that the set could find the entry.

### Implementing `__eq__`

In [None]:
class Point():
  def __init__(self, x, y):
    self.x = x
    self.y = y

  def __repr__(self):
    return "Point(%s, %s)" % (self.x, self.y)

  def __eq__(self, rhs):
    return self.x == rhs.x and self.y == rhs.y

In [None]:
p = Point(3, 5)
q = Point(3, 5)
print(p == q)

In [None]:
s = set()
s.add(Point(3, 5))

Python can now determine that the two points are equal—but it tells us that Point objects cannot be used in a hash table. In fact, it’s the hash function that no longer works:

In [None]:
print(hash(p))

The Python interpreter will not use its default implementation of the hash function for objects with an equality operator. Why not? Because the hash code of equal objects needs to be the same, and Python has no way to ensure this.

### Implementing `__hash__`

In [None]:
class Point():
  def __init__(self, x, y):
    self.x = x
    self.y = y

  def __repr__(self):
    return "Point(%s, %s)" % (self.x, self.y)

  def __eq__(self, rhs):
    return self.x == rhs.x and self.y == rhs.y

  def __hash__(self):
    return hash((self.x, self.y))

In [None]:
s = set()
s.add(Point(3, 5))
print(s)
print(Point(3, 5) in s)

The lesson is: hash tables require that keys satisfy the following “contract”

### Mutable keys

In [None]:
p = Point(3, 5)
s = set()
s.add(p)
print(s)

**WWPP**

In [None]:
p.y = 9
print(s)
print(Point(3, 9) in s)
print(Point(3, 5) in s)

Even though s clearly contains Point(3,9), the set cannot find it. The reason is that p’s hash code has changed after it was added to the hash table, so p is simply in the wrong slot of the hash table!

The lesson here: Never modify keys after they were added to a hash table.
In fact, I would go further and recommend: Never use mutable objects as keys in a hash table. This is yet another example why immutable objects make programming safer and easier.

Python encourages this idea: Python lists and Python sets are themselves not hashable. You cannot put a Python list, or a Python set into a set! What you can do instead is to use a tuple or a frozenset. These objects are hashable, and can be used as keys in a map or as elements of a set.

In [None]:
d[[1, 2, 3]] = 5  # not allowed!