In [None]:
// run this cell to prevent Jupyter from displaying the null output cell
com.twosigma.beakerx.kernel.Kernel.showNullExecutionResult = false;

<a id='notebook_id'></a>
# `hashCode`

The `hashCode` method is a curious method for new Java programmers because it is a method that is almost never explicitly used by Java programmers while at the same time it is a method that often must be implemented for certain parts of the standard library to work correctly.

A fundamental rule that Java programmers must remember is that **the programmer must override `hashCode` in a class whenever `equals` is overridden in that class**. Failure to override `hashCode` when `equals` is overridden will cause standard library collection classes such as `HashSet` and `HashMap` to behave incorrectly.

## Hash functions

A [*hash function*](https://en.wikipedia.org/wiki/Hash_function) is any function that maps information of arbitrary size to a set of fixed-size values. The `hashCode` method should implement a hash function that maps the state of an object (the "information of arbitrary size") to an `int` value (the "set of fixed-size values"). Hash functions are used by a data structure called a [*hash table*](https://en.wikipedia.org/wiki/Hash_table) that supports constant-time search over the elements in the hash table.

### Hash tables

Suppose that you an unsorted list of $n$ strings and you want to know if the list contains a particular target string. In Java you would use a loop to search for the string; for example, to search a list `t` for the target string `"fish"` you could use the following loop:

```java
// assume that t is a List<String>
boolean hasFish = false;
for (String s : t) {
    if (s.equals("fish")) {
        hasFish = true;
    }
}
```

Of course, you would actually use the `List` method `contains` but that method uses a loop closely resembling the one shown above.

In the worst-case every element of list is compared to the target string using `equals` and the worst-case complexity is $O(n)$ where $n$ is the number of elements in the list. Assuming the target string is always in the list then on average there will be approximately $n / 2$ comparisons using `equals` to find a target string.

A hash table can be thought of as a list of buckets where each bucket can hold multiple references. Instead of using a linear index to directly access an element, a hash table converts the state of an object to an integer value called a *hash code* and then uses the hash code to access a bucket of the hash table. Unlike a list, a well implemented hash table can search for an element in the hash table in $O(1)$ time complexity. The following video is a brief introduction to how hash tables store and retrieve elements. 

<video controls src="../resources/images/hash-tables.mp4" />

## What happens if you override `equals` but not `hashCode`?

Consider the `Point2` class where `equals` has been overridden so that two points are equal if and only if they have equal coordinates:

In [None]:
public class Point2 {
    
    private double x;
    private double y;
    
    public Point2(double x, double y) {
        this.x = x;
        this.y = y;
    }
    
    @Override
    public boolean equals(Object obj) {
        if (this == obj) {
            return true;
        }
        if (!(obj instanceof Point2)) {
            return false;
        }
        Point2 other = (Point2) obj;
        if (Double.compare(this.x, other.x) == 0 && 
            Double.compare(this.y, other.y) == 0) {
            return true;
        }
        return false;
    }
}

`hashCode` has not been overridden yet in the `Point2` class which means that when `hashCode` is invoked using a `Point2` reference the `hashCode` method inherited from `Object` is used. `Object.hashCode` uses the memory address of the object to compute the hash code. Consider what happens when we create two points having the same coordinates:

```java
Point2 p = new Point2(1.0, -2.0);
Point2 q = new Point2(1.0, -2.0);
```

A memory diagram after `q` is created would look something like:

| Address | Type | Variable | Value | |
| -: | -: | -: | -: | :- |
| 0 | | | | |
| 1 | | | | |
| ... | | | | |
| 100 | `Point2` | `p` | 2004 | |
| 101 | `Point2` | `q` | 2020 | |
| ... | | | | |
| 2004 | `Point2` object | | |  |
|      | double | x | 1.0 |
|      | double | y | -2.0 |
| ... | | | | |
| 2020 | `Point2` object | | |  |
|      | double | x | 1.0 |
|      | double | y | -2.0 |
| ... | | | | |

The two objects referenced by `p` and `q` reside at different memory addresses (2004 and 2020, respectively) because objects cannot overlap in memory. Now suppose that we add `p` to a `HashSet`:

In [None]:
import java.util.HashSet;

Point2 p = new Point2(1.0, -2.0);
Point2 q = new Point2(1.0, -2.0);

HashSet<Point2> t = new HashSet<>();
t.add(p);
System.out.println(t.contains(p));

Everything seems fine; the set `t` says that it contains `p` after adding `p` to the set. According to [the documentation for `contains`](https://docs.oracle.com/en/java/javase/11/docs/api/java.base/java/util/HashSet.html#contains(java.lang.Object)) the set should also contain `q` because `p.equals(q)` is `true`. Try running the following cell to test if the set thinks it contains `q`:

In [None]:
import java.util.HashSet;

Point2 p = new Point2(1.0, -2.0);
Point2 q = new Point2(1.0, -2.0);

HashSet<Point2> t = new HashSet<>();
t.add(p);
System.out.println(t.contains(p));
System.out.println(t.contains(q));

`t.contains(q)` returns `false` even though `p` and `q` refer to points having the same coordinates. The reason this happens is that `t.contains(q)` causes `q.hashCode()` to be invoked and the hash code computed based on the memory address of `q` is almost certainly different than the hash code computed based on the memory address of `p`. Because we failed to override `hashCode` in `Point2` classes such as `HashSet` can no longer satisfy the contracts of their methods.

## Requirements of `hashCode`

The [documentation for `hashCode`](https://docs.oracle.com/en/java/javase/11/docs/api/java.base/java/lang/Object.html#hashCode()) states the general contract for `hashCode` as:

* Whenever it is invoked on the same object more than once during an execution of a Java application, the `hashCode` method must consistently return the same integer, provided no information used in equals comparisons on the object is modified. This integer need not remain consistent from one execution of an application to another execution of the same application.
* If two objects are equal according to the `equals(Object)` method, then calling the `hashCode` method on each of the two objects must produce the same integer result.
* It is not required that if two objects are unequal according to the `equals(java.lang.Object)` method, then calling the `hashCode` method on each of the two objects must produce distinct integer results. However, the programmer should be aware that producing distinct integer results for unequal objects may improve the performance of hash tables. 

In simpler terms the contract can be read as:

* Repeatedly invoking `hashCode` on an object should return the same value if the state of the object does not change.
* Two objects that are equal must produce equal hash codes.
* Two objects that are not equal may produce equal hash codes but performance of the hash table-based containers may suffer.

### Legal but poor implementations of `hashCode`

The following is a legal but poor implementation of `hashCode` (for any class):

```java
@Override
public int hashCode() {
    return 1;
}
```

The preceding implementation ensures that equal objects return the same hash code by ensuring that every object has the same hash code. This causes every object to hash to the same bucket in a hash table.

The following is a legal but poor implementation of `hashCode` for the `Point2` class:

```java
@Override
public int hashCode() {
    return (int) this.x + (int) this.y;
}
```

The preceding implementation cases points whose coordinates differ only in the digits after the decimal point to return the same hash code. A practical example where this `hashCode` implementation would degrade performance is in computer graphics where it is common to deal with normalized coordinates in which every point has coordinates inside the square with corners $(0.0, 0.0)$ and $(1.0, 1.0)$. In such an application, all normalized points would hash to to the same bucket in a hash table.

Both of the above implementations of `hashCode` are fast to compute. A good hash function should tend to produce different hash code values for unequal objects and should produce a roughly uniform distribution of hash code values for any collection of unequal objects which the two above implementations of `hashCode` fail to do. 


## Implementing `hashCode`

Joshua Bloch published a step-by-step recipe for implementing a usable `hashCode` method in the book *Effective Java, Third Edition*. The recipe for overriding `hashCode` is quoted directly from its source below.

1. Declare an `int` variable named `result`, and initialize it to the hash code `c` for the first significant field in your object, as computed in Step 2A (Recall that a significant field is a field that affects equals comparisons.)
2. For every remaining significant field `f` in your object, do the following:
    1. Compute an `int` hash code `c` for the field:
        1. If the field is a primitive type, compute *`Type`* `.hashCode(f)`, where *`Type`* is the boxed primitive class corresponding to `f`'s type.
        2. If the field is an object reference and this class's `equals` method compares the field by recursively invoking `equals`, recursively invoke `hashCode` on the field. If a more complex comparison is required in `equals`, compute a "canonical representation" for this field and invoke `hashCode` on the canonical representation. If the value of the field is `null`, use 0 (or some other constant, but 0 is traditional).
        3. If the field is an array, treat it as if each significant element were a separate field. That is, compute a hash code for each significant element by applying these rules recursively, and combine the values per step 2B. If the array has no significant elements, use a constant, preferably 0. If all elements are significant, use `Arrays.hashCode`.
    2. Combine the hash code `c` computed in step 2A into `result` as follows:
    ```java
    result = 31 * result + c;
    ```
3. Return `result`.

The recipe seems somewhat complicated but is actually easy to implement in practice (except for the "canonical representation" part of Step 2A(b); see the exercises for an example).

## Implementing the `hashCode` for `Counter`

A  minimal version of the `Counter` class where we have started to implement the `hashCode` method is shown in the following cell. Run the cell to compile the class.

In [None]:
public class Counter {

    private int value;

    public Counter(int value) {
        if (value < 0) {
            throw new IllegalArgumentException("value must be non-negative");
        }
        this.value = value;
    }
    
    // other constructors and methods not shown
    
    @Override
    public boolean equals(Object obj) {
        if (!(obj instanceof Counter)) {
            return false;
        }
        Counter other = (Counter) obj;
        if (this.value == other.value) {
            return true;
        }
        return false;
    }
    
    @Override
    public int hashCode() {
        
        return 0;
    }
}

When implementing `hashCode` in a class you must refer to the `equals` method for the class to identify the significant fields. For `Counter` the lone significant field is the `int` field `value`.

Step 1 of the recipe says to declare an `int` variable named `result`, and initialize it to the hash code `c` for the first significant field in your object, as computed in Step 2A. To compute the hash code for an `int` field, Step 2A says to use `Integer.hashCode`. Implementing Step 1 yields the following:

```java
    @Override
    public int hashCode() {
        int result = Integer.hashCode(this.value);
        
        return 0;
    }
```

We skip Step 2 because there are no other significant fields.

Step 3 says to return `result` which completes the implementation of `hashCode`:

```java
    @Override
    public int hashCode() {
        int result = Integer.hashCode(this.value);
        
        return result;
    }
```


In [None]:
public class Counter {

    private int value;

    public Counter(int value) {
        if (value < 0) {
            throw new IllegalArgumentException("value must be non-negative");
        }
        this.value = value;
    }
    
    // other constructors and methods not shown
    
    @Override
    public boolean equals(Object obj) {
        if (!(obj instanceof Counter)) {
            return false;
        }
        Counter other = (Counter) obj;
        if (this.value == other.value) {
            return true;
        }
        return false;
    }
    
    @Override
    public int hashCode() {
        int result = Integer.hashCode(this.value);

        return result;
    }
}

After compiling the completed class we can try adding various counters to a hash set:

In [None]:
import java.util.HashSet;

HashSet<Counter> t = new HashSet<>();

Counter c1 = new Counter(5);
Counter c2 = new Counter(5);
Counter c3 = new Counter(3);

System.out.println("add c1? : " + t.add(c1));
System.out.println("add c2? : " + t.add(c2));
System.out.println("add c3? : " + t.add(c3));

## Implementing `hashCode` for `Point2`

A minimal version of the `Point2` class where we have started to implement the `hashCode` method is shown in the following cell. Run the cell to compile the class.

In [None]:
public class Point2 {

    private double x;
    private double y;
    
    public Point2(double x, double y) {
        this.x = x;
        this.y = y;
    }
    
    // other constructors and methods not shown
    
    @Override
    public boolean equals(Object obj) {
        if (this == obj) {
            return true;
        }
        if (!(obj instanceof Point2)) {
            return false;
        }
        Point2 other = (Point2) obj;
        if (Double.compare(this.x, other.x) == 0 && 
                Double.compare(this.y, other.y) == 0) {
            return true;
        }
        return false;
    }
    
    @Override
    public int hashCode() {
        
        return 0;
    }
}

`Point2` has two significant fields used in the implementation of `equals`. 

Step 1 of the recipe requires us to compute a hash code for the field `x` using `Double.hashCode`. Implementing Step 1 yields the following:

```java
    @Override
    public int hashCode() {
        int result = Double.hashCode(this.x);
        
        return 0;
    }
```

Step 2A of the recipe requires us to compute a hash code for the field `x` using `Double.hashCode`. Implementing Step 2A yields the following:

```java
    @Override
    public int hashCode() {
        int result = Double.hashCode(this.x);
        int c = Double.hashCode(this.y);
        
        return 0;
    }
```

Step 2B of the recipe requires us to combine `c` with result. Implementing Step 2B yields the following:

```java
    @Override
    public int hashCode() {
        int result = Double.hashCode(this.x);
        int c = Double.hashCode(this.y);
        result = 31 * result + c;
        
        return 0;
    }
```

As there are no other significant fields we can return `result` in Step 3:

```java
    @Override
    public int hashCode() {
        int result = Double.hashCode(this.x);
        int c = Double.hashCode(this.y);
        result = 31 * result + c;
        
        return result;
    }
```

#### Why does `equals` use `Double.compare` instead of `==`

Recall that the `Point2` version of `equals` uses `Double.compare` instead of `==` to compare the $x$ and $y$ coordinates of a point. Suppose that we did not follow the `equals` recipe and instead used `==` like so:

```java
    @Override
    public boolean equals(Object obj) {
        if (this == obj) {
            return true;
        }
        if (!(obj instanceof Point2)) {
            return false;
        }
        Point2 other = (Point2) obj;
        if (this.x == other.x && 
                this.y == other.y) {
            return true;
        }
        return false;
    }
```

With this version of `equals` the point $(-0.0, 1.0)$ is equal to the point $(0.0, 1.0)$ because `-0.0 == 0.0` is `true` and `1.0 == 1.0` is `true`. If two points are equal then their hash codes must also be equal. Is this true?

Notice that the implementation of `hashCode` uses `Double.hashCode(this.x)` and `Double.hashCode(this.y)`. What are the values of `Double.hashCode(-0.0)` and `Double.hashCode(0.0)`?

In [None]:
System.out.println(Double.hashCode(-0.0));
System.out.println(Double.hashCode(0.0));

The values are (very) different which means that the hash code of the point $(-0.0, 1.0)$ will be different than the hash code of the point $(0.0, 1.0)$. Because their hash codes are different, every hashed container will conclude that the two points are different even though `equals` concludes that the two points are equal.

A similar problem occurs if two points have a coordinate that is equal to `Double.NaN`. `Double.NaN == Double.NaN` is `false` as mandated by the IEEE754 floating-point standard. As a result, the point $(\text{NaN}, 1.0)$ is not equal to a second point having the coordinates $(\text{NaN}, 1.0)$ in the version of `equals` that uses `==` to compare coordinates. This means that you can put an arbitrary number of copies of the point $(\text{NaN}, 1.0)$ into a `HashSet` and if you do, the `HashSet` will always conclude that the point $(\text{NaN}, 1.0)$ is not in the set.

Both problems can be illustrated using a small class:

In [None]:
public class Widget {
    private double x;
    
    public Widget(double x) {
        this.x = x;
    }
    
    @Override
    public boolean equals(Object obj) {
        if (this == obj) {
            return true;
        }
        if (!(obj instanceof Widget)) {
            return false;
        }
        Widget other = (Widget) obj;
        return this.x == other.x;
    }
    
    @Override
    public int hashCode() {
        return Double.hashCode(this.x);
    }
}

In [None]:
import java.util.HashSet;

HashSet<Widget> t = new HashSet<>();

Widget w1 = new Widget(-0.0);
Widget w2 = new Widget(0.0);
// are the two widgets equal?
System.out.println("equal?: " + w1.equals(w2));

// if they are equal then we should be able to put only one of them into the set
t.add(w1);
t.add(w2);
System.out.println("size?: " + t.size());   // oops

Widget w3 = new Widget(Double.NaN);
Widget w4 = new Widget(Double.NaN);
Widget w5 = new Widget(Double.NaN);
Widget w6 = new Widget(Double.NaN);

t.clear();
// we should only be able to put one of w3-w6 into the set
t.add(w3);
t.add(w4);
t.add(w5);
System.out.println("size?: " + t.size());   // oops

// is w6 in the set?
System.out.println("in set?: " + t.contains(w6));   // oops

In conclusion, when implementing `equals` in a class that has a floating-point field always use `Float.compare` or `Double.compare` to compare the field for equality, otherwise the hashed collections in the standard library will not behave correctly.

## Implementing `hashCode` for `Card`

The `Card` class has two non-null reference type fields both of which are used in `equals`; Step 2Ab says to recursively invoke `hashCode` on the fields. A complete version of `hashCode` is shown below:

In [None]:
import java.util.Arrays;

public class Card {
    private String rank;
    private String suit;
    
    public static final String[] RANKS = {
        "2", "3", "4", "5", "6", "7", "8", "9", "10", "J", "Q", "K", "A"
    };
    
    public static final String[] SUITS = {
        "CLUBS", "DIAMONDS", "HEARTS", "SPADES"
    };
    
    public Card(String rank, String suit) {
        if (!Arrays.asList(RANKS).contains(rank)) {
            throw new IllegalArgumentException();
        }
        if (!Arrays.asList(SUITS).contains(suit)) {
            throw new IllegalArgumentException();
        }
        this.rank = rank;
        this.suit = suit;
    }
    
    // other constructors and methods not shown
    
    @Override
    public boolean equals(Object obj) {
        if (this == obj) {
            return true;
        }
        if (!(obj instanceof Card)) {
            return false;
        }
        Card other = (Card) obj;
        if (this.rank.equals(other.rank) && 
            this.suit.equals(other.suit)) {
            return true;
        }
        return false;
    }
    
    @Override
    public int hashCode() {
        int result = this.rank.hashCode();   // simply call hashCode on this.rank
        int c = this.suit.hashCode();        // and on this.suit
        result = 31 * result + c;
        return result;
    }

}

### Some explanation of Step 2B

The curious reader might be wondering why `c` is added to `31 * result` instead of just adding it to `result`. For example, why is `hashCode` for `Point2` not implemented as:

```java
    @Override
    public int hashCode() {
        int result = Double.hashCode(this.x);
        int c = Double.hashCode(this.y);
        result = result + c;                  // remove the multiplication
        
        return 0;
    }
```

The problem with the implementation shown above is that the result *no longer depends on the order of the fields*; for example, a point with coordinates $(1.5, 2.0)$ now produces the same hash code as $(2.0, 1.5)$. Pre-multiplying `result` by a constant factor ensures that the order in which the fields are considered affects the final result.

The curious reader might also be wondering why multiply by the constant value 31? It turns out that there is no one value that is guaranteed to produce good hash codes in all circumstances but the number 31 [worked well](https://bugs.java.com/bugdatabase/view_bug.do?bug_id=4045622) when developing the `String` `hashCode` method and testing the method on words found in an English language dictionary. Another minor benefit is that multiplication by 31 can be done efficiently on many CPUs.

Values that must not be chosen for the constant are even values because of how binary multiplication works. Choosing an even number is guaranteed to not produce a uniform distribution of hash code values for any collection of unequal objects.

## Exercises

1. Implement `hashCode` for the version of the `Domino` class shown below. Be careful; it's trickier than it might first appear!

In [None]:
public class Domino {

    /**
     * The smallest possible value for a side of a domino.
     */
    public static final int MIN_VALUE = 0;
    
    /**
     * The largest possible value for a side of a domino. 
     */
    public static final int MAX_VALUE = 6;

    private int val1;
    private int val2;

    public Domino(int value1, int value2) {
        if (!isValueOK(value1) || !isValueOK(value2)) {
            throw new IllegalArgumentException();
        }
        this.val1 = value1;
        this.val2 = value2;
    }

    public static boolean isValueOK(int value) {
        return value >= MIN_VALUE && value <= MAX_VALUE;
    }
    
    public int getSmallerValue() {
        int result = this.val1 <= this.val2 ? this.val1 : this.val2;
        return result;
    }

    public int getLargerValue() {
        int result = this.val1 >= this.val2 ? this.val1 : this.val2;
        return result;
    }
    
    @Override
    public boolean equals(Object obj) {
        if (this == obj) {
            return true;
        }
        if (!(obj instanceof Domino)) {
            return false;
        }
        Domino other = (Domino) obj;
        return (this.getSmallerValue() == other.getSmallerValue() && 
                this.getLargerValue() == other.getLargerValue());
    }
    
    @Override
    public int hashCode() {
        
    }
}

2. The class in the following cell represents a Canadian nickel. A nickel is equal to every other nickel (presumably because they are all worth 5 cents). What is your assessment of the correctness of the `hashCode` method?

In [None]:
/**
 * A class representing a Canadian nickel. The monetary value of nickel is
 * defined to be five cents. Nickels were first minted in the year 1858. A
 * nickel has an issue year which is the year in which the nickel was issued by
 * the mint.
 *
 */
public class Nickel {

    private int year;

    /**
     * The monetary value of a nickel in cents.
     */
    public final int CENTS = 5;

    /**
     * Initializes this nickel to have the specified issue year.
     * 
     * @param year the year this coin was issued in
     * @pre. year must be greater than or equal to 1858
     * @throws IllegalArgumentException if year is less than 1858
     */
    public Nickel(int year) {
        if (year < 1858) {
            throw new IllegalArgumentException();
        }
        this.year = year;
    }

    /**
     * Returns the issue year of this coin.
     * 
     * @return the issue year of this coin
     */
    public int issueYear() {
        return this.year;
    }

    /**
     * Compares this nickel to the specified object for equality. The result is true
     * if obj is a nickel. The issue year is not considered when comparing two
     * nickels for equality.
     * 
     * @return true if obj is a nickel
     */
    @Override
    public boolean equals(Object obj) {
        if (this == obj) {
            return true;
        }
        if (obj == null) {
            return false;
        }
        if (this.getClass() != obj.getClass()) {
            return false;
        }
        return true;
    }

    /**
     * Returns a hash code for this nickel. Specifically, this method returns the
     * issue year of this nickel.
     * 
     * @return the issue year of this nickel
     */
    @Override
    public int hashCode() {
        return this.year;
    }
}

3. See Exercises 3-6 in the [Designing simple classes](./designing_simple_classes.ipynb#notebook_id) notebook. Implement suitable `hashCode` methods for these classes.

4. See Exercise 5 in the [Constructors](./constructors.ipynb#notebook_id) notebook. Implement `hashCode` for your complex number class.

5. See Exercise 9 in the [Constructors](./constructors.ipynb#notebook_id) notebook. Implement `hashCode` for your card game hand class.

6. A `CombinationLock` represents a lock that is unlocked with a sequence of $4$ digits between 0 and 9. Two combination locks are equal if they have the same combination. Implement `equals` for the `CombinationLock` class.

In [None]:
import java.util.List;
import java.util.ArrayList;

public class CombinationLock {
    
    private List<Integer> combo;
    
    public CombinationLock(List<Integer> combo) {
        if (combo.size() != 4) {
            throw new IllegalArgumentException("combination requires 4 digits");
        }
        this.combo = new ArrayList<>(combo);
    }
    
    @Override
    public String toString() {
        return this.combo.toString();
    }
    
    @Override
    public boolean equals(Object obj) {
        if (this == obj) {
            return true;
        }
        if (!(obj instanceof CombinationLock)) {
            return false;
        }
        CombinationLock other = (CombinationLock) obj;
        return (this.combo.equals(other.combo));
    }
    
    @Override
    public int hashCode() {
        
    }
}

7. An unsigned binary number is a sequence of binary digits called *bits* where each bit is either zero or one. The binary number $10100$ has the decimal (base-10) value of:

  $$10100 = 1 \times 2^4 + 0 \times 2^3 + 1 \times 2^2 + 0 \times 2^1 + 0 \times 2^0 = 20$$ 
  
  Notice that the binary number $0010100$ also has the decimal value 20:
  
  $$0010100 = 0 \times 2^6 + 0 \times 2^5 + 1 \times 2^4 + 0 \times 2^3 + 1 \times 2^2 + 0 \times 2^1 + 0 \times 2^0 = 20$$ 
  
  In fact, every binary number made up of any number of leading zeros and ending in $10100$ has a decimal value of 20.
  
  Suppose that you have a class that represents binary numbers having an arbitrary number of bits and that the class defines equality to mean two binary numbers are equal if their decimal values are equal. Note that you cannot use any of primitive numeric types to reliably compute the decimal value of an arbitrary length binary number because such values have at most 64 bits of precision. The class uses a `List` of `Boolean` values to store the bits of the binary number (where `true` is 1 and `false` is 0). How do you define `hashCode` for such a class (for that matter, how do you implement equals?)?