## 8.1 Maps

For many applications, we wish to retrieve data when we know only part of it.
For example, we may wish to get an employee's full record by just their last name or
office extension.
This section introduces an ADT that allows us to handle data in such a way.

### 8.1.1 The map ADT

A **map** is an unordered collection of key–value pairs without duplicate keys.
Each key–value pair is an item of the map.
Different keys may be associated to the same value.
The map ADT supports the following operations.

Operation | Effect | Algorithm in English
:-|:-|:-
new  | create a new empty map  | let _m_ be an empty map
size  | the number of items in _m_ | │*m*│
membership  | check if _m_ has a given key  | _key_ in _m_
associate | associate a value to a key in _m_ | let _m_(_key_) be _value_
lookup  | obtain the value associated to a key  | _m_(_key_)
delete  | remove a key–value pair from _m_  | remove _m_(_key_)
(in)equality | are two maps the same or different? | _m1_ = _m2_ or _m1_ ≠ _m2_

The associate operation replaces the current value if the key is in the map,
otherwise it creates a new key–value pair.
The precondition of the lookup and delete operations
is for the key to be in the map.

We assume that maps are **iterable**:
it's possible to go through all keys with 'for each _key_ in _map_'.
This allows us to discover the keys in the map to then access the values.
As maps are unordered collections, the order in which keys are traversed isn't
known in advance and may even change after adding or removing key–value pairs,
depending on how the map is implemented.

<div class="alert alert-info">
<strong>Info:</strong> Maps are also called associative arrays.
Some texts have separate operations to insert a pair and to replace a value
instead of a single associate operation.
A lookup is also called a search.
</div>

The above table uses functional notation for some
operations because a map is, mathematically, a
[function](../02_Sequence/02_5_maths_functions.ipynb#2.5-Functions-in-mathematics):
there's a set of inputs (the keys), a set of outputs (the values) and
for each input there's a single output.
Functions usually have a rule that computes the output for each input,
whereas maps explicitly list all the inputs and their corresponding outputs.
Most functions have infinite inputs, e.g. any positive integer,
and hence correspond to infinite maps.

### 8.1.2 Using maps

Maps are widely used because they allow us to access an item by a mnemonic key,
instead of an arbitrary position that is meaningless for unordered collections.
Returning to the employee example, we could define several maps,
all with essentially the same values (the employee records)
but with different keys:
last names, social security numbers, office phone numbers, etc.
Depending on which piece of information the user has,
they would use a different map to look up the employee.
If there are several employees for the same key, then the associated value is
a sequence of employees or some other type of collection.

Another example of a map is a bilingual dictionary:
it associates each word in one language to one or more words in
another language.
Each key is a string and each value is a sequence of strings.
Here's a tiny Portuguese–English dictionary.
Maps are best written as two-column tables.

Key | Value
:-|:-
'alface' | ('lettuce')
'carro' | ('car')
'andar' | ('floor', 'walk')


Maps and other ADTs allow us to think about data in the most appropriate way
for the problem at hand. The same data can be organised in different ways.
The next two exercises provide an example.

#### Exercise 8.1.1

A max-priority queue is an ordered collection of priority–item pairs,
ordered by priority. The priorities can be any comparable values.
For this and the next exercise, assume that items with the same priority
must be ordered by arrival, i.e. FIFO, order.

A priority queue can be seen as a map. What are the keys and values of the map?

_Write your answer here._

[Answer](../32_Answers/Answers_08_1_01.ipynb)

#### Exercise 8.1.2

Using map operations, describe an algorithm for each operation:
find max, remove max, add.

_Write your answer here._

[Hint](../31_Hints/Hints_08_1_02.ipynb)
[Answer](../32_Answers/Answers_08_1_02.ipynb)

Almost anything can be seen as a map.
The next exercise is a reminder that just because we have a hammer,
it doesn't mean that every problem is a nail.

#### Exercise 8.1.3

Booleans are hardly ever used as map keys. Why?

_Write your answer here._

[Answer](../32_Answers/Answers_08_1_03.ipynb)

### 8.1.3 Lookup tables

The simplest way to implement a map is with a dynamic array of key–value pairs.
With this approach, the new and size operations take constant time, but
the membership, lookup, associate and delete operations are linear
in the size of the map, as they have to do a linear search for the given key.

If a map's keys are natural numbers, we can use them to directly index the
dynamic array. All map operations thereby take constant time.
For example, a map from house numbers to sequences of strings,
representing the residents of each house, can use this scheme.
We initialise the dynamic array with, say, 10 empty sequences.
When adding a new key–value pair, say Alice and Bob at house 50,
we grow the dynamic array so that the largest index is 50,
and then put string sequence ('Alice', 'Bob') at index&nbsp;50.

If an array represents a map rather than a sequence, it's called a
**lookup table**. Lookup tables are widely used to store pre-computed results.
They're another example of a space–time tradeoff.

Although lookup tables can be dynamic arrays, they're usually static arrays,
i.e. they're used when the map's keys form a fixed and relatively small
collection, known in advance.

For example, consider a travel website that displays ever-changing forecast
temperatures in degrees Celsius and Fahrenheit.
They will all be within a certain range, e.g. −50 to 50 degrees Celsius.
Using two lookup tables (mapping Celsius to Fahrenheit and vice versa)
is more efficient than constantly re-applying the conversion formulas
to the same temperature values.

In Python, one of the tables could be created like this:

In [1]:
FAHRENHEIT = [0] * 101
for celsius in range(-50, 51):
    FAHRENHEIT[celsius] = celsius * 9 // 5 + 32
FAHRENHEIT = tuple(FAHRENHEIT)  # don't change table anymore

The first 51 positions have the Fahrenheit values for 0 to 50 degrees Celsius,
and the last 50 positions have the values for −50 to −1 degrees Celsius.
In programming languages that don't support negative indices,
we would write `FAHRENHEIT[celsius + 50] = ...`.

What if the map's keys aren't integers? If we implement a **hash function** that
converts each key to an index (this is called hashing the key),
then we can still implement maps with lookup tables.
Consider the following lookup table (in the form of a string)
with a substitution cipher to encrypt text.

In [2]:
cipher = 'lejqawntgckmfyrboxvhzsdipu'

We must convert each lowercase English letter to
an index from 0 to 25 to then look up the corresponding encrypted letter.
Fortunately, this is easy with Python's `ord` function, which
returns the Unicode code of a character.

In [3]:
def index_of(character: str) -> int:
    """Return a valid index for the character.

    Preconditions: character is a lowercase English letter
    Postconditions: the output is 0 for a, 1 for b, ..., 25 for z
    """
    return ord(character) - ord('a')

The hash function takes advantage of the lowercase letters being consecutive
in the Unicode standard.

Any text can now be encrypted.

In [4]:
def encrypt(text: str, cipher: str) -> str:
    """Return the encrypted version of text, using cipher.

    Preconditions: len(cipher) = 26
    Postconditions: the output is text, with every lowercase letter
    replaced by the corresponding character in cipher
    """
    encrypted = ''
    for character in text:
        if 'a' <= character <= 'z':
            # apply hash function and lookup table
            encrypted = encrypted + cipher[index_of(character)]
        else:
            encrypted = encrypted + character
    return encrypted

encrypt('This is top secret!', cipher)

'Ttgv gv hrb vajxah!'

To sum up, a map can be implemented with a lookup table together with
a hash function that converts the keys to indices, if the keys aren't integers.
In the latter case, the map operations have the same complexity as the
function, because once an index is obtained,
accessing, replacing or deleting a value in the array takes constant time.
(In a lookup table, deleting a value doesn't shift the others:
the array represents a map, not a sequence.)
Often, the hash function takes constant time, like `index_of`.

<div class="alert alert-info">
<strong>Info:</strong> Lookup tables are also called direct address (or addressing) tables.
</div>

#### Exercise 8.1.4

You've now seen two implementations of a priority queue:
with a dynamic array of priority–item pairs
sorted by priority ([Section&nbsp;7.6](../07_Ordered/07_6_priority_queue.ipynb#7.6-Priority-queues)) and
with a map of priorities to queues (exercises above).

Implementing an ordered collection (priority queue) with an unordered one (map)
led to each operation searching for the highest key (priority)
and thus taking time linear in the size of the map (number of priorities).

Operation | Sorted dynamic array | Map
:-|:-|:-
find max  | Θ(1)  |  Θ(_priorities_)
remove max  | Θ(1)  | Θ(_priorities_)
add  | Θ(│*priority queue*│)  |  Θ(_priorities_)

These complexities assume that priorities aren't known in advance:
they can be any comparable objects.

1. In which way may the complexities be different
   if we know the priorities in advance?

_Write your answer here._

2. If we don't know the priorities in advance, is it
   always better to use sorted dynamic arrays?

_Write your answer here._

[Hint](../31_Hints/Hints_08_1_04.ipynb)
[Answer](../32_Answers/Answers_08_1_04.ipynb)

⟵ [Previous section](08-introduction.ipynb) | [Up](08-introduction.ipynb) | [Next section](08_2_dictionary.ipynb) ⟶