# Sets and Maps

In [1]:
#include <string>
#include <iostream>
using namespace std;

In [2]:
#include <vector>

In [3]:
bool add_to_vector_no_dups(vector<int>& numbers, int number) {
    // If not in the vector, add it
    if (find(numbers.begin(), numbers.end(), number) == numbers.end()) {
        numbers.push_back(number);
        return true;
    }
    return false;
}

In [4]:
vector<int> numbers;

for (int i = 0; i < 1000000; i++) {
    numbers.push_back(i);
}

In [5]:
add_to_vector_no_dups(numbers, 12345)

false

In [7]:
%%timeit
add_to_vector_no_dups(numbers, 999999);

2.83 ms +- 24.3 us per loop (mean +- std. dev. of 7 runs 100 loops each)


A new player comes on the scene...

<div class='big centered'> 🤠 </div>

In [8]:
#include <set>

In [9]:
set<int> numbers;

for (int i = 0; i < 1000000; i++) {
    numbers.insert(i);
}

In [11]:
%%timeit
numbers.insert(999999);

379 ns +- 2.78 ns per loop (mean +- std. dev. of 7 runs 1000000 loops each)


In [12]:
#include <unordered_set>

In [13]:
unordered_set<int> numbers;

for (int i = 0; i < 1000000; i++) {
    numbers.insert(i);
}

In [15]:
%%timeit
numbers.insert(999999);

63.7 ns +- 0.104 ns per loop (mean +- std. dev. of 7 runs 10000000 loops each)


## Sets

- No defined order
- No duplicates
- Fast lookup

https://en.cppreference.com/w/cpp/container/set


In [16]:
set<string> names;
names.insert("Joseph");
names.insert("Phebe");
names.insert("Brigham");
names.insert("John");
names.insert("Mary");
names.insert("Wilford");
names.insert("Eliza");
names.insert("Lorenzo");
names.insert("Joseph");
names.insert("Joseph");
names

{ "Brigham", "Eliza", "John", "Joseph", "Lorenzo", "Mary", "Phebe", "Wilford" }

In [17]:
names.find("Joseph") != names.end()

true

In [18]:
names.find("Josephine") != names.end()

false

In [19]:
*names.find("Joseph")

"Joseph"

In [20]:
unordered_set<string> names;
names.insert("Joseph");
names.insert("Phebe");
names.insert("Brigham");
names.insert("John");
names.insert("Mary");
names.insert("Wilford");
names.insert("Eliza");
names.insert("Lorenzo");
names.insert("Joseph");
names.insert("Joseph");
names

{ "Wilford", "Eliza", "Mary", "John", "Brigham", "Phebe", "Lorenzo", "Joseph" }

### Examples of sets in real life

- Students in class
- Integers between 1 and 10
- My list of favorite colors

#### Common use-cases

- Filtering to unique values
- Tracking the occurrence of a value ("Have I threatened you before?" - Jack Sparrow was using a set to keep track)


### Sets: key ideas

- No duplicate items
- No defined order
  - C++ sets happen to order things least to greatest
  - C++ `unordererd_set`s do not
  - Sets in other languages may or may not have a useful ordering: it's not part of the contract.
  

Sets are known as *associative* containers.

*Associative* means the same thing as in "associative property" in math: $A + B = B + A$

The order of the items in the set is undefined. The implementation of the set may impose some kind of arbitrary ordering, but that ordering is not relevant nor guaranteed to the user.

There is another associative container we need to discuss. 

It is possibly the most common/frequently-used container in all of programming!

## Maps

- Maps link a **key** with a **value**


- You use the key to store and retrieve the value
  - Table of contents
    - Key: chapter name, Value: page number
  - Student Information Database
    - Key: Student ID, Value: student information
  - Scoreboard
    - Key: athlete name, Value: score


- The collection of keys is a **set**
  - Unique, undefined order (associative)
  
https://en.cppreference.com/w/cpp/container/map

### Maps in Action!

In [21]:
#include <map>

#### Map literals

In [22]:
int stuff[] = {1, 2, 3, 4};

In [23]:
map<string, string> id2name = {
    {"235", "Data Structures"},
    {"142", "Intro to Programming"},
    {"236", "Discrete Stuff"}
};
id2name

{ "142" => "Intro to Programming", "235" => "Data Structures", "236" => "Discrete Stuff" }

In [24]:
id2name["235"]

"Data Structures"

#### Counts

In [25]:
map<string, int> counts;
string name;
while (cin >> name && name != "quit") {
    if (counts.find(name) == counts.end()) {
        counts[name] = 0;
    }
    counts[name]++;
}

// Print the scores
cout << endl << "Counts" << endl << "-------------" << endl;
for (auto& entry : counts) {
    cout << entry.first << ": " << entry.second << endl;
}


october
june
april
january
august
july
december
january
may
may
september
february
march
april
friday
july
november
quit

Counts
-------------
april: 2
august: 1
december: 1
february: 1
friday: 1
january: 2
july: 2
june: 1
march: 1
may: 2
november: 1
october: 1
september: 1


In [26]:
counts["sprummer"]

0

In [27]:
// Print the scores
cout << endl << "Counts" << endl << "-------------" << endl;
for (auto& entry : counts) {
    cout << entry.first << ": " << entry.second << endl;
}


Counts
-------------
april: 2
august: 1
december: 1
february: 1
friday: 1
january: 2
july: 2
june: 1
march: 1
may: 2
november: 1
october: 1
september: 1
sprummer: 0


In [28]:
counts.find("sprummer") != counts.end()

true

In [29]:
counts.find("sprummer")->second

0

In [30]:
counts.find("abcdefg") != counts.end()

false

In C++, simply by referencing a key, a default value is created.

If you want to check for a key without creating a value, use `.find()` and compare to `.end()`

(For hyper-efficient ways of initializing new values, see https://stackoverflow.com/questions/97050/stdmap-insert-or-stdmap-find)

We can also see how to iterate over a map.

The iterator has `first` and `second` properties. `first` is the key, `second` is the value.

#### Codex

In [31]:
(char)('d' - 'a' + 'A')

'D'

In [32]:
map<char, char> codex;
string alphabet = "abcdefghijklmnopqrstuvwxyz";
string scramble = "xyzijkafgwlmnohuvbcdepqrst";

for (int i = 0; i < alphabet.length(); i++) {
    codex[alphabet[i]] = scramble[i];
    codex[alphabet[i]-'a'+'A'] = scramble[i]-'a'+'A';
}

for (auto& entry : codex) {
    cout << entry.first << " -> " << entry.second << "; ";
}
cout << endl;

A -> X; B -> Y; C -> Z; D -> I; E -> J; F -> K; G -> A; H -> F; I -> G; J -> W; K -> L; L -> M; M -> N; N -> O; O -> H; P -> U; Q -> V; R -> B; S -> C; T -> D; U -> E; V -> P; W -> Q; X -> R; Y -> S; Z -> T; a -> x; b -> y; c -> z; d -> i; e -> j; f -> k; g -> a; h -> f; i -> g; j -> w; k -> l; l -> m; m -> n; n -> o; o -> h; p -> u; q -> v; r -> b; s -> c; t -> d; u -> e; v -> p; w -> q; x -> r; y -> s; z -> t; 


In [33]:
#include <sstream>

In [34]:
string encode(string const& in, map<char, char> const& codex) {
    stringstream result;
    for (int i = 0; i < in.length(); i++) {
        auto c = codex.find(in[i]);
        if (c == codex.end()) {
            result << in[i];
        } else {
            result << c->second;
        }
    }
    return result.str();
}

In [35]:
encode("Hello, my name is Inigo Montoya", codex)

"Fjmmh, ns oxnj gc Gogah Nhodhsx"

Let's try using a different character mapping!

In [36]:
(15 + 16) % 26

5

In [37]:
map<char, char> rot13;
int offset = 13;
for (int i = 0; i < 26; i++) {
    rot13['a' + i] = 'a' + (i+offset)%26;
    rot13['A' + i] = 'A' + (i+offset)%26;
}
rot13

{ 'A' => 'N', 'B' => 'O', 'C' => 'P', 'D' => 'Q', 'E' => 'R', 'F' => 'S', 'G' => 'T', 'H' => 'U', 'I' => 'V', 'J' => 'W', 'K' => 'X', 'L' => 'Y', 'M' => 'Z', 'N' => 'A', 'O' => 'B', 'P' => 'C', 'Q' => 'D', 'R' => 'E', 'S' => 'F', 'T' => 'G', 'U' => 'H', 'V' => 'I', 'W' => 'J', 'X' => 'K', 'Y' => 'L', 'Z' => 'M', 'a' => 'n', 'b' => 'o', 'c' => 'p', 'd' => 'q', 'e' => 'r', 'f' => 's', 'g' => 't', 'h' => 'u', 'i' => 'v', 'j' => 'w', 'k' => 'x', 'l' => 'y', 'm' => 'z', 'n' => 'a', 'o' => 'b', 'p' => 'c', 'q' => 'd', 'r' => 'e', 's' => 'f', 't' => 'g', 'u' => 'h', 'v' => 'i', 'w' => 'j', 'x' => 'k', 'y' => 'l', 'z' => 'm' }

In [38]:
encode("Hello, my name is Inigo Montoya", rot13)

"Uryyb, zl anzr vf Vavtb Zbagbln"

#### Decode!

In [39]:
map<char, char> xedoc;
for (auto key_value_pair : codex) {
    xedoc[key_value_pair.second] = key_value_pair.first;
}

Notice how we swapped the keys and values:

```
entry.second -> entry.first;
```

When you have one-to-one mappings, the ability to reverse the mapping can be very useful.

Will the same practice work for the "counts" map?

In [40]:
string message = "I think maps are the coolest thing since sliced bread. 🍞";
cout << message << endl;

string secret_message = encode(message, codex);
cout << secret_message << endl;

string message_restored = encode(secret_message, xedoc);
cout << message_restored << endl;

I think maps are the coolest thing since sliced bread. 🍞
G dfgol nxuc xbj dfj zhhmjcd dfgoa cgozj cmgzji ybjxi. 🍞
I think maps are the coolest thing since sliced bread. 🍞


### Nested structures

In [41]:
map<int, set<string>> words_of_length;

string word;
while (cin >> word && word != "q") {
    int length = word.length();
    if (words_of_length.find(length) == words_of_length.end()) {
        words_of_length[length] = set<string>();
    }
    words_of_length[length].insert(word);
}

BOO
frog
karate
friday
yay
gravy
pi
pie
other
the
ant
welcome
steam
to
help
skip
false
a
hog
seven
queue
q


In [42]:
for (auto entry : words_of_length) {
    cout << "Words of length " << entry.first << ":";
    for (auto word : entry.second) {
        cout << " " << word;
    }
    cout << endl;
}

Words of length 1: a
Words of length 2: pi to
Words of length 3: BOO ant hog pie the yay
Words of length 4: frog help skip
Words of length 5: false gravy other queue seven steam
Words of length 6: friday karate
Words of length 7: welcome


### What if I want to bin words by a range? Say, 0-2, 3-5, etc.

In [45]:
map<int, set<string>> words_of_length;
int range = 3;

string word;
while (cin >> word && word != "q") {
    int length = word.length();
    int key = length / 3;
    if (words_of_length.find(key) == words_of_length.end()) {
        words_of_length[key] = set<string>();
    }
    words_of_length[key].insert(word);
}

999999 another thing yo i u yep 123asdfad./kjczg
stuff
q


In [46]:
for (auto entry : words_of_length) {
    cout << "Words of length >= " << entry.first * 3 << ":";
    for (auto word : entry.second) {
        cout << " " << word;
    }
    cout << endl;
}

Words of length >= 0: i u yo
Words of length >= 3: stuff thing yep
Words of length >= 6: 999999 another
Words of length >= 15: 123asdfad./kjczg


## Performance

In [47]:
map<int, int> numbers;

In [48]:
%%timeit
for (int i = 0; i < 10000; i++) {
    numbers[i] = i + 1;
}

1.6 ms +- 4.7 us per loop (mean +- std. dev. of 7 runs 1000 loops each)


In [49]:
#include <unordered_map>

In [50]:
unordered_map<int, int> numbers;

In [51]:
%%timeit
for (int i = 0; i < 10000; i++) {
    numbers[i] = i + 1;
}

356 us +- 1.94 us per loop (mean +- std. dev. of 7 runs 1000 loops each)


## Maps: key ideas

- Unique keys, not necessarily unique values
- Fast lookup
- Speed vs ordering
  - `map` stores things in "sorted" order, but is slower at storage/retrieval than `unordered_map`
  - `unordered_map` does not guarantee any order, but is faster at storage/retrieval than `map`


## Other key ideas

- Nested data structures!