## 3.4 Sets, Dictionaries, and Comprehensions
### Sets
A **set**, like its mathematical namesake, is an *unordered* collection of *distinct* items. 
* *Unordered* means that there is no order to the items – sequences have a first item, second item, third item, and so on; but sets are just a collection of items in no particular order.
* *Distinct* means that items are either members of the set or they are not, there cannot be multiple copies of an item in a set.

The notation mirrors the mathematical one too, we use curly brackets to write sets:

In [1]:
{1, 2, 3}

{1, 2, 3}

In [2]:
type({1, 2, 3})

set

In [3]:
# sets are unordered
{1, 2, 3} == {2, 3, 1}

True

In [4]:
# they really are unordered, we can't even ask for the 'first' element
nums = {1, 2, 3}
nums[0]

TypeError: 'set' object is not subscriptable

In [5]:
# items are distinct
{1, 1, 1, 2, 2, 3} == {2, 3, 3, 3, 1}

True

Apart from being unordered and having distinct items, sets behave a lot like lists. They can hold any types of data, and they are mutable:

In [6]:
my_set = {"my", 2, "set"}
my_set.add(10)
my_set.remove("my")
my_set

{10, 2, 'set'}

The most common use for a set is fast membership testing. Let's return to some code that we used a long time ago: censoring vowels from an input string. Before we had a monolithic if statement:
```python
if char == "a" or char == "e" or char == "i" or char == "o" or char == "u":
```
using a set makes this significantly nicer to read:
```python
if char in {"a", "e", "i", "o", "u"}:
```

A list would work here too, the difference is minimal. As a general principle, lists are actually slightly more efficient to use for one-off membership testing, such as in the example above. But if you reuse the collection in multiple places then a set is the better choice, as in the example below:

In [7]:
VOWELS = {"a", "e", "i", "o", "u"}

def censor_vowels(word):
    out_str = ""
    for i in range(len(word)):
        char = word[i]
        if char in VOWELS:
            out_str += "*"
        else:
            out_str += char
    return out_str

censor_vowels("balderdash")

'b*ld*rd*sh'

In the code above, `VOWELS` is being used as a constant – perhaps we use it in multiple places in the code. If we really want to get technical, we can actually create an immutable set using the `frozenset` function, and ensure that no code ever ties to modify the set:

In [8]:
VOWELS = frozenset({"a", "e", "i", "o", "u"})
VOWELS.remove("a")

AttributeError: 'frozenset' object has no attribute 'remove'

If you want to create an empty set to add elements to, you have to write `set()`:

In [9]:
empty_set = set()
len(empty_set)

0

We cannot just write `{}`, because this is reserved for an empty *dictionary*, another data structure which uses curly brackets. Most programmers probably find themselves using dictionaries more often than sets, so let's move on!

### Dictionaries
A **dictionary** in Python is the inbuilt implementation of a data structure that is sometimes also called a *map* or an *associative array*. Whereas lists and tuples are indexed by their position (and sets are not indexable at all), dictionaries are indexed by arbitrary **keys**. Each key has an associated **value**. We could think of the entire dictionary as being a collection of *key-value pairs*.

To create a dictionary from some data, we can list the key-value pairs inside curly brackets with colons separating the keys and pairs:

In [10]:
high_scores = {"ray": 5000, "ali": 3000, "sam": 2000}
high_scores

{'ray': 5000, 'ali': 3000, 'sam': 2000}

In [11]:
type(high_scores)

dict

The keys can be *any immutable type*. It is common to use strings and numbers, but you could also use tuples provided they themselves only contain immutable types.

We can think of a dictionary like a table of values:

|key  |value  |
|:---:|:-----:|
|ray  |5000   |
|ali  |3000   |
|sam  |2000   |

And we can retrieve the value of any particular key using the same square bracket syntax we use with lists:

In [12]:
high_scores["ali"]

3000

We can add values too by simply assigning a value to a new key:

In [13]:
high_scores["andrew"] = 10000
high_scores

{'ray': 5000, 'ali': 3000, 'sam': 2000, 'andrew': 10000}

Dictionaries can only store one value for any given key. `high_scores["ali"]` currently produces `3000`, and if we try to insert another value for the key `"ali"` we will just overwrite the previous one:

In [14]:
high_scores["ali"] = 7000
high_scores

{'ray': 5000, 'ali': 7000, 'sam': 2000, 'andrew': 10000}

So, in an actual game, a dictionary might be a good choice to score each player's personal high score. It would not necessarily be the best choice to store a traditional high score table, because then you might expect the same person's name to occur multiple times. These are the kinds of things to think about when deciding what data structure to use.

The value of the key-value pair *can* be mutable, so it is possible to store a list for each key, thereby effectively storing multiple values per key provided you account for this in the syntax:

In [15]:
recent_scores = {"ray": [5000], "ali": [3000], "sam": [2000]}
recent_scores["ali"].append(7000)
recent_scores["sam"].append(1000)
recent_scores

{'ray': [5000], 'ali': [3000, 7000], 'sam': [2000, 1000]}

You can use the `.keys()` and `.values()` methods on a dictionary to get quick access to just its keys or values. This allows statements like:

In [5]:
high_scores = {"ray": 5000, "ali": 3000, "sam": 2000}
max(high_scores.values())

type(high_scores.keys())

dict_keys

#### Iteration
If you want to iterate through each item of a dictionary then the easiest method is to iterate through its keys, then you can retrieve the value for each key by querying the dictionary. In fact, using a dictionary as the target of a for each loop directly will give you the keys by default. Take some time to read through and understand the code below:

In [1]:
def winning_score(scores):
    top_player = ""
    top_score = -1
    for key in scores:
        player_scores = scores[key]
        max_score = max(player_scores)
        if max_score > top_score:
            top_player = key
            top_score = max_score
    return top_player, top_score

recent_scores = {"ray": [5000], "ali": [3000, 7000], "sam": [2000, 1000]}
player, score = winning_score(recent_scores)
print(f"The top player today was **{player}** with a score of **{score}**!")

The top player today was **ali** with a score of **7000**!


When you are dealing with data you will often find yourself having to navigate through the data structure to actually get the data you are looking for. Just like in the example above, we have stored our score data in a format that is perfectly sensible, but it requires a little bit of work to get the winning score. 

This is something worth bearing in mind as you move into this week's final section, but first let's see some exercises!

### Questions
#### Question 1:  Longest Value
Given a dictionary, return the *key* of the element whose *value* has the longest result for the function `len`. 

The dictionary will only contain values which support the `len` function. So it could include strings, tuples, lists, a mix of these, and so on. Only a single element will have the longest value.

In [18]:
%run ../scripts/show_examples.py ./questions/3.3/longest_value

Example tests for function longest_value

Test 1/5: longest_value({'a': 'a'}) -> 'a'
Test 2/5: longest_value({'a': 'a', 'b': 'bb', 'c': 'ccc'}) -> 'c'
Test 3/5: longest_value({'c': '0', 'b': '000', 'd': '00'}) -> 'b'
Test 4/5: longest_value({'dog': (0, 1, 1, 1, 0), 'egg': (0, 1, 1, 0), 'cat': (0, 1, 0), 'fox': (0, 0)}) -> 'dog'
Test 5/5: longest_value({'a': 'aaaaa', 2: (1, 2, 3, 4), (3,): {'ccc': 2, 'ddd': 3}, 'four': {0, -100, 100}}) -> 'a'


In [None]:
def longest_value(dictionary):
    pass

%run -i ../scripts/function_tester.py ./questions/3.3/longest_value

#### Question 2: Substitution Cipher
Given a lower case string and a dictionary containing substitutions from one letter to another, return the string that results by making all of the substitutions. So the following dictionary `{"t": "f"}` will replace all `t`s with `f`s.

In [19]:
%run ../scripts/show_examples.py ./questions/3.3/sub_cipher

Example tests for function sub_cipher

Test 1/5: sub_cipher('test', {'t': 'f'}) -> 'fesf'
Test 2/5: sub_cipher('pear', {'r': 'p', 'p': 'r'}) -> 'reap'
Test 3/5: sub_cipher('string', {}) -> 'string'
Test 4/5: sub_cipher('011011', {'0': '1', '1': '0'}) -> '100100'
Test 5/5: sub_cipher('hello!', {'!': '?'}) -> 'hello?'


In [None]:
def sub_cipher(s, dic):
    pass

%run -i ../scripts/function_tester.py ./questions/3.3/sub_cipher

#### Question 3: Longest Cycle
Given a dictionary, return the length of the longest *cycle*. 

For the purposes of this question, a cycle is defined as a repeating sequence of elements found by using the *value* of one element as the *key* of the next element. For example, suppose 1 maps to 4 which maps to 2 which maps to 1. We could represent this cycle in the following dictionary: `{1: 4, 4: 2, 2: 1}`, the cycle has length 3.

The following dictionary has two independent cycles. To make it clearer, one is made only of integers, and one is made only of single character strings: 

`{1: 4, "b": "a", "h": "b", 2: 1, "a": "t", 4: 2, "t": "h"}` 

Can you detangle the two cycles just by inspection? What are they?

Once you've worked it out, you'll see that the cycle of integers has length 3, and the cycle of strings has length 4. So given this dictionary, the function you write for this exercise should return 4, because this is the length of the longest cycle.

*Your code must not modify the dictionary that is passed in!* You can, of course, make a *copy* of the dictionary which you then modify. You can always look up how to do specific things (like making a copy of a dictionary or removing elements from a dictionary) online.

In [20]:
%run ../scripts/show_examples.py ./questions/3.3/longest_cycle

Example tests for function longest_cycle

Test 1/5: longest_cycle({1: 4, 4: 2, 2: 1}) -> 3
Test 2/5: longest_cycle({}) -> 0
Test 3/5: longest_cycle({12754: 12754, 'mpz': 'mpz', 'dxx': 'dxx', 9066: 9066}) -> 1
Test 4/5: longest_cycle({1: 4, 'b': 'a', 'h': 'b', 2: 1, 'a': 't', 4: 2, 't': 'h'}) -> 4
Test 5/5: longest_cycle({3: 1, 1: 3, 'x': 'x', 'j': 'z', 'z': 'j', 6: 6}) -> 2


In [None]:
def longest_cycle(dic):
    pass

%run -i ../scripts/function_tester.py ./questions/3.3/longest_cycle

## What Next?
When you are done with this notebook, go back to Engage and move onto the next section.