# Sets

You've already learned quite a bit about sets, so most of this is just a refresher. But maybe there are a few little new tricks you will see here. In between and at the end there are some exercises that you can do together with your partner to see if you can put the theory into practice.

The next data structure that we're going to take a look at today is the `set`. A set combines some of the awesome features of both the `list` and the `dictionary`. A set is defined as an unordered, mutable collection of unique items. This means that a `set` is a data structure where you can store items, without caring about their order and knowing that there will be at most one of them in the structure.

This description, while highly informal, is rather spot on. Sets in Python are actually analogous to sets in math. For this reason, much of the jargon and functional
ity that you will hear about when learning and talking about Python sets is similar to, if not exactly the same as, that which applies to mathematical sets ([here's](https://en.wikipedia.org/wiki/Set_(mathematics)) the wiki on sets if you want a quick overview of them).

## Objectives

At the end of this notebook you should be able to:

- understand the difference between sets and lists
- apply mathematical operations to sets
- know when to use sets


Let's take a look at how we construct sets. The first way should look familiar. The second way not so much.

In [3]:
my_set = set([1, 2, 3])
my_set

{1, 2, 3}

In [4]:
my_other_set = {1, 2, 3}
my_other_set

{1, 2, 3}

In [6]:
# we create identical sets:
my_set == my_other_set


True

Here, we see the two ways we have to make sets. We can use the `set` constructor, which takes an iterable, as well as the syntactic sugary curly braces. (**Note**, the curly braces are also used for dictionaries. With dictionaries, a colon is used separating the keywords and values. This is how Python determines whether or not you're declaring a set or a dictionary. The only place where Python doesn't know is when declaring an empty structure. When this happens, Python can't figure out if you want a dictionary or a set. For this reason, the empty curly braces `{}` always mean an empty dictionary to remove ambiguity). Sets with the same items in them will evaluate as equal.

If we take a look at the methods that are available on sets we see:
```
set.add                          set.intersection                 set.remove
set.clear                        set.intersection_update          set.symmetric_difference
set.copy                         set.isdisjoint                   set.symmetric_difference_update
set.difference                   set.issubset                     set.union
set.difference_update            set.issuperset                   set.update
set.discard                      set.pop  
```

As discussed earlier, many of these methods are similar to, if not the same as, those available to mathematical sets. Naturally, we see ways to compute set operations (`intersection()`, `union()`, etc.) and alter the set (`add()`, `update()`, `pop()` and `remove()`). Let's take a look at some of these methods in action.

In [3]:
my_set, my_other_set = {1, 2, 3}, {5, 6, 7}
print(my_set, my_other_set)


{1, 2, 3} {5, 6, 7}


In [5]:
my_set.union(my_other_set)
my_other_set


{5, 6, 7}

In [6]:
my_set.add(4)
my_set

{1, 2, 3, 4}

In [7]:
my_set.update(my_other_set)
my_set

{1, 2, 3, 4, 5, 6, 7}

In [8]:
my_set.remove(5)
my_set

{1, 2, 3, 4, 6, 7}

In [9]:
my_set.intersection(my_other_set)
my_set

{1, 2, 3, 4, 6, 7}

All of these methods should look fairly intuitive. The `update()` method is like an `add()` en masse. The `union()` method is like adding two sets together, but since there are only unique elements in a set, it removes duplicates. The `intersection()` method returns those elements that the sets have in common.

These are some of the most common set operations you will ever use. If you'd like to take a look at the documentation for all of them, check it out [here](https://docs.python.org/2/library/stdtypes.html#set).



#### Why Do We Need Sets?

Alright, that's cool, but when would I use a set? The most apparent answer is for times when you need to perform set operations, like checking what elements two lists have in common. Take the set of them both and find the intersection of those sets. The most obvious use case is to find the unique items in an iterable. There's also another amazing place where we'll want to use sets that might not be so apparent.

Remember, when discussing dictionaries above, we talked about how checking if an item is in a list requires us to check every item in the list? This can be computationally expensive and generally we want to avoid it. What do we do instead, then?

We use a set! The reason why lies in the fact that sets in Python are built very similarly to dictionaries. There's an underlying hash table that allows elements to be stored, and queried for membership in the set (*Note, this means that the elements of a set have to be immutable*). This operation happens much faster with sets than with lists ([here's](https://wiki.python.org/moin/TimeComplexity) some coverage on how quickly some Python methods run). Let's take a look at this in action, and simultaneously learn about how to time things in IPython.

In [10]:
my_list = list(range(1000000))
my_list

[0,
 1,
 2,
 3,
 4,
 5,
 6,
 7,
 8,
 9,
 10,
 11,
 12,
 13,
 14,
 15,
 16,
 17,
 18,
 19,
 20,
 21,
 22,
 23,
 24,
 25,
 26,
 27,
 28,
 29,
 30,
 31,
 32,
 33,
 34,
 35,
 36,
 37,
 38,
 39,
 40,
 41,
 42,
 43,
 44,
 45,
 46,
 47,
 48,
 49,
 50,
 51,
 52,
 53,
 54,
 55,
 56,
 57,
 58,
 59,
 60,
 61,
 62,
 63,
 64,
 65,
 66,
 67,
 68,
 69,
 70,
 71,
 72,
 73,
 74,
 75,
 76,
 77,
 78,
 79,
 80,
 81,
 82,
 83,
 84,
 85,
 86,
 87,
 88,
 89,
 90,
 91,
 92,
 93,
 94,
 95,
 96,
 97,
 98,
 99,
 100,
 101,
 102,
 103,
 104,
 105,
 106,
 107,
 108,
 109,
 110,
 111,
 112,
 113,
 114,
 115,
 116,
 117,
 118,
 119,
 120,
 121,
 122,
 123,
 124,
 125,
 126,
 127,
 128,
 129,
 130,
 131,
 132,
 133,
 134,
 135,
 136,
 137,
 138,
 139,
 140,
 141,
 142,
 143,
 144,
 145,
 146,
 147,
 148,
 149,
 150,
 151,
 152,
 153,
 154,
 155,
 156,
 157,
 158,
 159,
 160,
 161,
 162,
 163,
 164,
 165,
 166,
 167,
 168,
 169,
 170,
 171,
 172,
 173,
 174,
 175,
 176,
 177,
 178,
 179,
 180,
 181,
 182,
 183,
 184,


In [11]:
my_set = set(my_list)
my_set

{0,
 1,
 2,
 3,
 4,
 5,
 6,
 7,
 8,
 9,
 10,
 11,
 12,
 13,
 14,
 15,
 16,
 17,
 18,
 19,
 20,
 21,
 22,
 23,
 24,
 25,
 26,
 27,
 28,
 29,
 30,
 31,
 32,
 33,
 34,
 35,
 36,
 37,
 38,
 39,
 40,
 41,
 42,
 43,
 44,
 45,
 46,
 47,
 48,
 49,
 50,
 51,
 52,
 53,
 54,
 55,
 56,
 57,
 58,
 59,
 60,
 61,
 62,
 63,
 64,
 65,
 66,
 67,
 68,
 69,
 70,
 71,
 72,
 73,
 74,
 75,
 76,
 77,
 78,
 79,
 80,
 81,
 82,
 83,
 84,
 85,
 86,
 87,
 88,
 89,
 90,
 91,
 92,
 93,
 94,
 95,
 96,
 97,
 98,
 99,
 100,
 101,
 102,
 103,
 104,
 105,
 106,
 107,
 108,
 109,
 110,
 111,
 112,
 113,
 114,
 115,
 116,
 117,
 118,
 119,
 120,
 121,
 122,
 123,
 124,
 125,
 126,
 127,
 128,
 129,
 130,
 131,
 132,
 133,
 134,
 135,
 136,
 137,
 138,
 139,
 140,
 141,
 142,
 143,
 144,
 145,
 146,
 147,
 148,
 149,
 150,
 151,
 152,
 153,
 154,
 155,
 156,
 157,
 158,
 159,
 160,
 161,
 162,
 163,
 164,
 165,
 166,
 167,
 168,
 169,
 170,
 171,
 172,
 173,
 174,
 175,
 176,
 177,
 178,
 179,
 180,
 181,
 182,
 183,
 184,


In [12]:
timeit 100000 in my_list

611 µs ± 2.38 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)


In [13]:
timeit 100000 in my_set

19.5 ns ± 0.0731 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)


Here, we used the magic `timeit` function that's built into Jupyter. To use it, call `timeit` and then a line of code. We can see that the list version of checking membership in a collection took ~20 000 times longer than the set version. That number would only get bigger as the size of the collection that we're checking against grows.

## Check your understanding!

**Part 1**

1. Make a set called `first_set` with the values 1-10 and another with the values 5-15 called `second_set`.
2. Add the value 11 to `first_set`.
3. Add the string `'hello'` to `second_set`.
4. Using one of the methods discussed above, find what elements `first_set` and `second_set` have in common.
5. In one line of code, add all the elements of `second_set` to `first_set`.

In [19]:
first_set = {1,2,3,4,5,6,7,8,9,10}
second_set = {5,6,7,8,9,10,11,12,13,14,15}
first_set.add(11)
first_set

{1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11}

In [20]:
first_set = {1,2,3,4,5,6,7,8,9,10}
second_set = {5,6,7,8,9,10,11,12,13,14,15}
other_set = {'hallo'}
second_set.update(other_set)
second_set

{10, 11, 12, 13, 14, 15, 5, 6, 7, 8, 9, 'hallo'}

In [21]:
first_set = {1,2,3,4,5,6,7,8,9,10}
second_set = {5,6,7,8,9,10,11,12,13,14,15}
first_set.union(second_set)
first_set

{1, 2, 3, 4, 5, 6, 7, 8, 9, 10}

In [22]:
first_set.update(second_set)
first_set

{1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15}

**Part 2**

Given the lists `first_list = ['hello', 'there', 'things', 'stuff', 'other', 'soda', 'chicken wings', 'things', 'soda']` and `'second_list = ['turkey sandwich', 'guacamole', 'chicken wings', 'OJ', 'soda']`:
1. Find the unique elements in `first_list`.
2. In one line, find the common elements in the two lists.
3. Write a single line that outputs `True` or `False` depending on if the string `'pizza'` is in both lists.

In [26]:
my_list = ['hello', 'there', 'things', 'stuff', 'other', 'soda', 'chicken wings', 'things', 'soda']
my_set1 = set(my_list)
my_set1.union()
my_set1

{'chicken wings', 'hello', 'other', 'soda', 'stuff', 'there', 'things'}

In [31]:
my_list1 = ['hello', 'there', 'things', 'stuff', 'other', 'soda', 'chicken wings', 'things', 'soda']
my_list2 = ['turkey sandwich', 'guacamole', 'chicken wings', 'OJ', 'soda']
my_set1 = set(my_list1)
my_set2 = set(my_list2)

my_list3 = list(my_set1 & my_set2)
my_list3

['chicken wings', 'soda']

In [32]:
my_list1 = ['hello', 'there', 'things', 'stuff', 'other', 'soda', 'chicken wings', 'things', 'soda']
my_list2 = ['turkey sandwich', 'guacamole', 'chicken wings', 'OJ', 'soda']
my_set1 = set(my_list1)
my_set2 = set(my_list2)
my_set1.update(my_set2)
'Pizza' in my_set1

False