# Introductory Notes

Throughout this entire notebook you should be experimenting with the code in the non-text cells. A great way to begin to get a feel for Python is by playing with it. So, have some fun by changing the values in the cells and then running them again with Shift-Enter.

At the end of each section there will be some questions to help further your understanding. Remember, in Python we can always manually test things by trying them out; however, you should try to think about the answers to these questions before you run some code. This way you can check and verify your understanding of the section's topic.


## Sets

There is one more data structure that we're going to take a look at today, the `set`. A set combines some of the awesome features of both the `list` and the `dictionary`. A set is defined as an unordered, mutable collection of unique items. This means that a `set` is a data structure where you can store items, without caring about their order and knowing that there will be at most one of them in the structure.

This description, while highly informal, is rather spot on. Sets in Python are actually analogous to sets in math. For this reason, much of the jargon and functionality that you will hear about when learning and talking about Python sets is similar to, if not exactly the same as, that which applies to mathematical sets ([here's](https://en.wikipedia.org/wiki/Set_(mathematics)) the wiki on sets if you want a quick overview of them).

Let's take a look at how we construct sets.

In [1]:
my_set = set([1, 2, 3])

In [2]:
my_other_set = {1, 2, 3}

In [3]:
my_set == my_other_set

True

Here, we see the two ways we have to make sets. We can use the `set` constructor, which takes an iterable, as well as the syntactic sugary curly braces. (**Note**, the curly braces are also used for dictionaries. With dictionaries, we had a colon separating the keywords and values. This is how Python determines whether or not you're declaring a set or a dictionary. The only place where Python doesn't know is when declaring an empty structure. When this happens, Python can't figure out if you want a dictionary or a set. For this reason, the empty curly braces `{}` always mean an empty dictionary to remove ambiguity). Sets with the same items in them will evaluate as equal.

If we take a look at the methods that are available on sets we see:
```
set.add                          set.intersection                 set.remove
set.clear                        set.intersection_update          set.symmetric_difference
set.copy                         set.isdisjoint                   set.symmetric_difference_update
set.difference                   set.issubset                     set.union
set.difference_update            set.issuperset                   set.update
set.discard                      set.pop  
```

As discussed earlier, many of these methods are similar to, if not the same as, those available to mathematical sets. Naturally, we see ways to compute set operations (`intersection()`, `union()`, etc.) and alter the set (`add()`, `update()`, `pop()` and `remove()`). Let's take a look at some of these methods in action.

In [4]:
my_set, my_other_set = {1, 2, 3}, {5, 6, 7}

In [5]:
my_set.union(my_other_set)

{1, 2, 3, 5, 6, 7}

In [6]:
my_set.add(4)
my_set

{1, 2, 3, 4}

In [7]:
my_set.update(my_other_set)
my_set

{1, 2, 3, 4, 5, 6, 7}

In [8]:
my_set.remove(5)
my_set

{1, 2, 3, 4, 6, 7}

In [9]:
my_set.intersection(my_other_set)

{6, 7}

All of these methods should look fairly intuitive. The `update()` method is like an `add()` en masse. The `union()` method is like adding two sets together, but since there are only unique elements in a set, it removes duplicates. The `intersection()` method returns those elements that the sets have in common.

These are some of the most common set operations you will ever use. If you'd like to take a look at the documentation for all of them, check it out [here](https://docs.python.org/2/library/stdtypes.html#set).

**Set Questions**

1. Make a set called `first_set` with the values 1-10 and another with the values 5-15 called `second_set`.
2. Add the value 11 to `first_set`.
3. Add the string `'hello'` to `second_set`.
4. Using one of the methods discussed above, find what elements `first_set` and `second_set` have in common.
5. In one line of code, add all the elements of `second_set` to `first_set`.

#### Why Do We Need Sets?

Alright, that's cool, but when would I use a set? The most apparent answer is for times when you need to perform set operations, like checking what elements two lists have in common. Take the set of them both and find the intersection of those sets. The most obvious use case is to find the unique items in an iterable. There's also another amazing place where we'll want to use sets that might not be so apparent.

Remember, when discussing dictionaries above, we talked about how checking if an item is in a list requires us to check every item in the list? This can be computationally expensive and generally we want to avoid it. What do we do instead, then?

We use a set! The reason why lies in the fact that sets in Python are built very similarly to dictionaries. There's an underlying hash table that allows elements to be stored, and queried for membership in the set (*Note, this means that the elements of a set have to be immutable*). This operation happens much faster with sets than with lists ([here's](https://wiki.python.org/moin/TimeComplexity) some coverage on how quickly some Python methods run). Let's take a look at this in action, and simultaneously learn about how to time things in IPython.

In [10]:
my_list = range(10000)

In [11]:
my_set = set(my_list)

In [12]:
timeit 1000 in my_list

10000 loops, best of 3: 21.5 µs per loop


In [13]:
timeit 1000 in my_set

The slowest run took 19.49 times longer than the fastest. This could mean that an intermediate result is being cached.
10000000 loops, best of 3: 97.9 ns per loop


Here, we used the magic `timeit` function that's built into IPython. To use it, call `timeit` and then a line of code. We can see that the list version of checking membership in a collection took ~200 times longer than the set version. This is two orders of magnitude! That number would only get bigger as the size of the collection that we're checking against grows.

**Set Uses Questions**

Given the lists `first_list = ['hello', 'there', 'things', 'stuff', 'other', 'soda', 'chicken wings', 'things', 'soda']` and `'second_list = ['turkey sandwich', 'guacamole', 'chicken wings', 'OJ', 'soda']`:
1. Find the unique elements in `first_list`.
2. In one line, find the common elements in the two lists.
3. Write a single line that outputs `True` or `False` depending on if the string `'pizza'` is in both lists.

## set math:
 - & intersection 
 - | union
 - ^ symmetric difference difference
 - +/- differences

In [3]:
a = set([1,2,3,'a','b'])
b = set([3,5,1,'b','c'])
print(a & b)
print(a | b)
print(a ^ b)
print(a - b)
print(b - a)

{1, 3, 'b'}
{'a', 1, 2, 3, 5, 'c', 'b'}
{'a', 2, 5, 'c'}
{'a', 2}
{'c', 5}
