# L12 - Sets
Sets are our next type to explore. Simply put, sets are a collection of values in no particular order. The big difference between sets and lists is the lack of order. Let's take a look.

In [None]:
l = [1, 2, 3] # list
s = {1, 2, 3} # set

Because sets are unordered, they do not support indexing.

In [None]:
s[0]

Another big difference is that sets cannot contain repeat values.

In [None]:
l = [1,1,1,1,2,3,4]
print(len(l))
s = set(l) # function used to turn things into sets
print(len(s))

You can't get specific values out of them, and you can't store repeat values, so what's the point? Sets are stuuuupid fast. Imagine you have a tons of data to sort through and you're just looking for one name. If you were using a list, you would need to iterate through every single value to check if it's there. Sets underlying structure allows them to do this nearly instantly. Let's take a look.

In [None]:
f = open('imdb_data.txt', encoding = "ISO-8859-1")
names_list = list(map(lambda x: x.split('|')[0].strip(), f.read().split('\n')))

print("First 10 names:\n",names_list[:10], sep='') # first ten names
print("There are {:,} names in this text file".format(len(names_list)))

import time
t0 = time.time()
hanks_list = 'Hanks, Tom' in names_list
t_list = time.time() - t0

names_set = set(names_list)
t0 = time.time_ns()
hanks_set = 'Hanks, Tom' in names_set
t_set = time.time_ns() - t0

print("Found Tom Hanks in List: {} in {} seconds".format(hanks_list, t_list))
print("Found Tom Hanks in Set: {} in {} seconds".format(hanks_set, t_set))

This is just a list of over a quarter-million names and it takes 0.001 seconds to find Tom Hanks with a list, but it is nearly instant for a set. Maybe we got lucky with the location of Tom Hanks' name. What about if the name isn't in the list to begin with. Then we would need to look through the entire list.

In [None]:
f = open('imdb_data.txt', encoding = "ISO-8859-1")
names_list = list(map(lambda x: x.split('|')[0].strip(), f.read().split('\n')))

print("First 10 names:\n",names_list[:10], sep='') # first ten names
print("There are {:,} names in this text file".format(len(names_list)))

import time
t0 = time.time()
hanks_list = 'Name thats not in the list' in names_list
t_list = time.time() - t0

names_set = set(names_list)
t0 = time.time_ns()
hanks_set = 'Name thats not in the list' in names_set
t_set = time.time_ns() - t0

print("Found Tom Hanks in List: {} in {} seconds".format(hanks_list, t_list))
print("Found Tom Hanks in Set: {} in {} seconds".format(hanks_set, t_set))

It takes the list 4 times longer, but the set still instantly finds that the name is not in the set. It's not terribly important that you understand how this works since your not a CS nerd, but just know that it works. 

So what can we use sets for now that we know that they're stupid quick? Well first off, we saw a couple ways to make sets. You can use the curly brackets, {}, or you can turn other containers into sets with the funciton set().
# Working with Sets
We can use the add() method. 

In [None]:
s = {1,4,7,3}
s.add(9)
s.add('Blueberry')
print(9 in s)
print('Blueberry' in s)

You can get rid of everything in a set with the clear method.

In [None]:
s.clear()
print(9 in s)
print(1 in s)

There are many set operations that you can do with other sets. We can take the difference of two sets.

In [None]:
s1 = set(range(1, 11)) # make sure you know what's in this set
s2 = set(range(1, 11, 2)) # what about this one

print(s1.difference(s2))
print(s1 - s2) # multiple ways to do this

We can find the intersection of two sets.

In [None]:
print(s1.intersection(s2))
print(s1 & s2)

You can creates a union of two sets contains all elements of the two sets.

In [None]:
print(s1.union({'a','b','c'}))
print(s1 | {'a', 'b', 'c'})

You can check for sets being subsets of another. 

In [None]:
print(s2.issubset(s1))
print(s2 <= s1)

Similarly, you can check for supersets.

In [None]:
print(s1.issuperset(s2))
print(s1 >= s2)

You can create a new list of elements that are in either s1 or s2 but not both

In [None]:
print(s1.symmetric_difference(s2))
print(s1 ^ s2)

So far, we haven't really noticed the unordered nature of sets because we've been using sets that started in lexicographical order. But what if we combine two sets that have a mix of numbers in an order that is not sorted.

In [None]:
print({3, 5, 8} | {9, 0, 4})

The returned list appears to be sorted because that's a part of the way Python stores the data and what makes it so efficient.

When comparing the efficieny of sets and lists we used the keyword 'in'. This keyword is used to tell if value a is in container b (e.g. a in b). This is the best way to get a boolean value for if a certain value is stored in a list or set. 

Since we can use the 'in' keyword with sets, we can still iterate through them even though they do not preserve order.

In [None]:
for value in {3, 5, 8} | {9, 0, 4}:
    print(value)