# Lesson: Sets in Python

#### Sets are very similar to lists and dictionaries, and although we will probably use them less often than lists or dictionaries, they are still a valuable concept to understand. 
- Sets are a data structure that represents a collection of distinct, non-repeated components.
- Sets have been part of Python since version 2.4.
- Sets are a collection of unalterable and unique items. 

## Create a simple set:

In [46]:
# Example:
s = set()  # set() creates a simple blank set. 
s.add(1)   # set is now just 1
s

{1}

In [3]:
s.add(2)  # adds the number 2 to the set
s

{1, 2}

In [4]:
s.add(5)  # you can add whatever other random number to the set.
s

{1, 2, 5}

In [5]:
set_len = len(s)  # Shows how many numbers are in your set. 
set_len

3

In [6]:
x = 2 in s  # Allows you to check if a certain number is contained in the set.  
            # 2 is part of this set, and therefore it returns True.  
x

True

In [7]:
y = 4 in s  # 4 is not part of this set, so it returns False.
y

False

In [8]:
s.discard(2)   # Removes the number 2 from the set.
s

{1, 5}

## Finding distinct items in a list using sets:
- One reason to use sets is to find distinct items or numbers in lists.
- These are examples of short and simple lists, but these same techniques can be used on much larger lists and sets. 

In [9]:
# Example:
num_in_list = [1, 6, 3, 3, 7, 8, 2, 5, 6, 9]  # Create a list with random numbers 
                                              # with some repeated numbers. 

In [10]:
len_list = len(num_in_list)  # Like before, we can find the length of the list. 
len_list

10

In [11]:
num_set = set(num_in_list)  # Turn your list into a set, which removes repeated numbers/items 
                            # and returns them in numerical order.
num_set

{1, 2, 3, 5, 6, 7, 8, 9}

In [12]:
distinct_num = len(num_set)  # Finds the number of distinct numbers/items contained in the set. 
distinct_num

8

In [13]:
distinct_list = list(num_set)  # You can turn you distinct set back into a list if you want to. 
distinct_list

[1, 2, 3, 5, 6, 7, 8, 9]

## Using 'in' with sets:
- "in" can be used as a very fast working operation with sets.
- If we are working with a very large collection of items/numbers, a set will be more appropriate than a list. 
- Sets can also be useful for finding distinct words in a list or set. 
- This can be useful for Natural Language Processing, which we will learn about in more depth later in the course. 

In [14]:
# Example:
words = ['the', 'for', 'you', 'this', 'me', 'my']

In [15]:
word_set = set(words)

In [16]:
'cat' in word_set # Returns False since 'cat' is not located in the set of stop_words. 

False

In [17]:
'you' in word_set # Returns True since 'you' is located in the set of stop_words.

True

In [18]:
print(word_set)  # View your set. 

{'you', 'the', 'for', 'this', 'me', 'my'}


## Using 'not in' with sets:
- "not in" can be used in the same way as 'in' to examine what words/items are not contained in a set.
- We will use the same set from above.

In [19]:
'cat' not in word_set  # Returns True since cat is not in the set.

True

In [20]:
'the' not in word_set  # Returns False since the is in the set.

False

## Intersections:
- Find common items in two or more different sets.

In [21]:
x = ['a', 'b', 'c', 'd', 'e', 'f']  # First list.
y = ['f', 'a', 'z', 'u', 'p', 'b']  # Second list.

In [22]:
set_x = set(x) # Turn lists into sets.
set_y = set(y)

In [23]:
set(x), set(y)  # Prints sets next to one another. 

({'a', 'b', 'c', 'd', 'e', 'f'}, {'a', 'b', 'f', 'p', 'u', 'z'})

In [26]:
print(set(x))  # Prints sets on top of one another. 
print(set(y))

{'e', 'a', 'd', 'b', 'f', 'c'}
{'z', 'p', 'a', 'b', 'f', 'u'}


In [29]:
set_x.intersection(set_y)  # Inetersection finds common items contained in each set. 

{'a', 'b', 'f'}

## Unions:
- Unions combine two or more different sets while only returning distinct items. 

In [59]:
set_x.union(set_y)  # Combining the the sets from above. 

{'a', 'b', 'c', 'd', 'e', 'f', 'p', 'u', 'z'}

In [31]:
z = ['r', 'q', 'c', 'l', 'j', 'a']  # Create a 3rd new list

In [32]:
set_z = set(z)  # Change list into a set. 

In [33]:
set_x.union(set_y, set_z)  # Creating a union between 3 sets. This can also be done 
                           # the same way with intersections. 

{'a', 'b', 'c', 'd', 'e', 'f', 'j', 'l', 'p', 'q', 'r', 'u', 'z'}

## Frozen Sets:
- Frozen sets are immutible, and individual sets cannot be changed like the sets we were working with before. 

In [34]:
states = ['New York', 'Colorado', 'North Carolina', 'Texas', 'Alaska']

In [35]:
frozen_states = frozenset(states)  # Creates a frozen set which is immutable and cannot be changed. 

In [36]:
frozen_states  # Sets always return in alphabetical or numerical order.

frozenset({'Alaska', 'Colorado', 'New York', 'North Carolina', 'Texas'})

In [37]:
frozen_states.add('Maine')  # Frozen sets prevent anything to be added to it. Therefore you get an error code. 

AttributeError: 'frozenset' object has no attribute 'add'

In [38]:
frozen_states.discard('Colorado')  # Another example of how you can not change a frozen set and get an error code. 

AttributeError: 'frozenset' object has no attribute 'discard'

In [39]:
'New York' in frozen_states  # 'in' can still be used with frozen sets.

True

In [40]:
'Wyoming' not in frozen_states  # 'not in' can also still be used. 

True

In [41]:
other_states = ['Connecticut', 'New Jersey', 'Arizona', 'California', 'Montana', 'New York']

In [42]:
frozen_states_2 = set(other_states)

In [43]:
frozen_states_2

{'Arizona', 'California', 'Connecticut', 'Montana', 'New Jersey', 'New York'}

In [44]:
frozen_states.intersection(frozen_states_2)  # Intersections can still be used aswell. 

frozenset({'New York'})

In [45]:
frozen_states.union(frozen_states_2)  # Unions can also still be used. 

frozenset({'Alaska',
           'Arizona',
           'California',
           'Colorado',
           'Connecticut',
           'Montana',
           'New Jersey',
           'New York',
           'North Carolina',
           'Texas'})

## Overview:

- Sets are a mutable, iterable, collection of numbers/words/items that contain no duplicates or repeated items. 
- Sets are a highly optimized method that can be used to check if a specific item is contained within the set.
- Frozen sets are sets that are immutable and cannot have items added or removed from them.
- Frozen sets can still use operations such as intersections and unions. 