# Lesson: Sets in Python

#### Sets are very similar to lists and dictionaries, and although we will use them less often than lists and dictionaries, they are still a valuable concept to understand. 
- Sets are a data structure that represents a collection of distinct components. 
- Sets have been part of Python since version 2.4.
- Sets are a collection of unalterable and unique items. 

## Create a simple set:

In [4]:
# Example:
s = set()
s.add(1)   # set is now just 1
s

{1}

In [6]:
s.add(2)  # adds the number 2 to the set
s

{1, 2}

In [8]:
s.add(5)  # you can add whatever other random number to the set.
s

{1, 2, 5}

In [10]:
set_len = len(s)  # Shows how many numbers are in your set. 
set_len

3

In [12]:
x = 2 in s  # Allows you to check if a certain number is contained in the set. 2 is part of this set, 
            # and therefore it returns True.  
x

True

In [83]:
y = 4 in s  # 4 is not part of this set, so it returns False.
y

False

In [85]:
s.discard(2)   # Removes the number 2 from the set.
s

{1, 5}

## Finding distinct items in a list using sets:
- One reason to use sets is to find distinct items or numbers in lists.
- These are examples of short and simple lists, but these same techniques can be used on much larger lists and sets. 

In [16]:
# Example:
num_in_list = [1, 6, 3, 3, 7, 8, 2, 5, 6, 9]  # Create a list with random and repeated numbers. 

In [18]:
len_list = len(num_in_list)  # Like before, we can find the length of the list. 
len_list

10

In [19]:
num_set = set(num_in_list)  # Turn your list into a set, which removes repeated numbers/items 
                            # and returns them in order.
num_set

{1, 2, 3, 5, 6, 7, 8, 9}

In [21]:
distinct_num = len(num_set)  # Finds the number of distinct numbers/items contained in the set. 
distinct_num

8

In [23]:
distinct_list = list(num_set)  # You can turn you distinct set back into a list if you want to. 
distinct_list

[1, 2, 3, 5, 6, 7, 8, 9]

## Using 'in' with sets:
- "in" can be used as a very fast working operation with sets.
- If we are working with a very large collection of items/numbers, a set will be more appropriate than a list. 
- Sets can also be useful for finding distinct words in a list or set. 
- This can be useful for Natural Language Processing, which we will learn about in more depth later in the course. 

In [51]:
# Example:
words = ['the', 'for', 'you', 'this', 'me', 'my']

In [52]:
stop_words = set(words)

In [53]:
'cat' in stop_words  # Returns False since cat is not located in the list of stopwords. 

False

In [54]:
'you' in stop_words

True

In [55]:
stop_words_set = set(stop_words)  # Turn list into a set just like before, but now using words instead of numbers. 

In [56]:
'cat' in stop_words_set

False

In [57]:
'the' in stop_words_set

True

In [58]:
print(stop_words_set)  # View your set. 

{'this', 'you', 'for', 'me', 'my', 'the'}


## Using 'not in' with sets:
- "not in" can be used in the same way to examine what words are not contained in a set.
- We will use the same set from above.

In [37]:
'cat' not in stop_words_set  # Returns True since cat is not in the set.

True

In [38]:
'the' not in stop_words_set  # Returns False since the is in the set.

False

## Intersections
- Find similar items in two or more different sets.

In [42]:
x = ['a', 'b', 'c', 'd', 'e', 'f']
y = ['f', 'a', 'z', 'u', 'p', 'b']

In [43]:
set_x = set(x)
set_y = set(y)

In [44]:
set(x), set(y)

({'a', 'b', 'c', 'd', 'e', 'f'}, {'a', 'b', 'f', 'p', 'u', 'z'})

In [45]:
set_x.intersection(set_y)

{'a', 'b', 'f'}

## Unions
- Unions combine two or more different sets while only returning distinct items. 

In [46]:
set_x.union(set_y)  # Combining the the sets from above. 

{'a', 'b', 'c', 'd', 'e', 'f', 'p', 'u', 'z'}

In [47]:
z = ['r', 'q', 'c', 'l', 'j', 'a']

In [48]:
set_z = set(z)

In [49]:
set_x.union(set_y, set_z)  # Creating a union between 3 sets. This can also be used the same way with intersections. 

{'a', 'b', 'c', 'd', 'e', 'f', 'j', 'l', 'p', 'q', 'r', 'u', 'z'}

## Frozen Sets
- Frozen sets are immutible, and individual sets cannot be changed like the sets we were working with before. 

In [61]:
states = ['New York', 'Colorado', 'North Carolina', 'Texas', 'Alaska']

In [65]:
frozen_states = frozenset(states)  # Creates a frozen set which is immutable and cannot be changed. 

In [67]:
frozen_states  # Sets always return in alphabetical or numerical order.

frozenset({'Alaska', 'Colorado', 'New York', 'North Carolina', 'Texas'})

In [70]:
frozen_states.add('Maine')  # Frozen sets prevent anything to be added to it. Therefore you get an error code. 

AttributeError: 'frozenset' object has no attribute 'add'

In [87]:
frozen_states.discard('Colorado')  # Another example of how you can not change a frozen set and get an error code. 

AttributeError: 'frozenset' object has no attribute 'discard'

In [72]:
'New York' in frozen_states  # 'in' can still be used with frozen sets.

True

In [73]:
'Wyoming' not in frozen_states  # 'not in' can also still be used. 

True

In [78]:
other_states = ['Connecticut', 'New Jersey', 'Arizona', 'California', 'Montana', 'New York']

In [79]:
frozen_states_2 = set(other_states)

In [80]:
frozen_states_2

{'Arizona', 'California', 'Connecticut', 'Montana', 'New Jersey', 'New York'}

In [81]:
frozen_states.intersection(frozen_states_2)  # Intersections can still be used aswell. 

frozenset({'New York'})

In [82]:
frozen_states.union(frozen_states_2)  # Unions can also still be used. 

frozenset({'Alaska',
           'Arizona',
           'California',
           'Colorado',
           'Connecticut',
           'Montana',
           'New Jersey',
           'New York',
           'North Carolina',
           'Texas'})