# Sets

**Sets** are data structures that can contain only **unique immutable objects**. Sets in Python are actually very close to their prototypes — mathematical sets.

To create a set we list objects inside the curly breakets.


In [None]:
ex_list = ['cat', 'dog', 'cat'] # creating a list with []
ex_tuple = ('cat', 'dog', 'cat') # creating a tuple with ()
ex_set = {'cat', 'dog', 'cat'} # creating a set with {}
print(ex_list)
print(ex_tuple)
print(ex_set) # note that we have only one 'cat' in our set

['cat', 'dog', 'cat']
('cat', 'dog', 'cat')
{'dog', 'cat'}


Indeed. Even though we've tried to store two `'cat'` strings in our set, only one was added. It happened because sets can store **only unique objects**.

Python data type is also called `set`. Using `set()` function without arguments we can create an empty set. We can convert list to a set by passing a list as an argument.

In [None]:
empty_set = set() # creating an empty set
example_set = set(['cat', 'dog', 'cat']) # converting a list to a set
print(type(empty_set), empty_set)
print(type(example_set), example_set)

<class 'set'> set()
<class 'set'> {'dog', 'cat'}


In the same way we can convert strings or tuples into a set.

In [None]:
print(set('kitty')) # set of unique symbols from a string
print(set(('dog', 'dog'))) # set of unique objects from a tuple

{'k', 'i', 'y', 't'}
{'dog'}


Set can store objects of different data types but only those which are immutable.

In [None]:
example_1 = {1, 'cat', (2,4), 4.5, False}
print(example_1)

{False, 1, 4.5, 'cat', (2, 4)}


In [None]:
example_2 = {[4, 4], 4} # trying to create a set that contains a list

TypeError: ignored

We got an error. So far we know only two mutable data types — `list` and `set` itself. All other objects can be a part of a set.

Set is not an ordered sequence. Python has no idea which item of a set is the first and which is the last. Thus we cannot use indexing with sets.

In [None]:
ex_set = {'cat', 'dog', 'cat'}
ex_set[0]

TypeError: ignored

Error `'set' object is not subscriptable` means that indexing cannot be applied to a set. The only way to get items from a set is to loop through it.

In [None]:
ex_set = {'cat', 'dog'}
for item in ex_set: # looping through our set
  print(item) # printing each item

dog
cat


Speaking of other operations we already know, we can compute length of a set and we can check if something belongs to a set or if it does not.

In [None]:
ex_set = {'cat', 'dog', 'cat'}
print(len(ex_set)) # how many items are in the set?
print('cat' in ex_set) # does the string 'cat' belong to a set?
print('python' in ex_set) # what about the string 'python'?

2
True
False


Imagine that the teacher wants to check which marks did her students get and calculate some statistics.

In [None]:
marks = [10, 10, 8, 7, 4, 5, 4, 10] # marks for 8 students
for item in marks:
  print('Grade:', item) # printing the mark
  print('Count:', marks.count(item)) # printing how many students got that mark


Grade: 10
Count: 3
Grade: 10
Count: 3
Grade: 8
Count: 1
Grade: 7
Count: 1
Grade: 4
Count: 2
Grade: 5
Count: 1
Grade: 4
Count: 2
Grade: 10
Count: 3


Ouch! The non-unique marks were printed several times. We can convert our list to set to avoid the repetition.

In [None]:
marks = [10, 10, 8, 7, 4, 5, 4, 10] # list of non-unique marks
for item in set(marks): # converting our list to a set of unique marks
  print('Grade:', item)
  print('Count:', marks.count(item))

Grade: 4
Count: 2
Grade: 5
Count: 1
Grade: 7
Count: 1
Grade: 8
Count: 1
Grade: 10
Count: 3


There are two set methods that are particularly useful for us. The fist one is `.add()`. It requires an argument — an item to add to a set.

The second is `.remove()`. It requires an argument — an item to remove from a set.

In [None]:
ex_set = {'dog', 'cat'}
ex_set.add('python') # adding the string 'python' to a set
print(ex_set)
ex_set.remove('dog') # removing the stirng 'dog' from a set
print(ex_set)

{'dog', 'cat', 'python'}
{'cat', 'python'}


If we were to try to remove an item that is not in the set, then we would get an error.

In [None]:
ex_set = {'dog', 'cat'}
ex_set.remove('python') # getting an error, there is no 'python' string in our set

KeyError: ignored

Let's try to solve the following problem. For a group of people we want to find all unique languages they speak. Let's input languages' names until the string `'end'` is inputted.

In [None]:
language = input() # reading the first language
languages_we_speak = set() # creating an empty set

while language != 'end': # starting the loop
  languages_we_speak.add(language) # adding read language to a set
  language = input() # reading new string

print('We speak:')
print(*languages_we_speak, sep=',') # outputting all the unique languages we speak

english
japanese
english
french
german
japanese
french
chinese
end
We speak:
english,french,german,japanese,chinese


# Set operations

![](https://github.com/rogovich/Data/blob/master/img/eiler_1_eng.png?raw=true)

If you've ever encountered the sets before than such diagram should be familiar to you. It is called **Euler diagram**. In such diagram circles are used to represent different sets and relations between them.

Let's start with an example. Imagine that you and your friend are roommates. And you are considering to take in a pet. So there is a set of pets that you like (`my_pets_list` variable, green + blue areas on the graph): chinchilla, cat, fish, and grass snake. And there is a set of pets that your friend like (`friend_pets_list`, orange + blue areas on the graph): cat, grass snake, dog, python, and chameleon.

So on our diagram the blue area is an **intersection** of those two sets — pets that both of you like (cats and grass snakes). Let's find an intersection of two sets via Python. We use an operator `&` for it.

For all sets' operations the result would be also a set. You can always save it to a variable if there is a need.

In [None]:
my_pets_list = {'chinchilla', 'fish', 'grass snake', 'cat'}
friend_pets_list = {'grass snake', 'cat', 'python', 'chameleon', 'dog'}
print(my_pets_list & friend_pets_list) # intersection, blue area on the diagram

{'grass snake', 'cat'}


Then we can find a **union** — a set of pets that *at least one of you likes*. So those pets either belong to your set or your friend's set. In Python to find a union we use an operator `|`.

In [None]:
print(my_pets_list | friend_pets_list) # union, green + blue + orange areas on the diagram

{'chameleon', 'dog', 'fish', 'grass snake', 'chinchilla', 'cat', 'python'}


For the union and intesection it does not matter the order in which you are passing your sets, the result would be the same. But it would matter for a **difference**. Difference is a way to find animals that you like and your friend does not like, and vice versa.

To find a difference we us an `-` operator. And please pay attention to set's order.

In [None]:
print(my_pets_list - friend_pets_list) # difference, pets that I like and my friend's does not like, green area on the diagram
print(friend_pets_list - my_pets_list) # difference, pets that my friend like and I don't like, orange area on the diagram

NameError: ignored

Sometimes it would be useful to find a **symmetric difference** — objects that belong only to one of the sets. Or in other words, *objects that do not belong to an intersection of sets*.

In [None]:
print(friend_pets_list ^ my_pets_list) # symmetric difference, green + orange areas on a chart

{'chameleon', 'dog', 'fish', 'chinchilla', 'python'}


Also we can perfrom several set operations in a row. But keep in mind that they are executed from left to right, and we have to control the order of operations via brackets.

E.g. symmetric difference in other words is a difference between a union of two sets and its intersection. Let's compute it in this way.

In [None]:
print(my_pets_list | friend_pets_list - my_pets_list & friend_pets_list)

{'chameleon', 'dog', 'chinchilla', 'fish', 'grass snake', 'cat', 'python'}


The result does not look like the result of the operation above. It happened because Python (1) computed an union of two sets, (2) found a difference between that union and `my_pets_list`, (3) intersected the resulting set from (2) with `friend_pets_list` set. Doesn't look like the thing we wanted. Let's help Python by putting brackets around operations we want to be performed first:

In [None]:
print((my_pets_list | friend_pets_list) - (my_pets_list & friend_pets_list)) # union - intersection

{'chameleon', 'dog', 'fish', 'chinchilla', 'python'}


Now we've found the symmetric difference.