# Python Sets

A set is a collection that is unordered and unindexed. 

Items in a set do not have a defined order. Items can appear in a different order every time you use them and cannot be referred to by index or key.

Sets are unchangeable, meaning that we cannot change the items after the set has been created.

Duplicates values are not allowed.

In Python, sets are written with curly brackets.

In [1]:
fruit = {'apple', 'banana', 'cherry'}
print(fruit)

{'cherry', 'banana', 'apple'}


In [2]:
type(fruit)

set

In [3]:
'banana' in fruit

True

In [4]:
'mango' in fruit

False

## Python methods for working with sets

**add()**: add an item to a set

In [5]:
fruit.add('mango')
fruit

{'apple', 'banana', 'cherry', 'mango'}

**update()**: add more than one item to the set

In [6]:
fruit.update(['orange','grapes'])
fruit

{'apple', 'banana', 'cherry', 'grapes', 'mango', 'orange'}

Remember, sets do not contain repeated items:

In [7]:
fruit.add('mango')
fruit

{'apple', 'banana', 'cherry', 'grapes', 'mango', 'orange'}

'mango' already belongs to fruit set. If you try to add it, you will get the same set.

**len()**: determine how many items the set has

In [8]:
print(len(fruit)) 

6


**remove()**: remove an item in a set

In [9]:
fruit.remove('banana')
fruit

{'apple', 'cherry', 'grapes', 'mango', 'orange'}

If the item to remove does not exist, **remove()** will raise an error.

In [10]:
fruit.remove('banana')       # This will raise an error

KeyError: 'banana'

**discard()**: remove an item in a set. If the item does not exist, discard will not raise an error

In [11]:
fruit.discard('banana')
fruit

{'apple', 'cherry', 'grapes', 'mango', 'orange'}

In [12]:
fruit.discard('mango')
fruit

{'apple', 'cherry', 'grapes', 'orange'}

**clear()**: empties the set

In [13]:
fruit.clear()
fruit

set()

**del**: keyword will delete the set completely

In [14]:
del fruit

Now, fruit is not defined. If you call fruit, it will raise an error.

In [15]:
fruit           # This will raise an error

NameError: name 'fruit' is not defined

A set can contain different data types.

In [16]:
set3 = {'John', 20, True, 30.33, 'male'}
print(type(set3))

<class 'set'>


 **union()**: join two sets. Remember sets do not have repeated items

In [17]:
#union
set1 = {1, 2, 3}
set2 = {3, 4, 5}
uni = set1.union(set2)
uni

{1, 2, 3, 4, 5}

**intersection()**: returns the interception of two sets. The interception set contains items in both sets.

In [18]:
#intersection
inter = set1.intersection(set2)
inter

{3}

**difference()**: returns a set containing the difference between two or more sets. 

A-B, is the set of all the elements of set A that are not in set B.

In [19]:
# difference
diff = set1.difference(set2)
diff

{1, 2}

**symmetric_difference()**: returns a set containing the symmetric difference of two sets.

The symmetric difference is the set of elements in any of the sets but not in the intersection.

symmetric_difference = union - intersection

In [20]:
# symmetric_difference 
simdiff = set1.symmetric_difference(set2)
simdiff

{1, 2, 4, 5}

In [21]:
# symmetric_difference = union - intersection
set1.union(set2).difference(set1.intersection(set2)) 

{1, 2, 4, 5}

Reference:
- VanderPlas, J. (2017) Python Data Science Handbook: Essential Tools for Working with Data. USA: O’Reilly Media, Inc. 