# Sets


## Sets

* unordered collections of unique elements
* uniques only - great for getting unique items out of some collection
* curly braces {3, 6, 7}

![Set](https://upload.wikimedia.org/wikipedia/commons/thumb/6/6d/Venn_A_intersect_B.svg/440px-Venn_A_intersect_B.svg.png)

https://en.wikipedia.org/wiki/Set_theory

In [None]:
#

## Creating a set

```python

In [1]:
s = {3,3,6,1,3,6,7} # so curly braces are also used for a set not only a dictionary
print(s)

{1, 3, 6, 7}


In [2]:
nset = set((3,3,6,1,3,6,7)) # alternative is to use set which takes an iterable
nset

{1, 3, 6, 7}

In [3]:
num_set = set([1,2,6,2,7,2,1]) # could pass a list
num_set

{1, 2, 6, 7}

In [7]:
a = set("ķiļķēni un klimpas") # takes a sequence so string qualifies
a

{' ', 'a', 'i', 'k', 'l', 'm', 'n', 'p', 's', 'u', 'ē', 'ķ', 'ļ'}

In [11]:
# we can pass different types of elements
b = {"abracadbra","abba", "dubba", "abba",56,7,2,12,2,2,1,1}
b

{1, 12, 2, 56, 7, 'abba', 'abracadbra', 'dubba'}

In [12]:
bset = set(["abracadbra","abba", "dubba", "abba"])
bset

{'abba', 'abracadbra', 'dubba'}

In [None]:
aset = set("abracadbra") # compare with next set which is a list
aset

{'a', 'b', 'c', 'd', 'r'}

In [None]:
set(["abracadbra"])  # compare with previous I gave list of one string which is in fact unique

{'abracadbra'}

## Python sets - NOT ordered!

Python sets do not have a specific order. You cannot access items in a set by referring to an index, since sets are unordered the items has no index.

In [None]:
# we can loop through a set
for c in aset: # notice no guarantee on order
    print(c)

d
c
r
a
b


## Membership testing in sets - very fast! O(1)

In [18]:
# this lookup (membership testing) is very quick even for large sets
# In computer science terms this is O(1) lookup, so constant time even with millions of elements
# much faster than in a list
'a' in aset, 'b' in aset, 'f' in aset 

(True, True, False)

In [20]:
# if you need sorted list from a set
# then use sorted function which returns a list
mylist = sorted(aset) # sorted gives you a list
mylist

['a', 'b', 'c', 'd', 'r']

In [21]:
# list lookup is linear so much slower for large data list > 10_000 and so on
'a' in mylist, 'b' in mylist, 'f' in mylist 


(True, True, False)

In [22]:
type(s), type(aset)

(set, set)

In [None]:
a

{' ', 'a', 'i', 'k', 'l', 'm', 'n', 'p', 's', 'u', 'ē', 'ķ', 'ļ'}

In [28]:
myletters = list(a)
myletters

['p', 'ē', 'u', 's', 'l', 'k', 'n', 'ķ', ' ', 'a', 'ļ', 'i', 'm']

In [29]:
"|".join(sorted(a)) # you can join with any character even blank space
# notice that sorting is using Unicode chr values so Latvian letters are after English
# TODO sort it locale specific way

' |a|i|k|l|m|n|p|s|u|ē|ķ|ļ'

In [None]:
myletters[:3]

['a', 'ķ', 'u']

In [None]:
al = list(a)
al

['a', 'ķ', 'u', 'l', 'm', 'n', 'p', 's', ' ', 'ē', 'ļ', 'i', 'k']

In [None]:
sorted(al)

[' ', 'a', 'i', 'k', 'l', 'm', 'n', 'p', 's', 'u', 'ē', 'ķ', 'ļ']

In [24]:
s = {1,2,65,2,6,3}
s

{1, 2, 3, 6, 65}

In [23]:
nset = set(range(10))
nset

{0, 1, 2, 3, 4, 5, 6, 7, 8, 9}

## Set as a way to remove duplicates from a list

Sets offer an easy way to remove duplicates from a list. Just convert the list to a set and then back to a list.



In [35]:
shopping_list = ["apple","banana", "carrot", "banana", "apple", "banana", "pumpkin","candy", "apple"]
shopping_set = set(shopping_list)
unique_items = list(shopping_set)

# let's print all three
print("Original Shopping list:\n",shopping_list)
print("Unique items set:\n",shopping_set)
print("Unique items list:\n",unique_items)

Original Shopping list:
 ['apple', 'banana', 'carrot', 'banana', 'apple', 'banana', 'pumpkin', 'candy', 'apple']
Unique items set:
 {'carrot', 'apple', 'pumpkin', 'candy', 'banana'}
Unique items list:
 ['carrot', 'apple', 'pumpkin', 'candy', 'banana']


In [36]:
# we could have done this in one line
unique_items = list(set(shopping_list))
print("Unique items list:\n",unique_items)


Unique items list:
 ['carrot', 'apple', 'pumpkin', 'candy', 'banana']


In [25]:
s.issubset(nset)  # false because s has 65 which is outside of nset values

False

## Set operations

Set offers a wide range of set algebra operations. Here are some of the most common ones:

* issubset
* issuperset
* union
* intersection
* difference
* symmetric difference



In [37]:
# range is a sequence of numbers so we can convert it to a set
n_3_7 = set(range(3,8))
n_3_7

{3, 4, 5, 6, 7}

In [40]:
nset = set(range(10))
print("number set is", nset)
n_3_7.issubset(nset)

number set is {0, 1, 2, 3, 4, 5, 6, 7, 8, 9}


True

In [41]:
# Alternative syntax
n_3_7 < nset # strong subset meaning n_3_7 can't be equal to nset

True

In [42]:
n_3_7 <= nset  # this is just like issubset but not strict

True

In [None]:
nset < nset # strong subset meaning nset can't be equal to nset

False

In [None]:
nset <= nset # this is just like issubset but not strict

True

In [45]:
s

{1, 3, 6, 7}

In [46]:
# we can go the other way around
nset.issuperset(s)

True

In [47]:
nset, s

({0, 1, 2, 3, 4, 5, 6, 7, 8, 9}, {1, 3, 6, 7})

In [50]:
s.add(65) # we can add elements
print(s)
try:
    s.remove(65) # we can remove elements
except KeyError as e:
    print("Error:", e)
print(s)


{65, 1, 3, 6, 7}
{1, 3, 6, 7}


In [37]:
nset, s

({0, 1, 2, 3, 4, 5, 6, 7, 8, 9}, {1, 2, 3, 6})

In [38]:
nset.issuperset(s)

True

In [39]:
nset > s, nset >= s, nset < s

(True, True, False)

In [40]:
s.issuperset(range(6))

False

In [41]:
nset.issuperset(range(6))

True

In [42]:
n_5_9 = set(range(5,10))
n_5_9

{5, 6, 7, 8, 9}

### Set union

The union of two sets is a set containing all elements that are in either set.

```python

In [52]:
# let's create n_5_9 first
n_5_9 = set(range(5,10))
print("n_5_9 is", n_5_9)
print("Union of n_3_7 and n_5_9 is", n_3_7.union(n_5_9))

n_5_9 is {5, 6, 7, 8, 9}
Union of n_3_7 and n_5_9 is {3, 4, 5, 6, 7, 8, 9}


In [53]:
# shorter union syntax is 
n_3_7 | n_5_9 # means we make a new set out of ALL elements of the two sets

{3, 4, 5, 6, 7, 8, 9}

### Intersection

The intersection of two sets is a set containing all elements that are in both sets.



In [54]:
n_3_7.intersection(n_5_9)

{5, 6, 7}

In [None]:
# syntactic sugar for intersection is &
n_3_7 & n_5_9 # same as intersection above so only elements in BOTH sets

{5, 6, 7}

In [55]:
n_5_7 = n_3_7 & n_5_9  # we can store the values
n_5_7

{5, 6, 7}

In [56]:
n_5_7 = n_3_7 & n_5_9 & nset # nset is 0 to 9 so this will not change the result
n_5_7

{5, 6, 7}

In [57]:
set(range(7))

{0, 1, 2, 3, 4, 5, 6}

In [58]:
n_5_6 = n_3_7 & n_5_9 & set(range(7)) # range goes to 6
n_5_6 # only 5 and 6 is in ALL 3 sets

{5, 6}

### Set difference

Set difference is the set of elements that are only in the first set but not in the second set.

Thus set difference is not commutative. This means that the order of the sets is important for difference.

```python

In [59]:
n_3_7.difference(n_5_9) # only elements unique to left side

{3, 4}

In [60]:
n_3_7 - n_5_9, n_5_9 - n_3_7 # so - is syntactic sugar to the difference

({3, 4}, {8, 9})

### Set symmetric difference

Symmetric difference is the set of elements that are in one of the sets but not in both.

Symmetric difference is commutative. This means that the order of the sets is not important for symmetric difference.

If we think about it symmetric difference is the opposite of intersection.
It could also be thought of as a union of the differences of two sets.

```python

In [53]:
n_3_7.symmetric_difference(n_5_9) # only elements unique either side # analogy to XOR in logic

{3, 4, 8, 9}

In [61]:
n_3_7 ^ n_5_9 # ^ is short for .symmetric_difference

{3, 4, 8, 9}

In [62]:
# compare to union of set differences
(n_3_7 - n_5_9) | (n_5_9 - n_3_7)

{3, 4, 8, 9}

## Updating sets

In [63]:
# we can update  a single with many differnt data types as long as they are in iterable format
s.update({3,3,6,2,7,9},range(4,15), [3,6,7,"Valdis", "Badac","Valdis"],"Abba")
s

{1,
 10,
 11,
 12,
 13,
 14,
 2,
 3,
 4,
 5,
 6,
 7,
 8,
 9,
 'A',
 'Badac',
 'Valdis',
 'a',
 'b'}

In [None]:
dir(s)

['__and__',
 '__class__',
 '__contains__',
 '__delattr__',
 '__dir__',
 '__doc__',
 '__eq__',
 '__format__',
 '__ge__',
 '__getattribute__',
 '__gt__',
 '__hash__',
 '__iand__',
 '__init__',
 '__init_subclass__',
 '__ior__',
 '__isub__',
 '__iter__',
 '__ixor__',
 '__le__',
 '__len__',
 '__lt__',
 '__ne__',
 '__new__',
 '__or__',
 '__rand__',
 '__reduce__',
 '__reduce_ex__',
 '__repr__',
 '__ror__',
 '__rsub__',
 '__rxor__',
 '__setattr__',
 '__sizeof__',
 '__str__',
 '__sub__',
 '__subclasshook__',
 '__xor__',
 'add',
 'clear',
 'copy',
 'difference',
 'difference_update',
 'discard',
 'intersection',
 'intersection_update',
 'isdisjoint',
 'issubset',
 'issuperset',
 'pop',
 'remove',
 'symmetric_difference',
 'symmetric_difference_update',
 'union',
 'update']

In [64]:
# we can check if our set has anything in common with anohther data structures
n_3_7.isdisjoint(n_5_9) # False because sets do intersect with 5,6,7

False

In [65]:
n_8_9 = set((8,9))
n_8_9

{8, 9}

In [66]:
n_3_7.isdisjoint(n_8_9)

True

In [67]:
sentence = "a quick brown fox jumped over a sleeping dog which is not a normal dog"
words = sentence.split()
words

['a',
 'quick',
 'brown',
 'fox',
 'jumped',
 'over',
 'a',
 'sleeping',
 'dog',
 'which',
 'is',
 'not',
 'a',
 'normal',
 'dog']

In [68]:
unique_words_set = set(words)
unique_words_set

{'a',
 'brown',
 'dog',
 'fox',
 'is',
 'jumped',
 'normal',
 'not',
 'over',
 'quick',
 'sleeping',
 'which'}

In [69]:
unique_words_list = list(unique_words_set)
unique_words_list

['over',
 'brown',
 'dog',
 'normal',
 'sleeping',
 'which',
 'fox',
 'jumped',
 'is',
 'a',
 'not',
 'quick']

In [None]:
# so Sets use them to obtain  unique elements 
# then can convert back to other data structures