## 8.4 Sets

A **set** is an unordered collection of unique items, i.e. without duplicates.
The items in a set are traditionally called its elements or members.
In mathematics a set is written with curly braces.
For example, {1, 2, 'hi'} and {'hi', 2, 1} are the same heterogeneous set,
as the order in which we list set members doesn't matter.

### 8.4.1 The set ADT

The set ADT supports the following operations:

Operation | Effect | Maths | English
:-|:-|:-|:-
new  | create new set  | let _s_ be {}  |  let _s_ be the empty set
size  | the number of elements  | │*s*│ | │*s*│
membership  | check if item _i_ is in _s_  | $i \in s$  | _i_ in _s_
add  | add an item _i_ to _s_  |   |   add _i_ to _s_
remove  | take an item _i_ from _s_  |   | remove _i_ from _s_
intersection  | the items in _s1_ and in _s2_ | $s1 \cap s2$  |  intersection of _s1_ and _s2_
union  | the items in _s1_ or in _s2_   | $s1 \cup s2$  | union of _s1_ and _s2_
difference  | the items in _s1_ but not in _s2_  | _s1_ − _s2_  | _s1_ − _s2_

Some examples of the above operations:

- {1, 2, 3} $\cup$  {4, 5, 2} = {1, 2, 3, 4, 5}
- {1, 2, 3} − {4, 5, 2} = {1, 3}
- {1, 2, 3} $\cap$ {4, 5, 2} = {2}
- {1, 2, 3} $\cap$ {4, 5, 6} = {}

Two sets are said to be **disjoint** if they have no common elements:
their intersection is the empty set.

Adding an item corresponds to doing 'let _s_ be _s_ $\cup$ {_i_}'.
Removing an item corresponds to doing 'let _s_ be _s_ − {_i_}'.
Adding an already existing item or removing an inexistent item
has therefore no effect on the set.

The set ADT also includes comparison operations.
Two sets can be compared for (in)equality, e.g. {1, 2, 3} = {3, 1, 2}
but {1, 2, 3} ≠ {3, 1, 4}.

A set _A_ is a **subset** of _B_, and _B_ and **superset** of _A_,
written _A_ $\subseteq$ _B_ or _B_ $\supseteq$ _A_,
if every item of _A_ is also in _B_.
Set _A_ is a **proper** subset of _B_, written _A_ $\subset$ _B_
(or _B_ is a proper superset of _A_, written _B_ $\supset$ _A_),
if _A_ $\subseteq$ _B_ and _A_ ≠  _B_.
For example, {1, 2} $\subset$ {1, 2, 3}.

<div class="alert alert-info">
<strong>Info:</strong> The size of a set is also known as its cardinality.
The difference operation is also written $s1 \setminus s2$.
The empty set is also written as $\emptyset$.
MST124 Unit&nbsp;3 Section&nbsp;1.1 introduces sets of real numbers and set notation.
</div>

### 8.4.2 Sets in Python

Python has a built-in class `set` to represent sets.
Set literals are written as comma-separated items within curly braces,
e.g. `{1, 2, 3}`. Python sets are iterable.
The operations are written as follows.

Operation | Python
:-|:-
new  | `s = set()`
size  | `len(s)`
membership  |  `item in s` or `item not in s`
add  |  `s.add(item)`
remove  |  `s.discard(item)`
union  | `s1.union(s2)`
intersection  | `s1 & s2` or `s1.intersection(s2)`
difference  | `s1 - s2` or `s1.difference(s2)`

The union operation can also be written as `s1 | s2`.

<div class="alert alert-warning">
<strong>Note:</strong> In Python, <code>{}</code> represents the empty dictionary, not the empty set,
so always use <code>set()</code> instead.
</div>

The last three operations create a new set: they don't modify either input set.
Here's a simple example with positive integers.

In [1]:
odd = {1, 3, 5}
even = {2, 4, 6}
prime = {2, 3, 5}
print('all:', odd | even)
print('even primes:', even & prime)
print("odd primes (primes that aren't even):", prime - even)
print("even numbers that aren't prime:", even - prime)

all: {1, 2, 3, 4, 5, 6}
even primes: {2}
odd primes (primes that aren't even): {3, 5}
even numbers that aren't prime: {4, 6}


Note that the IPython interpreter displays set members in sorted order.
Internally, the items may be stored in any order, e.g.
`odd | even` may be stored as {1, 3, 5, 2, 4, 6}.

<div class="alert alert-warning">
<strong>Note:</strong> Your algorithms on sets must not rely on any particular order of the items.
</div>

#### Exercise 8.4.1

Write alternative expressions for the even primes and the odd primes,
without using the set `even`.

_Write your answer here._

[Hint](../31_Hints/Hints_08_4_01.ipynb)
[Answer](../32_Answers/Answers_08_4_01.ipynb)

It's possible to build expressions from these operations.
Their associativity and precedence in relation to other operations is
as follows, with highest precedence at the top of the table.

Operators | Associativity
:-|:-
arithmetic  |  left (except for exponentiation and negation)
intersection  |  left
union  |  left
comparison and membership  |  left
logical  |  left (except negation)

The set difference operator has the same precedence and associativity as
arithmetic difference (subtraction).
The set membership operator has the same precedence and associativity
as membership for other collections. In the next example,
'number' refers to a positive integer.

In [2]:
print('all:', odd | even | prime)
print("odd numbers that aren't even primes", odd - (prime & even))
print("non-prime odd numbers that are even:", odd - prime & even)
print("prime numbers that are odd or even", (odd | even) & prime)
print("numbers that are even primes or odd", odd | even & prime)

all: {1, 2, 3, 4, 5, 6}
odd numbers that aren't even primes {1, 3, 5}
non-prime odd numbers that are even: set()
prime numbers that are odd or even {2, 3, 5}
numbers that are even primes or odd {1, 2, 3, 5}


Like for Boolean expressions, it's best to always put parentheses to show our
intentions, e.g. `(odd - prime) & even` for the second expression.

<div class="alert alert-warning">
<strong>Note:</strong> Write all parentheses in set expressions, even if they're redundant.
</div>

The set comparison operators are written like the arithmetic comparisons.

In [3]:
print('are all primes odd?', prime <= odd)
print('are all odd numbers prime?', odd <= prime)
print('are some numbers not prime?', prime < odd | even)

are all primes odd? False
are all odd numbers prime? False
are some numbers not prime? True


The last expression asks the equivalent question:
are the prime numbers a proper subset of all numbers?

Any sequence can be converted to a set of its unique items,
using the type constructor.

In [4]:
set([3, 4, 3, 6, 2, 1, 6])

{1, 2, 3, 4, 6}

This is a shorter way of writing:

In [5]:
unique = set()
for item in [3, 4, 3, 6, 2, 1, 6]:
    unique.add(item)
unique

{1, 2, 3, 4, 6}

### 8.4.3 Implementing sets

The set ADT can be implemented with a sequence data type, but that
makes adding an item take linear time, to check if it's already there.

A set can be seen as a map from items to Booleans, stating
for each item if it's a member of the set.
Therefore, any map implementation can be used to implement sets.
For example, if the potential set members are limited and known in advance,
we can use a lookup table of Booleans.
Set operations like intersection are easy to implement by
going through two lookup tables and applying Boolean operations.

Python's sets are implemented with hash tables and thus items must be hashable.
Like dictionaries, sets aren't hashable themselves and so can't be used as keys.

<div class="alert alert-info">
<strong>Info:</strong> In Java, the interface <code>Set</code> defines the set data type.
Class <code>HashSet</code> implements the interface with a hash table.
Both the interface and class are in package <code>java.util</code>.
</div>

The add, remove and membership operations take amortised constant time
for Python sets. As for the operations on two sets _a_ and _b_,
union has complexity Θ(│*a*│ + │*b*│),
intersection has complexity Θ(min(│*a*│, │*b*│),
i.e. is linear in the smallest of both sets,
and the difference _a_ − _b_ is linear in the size of the first set: Θ(│*a*│).

#### Exercise 8.4.2

Explain the complexities of the union, intersection and difference operations.

_Write your answer here._

[Hint](../31_Hints/Hints_08_4_02.ipynb)
[Answer](../32_Answers/Answers_08_4_02.ipynb)

#### Exercise 8.4.3

Checking if two sets are disjoint can be done with the Boolean expression
`len(s1 & s2) == 0`.

1. Why isn't this an efficient way of checking disjointness?

_Write your answer here._

2. Describe a better algorithm.

_Write your answer here._

3. Explain why it's better, by comparing the memory use and
   the best- and worst-case complexities against those of the expression.

_Write your answer here._

<div class="alert alert-warning">
<strong>Note:</strong> The shortest algorithm is not necessarily the most efficient.
</div>

[Hint](../31_Hints/Hints_08_4_03.ipynb)
[Answer](../32_Answers/Answers_08_4_03.ipynb)

### 8.4.4 Using sets

An efficient implementation of a set is very useful:
contrary to lists, stacks and queues, it supports the
add, remove and membership operations in constant time, for _every_ item.
Here's a problem that uses a set just as a basic collection of items.

#### Exercise 8.4.4

A computing society is organising a programming contest for schools.
Each school can send up to 4 teams of students.
Each team is identified by a string like 'DS2',
with the team's number after the school's initials.
The best team of each school gets a certificate.
Given the final team ranking, compute the teams that get a certificate.
Add tests.

In [6]:
%run -i ../m269_util

def certificates(ranking: list) -> list:
    """Return the best team of each school.

    The input and output are lists of strings (team names).
    Each string is the name of a school and a digit from 1 to 4.

    Preconditions:
    - ranking is a non-empty list ordered from first to last team
    - there are no duplicate teams
    Postconditions:
    - the output has the first team in 'ranking' of each school
    - the output strings are in the same order as in 'ranking'
    """
    best_teams = []
    pass
    return best_teams

certificates_tests = [
    # case,         ranking,                    certificates
    ('3 schools',   ['C1','B2','B1','A1','C2'], ['C1','B2','A1']),
    # new tests:
]

test(certificates, certificates_tests)

[Hint](../31_Hints/Hints_08_4_04.ipynb)
[Answer](../32_Answers/Answers_08_4_04.ipynb)

#### Optional exercises

The Kattis Guide lists further
[problems on sets](https://mwermelinger.github.io/kattis-guide/unordered.html#sets).

⟵ [Previous section](08_3_hash_table.ipynb) | [Up](08-introduction.ipynb) | [Next section](08_5_bag.ipynb) ⟶