# Data Structures: Sets

> A set is an unordered "bag" of unique values.

## Creation

In [None]:
girls = {"Andrea", "Sofia", "Lucia", "Maria"}

In [None]:
girls

In [None]:
boys = {"Andrea", "Hugo", "Martin"}

In [None]:
boys

Items can never be repeated in a set, and Python ensures that is always the case:

In [None]:
boys = {"Andrea", "Hugo", "Martin", "Martin", "Martin"}

In [None]:
boys

To create an empty set, use the `set()` function, and **not** empty curly braces (`{}`):

In [None]:
empty_set = set()

<div class="alert alert-warning">

<b>Beware:</b> To create an empty set, use the <code>set()</code> function

</div>

Note that an empty set is represented by `set()`, and **not** by empty curly braces (`{}`):

In [None]:
empty_set

<div class="alert alert-warning">

<b>Beware:</b> An empty set is represented by <code>set()</code>, and not by empty curly braces

</div>

<div class="alert alert-info">

<b>Note:</b> Sets can contain arbitrary objects, including a mix of data types (e.g. strings and numbers), other sets, or any other data structures.

</div>

## Type

In [None]:
type(girls)

In [None]:
type(boys)

In [None]:
type(empty_set)

## Conversion

### From lists

In [None]:
habsburgs = ["Philip", "Charles", "Philip", "Philip", "Philip", "Charles"]

In [None]:
type(habsburgs)

In [None]:
len(habsburgs)

In [None]:
set(habsburgs)

In [None]:
kings = set(habsburgs)

In [None]:
kings

In [None]:
type(kings)

In [None]:
len(kings)

## Length

In [None]:
len(boys)

In [None]:
len(girls)

In [None]:
len(empty_set)

## Differences with lists

The following operations are not allowed for sets:
* Indexing
* Slicing
* `.count()` and `.index()`
* Sorting
* ...

## Demo 1: Searching for items

In [None]:
girls

In [None]:
"Carla" in girls

In [None]:
"Lucia" in girls

## Exercise 1

### Skeleton

In [None]:
data_scientist_skills = {"Python", "Git", "SQL", "R", "Tableau", "SAS"}

Check if Python is one of skills of a Data Scientist:

Check if Java is one of skills of a Data Scientist:

## Demo 2: Adding items

In [None]:
girls

In [None]:
len(girls)

To `add()` a single item to the set:

In [None]:
girls.add("Emma")

In [None]:
girls

In [None]:
len(girls)

Adding an item that already exists in the set has no effect:

In [None]:
girls.add("Emma")

In [None]:
girls

In [None]:
len(girls)

To `update()` the set with multiple items:

In [None]:
girls.update(["Alba", "Sara", "Carmen"])

In [None]:
girls

In [None]:
len(girls)

## Exercise 2

### Skeleton

In [None]:
data_scientist_skills = {"Python", "Git", "SQL", "R", "Tableau", "SAS"}

Add Pandas to the skills of a Data Scientist:

Check the number of elements in the set:

Add Python to the skills of a Data Scientist:

Check the number of elements in the set:

Add Scikit-learn and Seaborn to the skills of a Data Scientist:

Check the number of elements in the set:

## Demo 3: Removing items

In [None]:
girls = {"Andrea", "Sofia", "Lucia", "Maria"}

In [None]:
girls

To get rid of an item from the set, use either `discard()` or `remove()`:

In [None]:
girls.discard("Lucia")

In [None]:
girls

In [None]:
girls.discard("Maria")

In [None]:
girls

The difference is that when attempting to remove a non-existent item from the set, `remove()` raises an error but not `discard()`:

In [None]:
girls.discard("Lucia")

In [None]:
girls

In [None]:
# Raises an error, because Emma is no longer present in the set:
girls.remove("Lucia")

## Exercise 3

### Skeleton

In [None]:
data_scientist_skills = {"Python", "Git", "SQL", "R", "Tableau", "SAS"}

Remove SAS from the skills of a Data Scientist:

Check the number of elements in the set:

Remove Java from the skills of a Data Scientist, ensuring that no error is raised if it is not in present:

Check the number of elements in the set:

## Demo 4: Set operations

In [None]:
girls = {"Andrea", "Sofia", "Lucia", "Maria"}
boys = {"Andrea", "Hugo", "Martin"}

### Intersection

To get the intersection between two sets (i.e. elements that belong to both sets):

In [None]:
girls.intersection(boys)

In [None]:
len(girls.intersection(boys))

This operation is symmetric:

In [None]:
boys.intersection(girls)

An alternative notation to get the intersection between two sets:

In [None]:
girls & boys

Note that this notation requires both operands to be sets, while the `.intersection()` only requires the first one to be a set:

In [None]:
girls.intersection(list(boys))

In [None]:
# Raises an error, because both parts must be sets:
girls & list(boys)

<div class="alert alert-warning">

<b>Beware:</b> The <code>&</code> notation for set intersection requires both operands to be sets

</div>

### Union

To get the union between two sets (i.e. elements that belong to either sets, or to both):

In [None]:
girls.union(boys)

In [None]:
len(girls.union(boys))

This operation is symmetric:

In [None]:
boys.union(girls)

An alternative notation to get the intersection between two sets:

In [None]:
girls | boys

<div class="alert alert-warning">

<b>Beware:</b> The <code>|</code> notation for set union requires both operands to be sets

</div>

### Difference

To get the difference between two sets (i.e. elements that belong to one set but not to the other):

In [None]:
girls.difference(boys)

In [None]:
len(girls.difference(boys))

This operation is **not** symmetric:

In [None]:
boys.difference(girls)

An alternative notation to get the difference between two sets:

In [None]:
girls - boys

<div class="alert alert-warning">

<b>Beware:</b> The <code>-</code> notation for set difference requires both operands to be sets

</div>

### Symmetric difference

To get the symmetric difference between two sets (i.e. elements that belong to either set, but not to both):

In [None]:
girls.symmetric_difference(boys)

In [None]:
len(girls.symmetric_difference(boys))

This operation is symmetric:

In [None]:
boys.symmetric_difference(girls)

This operation is equivalent to the union of both set differences:

In [None]:
boys - girls

In [None]:
girls - boys

In [None]:
(boys - girls) | (girls - boys)

An alternative notation to get the intersection between two sets:

In [None]:
girls ^ boys

<div class="alert alert-warning">

<b>Beware:</b> The <code>^</code> notation for set symmetric difference requires both operands to be sets

</div>

## Exercise 4

### Skeleton

In [None]:
data_scientist_skills = {"Python", "Git", "SQL", "R", "Tableau", "SAS"}
data_engineer_skills = {"Python", "Git", "SQL", "Java", "Scala", "Hadoop"}

Get the intersection between the skills of a Data Scientist and the skills of a Data Engineer:

Get the union between the skills of a Data Scientist and the skills of a Data Engineer:

Get the difference between the skills of a Data Scientist and the skills of a Data Engineer:

Get the difference between the skills of a Data Engineer and the skills of a Data Scientist:

Get the symmetric difference between the skills of a Data Scientist and the skills of a Data Engineer:

## Demo 5: Set operations

In [None]:
students = {"Alice", "Bob", "Carol", "Dennis"}
teachers = {"Pablo", "Ivan", "JC"}
strangers = {"Snoopy", "Garfield"}

Create a set with everyone who should have access to the IE:

In [None]:
ie = students | teachers

In [None]:
ie

### Subsets

To check if the students are a subset of IE:

In [None]:
students.issubset(ie)

To check if the strangers are a subset of IE:

In [None]:
strangers.issubset(ie)

### Supersets

To check if the IE is a superset of the teachers:

In [None]:
ie.issuperset(teachers)

To check if the IE is a superset of the strangers:

In [None]:
ie.issuperset(strangers)

## Exercise 5

### Skeleton

In [None]:
data_scientist_skills = {"Python", "Git", "SQL", "R", "Tableau", "SAS"}
skills_alba = {"Python", "Git", "Tableau"}
skills_bea = {"Python", "Java", "Scala"}

Check if all of Alba's skills are relevant to be a Data Scientist:

Check if all of Bea's skills are relevant to be a Data Scientist:

Check which of Bea's skills are useful for a Data Scientist: