# Data Structures: Sets

> A set is an unordered "bag" of unique values.

## Creation

In [1]:
girls = {"Andrea", "Sofia", "Lucia", "Maria"}

In [2]:
girls

{'Andrea', 'Lucia', 'Maria', 'Sofia'}

In [3]:
boys = {"Andrea", "Hugo", "Martin"}

In [4]:
boys

{'Andrea', 'Hugo', 'Martin'}

Items can never be repeated in a set, and Python ensures that is always the case:

In [5]:
boys = {"Andrea", "Hugo", "Martin", "Martin", "Martin"}

In [6]:
boys

{'Andrea', 'Hugo', 'Martin'}

To create an empty set, use the `set()` function, and **not** empty curly braces (`{}`):

In [7]:
empty_set = set()

<div class="alert alert-warning">

<b>Beware:</b> To create an empty set, use the <code>set()</code> function

</div>

Note that an empty set is represented by `set()`, and **not** by empty curly braces (`{}`):

In [8]:
empty_set

set()

<div class="alert alert-warning">

<b>Beware:</b> An empty set is represented by <code>set()</code>, and not by empty curly braces

</div>

<div class="alert alert-info">

<b>Note:</b> Sets can contain arbitrary objects, including a mix of data types (e.g. strings and numbers), other sets, or any other data structures.

</div>

## Type

In [9]:
type(girls)

set

In [10]:
type(boys)

set

In [11]:
type(empty_set)

set

## Conversion

### From lists

In [12]:
habsburgs = ["Philip", "Charles", "Philip", "Philip", "Philip", "Charles"]

In [13]:
type(habsburgs)

list

In [14]:
len(habsburgs)

6

In [15]:
set(habsburgs)

{'Charles', 'Philip'}

In [16]:
kings = set(habsburgs)

In [17]:
kings

{'Charles', 'Philip'}

In [18]:
type(kings)

set

In [19]:
len(kings)

2

## Length

In [20]:
len(boys)

3

In [21]:
len(girls)

4

In [22]:
len(empty_set)

0

## Differences with lists

The following operations are not allowed for sets:
* Indexing
* Slicing
* `.count()` and `.index()`
* Sorting
* ...

## Demo 1: Searching for items

In [23]:
girls

{'Andrea', 'Lucia', 'Maria', 'Sofia'}

In [24]:
"Carla" in girls

False

In [25]:
"Lucia" in girls

True

## Exercise 1

### Skeleton

In [26]:
data_scientist_skills = {"Python", "Git", "SQL", "R", "Tableau", "SAS"}

Check if Python is one of skills of a Data Scientist:

In [27]:
"Python" in data_scientist_skills

True

Check if Java is one of skills of a Data Scientist:

In [28]:
"Java" in data_scientist_skills

False

## Demo 2: Adding items

In [29]:
girls

{'Andrea', 'Lucia', 'Maria', 'Sofia'}

In [30]:
len(girls)

4

To `add()` a single item to the set:

In [31]:
girls.add("Emma")

In [32]:
girls

{'Andrea', 'Emma', 'Lucia', 'Maria', 'Sofia'}

In [33]:
len(girls)

5

Adding an item that already exists in the set has no effect:

In [34]:
girls.add("Emma")

In [35]:
girls

{'Andrea', 'Emma', 'Lucia', 'Maria', 'Sofia'}

In [36]:
len(girls)

5

To `update()` the set with multiple items:

In [37]:
girls.update(["Alba", "Sara", "Carmen"])

In [38]:
girls

{'Alba', 'Andrea', 'Carmen', 'Emma', 'Lucia', 'Maria', 'Sara', 'Sofia'}

In [39]:
len(girls)

8

## Exercise 2

### Skeleton

In [40]:
data_scientist_skills = {"Python", "Git", "SQL", "R", "Tableau", "SAS"}

Add Pandas to the skills of a Data Scientist:

In [41]:
data_scientist_skills.add("Pandas")

In [42]:
data_scientist_skills

{'Git', 'Pandas', 'Python', 'R', 'SAS', 'SQL', 'Tableau'}

Check the number of elements in the set:

In [43]:
len(data_scientist_skills)

7

Add Python to the skills of a Data Scientist:

In [44]:
data_scientist_skills.add("Python")

In [45]:
data_scientist_skills

{'Git', 'Pandas', 'Python', 'R', 'SAS', 'SQL', 'Tableau'}

Check the number of elements in the set:

In [46]:
len(data_scientist_skills)

7

Add Scikit-learn and Seaborn to the skills of a Data Scientist:

In [47]:
data_scientist_skills.update(["Scikit-learn","Seaborn"])

In [48]:
data_scientist_skills

{'Git',
 'Pandas',
 'Python',
 'R',
 'SAS',
 'SQL',
 'Scikit-learn',
 'Seaborn',
 'Tableau'}

Check the number of elements in the set:

In [49]:
len(data_scientist_skills)

9

## Demo 3: Removing items

In [50]:
girls = {"Andrea", "Sofia", "Lucia", "Maria"}

In [51]:
girls

{'Andrea', 'Lucia', 'Maria', 'Sofia'}

To get rid of an item from the set, use either `discard()` or `remove()`:

In [52]:
girls.discard("Lucia")

In [53]:
girls

{'Andrea', 'Maria', 'Sofia'}

In [54]:
girls.discard("Maria")

In [55]:
girls

{'Andrea', 'Sofia'}

The difference is that when attempting to remove a non-existent item from the set, `remove()` raises an error but not `discard()`:

In [56]:
girls.discard("Lucia")

In [57]:
girls

{'Andrea', 'Sofia'}

In [58]:
# Raises an error, because Emma is no longer present in the set:
girls.remove("Lucia")

KeyError: 'Lucia'

## Exercise 3

### Skeleton

In [59]:
data_scientist_skills = {"Python", "Git", "SQL", "R", "Tableau", "SAS"}

Remove SAS from the skills of a Data Scientist:

In [60]:
data_scientist_skills.remove("SAS")

In [61]:
data_scientist_skills

{'Git', 'Python', 'R', 'SQL', 'Tableau'}

Check the number of elements in the set:

In [62]:
len(data_scientist_skills)

5

Remove Java from the skills of a Data Scientist, ensuring that no error is raised if it is not in present:

In [63]:
data_scientist_skills.discard("Java")

In [64]:
data_scientist_skills

{'Git', 'Python', 'R', 'SQL', 'Tableau'}

Check the number of elements in the set:

In [65]:
len(data_scientist_skills)

5

## Demo 4: Set operations

In [66]:
girls = {"Andrea", "Sofia", "Lucia", "Maria"}
boys = {"Andrea", "Hugo", "Martin"}

### Intersection

To get the intersection between two sets (i.e. elements that belong to both sets):

In [67]:
girls.intersection(boys)

{'Andrea'}

In [68]:
len(girls.intersection(boys))

1

This operation is symmetric:

In [69]:
boys.intersection(girls)

{'Andrea'}

An alternative notation to get the intersection between two sets:

In [70]:
girls & boys

{'Andrea'}

Note that this notation requires both operands to be sets, while the `.intersection()` only requires the first one to be a set:

In [71]:
girls.intersection(list(boys))

{'Andrea'}

In [72]:
# Raises an error, because both parts must be sets:
girls & list(boys)

TypeError: unsupported operand type(s) for &: 'set' and 'list'

<div class="alert alert-warning">

<b>Beware:</b> The <code>&</code> notation for set intersection requires both operands to be sets

</div>

### Union

To get the union between two sets (i.e. elements that belong to either sets, or to both):

In [73]:
girls.union(boys)

{'Andrea', 'Hugo', 'Lucia', 'Maria', 'Martin', 'Sofia'}

In [74]:
len(girls.union(boys))

6

This operation is symmetric:

In [75]:
boys.union(girls)

{'Andrea', 'Hugo', 'Lucia', 'Maria', 'Martin', 'Sofia'}

An alternative notation to get the union between two sets:

In [76]:
girls | boys

{'Andrea', 'Hugo', 'Lucia', 'Maria', 'Martin', 'Sofia'}

<div class="alert alert-warning">

<b>Beware:</b> The <code>|</code> notation for set union requires both operands to be sets

</div>

### Difference

To get the difference between two sets (i.e. elements that belong to one set but not to the other):

In [77]:
girls.difference(boys)

{'Lucia', 'Maria', 'Sofia'}

In [78]:
len(girls.difference(boys))

3

This operation is **not** symmetric:

In [79]:
boys.difference(girls)

{'Hugo', 'Martin'}

An alternative notation to get the difference between two sets:

In [80]:
girls - boys

{'Lucia', 'Maria', 'Sofia'}

<div class="alert alert-warning">

<b>Beware:</b> The <code>-</code> notation for set difference requires both operands to be sets

</div>

### Symmetric difference

To get the symmetric difference between two sets (i.e. elements that belong to either set, but not to both):

In [81]:
girls.symmetric_difference(boys)

{'Hugo', 'Lucia', 'Maria', 'Martin', 'Sofia'}

In [82]:
len(girls.symmetric_difference(boys))

5

This operation is symmetric:

In [83]:
boys.symmetric_difference(girls)

{'Hugo', 'Lucia', 'Maria', 'Martin', 'Sofia'}

This operation is equivalent to the union of both set differences:

In [84]:
boys - girls

{'Hugo', 'Martin'}

In [85]:
girls - boys

{'Lucia', 'Maria', 'Sofia'}

In [86]:
(boys - girls) | (girls - boys)

{'Hugo', 'Lucia', 'Maria', 'Martin', 'Sofia'}

An alternative notation to get the simmetric diference between two sets:

In [87]:
girls ^ boys

{'Hugo', 'Lucia', 'Maria', 'Martin', 'Sofia'}

<div class="alert alert-warning">

<b>Beware:</b> The <code>^</code> notation for set symmetric difference requires both operands to be sets

</div>

## Exercise 4

### Skeleton

In [88]:
data_scientist_skills = {"Python", "Git", "SQL", "R", "Tableau", "SAS"}
data_engineer_skills = {"Python", "Git", "SQL", "Java", "Scala", "Hadoop"}

Get the intersection between the skills of a Data Scientist and the skills of a Data Engineer:

In [89]:
data_scientist_skills & data_engineer_skills

{'Git', 'Python', 'SQL'}

Get the union between the skills of a Data Scientist and the skills of a Data Engineer:

In [90]:
data_scientist_skills | data_engineer_skills

{'Git', 'Hadoop', 'Java', 'Python', 'R', 'SAS', 'SQL', 'Scala', 'Tableau'}

Get the difference between the skills of a Data Scientist and the skills of a Data Engineer:

In [91]:
data_scientist_skills - data_engineer_skills

{'R', 'SAS', 'Tableau'}

Get the difference between the skills of a Data Engineer and the skills of a Data Scientist:

In [92]:
data_engineer_skills - data_scientist_skills

{'Hadoop', 'Java', 'Scala'}

Get the symmetric difference between the skills of a Data Scientist and the skills of a Data Engineer:

In [93]:
data_scientist_skills ^ data_engineer_skills

{'Hadoop', 'Java', 'R', 'SAS', 'Scala', 'Tableau'}

## Demo 5: Set operations

In [94]:
students = {"Alice", "Bob", "Carol", "Dennis"}
teachers = {"Pablo", "Ivan", "JC"}
strangers = {"Snoopy", "Garfield"}

Create a set with everyone who should have access to the IE:

In [95]:
ie = students | teachers

In [96]:
ie

{'Alice', 'Bob', 'Carol', 'Dennis', 'Ivan', 'JC', 'Pablo'}

### Subsets

To check if the students are a subset of IE:

In [97]:
students.issubset(ie)

True

To check if the strangers are a subset of IE:

In [98]:
strangers.issubset(ie)

False

### Supersets

To check if the IE is a superset of the teachers:

In [99]:
ie.issuperset(teachers)

True

To check if the IE is a superset of the strangers:

In [100]:
ie.issuperset(strangers)

False

## Exercise 5

### Skeleton

In [102]:
data_scientist_skills = {"Python", "Git", "SQL", "R", "Tableau", "SAS"}
skills_alba = {"Python", "Git", "Tableau"}
skills_bea = {"Python", "Java", "Scala"}

Check if all of Alba's skills are relevant to be a Data Scientist:

In [103]:
data_scientist_skills.issuperset(skills_alba)

True

Check if all of Bea's skills are relevant to be a Data Scientist:

In [104]:
data_scientist_skills.issuperset(skills_bea)

False

Check which of Bea's skills are useful for a Data Scientist:

In [105]:
skills_bea & data_scientist_skills

{'Python'}