In [1]:
# Remember to execute this cell with Shift+Enter

import jupman


# Sets

## [Download exercises zip](../_static/generated/sets.zip)

[Browse online files](https://github.com/DavidLeoni/softpython-en/tree/master/sets)

A set is a _mutable_ _unordered_ collection of _immutable_ _distinct_ elements (that is, without duplicates). The Python datatype to represent sets is called `set`.

## What to do

1. Unzip [exercises zip](../_static/generated/sets.zip) in a folder, you should obtain something like this:

```
sets
    sets1.ipynb    
    sets1-sol.ipynb         
    jupman.py         
```

<div class="alert alert-warning">

**WARNING: to correctly visualize the notebook, it MUST be in an unzipped folder !**
</div>

2. open Jupyter Notebook from that folder. Two things should open, first a console and then a browser. The browser should show a file list: navigate the list and open the notebook `sets.ipynb`

3. Go on reading the exercises file, sometimes you will find paragraphs marked **Exercises** which will ask to write Python commands in the following cells.

Shortcut keys:

- to execute Python code inside a Jupyter cell, press `Control + Enter`

- to execute Python code inside a Jupyter cell AND select next cell, press `Shift + Enter`

- to execute Python code inside a Jupyter cell AND a create a new cell aftwerwards, press `Alt + Enter`

- If the notebooks look stuck, try to select `Kernel -> Restart`


## Creating a set

We can create a set using curly brackets, and separating the elements with commas `,`

Let's try a set of characters:

In [2]:
s = {'b','a','d','c'}

In [3]:
type(s)

set

<div class="alert alert-warning">

**WARNING: SETS ARE** ***NOT*** **ORDERED !!!**
    
**DO NOT** BELIEVE IN WHAT YOU SEE !!

</div>

Let's try printing the set:

In [4]:
print(s)

{'b', 'd', 'c', 'a'}


The output shows the order in which the print was made is different from the order in which we built the set. Also, according to the Python version you're using, on your computer it might be even different! 

This is because order in sets is NOT guaranteed: the only thing that matters is whether or not an element belongs to a set.

As a further demonstration, we may ask Jupyter to show the content of the set, by writing only the variable `s` WITHOUT `print`:

In [5]:
s

{'a', 'b', 'c', 'd'}

Now it appears in alphabetical order! It happens like so because Jupyter show variables by implicitly using  the  [pprint](https://docs.python.org/3/library/pprint.html) (_pretty_ print), which ONLY for sets gives us the courtesy to order the result before printing it. We can thank Jupyter, but let's not allow it to confuse us!

**Elements index**: since sets have no order, asking Python to extract an element at a given position would make no sense. Thus, differently from strings, lists and tuples, with sets it's NOT possible to extract an element from an index:

```python
s[0]

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-352-c9c96910e542> in <module>
----> 1 s[0]

TypeError: 'set' object is not subscriptable
```

We said that a set has only _distinct_ elements, that is without duplicates - what happens if we try to place some duplicate anyway?

In [6]:
s = {6,7,5,9,5,5,7}

In [7]:
s

{5, 6, 7, 9}

We note that Python silently removed the duplicates.

### Converting sequences to sets

As for lists and strings, we can create a `set` from another sequence:

In [8]:
set('acacia') # from string

{'a', 'c', 'i'}

In [9]:
set( [1,2,3,1,2,1,2,1,3,1] ) # from list

{1, 2, 3}

In [10]:
set( (4,6,1,5,1,4,1,5,4,5) ) # from tuple

{1, 4, 5, 6}

Again, we notice in the generated set  there are no duplicates

<div class="alert alert-info">

**REMEMBER: Sets are useful to remove duplicates from a sequence**    
   
</div>

### Mutable elements and hashes

Let's see again the definition from the beginning:

> A set is a _mutable_ _unordered_ collection of _immutable_ _distinct_ elements

So far we only created the set  using _immutable_ elements like numbers and strings.

What happens if we place some mutable elements, like lists?

```python
>>> s = { [1,2,3], [4,5] }  

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-40-a6c538692ccb> in <module>
----> 1 s = { [1,2,3], [4,5]  } 

TypeError: unhashable type: 'list'
```    

We obtain `TypeError: unhashable type: 'list'`, which literally means Python didn't manage to calculate the _hash_ of the list. What could this particular dish ever be?

**What is the hash?** The _hash_ of an object is a number that Python can associate to it, for example you can see the `hash` of an object by using the function with the same name:

In [11]:
hash( "This is a nice day" )  # string

-3262299758023616108

In [12]:
hash( 111112222223333333344444445555555555 )   # number

651300278308214397

Imagine the _hash_ is some kind of label with these properties:
 
- it is too short to completely describe the object to which it is associated (that is: given a hash label, you _cannot_ reconstruct the object it represents)
- it is enough long to identify _almost uniquely_ the object...
- ... even if in the world there _might_ be different objects  which have associated exactly the same label

**What's the relation with our sets?** The _hash_ has various applications, but typically Python uses it to quickly find an object in collections which are based on hashes, like sets and dictionaries. How much fast? Very: even with homongous sets, we always obtain an answer in a constant very short time! In other words, the answer speed _does not_ depend on the set dimension (except for pathological cases we don't review here).

This velocity is permitted by the fact that given some object to search, Python is able to rapidly calculate its _hash_ label: then, with the label in the hand, so to speak,  it can manage to quickly find in the memory store whether there are objects which have the same label. If they are found, they will almost surely be very few, so Python will only need to compare them with the searched one.

***Immutable*** **objects always have  the same hash label** from when they are created until the end of the program. Instead, the _mutable_ ones behave differently: each time we change an object, the _hash_ also changes. Imagine a market where employees place food by looking at labels and separating accordingly for example the coffee in the shelves for the breakfast and bleach in the shelves for detergents. If you are a customer and you want some coffee, you look at signs and directly go toward the shelves for breakfast stuff. Image what could happen if an evil sorcerer could transform the objects already placed into other objects, like for example the coffee into bleach (let's assume that at the moment of the transmutation the _hash_ label also changes). Much confusion would certainly follow, and, if we aren't cautious, also a great stomachache or worse.

So to offer you the advantage of a fast search while avoiding disastrous situations, Python imposes to place inside sets only objects with a stable _hash,_ that is _immutable_ objects.

**QUESTION**: Can we insert a tuple inside a set? Try to verify your intuition with a code example.

**ANSWER**: Yes, tuples are _immutable,_ so they have a corrispending _hash_ which remains stable for all the program duration, for example this is a tuple set:`{(1,2), (3,4,5)}`

Note we can consider a tuple as really immutable only if it contains elements which are also immutable.

### Empty set

<div class="alert alert-warning">

**WARNING: If you write** `{}` **you will obtain a dictionary, NOT a set !!!**    

</div>

To create an empty set we must call the function `set()`:

In [13]:
s = set()

In [14]:
s

set()

**EXERCISE**: try writing `{}` in the cell below and look at the object type obtained with `type`

In [15]:
# write here


**QUESTION**: Can we try inserting a set inside another set? Have a careful look at the set definition, then verify your suppositions by writing some code to create a set which has  another set inside.

<div class="alert alert-warning">

**WARNING: To perform the check, DO NOT use the** `set` **function, only use creation with curly brackets**

</div>

**ANSWER**: A set is _mutable,_ so we _cannot_ insert it as an element of another set (its _hash_ label could vary over time). By writing  `{{1,2,3}}` you will get an error.

**QUESTION**: If we write something like this, what do we get? (careful!)

```python
set(set(['a','b']))
```

1. a set with `a` and `b` inside
2. a set containing another set which contains `a` and `b` as elements
3. an error (which one?)

**ANSWER**: 1:

- inside we have the expression  `set(['a','b'])` which generates the set `{'a','b'}`
- outside we have the expression `set( set(['a','b'])  )` which is given the set just created, so we can rewrite it as `set({'a','b'})`
- Since `set` when used as a function expects a sequence, and a set _is_ a sequence, the external `set` takes all the elements it finds inside the sequence `{'a','b'}` we passed, and generates a new set with `'a'` and `'b'` inside.

**QUESTION**: Have a look at following expressions, and for each of them try to guess which result it produces (or if it gives an error):

1.  ```python
    {'oh','la','la'}
    ```   
1.  ```python
    set([3,4,2,3,2,2,2,-1])
    ```  
1.  ```python    
    {(1,2),(2,3)}
    ```   
1.  ```python    
    set('aba')
    ```   
1.  ```python
    str({'a'})
    ```       
1.  ```python    
    {1;2;3}
    ```   
1.  ```python    
    set(  1,2,3  )
    ```   
1.  ```python    
    set( {1,2,3} )
    ```   
1.  ```python    
    set( [1,2,3] )
    ```   
1.  ```python    
    set( (1,2,3) )
    ```   
1.  ```python    
    set(  "abc"  )
    ```   
1.  ```python    
    set(  "1232"  )
    ```   
1.  ```python    
    set( [ {1,2,3,2} ] )
    ```   
1.  ```python    
    set( [ [1,2,3,2] ] )
    ```   
1.  ```python    
    set( [ (1,2,3,2) ] )
    ```   
1.  ```python    
    set( [ "abcb"   ] )
    ```   
1.  ```python    
    set( [ "1232"   ] )
    ```   
1.  ```python    
    set((1,2,3,2))
    ```   
1.  ```python    
    set([(),()])
    ```   
1.  ```python    
    set([])
    ```   
1.  ```python    
    set(list(set()))
    ```

### Exercise - dedup

Write some brief code to create a list `lb` which contains all the elements of the list `la` without duplicates and alphabetically sorted.

- DO NOT change original list `la`
- DO NOT use cycles
- your code should work for any `la`

```python
la = ['c','a','b','c','d','b','e']
```

After your code, you should obtain:

```python
>>> print(la)
['c', 'a', 'b', 'c', 'd', 'b', 'e']
>>> print(lb)
['a', 'b', 'c', 'd', 'e']
```

In [16]:
la = ['c','a','b','c','d','b','e']

# write here

lb = list(set(la))
lb.sort()
#lb = list(sorted(set(la)))  # alternative, NOTE sorted generates a NEW sequence

print("la =",la)
print("lb =",lb)

la = ['c', 'a', 'b', 'c', 'd', 'b', 'e']
lb = ['a', 'b', 'c', 'd', 'e']


In [16]:
la = ['c','a','b','c','d','b','e']

# write here



la = ['c', 'a', 'b', 'c', 'd', 'b', 'e']
lb = ['a', 'b', 'c', 'd', 'e']


### Frozenset

<div class="alert alert-info" >

**INFO: this topic is optional for the purposes of the book**
</div>

In Python also exists _immutable_ sets which are called `frozenset`. Here we just remind that since frozensets are _immutable_ they do have associated a _hash_ label and thus they can be inserted as elements of other sets. For other info we refer to the [official documentation](https://docs.python.org/3/library/stdtypes.html#frozenset).

## Operators

|Operator| Result | Description |
|---------|-----------|-------------|
|`len`(set)|`int` | the number of elements in the set|
|el `in` set|`bool`|verifies whether an element is contained in the set|
|set <code>&#124;</code>  set| `set` | union, creates a NEW set|
|set `&` set| `set` | intersetion, creates a NEW set|
|set `-` set| `set` | difference, creates a NEW set|
|set `^` set| `set` | symmetric difference, creates a NEW set|
|`==`,`!=`|`bool`| checks whether two sets are equal or different|


### len

In [17]:
len( {'a','b','c'}  )

3

In [18]:
len( set() )

0

### Exercise - distincts


Given a string `word`, write some code that:

* prints the distinct characters present in `word` as alphabetically ordered (without the square brackets!), together with their number
* prints the number of duplicate characters found in total

Example 1 - given:

```python
word = "ababbbbcdd"
```
after your code it must print:

```
word     : ababbbbcdd
4 distincts : a,b,c,d
6 duplicates
```

Example 2 - given:

```python
word = "cccccaaabbbb"
```

after your code it must print:

```
word     : cccccaaabbbb
3 distinct : a,b,c
9 duplicates
```


In [19]:
# write here
word = "ababbbbcdd"
#word = "cccccaaabbbb"
s = set(word)
print("word     :", word)
la = list(s) 
la.sort()
print(len(s), 'distincts :', ",".join(la))
#print(len(s), 'distincts :', list(sorted(s)))  # ALTERNATIVE WITH SORTED
print(len(word) - len(s), 'duplicates')

word     : ababbbbcdd
4 distincts : a,b,c,d
6 duplicates


In [19]:
# write here



word     : ababbbbcdd
4 distincts : a,b,c,d
6 duplicates


### Union

The union operator  `|` (called _pipe_)  produces a NEW set containing all the elements from both the first and second set.

![eiu3](img/union.png)

In [20]:
{'a','b','c'} | {'b','c','d','e'}

{'a', 'b', 'c', 'd', 'e'}

Note there aren't duplicated elements

**EXERCISE**: What if we use the `+`? Try writing in a cell `{'a','b'} + {'c','d','e'}`. What happens?

In [21]:
# write here


**QUESTION**: Look at the following expressions, and for each try guessing the result (or if they give an error):

1.  ```python    
    {'a','d','b'}|{'a','b','c'}
    ```           
1.  ```python    
    {'a'}|{'a'}
    ```       
1.  ```python    
    {'a'|'b'}
    ```
1.  ```python    
    {1|2|3}
    ```   
1.  ```python    
    {'a'|'b'|'a'}
    ```       
1.  ```python    
    {{'a'}|{'b'}|{'a'}}
    ```       
1.  ```python    
    [1,2,3] | [3,4]
    ```       
1.  ```python    
    (1,2,3) | (3,4)
    ```       
1.  ```python        
    "abc" | "cd"
    ```       
1.  ```python        
    {'a'} | set(['a','b'])
    ```       
1.  ```python        
    set(".".join('pacca'))
    ```       
1.  ```python        
    '{a}'|'{b}'|'{a}'
    ```       
1.  ```python        
    set((1,2,3))|set([len([4,5])])
    ```
1.  ```python        
    {()}|{()}
    ```       
1.  ```python        
    {'|'}|{'|'}
    ```       


**QUESTION**: Given two sets `x` and `y`, the expression

```python
len(x | y) <= len(x) + len(y)
```

produces:

1. an error (which one?)
2. always `True`
3. always `False`
4. sometimes `True` sometimes  `False` according to values of `x` and `y`

**ANSWER**: 2: the number of elements from the union will always be lesser or equal to the sum of the number of elements of each single set we are going to merge, so from the `<=` comparison we will always get `True`.

### Exercise - everythingbut 1

Write some code which creates a set `s4` which contains all the elements of `s1` and `s2` but does not contain the elements of `s3`.

* Your code should work with _any_ set `s1`, `s2`, `s3`


Example - given:

```python
s1 = set(['a','b','c','d','e'])
s2 = set(['b','c','f','g'])
s3 = set(['b','f'])
```

After your code you should obtain:

```python
>>> print(s4)
{'d', 'a', 'c', 'g', 'e'}
```

In [22]:
s1 = set(['a','b','c','d','e'])
s2 = set(['b','c','f','g'])
s3 = set(['b','f'])

# write here
s4 = (s1 | s2) - s3
#print(s4)

In [22]:
s1 = set(['a','b','c','d','e'])
s2 = set(['b','c','f','g'])
s3 = set(['b','f'])

# write here



### Intersection

The intersection operator `&` produces a NEW set which contains all the common elements of the first and second set.

![okoerioe](img/intersection.png)

In [23]:
{'a','b','c'} & {'b','c','d','e'}

{'b', 'c'}

**QUESTION**: Look at the following expressions, and for each try guessing wthe result (or if it gives an error):


1.  ```python
    {0}&{0,1}
    ```
1.  ```python    
    {0,1}&{0}
    ```
1.  ```python
    set("capra") & set("campa")
    ```
1.  ```python
    set("cba") & set("dcb")
    ```
1.  ```python    
    {len([1,2,3]),4} & {len([5,6,7])}
    ```
1.  ```python    
    {1,2}&{1,2}
    ```
1.  ```python    
    {0,1}&{}
    ```
1.  ```python    
    {0,1}&set()
    ```
1.  ```python    
    set([1,2,3,4,5][::2]) & set([1,2,3,4,5][2::2])
    ```
1.  ```python        
    {((),)}&{()}
    ```
1.  ```python        
    {(())}&{()}
    ```    

### Difference

The difference operator `-` produces a NEW set containing all the elements of the first set except the ones from the second:

![3423dde](img/difference.png)

In [24]:
{'a','b','c','d'} - {'b','c','e','f','g'}

{'a', 'd'}

**QUESTION**: Look at the following expressions, and for each try guessing the result  (or if it gives an error):


    
1.  ```python
    {3,4,2}-2
    ```
1.  ```python    
    {1,2,3}-{3,4}
    ```
1.  ```python        
    '{"a"}-{"a"}'
    ```
1.  ```python    
    {1,2,3}--{3,4}
    ```
1.  ```python    
    {1,2,3}-(-{3,4})
    ```    
1.  ```python        
    set("chiodo") - set("chiave")
    ```
1.  ```python        
    set("prova") - set("prova".capitalize())
    ```
1.  ```python        
    set("BarbA") - set("BARBA".lower())
    ```
1.  ```python        
    set([(1,2),(3,4),(5,6)]) - set([(2,3),(4,5)])
    ```
1.  ```python        
    set([(1,2),(3,4),(5,6)]) - set([(3,4),(5,6)])
    ```
1.  ```python            
    {1,2,3} - set()
    ```
1.  ```python            
    set() - {1,2,3}
    ```

**QUESTION**: Given two sets `x` and `y`, what does the following code produce? An error? Is it simplifiable?
    
```python
(x & y) | (x-y)
```

**ANSWER**: We are merging the common elements between `x` and `y`, with the elements present in `x` but not in `y`. Thus, we are taking all the elements of `x`, so the expression can be greatly simplified by just writing:

```python
x
```

### Symmetric difference

The symmetric difference of two sets is their union except their intersection, that is all elements except the common ones:

![kjdfslkj](img/symmetric-difference.png)

In Python you can directly express it with the `^` operator:

In [25]:
{'a','b','c'} ^ {'b','c','d','e'}

{'a', 'd', 'e'}

Let's check the result corresponds to the definition:

In [26]:
s1 = {'a','b','c'}
s2 = {'b','c','d','e'}

(s1 | s2) - (s1 & s2)

{'a', 'd', 'e'}

**QUESTION**: Look at the following expressions, and for each try guessing the result (or if it gives an error):


1.  ```python    
    {'p','e','p','p','o'} ^ {'p','a','p','p','e'}
    ```
1.  ```python        
    {'ab','cd'} ^ {'ba','dc'}
    ```
1.  ```python    
    set('brodino') ^ set('bordo')
    ```
1.  ```python        
    set((1,2,5,3,2,3,1)) ^ set((1,4,3,2))
    ```

**QUESTION**: given 3 sets `A`, `B`, `C`, what's the expression to obtain the azure part?
    
![sewqe](img/ex-abc-common.png)

**ANSWER**: 

```python
(A & B) | (A & C) | (B & C)
```

**QUESTION**: If we use the following values in the previous exercise, what would the set which denotes the azure part contain?

```python
A = {'a','ab','ac','abc'}
B = {'b','ab','bc','abc'}
C = {'c','ac','bc','abc'}
```

Once you guessed the result, try executing the formula you obtained in the previous exercise with the provided values and compare the results with the solution.

**ANSWER**: If the formula is correct you should obtain:

```python
{'abc', 'ac', 'bc', 'ab'}
```

### Membership

As for any sequence, when we want to check whether an element is contained in a set we can use the `in` operator which returns a boolean value: 

In [27]:
'a' in {'m','e','n','t','a'}

True

In [28]:
'z' in {'m','e','n','t','a'}

False

<div class="alert alert-warning">

`in` **IS VERY FAST WHEN USED WITH SETS**

The speed of `in` operator DOES NOT depend on the set dimension
</div>

This is a substantial difference with respect to other sequences we've already seen: if you try searching for an element with `in` in strings, lists or tuples, and the element to find is toward the end (or there isn't at all), Python will have to look through the whole sequence.

#### not in

To check whether something is **not** belonging to a sequence, we can use two forms:

**not in - form 1**:

In [29]:
"carrot" not in {"watermelon","banana","apple"}

True

In [30]:
"watermelon" not in {"watermelon","banana","apple"}

False

**not in - forma 2**

In [31]:
not "carrot" in {"watermelon","banana","apple"}

True

In [32]:
not "watermelon" in {"watermelon","banana","apple"}

False

**QUESTION**: Look at the following expressions, and for each try guessing the result (or if it gives an error):


1.  ```python
    2*10 in {10,20,30,40}
    ```
1.  ```python    
    'four' in {'f','o','u','r'}
    ```
1.  ```python    
    'aa' in set('aa')
    ```
1.  ```python    
    'a' in set(['a','a'])
    ```
1.  ```python    
    'c' in (set('parco') - set('cassa'))
    ```
1.  ```python    
    'cc' in (set('pacca') & set('zucca'))
    ```
1.  ```python    
    [3 in {3,4}, 6 in {3,4} ]
    ```
1.  ```python    
    4 in set([1,2,3]*4)
    ```
1.  ```python    
    2 in {len('3.4'.split('.'))}
    ```
1.  ```python    
    4 not in {1,2,3}
    ```
1.  ```python    
    '3' not in {1,2,3}
    ```
1.  ```python    
    not 'a' in {'b','c'}
    ```
1.  ```python    
    not {} in set([])
    ```
1.  ```python    
    {not 'a' in {'a'}}
    ```
1.  ```python    
    4 not in set((4,))
    ```
1.  ```python    
    () not in set([()])
    ```

    


**QUESTION**: the following expressions are similar. What do they have in common? What is the difference with the last one (beyond the fact it is a set)?


1.  ```python    
    'e' in 'abcde'
    ```    
2.  ```python    
    'abcde'.find('e') >= 0
    ```
3.  ```python
    'abcde'.count('e') > 0
    ```
4.  ```python
    'e' in ['a','b','c','d','e']
    ```
5.  ```python
    ['a','b','c','d','e'].count('e') > 0
    ```
6.  ```python
    'e' in ('a','b','c','d','e')
    ```
7.  ```python
    ('a','b','c','d','e').count('e') > 0
    ```
8. ```python    
    'e' in {'a','b','c','d','e'}
    ```

**ANSWER**: All the expressions reported above return a boolean which is `True` if the element `'e'` is present in the sequence.

All the operations of search and/counting (`in`, `find`, `index`, `count`) on strings, lists and tuples take a search time which in the worst case like here can be equal to the sequence dimension (`'e'` is at the end).

On the other hand, since sets (expression 8.) are based on _hashes,_ they allow an immediate search, independently from the set dimension or the elements position (so creating the set with `e` at the end makes no difference).


<div class="alert alert-info">

**To make performant searches it's preferable to use hash based collections, like sets or dictionaries !**    
</div>

## Equality

We can check whether two sets are equal by using the equality operator `==`, which given two sets return `True` if they contain the same elements or `False` otherwise:

In [33]:
{4,3,6} == {4,3,6}

True

In [34]:
{4,3,6} == {4,3}

False

In [35]:
{4,3,6} == {4,3,6, 'hello'}

False

Careful about removal of duplicates !

In [36]:
{2,8} == {2,2,8}

True

To verify the inequality, we can use the `!=` operator:

In [37]:
{2,5} != {2,5}

False

In [38]:
{4,6,0} != {2,8}

True

In [39]:
{4,6,0} != {4,6,0,2}

True

Beware of duplicates and order!

In [40]:
{0,1} != {1,0,0,0,0,0,0,0}

False

**QUESTION**: Look at the following expressions, and for each try guessing the result (or if it gives an error):

1.  ```python
    {2 == 2, 3 == 3}
    ```
1.  ```python    
    {1,2,3,2,1} == {1,1,2,2,3,3}
    ```
1.  ```python
    {'aa'} == {'a'}
    ```
1.  ```python    
    set('aa') == {'a'}
    ```
1.  ```python    
    [{1,2,3}] == {[1,2,3]}
    ```
1.  ```python    
    set({1,2,3}) == {1,2,3}
    ```
1.  ```python    
    set((1,2,3)) == {(1,2,3)}
    ```
1.  ```python    
    {'aa'} != {'a', 'aa'}
    ```
1.  ```python    
    {set() != set()}
    ```
1.  ```python    
    set('scarpa') == set('capras')
    ```
1.  ```python    
    set('papa') != set('pappa')
    ```
1.  ```python    
    set('pappa') != set('reale')
    ```
1.  ```python    
    {(),()} == {(())}
    ```
1.  ```python    
    {(),()} != {(()), (())}
    ```
1.  ```python    
    [set()] == [set(),set()]
    ```
1.  ```python    
    (set('gosh') | set('posh')) == (set('shopping') - set('in'))
    ```

## Methods like operators

There are methods which behave like the operators`|`, `&`, `-`, `^` by creating a **NEW** set.

**NOTE**: differently from operators, these methods accept as parameter _any_ sequence, not just sets:

|Method| Result | Description|Related operator|
|---------|-----------|-------------|------|
|`set.union(seq)`|`set`|union, creas a NEW set |<code>&#124;</code>|
|`set.intersection(seq)`| `set`| intersection, creates a NEW set|`&`|
|`set.difference(seq)`| `set` | difference, creates a NEW set|`-`|
|`set.symmetric_difference(seq)`| `set` | symmetric difference, creates a NEW set|`^`|

Methods which **MODIFY** the first set on which they are called (and return `None`!):

|Method| Result | Description |
|---------|-----------|-------------|
|`setA.update(setB)`|`None`| union, MODIFIES `setA`|
|`setA.intersection_update(setB)` |`None` |intersection, MODIFIES `setA`|
|`setA.difference_update(setB)`| `None` | difference, MODIFIES `setA`|
|`setA.symmetric_difference_update(setB)`| `None` | symmetric difference, MODIFIES `setA`|

### union

We'll only have a look  at `union`/`update`, all other methods behave similarly

With `union`, given a set and a generic sequence (so not necessarily a set) we can create a NEW set:

In [41]:
sa = {'g','a','r','a'}

In [42]:
la = ['a','g','r','a','r','i','o']

In [43]:
sb = sa.union(la)

In [44]:
sb

{'a', 'g', 'i', 'o', 'r'}

**EXERCISE**: with `union` we can use any sequence, but that's not the case with operators. Try writing `{1,2,3} | [2,3,4]` and see what happens.

In [45]:
# write here


We can verify `union` creates a new set with Python Tutor:

In [46]:
sa = {'g','a','r','a'}
la = ['a','g','r','a','r','i','o']
sb = sa.union(la)

jupman.pytut()

### update

If we want to MODIFY the first set instead, we can use the methods ending with `update`:

In [47]:
sa = {'g','a','r','a'}

In [48]:
la = ['a','g','r','a','r','i','o']

In [49]:
sa.update(la)

In [50]:
print(sa)

{'a', 'g', 'i', 'r', 'o'}


**QUESTION**: what did the call to `update` return?

**ANSWER**: since Jupyter didn't show anything, it means the call to `update` method implicitly returned the `None` object.

Let's look what at happened with Python Tutor - we also added a `x =` to put in evidence what was returned by calling `.update`:

In [51]:
sa = {'g','a','r','a'}
la = ['a','g','r','a','r','i','o']
x = sa.update(la)
print(sa)
print(x)

jupman.pytut()

{'a', 'g', 'i', 'r', 'o'}
None


**QUESTION**: Look at the following expressions, and for each try guessing the result (or if it gives an error):


1.  ```python
    set('case').intersection('sebo') == 'se'
    ```
1.  ```python
    set('naso').difference('caso')
    ```
1.  ```python
    s = {1,2,3}
    s.intersection_update([2,3,4])
    print(s)
    ```
1.  ```python
    s = {1,2,3}
    s = s & [2,3,4]
    ```
1.  ```python
    s = set('cartone')
    s = s.intersection('parto')
    print(s)
    ```
1.  ```python
    sa = set("mastice")
    sb = sa.difference("mastro").difference("collo")
    print(sa)
    print(sb)
    ```
1.  ```python
    sa = set("mastice")
    sb = sa.difference_update("mastro").difference_update("collo")
    print(sa)
    print(sb)
    ```    

### Exercise - everythingbut 2

Given sets `s1`, `s2` e `s3`, write some code which MODIFIES `s1` so that it also contains the elements of `s2` but not the elements of `s3`:

* Your code should work with _any_ set `s1`, `s2`, `s3`
* **DO NOT** create new sets

Example - given:

```python
s1 = set(['a','b','c','d','e'])
s2 = set(['b','c','f','g'])
s3 = set(['b','f'])
```
After your code you should obtain:


```python
>>> print(s1)
{'a', 'g', 'e', 'd', 'c'}
```

In [52]:
s1 = set(['a','b','c','d','e'])
s2 = set(['b','c','f','g'])
s3 = set(['b','f'])

# write here
s1.update(s2)
s1.difference_update(s3)
print(s1)

{'d', 'a', 'g', 'c', 'e'}


In [52]:
s1 = set(['a','b','c','d','e'])
s2 = set(['b','c','f','g'])
s3 = set(['b','f'])

# write here



{'d', 'a', 'g', 'c', 'e'}


## Other methods

|Method| Result | Description |
|---------|-----------|-------------|
|[set.add(el)](#add-method)| `None`| adds the specified element - if already present does nothing|
|[set.remove(el)](#remove-method)|`None`|removes the specified element - if not present raises an error|
|[set.discard(el)](#discard-method)|`None`|removes the specified element - if not present does nothing|
|`set.pop()`|obj|removes an arbitrary element from the set and returns it|
|`set.clear()`|`None`|removes all the elements|
[setA.issubset(setB)](#issubset-method)|`bool`|checks whether `setA` is a subset of `setB`|
|[setA.issuperset(setB)](#issuperset-method)|`bool`|checks whether `setA` contains all the elements of `setB`|
|[setA.isdisjoint(setB)](#)|`bool`|checks whether `setA` has no element in common with `setB`|

### `add` method

Given a set, we can add an element with the method `.add`:

In [53]:
s = {3,7,4}

In [54]:
s.add(5)

In [55]:
s

{3, 4, 5, 7}

If we add the same element twice, nothing happens:

In [56]:
s.add(5)

In [57]:
s

{3, 4, 5, 7}

**QUESTION**: If we write this code, which result do we get?

```python
s = {'a','b'}
s.add({'c','d','e'})
print(s)
```

1. prints `{'a','b','c','d','e'}`
2. prints `{{'a','b','c','d','e'}}`
3. prints `{'a','b',{'c','d','e'}}`
4. an error (which one?)

**ANSWER**: 4 - produces `TypeError: unhashable type: 'set'` : we are trying to insert a set as element of another set, but sets are _mutable_  so their _hash_ label (which allows Python to find them quickly) might vary over time.

**QUESTION**: Look at the following code, which result does it produce?

```python
x = {'a','b'}
y = set(x)
x.add('c')
print('x=',x)
print('y=',y)
```

1. an error (which one?)
2. `x` and `y` will be the same (how?)
3. `x` and `y` will be different (how?)


**ANSWER**: 3. It will print:

```python
x= {'c', 'a', 'b'}
y= {'a', 'b'}
```

because `y=set(x)` creates a NEW set by copying all the elements in the input sequence `x`.

Let's verify with Python Tutor:

In [58]:
x = {'a','b'}
y = set(x)
x.add('c')

jupman.pytut()

### `remove` method

The `remove` method  takes the specified element out of the set. If it doesn't exist, it produces an error:

In [59]:
s = {'a','b','c'}

In [60]:
s.remove('b')

In [61]:
s

{'a', 'c'}

In [62]:
s.remove('c')

In [63]:
s

{'a'}

```python
s.remove('z')

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<ipython-input-266-a9e7a977e50c> in <module>
----> 1 s.remove('z')

KeyError: 'z'

```

### Exercise - bababiba

Given a string  `word` of exactly 4 syllabs of two characters each, create a set `s` which contains tuples with 2 characters each. Each tuple must represent a syllab taken from `word`.


* to add elements to the set, only use `add`
* your code must work for any `word` of 4 bisyllabs

Example 1 - given:

```python
word = "bababiba"
```

after your code, it must result:

```python
>>> print(s)
{('b', 'a'), ('b', 'i')}
```

Example 2 - given

```python
word = "rubareru"
```

after your code, it must result:

```python
>>> print(s)
{('r', 'u'), ('b', 'a'), ('r', 'e')}
```


In [64]:
word = "bababiba"
#word = "rubareru"

# write here

s = set()
s.add(tuple(word[:2]))
s.add(tuple(word[2:4]))
s.add(tuple(word[4:6]))
s.add(tuple(word[6:8]))
print(s)

{('b', 'a'), ('b', 'i')}


In [64]:
word = "bababiba"
#word = "rubareru"

# write here



{('b', 'a'), ('b', 'i')}


### `discard` method

The `discard` method removes the specifed element from the set. If it doesn't exists, it does nothing (we may also say it _silently_ discards the element):

In [65]:
s = {'a','b','c'}

In [66]:
s.discard('a')

In [67]:
s

{'b', 'c'}

In [68]:
s.discard('c')

In [69]:
s

{'b'}

In [70]:
s.discard('z')

In [71]:
s

{'b'}

### Exercise - trash

✪✪ A waste processing plant receives a load of `trash`, which we represent as a set of strings:

```python
trash = {'alkenes','vegetables','mercury','paper'}
```

To remove the contaminant elements which _might_ be present (NOTE: they're not always present), the plant has exactly 3 `filters` (as list of strings) which will apply in series to the trash:

```python
filters = ['cadmium','mercury','alkenes']
```

In order to check whether filters have effectively removed the contaminant(s), for each applied filter we want to see the state of the processed `trash`.

At the end, we also want to print all and _only_ the contaminants which were actually removed (put them together in the variable `separated`)


* **DO NOT** use `if` commands
* **DO NOT** use cycles (the number of filters is fixed to 3, so you can jsut copy and paste code)
* Your code must work for _any_ list `filters` of 3 elements and _any_ set `trash`

Example - given:

```python
filters = ['cadmium','mercury','alkenes']
trash = {'alkenes','vegetables','mercury','paper'}
```

After your code, it must show:

```
Initial trash: {'mercury', 'alkenes', 'vegetables', 'paper'}
Applying filter for cadmium : {'mercury', 'alkenes', 'vegetables', 'paper'}
Applying filter for mercury : {'alkenes', 'vegetables', 'paper'}
Applying filter for alkenes : {'vegetables', 'paper'}

Separated contaminants: {'mercury', 'alkenes'}
```


In [72]:

filters = ['cadmium','mercury','alkenes']
trash = {'alkenes','vegetables','mercury','paper'}

separated = trash.intersection(filters) # creates a NEW set

# write here
s = "Applying filter for"
print("Initial trash:", trash)
trash.discard(filters[0])
print(s,filters[0],":", trash)
trash.discard(filters[1])
print(s,filters[1],":", trash)
trash.discard(filters[2])
print(s,filters[2],":", trash)
print("")


print("Separated contaminants:", separated)


Initial trash: {'alkenes', 'mercury', 'vegetables', 'paper'}
Applying filter for cadmium : {'alkenes', 'mercury', 'vegetables', 'paper'}
Applying filter for mercury : {'alkenes', 'vegetables', 'paper'}
Applying filter for alkenes : {'vegetables', 'paper'}

Separated contaminants: {'alkenes', 'mercury'}


In [72]:

filters = ['cadmium','mercury','alkenes']
trash = {'alkenes','vegetables','mercury','paper'}

separated = trash.intersection(filters) # creates a NEW set

# write here



### `issubset` method

To check whether all elements in a set `sa` are contained in another set `sb` we can write `sa.issubset(sb)`. Examples:

In [73]:
{2,4}.issubset({1,2,3,4})

True

In [74]:
{3,5}.issubset({1,2,3,4})

False

<div class="alert alert-warning">

**WARNING: the empty set is always considered a subset of any other set**    
</div>

In [75]:
set().issubset({3,4,2,5})

True

### `issuperset` method

To verify whether a set `sa` contains all the elements of another set `sb` we can write `sa.issuperset(sb)`. Examples:

In [76]:
{1,2,3,4,5}.issuperset({1,3,5})

True

In [77]:
{1,2,3,4,5}.issuperset({2,4})

True

In [78]:
{1,2,3,4,5}.issuperset({1,3,5,7,9})

False

<div class="alert alert-warning">

**WARNING: the empty set is always considered a subset of any other set**    
</div>

In [79]:
{1,2,3,4,5}.issuperset({})

True

### `isdisjoint` method

A set is disjoint from another one if it doesn't have any element in common, we can check for disjointness by using the method  `isdisjoint`:

In [80]:
{1,3,5}.isdisjoint({2,4})

True

In [81]:
{1,3,5}.isdisjoint({2,3,4})

False

**QUESTION**: Given a set `x`, what does the following expression produce?

```python
x.isdisjoint(x)
```

1. an error (which one?)
2. always `True`
3. always `False`
4. `True` or `False` according to the value of `x`


**ANSWER**: 4, `True` or `False` according to the value ot`x`.

Probably you thought the expression always returns `False`: after all, how could a set ever be disjoint from itself? In fact the expression almost always returns `False` _except_ for the particular case of the empty set:

```python
x = set()
x.isdisjoint(x)
```
in which it returns `True`.

<div class="alert alert-warning">

**MORAL OF THE STORY: ALWAYS CHECK FOR THE EMPTY SET !**

For this and many other methods the empty set often causes behaviours which aren't always intuitive, so we invite you to always check case by case.

</div>

## Exercise - matrioska

✪✪ Given a list `sets` of exactly 4 sets, we define it a _matrioska_ if each set contains all the elements of the previous set (plus eventually others). Write some code which PRINTS `True` if the sequence is a matrioska, otherwise PRINTS `False`.

* **DO NOT** use `if`
* your code must work for _any_ sequence of exactly 4 sets
* **HINT**: you can create a list of 3 booleans which verify whether a set is contained in the next one ...

Example 1 - given:

```
sets = [{'a','b'}, 
        {'a','b','c'},
        {'a','b','c','d','e'},
        {'a','b','c','d','e','f','g','h','i'}]
```
after your code, it must print:

```
Is the sequence a matrioska? True
```

Example 2 - given:

```
sets = [{'a','b'}, 
        {'a','b','c'},
        {'a','e','d'},
        {'a','b','d','e'}]
```
after your code, it must print:

```
Is the sequence a matrioska? False
```


In [82]:

sets = [{'a','b'}, 
        {'a','b','c'},
        {'a','b','c','d','e'},
        {'a','b','c','d','e','f','g','h','i'}]


#sets = [{'a','b'}, 
#        {'a','b','c'},
#        {'a','e','d'},
#        {'a','b','d','e'}]


# write here

checks = [ sets[0].issubset(sets[1]),
           sets[1].issubset(sets[2]), 
           sets[2].issubset(sets[3]) ]

print("Is the sequence a matrioska?", checks.count(True) == 3)

Is the sequence a matrioska? True


In [82]:

sets = [{'a','b'}, 
        {'a','b','c'},
        {'a','b','c','d','e'},
        {'a','b','c','d','e','f','g','h','i'}]


#sets = [{'a','b'}, 
#        {'a','b','c'},
#        {'a','e','d'},
#        {'a','b','d','e'}]


# write here



### -------

<!--
## Continue

Go on with [first challenges](https://en.softpython.org/sets/sets2-chal.html)
-->