In [1]:
#Please execute this cell
import sys;
sys.path.append('../../'); 
import jupman;


#  Sets solutions


## [Download exercises zip](../../_static/sets-exercises.zip) 

[Browse files online](https://github.com/DavidLeoni/datasciprolab/tree/master/exercises/sets)


### What to do

- unzip exercises in a folder, you should get something like this: 

```

-jupman.py
-sciprog.py
-exercises
     |-sets
         |- sets-exercise.ipynb
         |- sets-solution.ipynb

```


- open the editor of your choice (for example Visual Studio Code, Spyder or PyCharme), you will edit the files ending in `_exercise.py` files
- Go on reading this notebook, and follow instuctions inside.

## introduction

A set is an _unordered_ collection of _distinct_ elements, so no duplicates are allowed.

### Creating a set
In Python you can create a set with a call to `set()`

In [2]:
s = set()

In [3]:
s

set()

To add elements, use `.add()` method:

In [4]:
s.add('hello')
s.add('world')

Notice Python represents a set with curly brackets, but differently from a dictionary you won't see colons `:` nor key/value couples:

In [5]:
s

{'hello', 'world'}

#### set from a sequence 

You can create a set from any sequence, like a list. Doing so will eliminate duplicates present:

In [6]:
set(['a','b','c','b','a','d'])

{'a', 'b', 'c', 'd'}

### Empty sets

<div class="alert alert-warning">

**WARNING**: `{}` means empty dictionary, not empty set !
</div>

Since a set print out representation starts and ends with curly brackets as dictionaries, when you see written `{}` you might wonder whether that is the empty set or the empty dictionary. 

The empty set is represented with `set()`

In [7]:
s = set()

In [8]:
s

set()

In [9]:
type(s)

set


Instead, the empty dictionary is represented as a curly bracket:


In [10]:
d = {}

In [11]:
d

{}

In [12]:
type(d)

dict

### Iterating a set

You can iterate in a set with the `for in` construct:

In [13]:
for el in s:
    print(el)

From the print out you notice sets, like dictionaries keys, are not necessarily iterated in same order as the insertion one. This also means they do not support access by index: 

```python
s[0]
```

```bash
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-61-f8bb2b116405> in <module>()
----> 1 s[0]

TypeError: 'set' object does not support indexing


```

### Adding twice

Since sets must contain distinct elements, if we add the same element twice the same remains unmodified with no complaints from Python:

In [14]:
s.add('hello')

In [15]:
s

{'hello'}

In [16]:
s.add('world')

In [17]:
s

{'hello', 'world'}

In a set we add eterogenous elements, like a numer here:

In [18]:
s.add(7)

In [19]:
s

{7, 'hello', 'world'}

To remove an element, use `.remove()` method:

In [20]:
s.remove('world')

In [21]:
s

{7, 'hello'}

### Belonging to a set

To determine if an item belongs to a set you can use the usual 'in' operator as for any other sequence:

In [22]:
'b' in set(['a','b','c','d'])

True

In [23]:
'z' in set(['a','b','c','d'])

False

There is an important difference with other sequences such as lists, though: searching for an item in a set is always very fast, while searching in a list in the worst case requires Python to search the whole list.

There is a catch though: to get such performance you are obliged to only put in the set immutable data, such as numbers, strings, etc. If you try to add a mutable type like i.e. a list, you will get an error:

```python
s = set()
s.add( ['a','b','c'] )
```

```bash
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-34-b345c7f28446> in <module>
----> 1 s.add( ['a','b','c'] )

TypeError: unhashable type: 'list'


```


### Operations

You can perform set `.union(s2)`, `.intersection(s2)`, `.difference(s2)` ...

**NOTE: set operations which don't have 'update' in the name create a NEW set each time!!!**

In [24]:
s1 = set(['a','b','c','d','e'])
print(s1)

{'a', 'e', 'c', 'b', 'd'}


In [25]:
s2 = set(['b','c','f'])

In [26]:
s3 = s1.intersection(s2)  # NOTE: it returns a NEW set !!!
print(s3)

{'c', 'b'}


In [27]:
print(s1)  # did not change

{'a', 'e', 'c', 'b', 'd'}



#### updating sets

If you do want to change the original, you have to use `intersection_update`:

In [28]:
s4 = set(['a','b','c','d','e'])
s5 = set(['b','c','f'])
res = s4.intersection_update(s5)  #NOTE: this MODIFIES s4 and thus return None !!!!
print(res) 

None


In [29]:
print(s4)

{'c', 'b'}


### Exercise: set operators

Write some code that creates a set `s4` which contains all the elements of `s1` and `s2` but does not contain  the elements of `s3`. Your code should work with _any_ `s1`,`s2`,`s3`.

With 

```python
s1 = set(['a','b','c','d','e'])
s2 = set(['b','c','f','g'])
s3 = set(['b','f'])
```

After you code you should get

```python
{'d', 'a', 'c', 'g', 'e'}
```

In [30]:
s1 = set(['a','b','c','d','e'])
s2 = set(['b','c','f','g'])
s3 = set(['b','f'])

# write here
s4 = s1.union(s2).difference(s3)
print(s4)

{'g', 'a', 'e', 'c', 'd'}


### Exercise: dedup

Write some short code to create a `listb` which contains all elements from `lista`  without duplicates and sorted alphabetically.

- MUST NOT change original lista
- no cycles allowed !
- your code should work with any `lista`

```python
lista = ['c','a','b','c','d','b','e']
```

after your code, you should get

```bash
lista = ['c', 'a', 'b', 'c', 'd', 'b', 'e']
listb = ['a', 'b', 'c', 'd', 'e']
```

In [31]:
lista = ['c','a','b','c','d','b','e']

# write here
s = set(lista)
listb = list(sorted(s))  # NOTE: sorted generates a NEW sequence
print("lista =",lista)
print("listb =",listb)

lista = ['c', 'a', 'b', 'c', 'd', 'b', 'e']
listb = ['a', 'b', 'c', 'd', 'e']
