# Lists and Sets

## *"['da', 'da', 'da']"* 
*(– Franz Lis[z]t)*

# Built-in data types: Lists

A list is one of the most versatile data structure in Python. It is a composite data structure, meaning that its basic parts are other data structures. At the same time it's simple, and it's easy to relate lists to how computers actually work.

In many respects, strings can be thought of as lists of characters. 

# Lists

* are enclosed within square brackets: `[]`
* contain **elements** separated by commas
* can contain *anything* as element. Even other lists...
* are **mutable**: i.e., you can change an element without creating a new list
* have an **index** (=position within the list) for each element 
* are **ordered**: there is a first, second, ..., last element

In [1]:
things_I_like = ['bread',
                 42,
                 [
                     ['eggs for breakfast', 'eggs for lunch', 'eggs for dinner']
                 ]
                ]

print(things_I_like)

['bread', 42, [['eggs for breakfast', 'eggs for lunch', 'eggs for dinner']]]


### Building a list

There are several ways of creating a list.

1. define the entire list
2. start empty and append incrementally
3. repeat one element
4. define a range
5. list comprehension (later)

In [3]:
catman_list = ["C", "A", "T", "M", "A", "N"]
print(catman_list)

['C', 'A', 'T', 'M', 'A', 'N']


Incrementally constructing the list

In [4]:
catman_list_incr = [] # or: list()
catman_list_incr.append("C")
catman_list_incr.append("A")
print(catman_list_incr)
catman_list_incr.append("T")
catman_list_incr.append("M")
catman_list_incr.append("A")
catman_list_incr.append("N")
print(catman_list_incr)

['C', 'A']
['C', 'A', 'T', 'M', 'A', 'N']


Repetititive lists

In [6]:
lotsa_ones = [1] * 10
print(lotsa_ones)

[1, 1, 1, 1, 1, 1, 1, 1, 1, 1]


Ranges

The `range()` function returns an **iterator**, starting from `0` (or a defined start) up to **but not including** the end number

In [11]:
print(range(10))
print(list(range(1,11)))

range(0, 10)
[1, 2, 3, 4, 5, 6, 7, 8, 9, 10]


### Activity

* create a list of the Beatles' members and assign it to the variable `beatles`
* create a list `let_it_be` that contains the lyrics "Let it be" 38 times
* create a list with the numbers from 50 to 100 (both included!)

In [18]:
# your code here
beatles = ['john', 'paul', 'george', 'the other']
let_it_be = ['Let it be'] * 38
numbers = list(range(50, 101))

## Common list operations




### Conceptual model of lists 

A list consists of a number of slots in which you put *references* to objects, such as numbers, strings, or even other lists.

| Index | 0 | 1 | 2 | 3 | 4 | 5 |
|:------|:--|:--|:--|:--|:--|:--|
| Object| C | A | T | M | A | N |

In [51]:
numbers = [1.3, 3, 5, 8]
shows = ['Master Chef', 'Breaking Bad', 'Rick & Morty', 'Archer']

## Element Retrieval

Subsets of lists can be retrieved using the **slice operators**.

#### Single element retrieval

We can retrieve any element simply by its position. 

NOTE: the first element is at position `0`! No, that does not make sense. Yes, it will be confusing. But that's ok.

In [23]:
print(shows)
print(shows[0])

['Master Chef', 'Breaking Bad', 'Rick & Morty', 'Archer']
Master Chef


DOUBLE NOTE: the **last** element of *any* list has the index **`-1`**. The second to last has index `-2`, and so on. That does make a little sense, and it can be *very* handy.

In [25]:
print(shows)
print(shows[-2])

['Master Chef', 'Breaking Bad', 'Rick & Morty', 'Archer']
Rick & Morty


### Activity

- add your favorite show to `shows`
- retrieve *Rick & Morty* from `shows` and print it
- retrieve the third element (the number `5`) from `numbers` and print it

In [31]:
# your code here
print(shows)
print(numbers)

shows.append("Grey's Anatomy, \"The real one\"")
print(shows)
print(shows[-1])
print(numbers[2])

['Master Chef', 'Breaking Bad', 'Rick & Morty', 'Archer', 'Grey\'s Anatomy, "The real one"', 'Grey\'s Anatomy, "The real one"', 'Grey\'s Anatomy, "The real one"']
[1.3, 3, 5, 8]
['Master Chef', 'Breaking Bad', 'Rick & Morty', 'Archer', 'Grey\'s Anatomy, "The real one"', 'Grey\'s Anatomy, "The real one"', 'Grey\'s Anatomy, "The real one"', 'Grey\'s Anatomy, "The real one"']
Grey's Anatomy, "The real one"
5


### Remove specific element by value

In [36]:
shows.append("Norsemen")
shows = ["Norsemen"] + shows
print(shows)
shows.remove("Norsemen")
shows.remove('Grey\'s Anatomy, "The real one"')

print(shows)

['Norsemen', 'Master Chef', 'Breaking Bad', 'Rick & Morty', 'Archer', 'Grey\'s Anatomy, "The real one"', 'Grey\'s Anatomy, "The real one"', 'Grey\'s Anatomy, "The real one"', 'Norsemen', 'Norsemen', 'Norsemen', 'Norsemen']
['Master Chef', 'Breaking Bad', 'Rick & Morty', 'Archer', 'Grey\'s Anatomy, "The real one"', 'Grey\'s Anatomy, "The real one"', 'Norsemen', 'Norsemen', 'Norsemen', 'Norsemen']


Naturally, you need to know the element is in there, otherwise, we get an error:

#### Slices

We can retrieve a **sublist** by defining a range of list indices, separated by a colon `[<START>: <END>]`

NOTE: the `<END>` element is **not** included.

Getting the three first elements

In [54]:
print(shows[0:3])
# Alternative syntax if starting at 0:
print(shows[:3])

['Master Chef', 'Breaking Bad', 'Rick & Morty']
['Master Chef', 'Breaking Bad', 'Rick & Morty']


Getting everything *but* the last element

In [55]:
print(shows[:-1])

['Master Chef', 'Breaking Bad', 'Rick & Morty']


Getting the last three elements

In [58]:
print(catman_list[-3:])
# Alternative syntax if you know where to start
print(catman_list[3:])

['M', 'A', 'N']
['M', 'A', 'N']


### Appending

To add a single element to the end of a list, use `append()`, as before

In [52]:
print("list before we append: {}".format(numbers))
numbers.append(11)
print(numbers)

list before we append: [1.3, 3, 5, 8]
[1.3, 3, 5, 8, 11]


### Extending

If we want to add more than one element, we can use `extend()`

In [48]:
himym = ['season 1', 'season2', 'season3']
print(himym)
new_seasons = ['season 4', 'season5', 'season6']
himym.extend(new_seasons)
print(himym)

['season 1', 'season2', 'season3']
['season 1', 'season2', 'season3', 'season 4', 'season5', 'season6']


You can reach the same effect by concatenating lists with `+`

In [60]:
mixed_members = numbers + catman_list
print(mixed_members)

[1.3, 3, 5, 8, 11, 'C', 'A', 'T', 'M', 'A', 'N']


## Activity

* retrieve seasons 3 thru 5 from `himym`

In [61]:
# your code here
print(himym[2:5])

['season3', 'season 4', 'season5']


### Removing and returning the last element

In [45]:
print(shows)
print(shows.pop())
print(shows)
L = []
L.pop()

['Master Chef', 'Breaking Bad', 'Rick & Morty', 'Norsemen', 'Norsemen', 'Norsemen']
Norsemen
['Master Chef', 'Breaking Bad', 'Rick & Morty', 'Norsemen', 'Norsemen']


IndexError: pop from empty list

You can also remove the `n`th element, by using `n` as an argument to `pop()`

In [62]:
print(shows)
print(shows.pop(shows.index('Grey\'s Anatomy, "The real one"')))
shows.remove(shows[3])
print(shows)

['Master Chef', 'Breaking Bad', 'Rick & Morty', 'Archer']


ValueError: 'Grey\'s Anatomy, "The real one"' is not in list

## Activity

* Discuss with your neighbor: If you were to implement a stack of cards as a list, how would you draw the last (top) card put on it?
* If it was a queue, how would you retrieve the last person to enter it?

In [35]:
shows.remove('Hanna Montana')

ValueError: list.remove(x): x not in list

### Checking for elements

To see whether an element is contained in a list, we can use the keyword `in`

In [37]:
print('Breaking Bad' in shows)
print('Good Eats' in shows)

True
False


### Finding the index of an element

In [63]:
print(shows, shows.index('Norsemen'))

ValueError: 'Norsemen' is not in list

If an element is not in a list, we get an error

In [64]:
print(shows.index('Vikings'))

ValueError: 'Vikings' is not in list

### Length of list

We have already seen the function `len()`, but it also works for lists:

In [65]:
print(len(numbers))

5


### Converting between strings and lists

Python treats strings as list of characters, and that can be very handy: everything we can do with lists, we can do with strings!

In addition, we can convert back and forth between the two.

To create a list from a string, use `split()` and define the split character

In [69]:
wise_words = "My hovercraft is full of eels"
wise_list = wise_words.split()
print(wise_list)

['My', 'hovercraft', 'is', 'full', 'of', 'eels']


To join a list (of strings) into a string, use `join()`. It has a funny syntax, since the joining character comes first

In [73]:
fruits = ["Apple", "Orange", "Pear"]
joining_char = '|'
print(joining_char.join(fruits))

# Alternative syntax
print(str.join(joining_char, fruits))

Apple|Orange|Pear
Apple|Orange|Pear


## Activity

* split the string below on commas and write the result to a list
* re-assemble the list as string by joining the elements with the pipe symbol `|`

In [79]:
lessons = "learn to juggle,learn to use chainsaws,learn about one-hand-clapping"
# your code here
list_of_lessons = lessons.split(',')
print('|'.join(list_of_lessons))

learn to juggle|learn to use chainsaws|learn about one-hand-clapping


## Sets

Sets are collections of elements like lists, but with with some different properties, examined pictorially below. 

### Sets are unordered

<img src="sets_are_unordered.png">

### Sets contain no duplicates

<img src="sets_contain_no_dups.png">

### Membership test is fast

<img src="membership_test_is_fast.png">

### Initialization

Sets can be instantiated empy with `set()`, by putting elements in curly braces `{}`, or by casting a list as `set`

In [82]:
empty_set = set()
primes = {2, 3, 5, 7, 11, 13}
up_to_ten = set([1, 2, 3, 4] + [4, 5, 6, 7, 8, 9, 10])

sentence = 'a good sentence is a sentence with good words'
tokens = sentence.split()
types = set(tokens)
print(tokens, types, set(sentence))

['a', 'good', 'sentence', 'is', 'a', 'sentence', 'with', 'good', 'words'] {'sentence', 'a', 'good', 'is', 'words', 'with'} {'w', 't', 'h', 'o', 'd', 'g', 'a', 'r', 'n', 'e', 'i', 'c', ' ', 's'}


Casting a list as set is also a great way of getting rid of duplicates:

In [83]:
lunches_this_week = ['Pizza', 'Burger', 'Pasta', 'Pizza', 'Roast squirrel', 'Pasta', 'Salad']
unique_lunches = set(lunches_this_week)
print(len(lunches_this_week), len(unique_lunches), unique_lunches)

7 5 {'Roast squirrel', 'Salad', 'Pizza', 'Burger', 'Pasta'}


## Set operations

<img src="set_ops.png"/>

### Intersection
Common elements in two sets

In [85]:
print(primes, up_to_ten)
print(primes & up_to_ten)
# Alternative syntax: 
print(primes.intersection(up_to_ten))

{2, 3, 5, 7, 11, 13} {1, 2, 3, 4, 5, 6, 7, 8, 9, 10}
{2, 3, 5, 7}
{2, 3, 5, 7}


### Difference 
Removing elements in one set from the other

In [88]:
print(up_to_ten, primes)
print(up_to_ten - primes)
# Alternative syntax: 
print(up_to_ten.difference(primes))

{1, 2, 3, 4, 5, 6, 7, 8, 9, 10} {2, 3, 5, 7, 11, 13}
{1, 4, 6, 8, 9, 10}
{1, 4, 6, 8, 9, 10}


### Union
Combine elements that are in either set

In [89]:
print(primes, up_to_ten)
print(primes | up_to_ten)
# Alternative syntax: 
print(primes.union(up_to_ten))

{2, 3, 5, 7, 11, 13} {1, 2, 3, 4, 5, 6, 7, 8, 9, 10}
{1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 13}
{1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 13}


### Subsets
Checking if all elements of one set are contained in another

In [90]:
print(primes.issubset(up_to_ten.union(primes)))

True


### Activity

Given the sets of flying animals, birds and mammals,
- retrieve the set of flightless birds and 
- the set of flying mammals.

Use set operations like `.intersection`, `.union` or `.difference`

BONUS: Try to do it in one single line.

In [None]:
fly = {"bat","crow","eagle","sparrow","finch"}
bird = {"crow","eagle","sparrow","finch","penguin","ostrich"}
mammal = {"bat","possum","cow","whale"}

#your code here


### Modifying elements

In [None]:
print(primes)
primes.add(17)
print(primes)

In [None]:
print(up_to_ten)
up_to_ten.remove(10)
print(up_to_ten)

In [None]:
print(up_to_ten.pop())

### Variables $\neq$ values

While people ofte refer to an assignment as *saving* a value in a variable, in Python, the ***value*** leads an existence independent from the ***variable***, which is merely a *reference* to the value. It is therefore possible to have more than one variable referencing the same value. 

E.g.: Kaj and Pia like the same kinds of dishes. 

In [None]:
kajs_favorite_dishes = ["falafel"]
pias_favorite_dishes = kajs_favorite_dishes.copy()
print(kajs_favorite_dishes, pias_favorite_dishes)

At some point Pia's taste evolves radically:

In [None]:
pias_favorite_dishes.append("squirrel")
print(pias_favorite_dishes)

Now Kaj has a problem…

In [None]:
print(kajs_favorite_dishes)

Solution: use `copy()` when you create a new list based on an existing one.

In [None]:
pias_favorite_dishes = kajs_favorite_dishes.copy()