<a href="https://colab.research.google.com/github/CometSplit/DS2500/blob/main/listsDictionariesSets.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

DS 2500: Notebook 3

Basic Python

Prof. Marina Kogan

based in part on materials by Prof. Alex Lex

In this lecture we will introduce more data structures: lists, sets, and dictionaries.

## 1. Lists

Now we'll take a look at a compound data type: lists.

**A list is a collection of items.** Another word commonly used for a list in other programming languages is an array (though there are differences between lists and arrays in many languages).

**Lists are created with square brackets `[]` and can be accessed via an index:**

In [None]:
beatles = ["Paul", "John", "George", "Ringo"]
# printing the whole array
print(beatles)
# printing the first element of that array, at index 0
print(beatles[0])
# third element, at index 2
print(beatles[2])
# access the last element
print(beatles[-1])
# access the second-to-last element
print(beatles[-2])

['Paul', 'John', 'George', 'Ringo']
Paul
George
Ringo
George


If we try to address an index outside of the range of an array, we get an error:

In [None]:
beatles[4]

IndexError: list index out of range

Sometimes, it makes sense to pre-initialize an array of a certain size:

In [None]:
[0] * 10

[0, 0, 0, 0, 0, 0, 0, 0, 0, 0]

There is also a handy shortcut for quickly initializing lists. This uses the [range()](https://docs.python.org/3/library/functions.html#func-range) function.

We can also create **slices of an array with the slice operator `:`**

```python
a[start:end] # items start through end-1
a[start:]    # items start through the rest of the array
a[:end]      # items from the beginning through end-1
a[:]         # a copy of the whole array
```

There is also the step value, which can be used with any of the above:

```python
a[start:end:step] # start through not past end, by step
```

See [this post](http://stackoverflow.com/questions/509211/explain-pythons-slice-notation) for a good explanation on slicing.

In [None]:
# Get the slice from 0 (included) to 2 (excluded)
beatles[:2] # this can also be written as [0:2]

['Paul', 'John']

In [None]:
# Sclice from index 2 (3rd element) to end
beatles[2:]

['George', 'Ringo']

In [None]:
# A copy of the array
beatles[:]

['Paul', 'John', 'George', 'Ringo']

The slice operations return a new array, the original array is untouched:

In [None]:
beatles

['Paul', 'John', 'George', 'Ringo']

Slicing outside of a defined range returns an empty list:

In [None]:
beatles[4:9]

[]

Strings can be treated similar to arrays with respect to indexing and slicing:

In [None]:
paul = "Paul McCartney"
paul[0:4]

'Paul'

Lists (in contrast to strings) are mutable.

That means **we can change the elements that are contained in a list**:

In [None]:
beatles[1] = "JohnYoko"
beatles

['Paul', 'JohnYoko', 'George', 'Ringo']

This does not work with strings, strings are immutable:

In [None]:
# This will return an error
paul[1] = "o"

TypeError: 'str' object does not support item assignment

Lists can also be **extended with the `append()` function**:

In [None]:
beatles.append("George Martin")
beatles

['Paul', 'JohnYoko', 'George', 'Ringo', 'George Martin']

Lists can be **concatenated**:

In [None]:
zeppelin = ["Jimmy", "Robert", "John", "John"]
supergroup = beatles + zeppelin
supergroup

['Paul',
 'JohnYoko',
 'George',
 'Ringo',
 'George Martin',
 'Jimmy',
 'Robert',
 'John',
 'John']

We can **check the length** of a list using the built-in [`len()`](https://docs.python.org/3.3/library/functions.html#len) function:

In [None]:
len(supergroup)

9

Lists can also be **nested**:

In [None]:
bands = [beatles, zeppelin]
bands

[['Paul', 'JohnYoko', 'George', 'Ringo', 'George Martin'],
 ['Jimmy', 'Robert', 'John', 'John']]

In fact, lists can be of hybrid data types, which, however, is something that you typically shouldn't do:

In [None]:
bad_bands = bands + [1, 0.3, 17, "This is bad"]
# this list contains lists, integers, floats and strings
bad_bands

[['Paul', 'JohnYoko', 'George', 'Ringo', 'George Martin'],
 ['Jimmy', 'Robert', 'John', 'John'],
 1,
 0.3,
 17,
 'This is bad']



### **Exercise 1: Lists**

* Create a list for the Rolling Stones: Mick, Keith, Charlie, Ronnie.
* Create a slice of that list that contains only members of the original lineup (Mick, Keith, Charlie).
* Add the stones lists to the the bands list.

## 2. Tuples

[Tuples](https://docs.python.org/3/tutorial/datastructures.html#tuples-and-sequences) are a list-like data structure that are, in contrast to lists, **immutable**.

The purpose of tuples is to store objects of different types. Remember that Lists should only contain homogeneous data; Tuples are designed for the heterogeneous case.

Also, Tuples have practical implications for performance and HashTables.

In [None]:
person = "Dave", 2001, "Computer Science"
person

('Dave', 2001, 'Computer Science')

Initialization with brackets is prefered, since it's more explicit:

In [None]:
person = ("Dave", 2001, "Australia")
person

('Dave', 2001, 'Australia')

We can access them just like arrays:

In [None]:
person[1]

2001

We cannot, however change values. This throws a **TypeError**.

In [None]:
# throws TypeError
person[1] = 1985

TypeError: 'tuple' object does not support item assignment

Arbitrary objects can be part of a tuple:

In [None]:
train_schedule = ("Train 1", [9,11])
# this works because we're modifying the mutable array within the immuatable tuple.
train_schedule[1][0] = 15
train_schedule

('Train 1', [15, 11])

Of course, that includes tuples:

In [None]:
train_schedule = ("Train 1", (9,11))
# this doesn't work
train_schedule[1][0] = 15
train_schedule

TypeError: 'tuple' object does not support item assignment

## 3. Sets

A [set](https://docs.python.org/3/tutorial/datastructures.html#sets) is a mutable collection, similar to a list, however, it is
 * **not ordered**, and
 * **cannot contain the same element twice**

Here is an example:

In [None]:
# Initialize a set with {}
beatles = {"John", "Paul", "Ringo", "George"}
beatles

{'George', 'John', 'Paul', 'Ringo'}

In [None]:
# Initialize the set with an array
usernames = set(["Jimmy", "Robert", "John", "John"])
usernames

{'Jimmy', 'John', 'Robert'}

We've initialized the set `usernames` with an array of names. We have chosen a set, because we don't want to have duplicate user names. However, **in the second example, the array included a duplicate — John was specified twice**. We can see, however, that **John was added to the set only once**.

Also note the order of the elements in the first set:

Initialized with:
```python
{"John", "Paul", "Ringo", "George"}
```
But the output isn't ordered:
```python
{"John", "Paul", "Ringo", "George"}
```

Sets are great for various tasks. For example, they can be used to remove duplicate entries from lists. Most importantly, they let you very efficiently check whether an element already exists.

A set works based on a mathematical function that produces a "hash code". This hash code is then used as an index to an array. For example, "Jimmy" could hash to the value 13, and accordingly, Jimmy would be put at the 13th index of an array. When we want to test whether "Jimmy" is already in a set, we simply compute the hash, which will again produce 13, and then look up whether something is stored at index 13.

We can check whether a set contains a value using the `in` keyword:

In [None]:
"Jimmy" in usernames

True

In [None]:
"Ringo" in usernames

False

We can add values using the add function on a set:

In [None]:
usernames.add("JohnB")
usernames

{'Jimmy', 'John', 'JohnB', 'Robert'}

And remove elements with the remove function:

In [None]:
usernames.remove("John")
usernames

{'Jimmy', 'JohnB', 'Robert'}

If the set doesn't contain a key we want to remove, it will throw a `KeyError`.

In [None]:
usernames.remove("Joseph")

KeyError: 'Joseph'

To prevent that, it is advisable to first check whether a set actually contains a value, if you're not 100% sure:

In [None]:
if ("Joseph" in usernames):
    usernames.remove("Joseph")

In [None]:
usernames

{'Jimmy', 'JohnB', 'Robert'}

We can iterate over the values of a set. Note, however, that no guarantee about the order of the set is made.

In [None]:
for name in usernames:
    print (name)

Jimmy
JohnB
Robert


Make sure to check out the [documentation](https://docs.python.org/3.5/library/stdtypes.html#set) to see what else a set can do.

## **Exercise 2: Sets**

Write a function that finds the overlap of two sets and prints them. Initialize two sets, e.g., with values {13, 25, 37, 45, 13} and {14, 25, 38, 8, 45} and call this function with them.

## 3. Dictionaries

[Dictionaries](https://docs.python.org/3/tutorial/datastructures.html#dictionaries) are related to sets, but are more powerful: in addition to the key used to identify an element in a set, dictionaries also store a value associated with a key. Other terms commonly used for dictionaries are *associative arrays*, *(hash) maps*, and *hash tables*.

Here is a simple example:

In [None]:
musicians = {"John":"Zeppelin", "Jimmy":"Zeppelin", "Paul":"Beatles", "Ringo":"Beatles"}
musicians

{'John': 'Zeppelin',
 'Jimmy': 'Zeppelin',
 'Paul': 'Beatles',
 'Ringo': 'Beatles'}

As we can see, a dictionary can be created with curly brackets and a list of key-value pairs, separated by a `:`. Here, the names are the keys, the bands are the values.

There are other ways of creating a dictionary. Here, we pass a list of tuples to the dictionary, but we could also pass a list of lists.

In [None]:
a=[("Thom", "Radiohead"), ("Dave", "Foo Fighters")]
more_musicians = dict(a)
more_musicians

{'Thom': 'Radiohead', 'Dave': 'Foo Fighters'}

Of course, a dictionary can be of any data type. Here is an example with int as keys, floats as values:

In [None]:
numbers = {3:1.45, 4:1.32, 19:9.97, 6:9.99}
numbers

{3: 1.45, 4: 1.32, 19: 9.97, 6: 9.99}

Note that it's generally not a good idea to use floats as keys, as they are stored only as approximations.

Dict elements are accessed just as elements in a list, with square brackets, but instead of the index, we pass in the key:

In [None]:
numbers[3]

1.45

In [None]:
musicians["John"]

'Zeppelin'

We can add elements to a dict:

In [None]:
musicians["Thom"] = "Radiohead"
musicians

{'John': 'Zeppelin',
 'Jimmy': 'Zeppelin',
 'Paul': 'Beatles',
 'Ringo': 'Beatles',
 'Thom': 'Radiohead'}

And remove them using the `del` keyword:

In [None]:
del musicians["Thom"]
musicians

{'John': 'Zeppelin',
 'Jimmy': 'Zeppelin',
 'Paul': 'Beatles',
 'Ringo': 'Beatles'}

Again, we have to worry about key errors. If we want to remove Thom again, we'd get a `KeyError`.

In [None]:
del musicians["Thom"]

KeyError: 'Thom'

We can access a list of keys and values separately:

In [None]:
musicians.keys()

dict_keys(['John', 'Jimmy', 'Paul', 'Ringo'])

Notice that the result is not a list or a set, but a [view object](https://docs.python.org/3/library/stdtypes.html#dict-views). A view object always is updated when the dictionary is changed, and we can use it to iterate over a dictionary.

In [None]:
for musician in musicians.keys():
    print(musician)

John
Jimmy
Paul
Ringo


This also works with `values()` and `items()`:

In [None]:
musicians.values()

dict_values(['Zeppelin', 'Zeppelin', 'Beatles', 'Beatles'])

In [None]:
musicians.items()

dict_items([('John', 'Zeppelin'), ('Jimmy', 'Zeppelin'), ('Paul', 'Beatles'), ('Ringo', 'Beatles')])

The latter is especially handy for iterating over the key-value pairs in a dictionary:

In [None]:
# notice that we iterate over tuples and assign the elements of the tuple to k and v, respectively.
for k, v in musicians.items():
    print (k + ", " + v)

John, Zeppelin
Jimmy, Zeppelin
Paul, Beatles
Ringo, Beatles


Another way to write the previous expression would be like this:

In [None]:
for k in musicians.keys():
    print(k + ", " +  musicians[k])

John, Zeppelin
Jimmy, Zeppelin
Paul, Beatles
Ringo, Beatles


Make sure to check out [the dictionary documentation](https://docs.python.org/3/library/stdtypes.html#typesmapping) for more info.

### **Exercise 3: Dictionaries**

 * Create a dictionary with two-letter codes of two of US states and the full names, e.g., UT: Utah, NY: New York
 * After initially creating the dictionary, add two more states to the dictionary.
 * Create a second dictionary that maps the state codes to an array of cities in that state, e.g., UT: [Salt Lake City, Ogden, Provo, St. George].
 * Write a function that takes a state code and prints the full name of the state and lists the cities in that state.

## 4. Revisiting Lists: List Comprehension

Now that we know about loops, we can also take a look at [list comprehension](https://docs.python.org/3.5/tutorial/datastructures.html#list-comprehensions). List comprehension can be used to initialize and transform arrays.



In [None]:
# _ is customary for a variable name if you don't need it
[0 for _ in range(10)]

[0, 0, 0, 0, 0, 0, 0, 0, 0, 0]

In [None]:
["John" for _ in range(10)]

['John',
 'John',
 'John',
 'John',
 'John',
 'John',
 'John',
 'John',
 'John',
 'John']

In [None]:
# we can also make use of values we iterate over
[i*100+32 for i in range(10)]

[32, 132, 232, 332, 432, 532, 632, 732, 832, 932]

We can, for example use functions in place of a variable. Here we initialize an array of random numbers in the unit interval:

In [None]:
import random
rands = [random.random() for _ in range(10)]
rands

[0.22350870528444533,
 0.28202211760031737,
 0.2808106733917789,
 0.3900586414989149,
 0.5522770075672638,
 0.3553197933315855,
 0.6658153653541674,
 0.7542598792539316,
 0.06765684103805703,
 0.579904893258964]

You can also use list comprehension to create a list based on another list:

In [None]:
[x*10 for x in rands]

[2.2350870528444533,
 2.820221176003174,
 2.808106733917789,
 3.900586414989149,
 5.522770075672638,
 3.5531979333158548,
 6.6581536535416745,
 7.542598792539316,
 0.6765684103805703,
 5.79904893258964]

## Exercise 4: List Comprehension

Write a list comprehension function that creates an array with the length of the words in the following sentence:

In [None]:
sentence = "the quick brown fox jumps over the lazy dog"
word_list = sentence.split()
word_list

['the', 'quick', 'brown', 'fox', 'jumps', 'over', 'the', 'lazy', 'dog']

In [None]:
# your solution