[Day 1 afternoon **_"Python crash course part 2"_**]

<a name="data containers"></a>Data Containers
===

<a name="top"></a>Outline
---

* [Data Containers](#data containers)
  * [Lists](#lists)
  * [Accessing list elements](#accessing elements)
  * [List operations](#list operations)
  * [List comprehensions](#list comprehensions)
  * [Tuples and sets](#sets)
  * [Dictionaries](#dictionaries)
* [Exercises 04: Data Containers](#exercise04)

**Learning goals:** By the end or this lecture you will
* have learned the concept of an _iterable_ in Python
* know how to store and access variables in lists
* know how to create sofisticated lists automatically
* know the difference between a list, a tupe, a set and a dictionary

<a name="lists"></a>Lists
---

* lists are collections of items, stored in a variable
* there is no restrictions on what can be stored in a list
* lists are declared using squared brackets
* items in a list are separated with commas

_Everything in Python that can be perceived as a collection of different items is an **iterable**_ (i.e. we can _iterate_ over its items)

##### Example 

In [1]:
data_science = ['reproducible', 'transparent', 'open', 'scalable']
for trait in data_science:
    print ("data science is {}".format(trait))

data science is reproducible
data science is transparent
data science is open
data science is scalable


##### List index

You can access a specific item in a list using its **index** (i.e. position) in the list.  
**NOTE:** the list index starts with zero.

In [2]:
first_trait = data_science[0]
print(first_trait)

reproducible


You can also use negative indices to traverse the list from end to start.

In [3]:
last_trait = data_science[-1]
print(last_trait)

scalable


##### Index error 

In [4]:
data_science[4]

IndexError: list index out of range

If you try to access an index that lies outside the length of the list, you will get an **IndexError**.  
**HINT:** you can check the number of elements in a list using the ```len()``` functionality.

In [1]:
num_elements = len(data_science)
print('data science has {} traits'.format(num_elements))

NameError: name 'data_science' is not defined

[top](#top)

<a name="accessing elements"></a>Accessing elements
---

##### Looping over lists 

This is one of the most important concepts related to lists. You can have a list with a million items in it, and in three lines of code you can write a sentence for each of those million items.  

We use a loop to access all the elements in a list. A loop is a block of code that repeats itself until it runs out of items to work with, or until a certain condition is met. In this case, our loop will run once for every item in our list.

In [None]:
for trait in data_science:
    print(trait)

What happened?
* ```for``` is a keyword in python that is used to start a _loop_.
* ```trait``` is a temporary variable that is only known to the program _within_ the loop.
* The temporary variable has the value of each of the items in the list for each run of the loop respectively.
* The loop will run four times as the number of items in the list is four and then finish.  

The ```for``` keyword is also an example of a control element (more on that later).  
**NOTE:** The colon after the first line is how python knows that the control element just ended.  

##### Inside and outside loops

Python uses _indentation_ to recognize which parts of the code are inside and outside a loop. Use at least a ```tab``` (and not an single whitespace) to indent your code to make the structure clearly recognizeable!

In [None]:
for trait in data_science:
    print('important trait:')
    print(trait + '\n')

print("It's time to change the example list...")

In [None]:
cool_tools = ['jupyter', 'git', 'xenodo', 'bokeh']

##### Enumerating 

While looping through a list, you might be interested in the index of the current item. There are two neat ways of achieving that:

In [None]:
# access the index of a certain item in the list
print(data_science.index('open'))

In [None]:
# access every index of every item in the list
for index, tool in enumerate(cool_tools):
    print('{}. cool tool is {}'.format(index, tool))

We can also modify values the loop _returns_ to us:

In [None]:
print("actually zero doesn't make much sense for an index...\n" )
for index, tool in enumerate(cool_tools):
    print('{}. cool tool is {}'.format(index + 1, tool))

##### Iteration keywords 

You can use two additional keywords _inside_ loops like a ```for``` (and later a ```while``` loop):  
* **break** will immediately _end_ the loop and exit it
* **continue** will _skip_ to the next iteration step

##### Slicing lists 

A very powerful concept for accessing lists is slicing. It allows you to access any subset of the items in the list. For this we access the list using
* the start index
* the stop index
* the stepsize (which is optional and 1 per default)  

```[start : stop (: step)]```

In [None]:
numbers = [1, 2, 3, 4, 5, 6, 7]
# items in the middle of the list using only start and stop
print(numbers[2:5])

# all even numbers 
print(numbers[1:-1:2])

# first two items
print(numbers[0:2])

#  last four items (notice: ommiting 'end' means 'until list end')
print(numbers[-4:])

##### Copying lists

Note that you can assign list slices to variables and you can make a copy of the whole list like this:

In [None]:
# assign a new variable to a slice
strings = ['here', 'there', 'and', 'all', 'over']
view = numbers[0:2]
print(view)

# copy the list using the slicing syntax
copied_strings = strings[:]
print(copied_strings)

[top](#top)

<a name="list operations"></a>List operations
---

##### Modifying elements 
You can change the value of any list element by accessing it via its index:

In [None]:
print(cool_tools)
cool_tools[2] = 'scipy'
cool_tools[3] = 'numpy'
print(cool_tools)

##### Checking for existence 

If you don't know, if a certain item is in a list, you can use the **in** keyword to check

In [None]:
print('numpy' in cool_tools)
print('xenodo' in cool_tools)

##### Adding items to a list 

If you want to add items to a list you can use
* **insert()** to add an item at a certain position
* **append()** to add an item to the end of the list
* **extend()** to append another list to your list

In [None]:
string_list = ['a', 'b', 'c']
number_list = [1, 2, 3]

In [None]:
# insert
string_list.insert(2,'z')
print(string_list)

In [None]:
# append
string_list.append('y')
print(string_list)

In [None]:
# extend
string_list.extend(number_list)
print(string_list)

##### Removing items from a list 

If you want to remove items from a list, you can use
* **remove()** to remove a certain item by value
* **del** to remove an item at a certain index
* **pop()** to remove and return the last item in the list

In [None]:
cool_tools = ['jupyter', 'git', 'xenodo', 'bokeh']
cool_tools.remove('jupyter')
print(cool_tools)

In [None]:
del cool_tools[1]
print(cool_tools)

In [None]:
last_item = cool_tools.pop()
print(last_item)

##### Empty lists 

Sometimes it can be useful to initialize an empty list that can be filled with items later on.

In [None]:
letters = ['a', 'b', 'c', 'd']
# here we initialize an empty list
alphabet = []

# we fill the new list in the loop
for letter in letters:
    alphabet.append(letter + letter)
    
print(alphabet)

##### Sorting and reversing lists 

We can sort lists depending on their content. Strings will be sorted alphabetically by default, numbers numerically. We have two options:  
* **sorted()** will keep the original order or the elements
* **sort()** will modify the order of the elements

In [None]:
# using sorted()

letters = ['b','a','z','c','y']
# Print the letters in alphabetical order but keep the original order
print('Letters in alphabetical order')
for letter in sorted(letters):
    print(letter)
    
print('\nLetters in reverse alphabetical order')    
# Print the letters in reverse alphabetical order but keep original order
for letter in sorted(letters, reverse=True):
    print(letter)
    
print('\nOriginal list order')    
# Show that the original order is preserved
for letter in letters:
    print(letter)

In [None]:
numbers = [10, 2, 5, 3]

# reverse() reverses the order of the list and is permanent
numbers.reverse()
print(numbers)

In [None]:
# increasing order using sort()
numbers.sort()
print(numbers)

# decreasing order
numbers.sort(reverse=True)
print(numbers)

##### Functionality for numerical lists

For lists containing only numbers, we have some special helper functions:
* **range(start, stop, step)** helps us create large lists of numbers
* **min()** returns the smallest item in a list
* **max()** returns the largest item in a list
* **sum()** returns the sum of all items in the list

In [None]:
# using the range() function to print the first ten odd numbers:
for number in range(1,21,2):
    print(number)

To turn this into a list, we can use the **list()** function

In [None]:
# create a list of the first 15 numbers.
# NOTE: starts with 0!
numbers = list(range(15))
print(numbers)

# print min, max and sum
print('the minimum is: {}\nthe maximum is: {}\nthe sum is: {}'\
     .format(min(numbers), max(numbers), sum(numbers)))

[top](#top)

<a name="list comprehensions"></a>List comprehensions
---

Creating more complicated lists manually is tedious. If we wanted to create a list of the first ten square numbers, it takes us three lines of code:

In [None]:
# make an empty list to store the squares
squares = []

# loop through the numbers, square them and append them to the list
for number in range(10):
    squares.append(number**2)
    
# make sure the result is correct    
print(squares)

We can do that in a more _pythonic_ way, using a _list comprehension_:

In [None]:
squares = [number**2 for number in range(10)]
print(squares)

What did just happen?
* the part after the **for** keyword initializes a loop that runs ten times and returns the next number every time
* the part before the **for** keyword performs an operation on the number
* the square brackets "channel" the results into a list which is then assigned to the variable ```squares```

##### A few more examples

In [None]:
# dividing every number by 2
double = [number/2 for number in numbers]
print(double)

# adding the sum of numbers to every item
sums = [number + sum(numbers) for number in numbers]
print(sums)

# works with strings too
names = ['deb', 'dimitra', 'jana']
narcicists = [name + ' the great' for name in names]
print(narcicists)

[top](#top)

<a name="sets"></a>Tuples and sets
---

##### Tuples

* Basically, tuples are lists that can never be changed (e.g. _immutable_). 
* This can be handy in some cases where you want to make sure, that the vaules/positions in your container stay consistent.
* Tuples are denoted by round brackets ().

In [1]:
very_important_parameters = (1,2,3)
print(very_important_parameters[0])

1


##### Sets

* Sets are **unordered** collections of **unique** objects.
* Sets are very efficient for membership testing.
* Set can be used to perform mathematical operations like union and intersection.
* Sets are denoted by curly brackets {}.
* Sets can be created from other containers using the **set()** function.

In [None]:
# sets contain only unique elements
letters1 = ['a','b','c','d','b','a']
letters1 = set(letters1)
print(letters1)

letters1.add('e')
print(letters1)

letters1.add('a')
print(letters1)

In [None]:
# membership testing
'b' in letters1

In [None]:
# operations on sets:
letters2 = {'c','d','e','f','g','h'}

# intersection
print(letters1.intersection(letters2))

# union
print(letters1.union(letters2))

[top](#top)

<a name="dictionaries"></a>Dictionaries
---

* Dictionaries are a way to store information that is connected in some way.
* Dictionaries store information in key-value pairs.
* Dictionaries do not store information in any particular order.
* Syntax: ```{key : value} ```.
* Like with indices in lists, we can access values by using the key to access the dictionary.

##### Example 

In [None]:
# a dictionary of the word and meaning of different data containers
# in python
concepts = {'list': 'A mutable, ordered collection of values',
            'tuple': 'An immutable, ordered collection of values',
            'set': 'An unordered collection of unique values',
            'dictionary': 'An unordered collection of {key:value} pairs'}

# access the value by using dictionary[key]
for key in concepts.keys():
    meaning = concepts[key]
    print('Word: {}'.format(key))
    print('Meaning: {}\n'.format(meaning))

##### Adding and removing items

In [None]:
# add a new entry
concepts['function'] = 'A named set of instructions'

# remove an existing entry
del concepts['tuple']

# print the dictionary
for key in concepts.keys():
    print('{} : {}'.format(key, concepts[key]))

[top](#top)

<a name="exercise04"></a>Exercise 04: Data Containers
---

0. **Git**
  1. Switch to the branch you created for today.
  2. Create a new notebook for the exercise and add it to the index.
1. **Lists and loops**
  1. Create a list containing strings that form a sentence.
  2. Concatenate all the list items into a single string by looping over the list.
  3. Create a list containing integers.
  4. Perform a mathematical operation on every item in the list by looping over the list.
  5. Print the result of the calculation for every iteration of the loop.
  6. Make a list with all integers from zero to ten.
  7. Print all even and then all odd numbers in the list. Print only the first 4 and then the last 4 elements in the list.
  8. (Optional) Set every second element in the list to zero using a slice.
2. **List operations**
  1. Create a list containing strings using some combination of ```append()```, ```insert()``` and ```extend()```.
  2. Create a second empty list. Loop over the first list and multiply each item with its index. Store the results in the new list.
  3. Print the length of the new list and the total number of characters in all strings of the list.  
  HINT: strings themselves can be perceived as lists and have a ```len()```.
  4. Create a list of the first 50 even numbers using ```range()```. Calculate the sum of the numbers once using a for loop and once using the ```sum()``` function.
3. **List comprehensions**
  1. Create a list of the first ten cubes using a list comprehension.
  2. (Optional) Make a list with the first 100 elements of the fibonacci series. What are the problems when trying to use a list comprehension for this?
4. **Tuples and sets**
  1. Try to change the value of an item in a tuple.
  2. Make two sets of ten letters each, five letters should be similar, five different for each set.
  3. Calculate and print the union and intersection of the two sets.
  4. (Optional) Use set operations to show that the symmetric difference of two sets is equivalent to the union of both relative complements of the sets.
5. **Dictionaries**
  1. Look at the ```airport.dat``` file located in the folder /day1 in the repository. Create a dictionary with the ariport name shortform (for example 'LHR' for London Heathrow) as key and the latitute/longitude stored in a tuple as value. The dictionary should contain at least 3 different airports.
  2. Loop through the dictionary and print the information in a meaningful way.
  3. (Optional) Can you figure out an automated way to read the data-file into a dictionary?
6. **Git**
  1. Commit the document with the exercise to the current working branch with a meaningful commit message.  

[top](#top)

[Kudos to **Aron Ahmadia** (US Army ERDC) and **David Ketcheson** (KAUST) from whom I copied shamelessly]