# Computer Class 1a - basic Python and NumPy

The examples and exercises of this computer class introduce the student to working with the Python Standard Library and the NumPy library. It can be used in conjunction with chapters 1-4 of the McKinney book.

*Authors: Cees Diks and Bram Wouters, Faculty Economics and Business, University of Amsterdam (UvA)* <br>
*Copyright (C): UvA (2020)* <br>
*Credits: some of the examples and formulations are taken from McKinney and/or the material of the Computational Finance course by Simon Broda (UvA)*

# 1. Basics

## Arithmetic
The basic arithmetic operations are `+`, `-`, `*`, `/`, and `**` for exponentiation:

**Exercise 1:** calculate $\frac{2 \cdot (3-1)^4}{\sqrt{25}}$.

In [1]:
2*(3-1)**4/25**(1/2)

6.4

## Printing

**Examples:** when executing a cell, in many cases the last line is printed as output. One can use the ``print`` statement to display additional output. In the cell below one finds examples of print statements.

In [2]:
print('This is a line.') # Printing a string

a = 137
b = [1,2,6,3,7,4,5,3]

print(a) # Printing an integer
print(b) # Printing a list
print('') # Creating an empty line
print(a, b) # Printing and integer and a list

print('This is a line with an integer,',a ,'and a list,', b)

print('This is a line with an integer, %i and a list, %s' % (a, b))  # Same line, using a tuple

print('This is a line with an integer, %.3f and a list, %s \n' % (a, b)) # Same line, turning the integer into a float
                                                                         # with three decimals.
                                                                         # Note the \n to create and extra empty line

print('This is a line with an integer, {0} and a list, {1}'.format(a, b))  # Same line, using a different syntax

print('This is a line with an integer,',a ,'and a list,', b, sep='') # Same line, without additional spaces

This is a line.
137
[1, 2, 6, 3, 7, 4, 5, 3]

137 [1, 2, 6, 3, 7, 4, 5, 3]
This is a line with an integer, 137 and a list, [1, 2, 6, 3, 7, 4, 5, 3]
This is a line with an integer, 137 and a list, [1, 2, 6, 3, 7, 4, 5, 3]
This is a line with an integer, 137.000 and a list, [1, 2, 6, 3, 7, 4, 5, 3] 

This is a line with an integer, 137 and a list, [1, 2, 6, 3, 7, 4, 5, 3]
This is a line with an integer,137and a list,[1, 2, 6, 3, 7, 4, 5, 3]


**Exercise 2:** print a line similar to the examples above, but now with three insertions: an integer, a float and a list.

In [3]:
print('This is a line with an integer, %i, a float, %f and a list, %s' % (a, a, b))

This is a line with an integer, 137, a float, 137.000000 and a list, [1, 2, 6, 3, 7, 4, 5, 3]


## Types

Everything in Python is an object. Each object is of a certain type. Here's a list of Python types you will often use:
* integer number (int)
* decimal number (float)
* boolean (bool)
* string of characters (str)
* list of objects (list)
* tuple of objects (tuple)
* dictionary (dict)
* set of objects (set)
* function (function)

**Examples:** the function ``type`` gives the type of the object in its argument.

In [4]:
type(a) # Functions take one or more inputs/arguments (in parentheses) and return an output.

int

In [5]:
type(b)

list

In [6]:
c = 'ThIs is A stRinG! #$$3@6**#9&@'

type(c)

str

## Attributes and methods

Objects in Python typically have attributes and methods. Attributes are other Python objects stored inside the object. Methods are functions associated with the object, which have access to the internal data of the object. Which attributes and methods an object has depends on its type. So str objects have different attributes/methods than list objects, for example. In this tutorial, we will be mainly concerned with methods of objects.

**Exercise 3:** to get an overview of the attributes/methods associated with a certain object, one can use the Tab key. For example, to see which methods are associated to an integer you can type `a.<Tab>` or `int.<Tab>`. Try both. Try `b.<Tab>` and `c.<Tab>` as well and note the differences.

**Example:** there are two equivalent syntactic expressions to access a method. We illustrate this with the string method `upper`, which transforms all lowercase characters into uppercase characters.

In [7]:
print(str.upper(c))

print(c.upper())

THIS IS A STRING! #$$3@6**#9&@
THIS IS A STRING! #$$3@6**#9&@


**Example:** two equivalent syntactic expressions to count the occurrences of the integer 3 in the list `b`.

In [8]:
print(list.count(b,3))

print(b.count(3))

2
2


**Example:** applying a method to an object of the wrong type will give you an error.

In [9]:
c.count(3)

TypeError: must be str, not int

**Exercise 4:** apply a single method to `c` that transforms all lowercase letters to uppercase letters, and vice versa. (Hint: use the Tab key to find the appropriate method.)

In [12]:
c.swapcase()

'tHiS IS a STrINg! #$$3@6**#9&@'

**Exercise 5:** count how often the character 's' occurs in `c`.

In [13]:
c.count('s')

3

**Example:** it is possible to apply multiple methods subsequently. For this, one can also use the two equivalent syntactic expressions. Note the order of methods in the second expression.

In [14]:
str.count(str.upper(c),'s')

0

In [15]:
c.upper().count('s')

0

**Example:** accessing an attribute does not require parentheses, because attributes are other objects already stored inside the object.

In [16]:
a.real # Getting the real part of the numeric quantity a.

137

## Built-in functions

The Python Standard Library has a number of built-in functions that can be called at any time (as opposed to attributes and methods, which can only be called in association with an object of the correct type). Click [here](https://docs.python.org/3.6/library/functions.html) for a list and description of all built-in functions.

**Example**: using `len` to calculate the length of iterable objects.

In [17]:
print(len(b))
print(len(c))

8
30


**Exercise 6:** use a built-in function to determine the sum of the numbers in `b`.

In [18]:
sum(b)

31

**Exercise 7:** use built-in functions to determine the difference between the maximum and minimum value of `b`.

In [19]:
max(b)-min(b)

6

**Exercise 8:** use a built-in function to turn the string object `c` into a list of characters and print the result.

In [20]:
print(list(c))

['T', 'h', 'I', 's', ' ', 'i', 's', ' ', 'A', ' ', 's', 't', 'R', 'i', 'n', 'G', '!', ' ', '#', '$', '$', '3', '@', '6', '*', '*', '#', '9', '&', '@']


# 2. Built-in types

This section introduces the most important built-in types of the Python Standard Library.

## Numeric types

**Example:** Computers distinguish between integers and floating point numbers. Python integers can be arbitrary large (will use as many bits as necessary). Python floats are between $\pm 1.8\cdot 10^{308}$, but are stored with just 64 bits of precision. Hence, not all real numbers can be represented, and floating point arithmetic is not exact.

In [21]:
a = 1
type(a)

int

In [22]:
a = 1.0
type(a)  # Note that variables can change type: a was an integer before

float

In [23]:
a-0.9 # Example of the non-exactness of floating point arithmetic in Python. 

0.09999999999999998

**Example:** If any of the operands is a float, then Python will convert the others to float, too.

In [24]:
print(4/2.0)

2*(3-1.0)**2

2.0


8.0

**Example:** in Python 3.x (as opposed to Python 2.x) division `/` of two integers returns a float. One uses ``//`` for integer division and `%` to obtain the remainder.

In [25]:
print(6/3) # Ordinary division turns integers into a float.
print(5//3) # Integer division, discarding the remainder.
print(5 % 3) # Obtaining the remainder of integer division.

2.0
1
2


## Booleans

The two boolean type objects in Python are written as `True` and `False`.

**Example**: comparisons and other conditional expressions evaluate to either `True` or `False`.

In [26]:
3 > 4.0

False

In [27]:
a <= 1

True

In [28]:
not(1 < 2)

False

**Example:** Boolean values can be combined with `and` and `or`.

In [29]:
True and False

False

In [30]:
True or False

True

In [31]:
1 < 2 and 2 < 1 

False

## Strings

**Example:** we have seen already examples of strings.

In [32]:
b = 'This is the first half'
c = 'and this is the second half.'

**Exercise 9:** use the `+` to combine the string `b` and `c` into a single string.

In [33]:
b + c

'This is the first halfand this is the second half.'

**Exercise 10:** the resulting string is missing a space in between the words 'half' and 'and'. Correct this by inserting a third string into the sum.

In [34]:
b + ' ' + c

'This is the first half and this is the second half.'

**Exercise 11:** use the built-in function `str` to turn the integer number 8471 into a string. Call the resulting object `d`.

In [35]:
d = str(8471)

**Exercise 12:** use the built-in function `isinstance` to check whether the variable `d` refers to an object of string type.

In [36]:
isinstance(d, str)

True

## Lists

A list is an ordered sequence of Python objects. You can define them using square brackets `[]` or the built-in function `list`.

**Example:** defining lists using `[]`. Note that a list can contain Python objects of different type, including sequence-type objects like lists themselves. Also note the `None`, which is the only Python object of NoneType, and which is often used to mark the absence of a value. In a list, you can consider it as an empty entry.

In [37]:
list1 = [7,4,3,9,0,1,6,2,3,6,8,0,5]
list2 = ['April','May','July','August']
list3 = [4, 6.5, 'butterfly', [4,5], None]

**Exercise 13:** use the list method `append` to attach the integer 17 to the end of `list1`. Print the result.

In [38]:
list1.append(17)
list1

[7, 4, 3, 9, 0, 1, 6, 2, 3, 6, 8, 0, 5, 17]

**Exercise 14:** use the list method `insert` to include the string 'June' in the correct position of `list2`. Note that the index of sequence-like objects in Python starts at 0. Print the result.

In [39]:
list2.insert(2, 'June')
list2

['April', 'May', 'June', 'July', 'August']

**Example:** checking whether `list3` contains the object `6.5`.

In [40]:
6.5 in list3

True

**Exercise 15:** check whether the integer `9` is not an element of `list1`. The answer should be `False`.

In [41]:
9 not in list1

False

**Exercise 16:** use `+` to concatenate `list1`, `list3` and `list2` (in that order), and print the result.

In [42]:
print(list1 + list3 + list2)

[7, 4, 3, 9, 0, 1, 6, 2, 3, 6, 8, 0, 5, 17, 4, 6.5, 'butterfly', [4, 5], None, 'April', 'May', 'June', 'July', 'August']


**Exercise 17:** use the list method `extend` to add the 4 last months of the year to `list2`. A single line of code should be enough. Print the result.

In [43]:
list2.extend(['September','October','November','December'])
print(list2)

['April', 'May', 'June', 'July', 'August', 'September', 'October', 'November', 'December']


**Example:** with indices within brackets `[]` one can access specific elements of sequence-like/iterable objects. One can also use this to re-define objects within a list.

In [44]:
print(list3[0])
print(list3[1])
print(list3[2])
print(list3[-1]) # Negative indices start at the end of a list and count backwards.

list3[0] = [1,2,3] # Replacing the first element of c_list by a list if length 3.
print(list3)

4
6.5
butterfly
None
[[1, 2, 3], 6.5, 'butterfly', [4, 5], None]


**Example:** using the `:` symbol to select slices of lists. 

In [45]:
print(list1[2:7])
print(list2[0:1])
print(list1[:7])
print(list1[2:])
print(list3[-2:])
print(list1[2:7:2]) # An (optional) third integer gives the step size of the slicing.
print(list1[2::2])

list3[1:3] = [1.23, 'spider','flower'] # Replacing the second and third element of list3 by three new elements.
print(list3)

[3, 9, 0, 1, 6]
['April']
[7, 4, 3, 9, 0, 1, 6]
[3, 9, 0, 1, 6, 2, 3, 6, 8, 0, 5, 17]
[[4, 5], None]
[3, 0, 6]
[3, 0, 6, 3, 8, 5]
[[1, 2, 3], 1.23, 'spider', 'flower', [4, 5], None]


**Exercise 18:** the slicing also works for strings, where a string is seen as a sequence of unicode characters. Below we have defined the string `alphabet`. Use the slicing syntax to create a string with the first, third, fifth, seventh, etc., letter of the alphabet.

In [46]:
alphabet = 'abcdefghijklmnopqrstuvwxyz'

alphabet[::2]

'acegikmoqsuwy'

**Exercise 19:** complete `list2` with the three missing months of the year and make sure they are in the correct position. Print the result.

In [47]:
list2[0:0] = ['January', 'February','March']
print(list2)

['January', 'February', 'March', 'April', 'May', 'June', 'July', 'August', 'September', 'October', 'November', 'December']


**Exercise 20:** use the list method `sort` to sort `list1` and print the result.

In [48]:
list1.sort()

print(list1)

[0, 0, 1, 2, 3, 3, 4, 5, 6, 6, 7, 8, 9, 17]


## Tuples

Like a list, a tuple is an ordered sequence of Python objects. Crucially, unlike a list, a tuple is an immutable object. This means that once the tuple is defined, its length and its objects cannot be changed anymore.

**Example:** One can define a tuple using commas only. Parentheses `()` are optional.

In [49]:
tuple1 = 1, 2, ['tree','house',9.9] , 4, 'king'
tuple2 = ('queen', 'door', 'leaf')

**Exercise 21:** use the built-in function `tuple` to create a tuple out of the string `alphabet` that was defined earlier. Print the result.

In [50]:
print(tuple(alphabet))

('a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z')


**Exercise 22:** try to append the integer 6 to `tuple1`, like you did earlier for a list. You will encounter an error, because a tuple is immutable.

In [51]:
tuple1.append(6)

AttributeError: 'tuple' object has no attribute 'append'

## Dictionaries

A dictionary (`dict`) is a mapping from (immutable) Python objects, called *keys*, to other Python objects, called *values*. 

**Example:** creating a dictionary and printing the keys and values. Note that the dictionary methods `keys` and `values` create an iterable object, which can be turned into a list with the built-in function `list`.

In [52]:
d1 = {'a' : 'blue', 'b' : [1,2,3,4], 3 : ('al','pha','bet')}

print('The keys of the dictionary are: {0}'.format(list(d1.keys())))

print('The values of the dictionary are: {0}'.format(list(d1.values())))

The keys of the dictionary are: ['a', 'b', 3]
The values of the dictionary are: ['blue', [1, 2, 3, 4], ('al', 'pha', 'bet')]


**Example:** instead of the index for a list, inserting the key in the brackets `[]` returns the associated value.

In [53]:
print(d1['a'])

print(d1['b'])

print(d1[3])

blue
[1, 2, 3, 4]
('al', 'pha', 'bet')


**Example:** one method for adding entries to the dictionary, and two different methods for deleting entries. Note that the `pop` method returns the deleted value of the dictionary.

In [54]:
d1['c'] = 'red' # Adding a dictionary entry.
print(d1)

del d1['a'] # Deleting a dictionary entry.
print(d1)

popped_value = d1.pop(3) # Deleting a dictionary entry using the `pop` method.
print('The deleted value is: {0}.'.format(popped_value))
print(d1)

{'a': 'blue', 'b': [1, 2, 3, 4], 3: ('al', 'pha', 'bet'), 'c': 'red'}
{'b': [1, 2, 3, 4], 3: ('al', 'pha', 'bet'), 'c': 'red'}
The deleted value is: ('al', 'pha', 'bet').
{'b': [1, 2, 3, 4], 'c': 'red'}


**Example:** one often encounters a situation with two equal-length lists (or tuples) that need to be combined into a single dictionary. One can use the built-in function `zip`, which creates a so-called iterator over 2-tuples. Iterators are subtle objects, but for now all you need to know is that you can turn them into an iterable object by (for example) applying `list` to them. Subsequently, the built-in function `dict` creates a dictionary.

In [55]:
list1 = ['a','b','c']
list2 = [1,2,3]

print(list(zip(list1,list2))) # Showing that the zip function creates a 'list' of tuples of length 2.

dict(zip(list1,list2))

[('a', 1), ('b', 2), ('c', 3)]


{'a': 1, 'b': 2, 'c': 3}

**Exercise 23:** use the above example to create two dictionaries. The first dictionary, called `digit_to_word`, has as keys the digits (integers) 0,1,...,9 and as values the words zero, one,...,nine. For the second dictionary, called `word_to_digit`, the words are the keys and the integers are the values. Print the resulting dictionaries.

In [56]:
list1 = ['zero','one','two','three','four','five','six','seven','eight','nine']
list2 = [0,1,2,3,4,5,6,7,8,9]

word_to_digit = dict(zip(list1,list2))
digit_to_word = dict(zip(list2,list1))

print(word_to_digit)
print(digit_to_word)

{'zero': 0, 'one': 1, 'two': 2, 'three': 3, 'four': 4, 'five': 5, 'six': 6, 'seven': 7, 'eight': 8, 'nine': 9}
{0: 'zero', 1: 'one', 2: 'two', 3: 'three', 4: 'four', 5: 'five', 6: 'six', 7: 'seven', 8: 'eight', 9: 'nine'}


## Sets

A set is an unordered list of unique Python objects.

**Example:** a set can be created in two ways. One can use curly braces or the built-in function `set`. Note that duplicates are deleted automatically.

In [57]:
set1 = {4,7,1,4,6} 
set2 = set([1,2,3,4,5,4,3,2,1])
set3 = set('abcde')

print(set1)
print(set2)
print(set3)

{1, 4, 6, 7}
{1, 2, 3, 4, 5}
{'c', 'd', 'e', 'a', 'b'}


**Example:** adding elements to the set.

In [58]:
set1.add(8)
set2.add(5) # Since 5 is already in set2, this operation does not change the set.

print(set1)
print(set2)

{1, 4, 6, 7, 8}
{1, 2, 3, 4, 5}


**Example:** basic operations from set theory are available to objects of the `set` type through methods.

In [59]:
print(set1.union(set2))

print(set1.intersection(set2))

set1.issubset(set2)

{1, 2, 3, 4, 5, 6, 7, 8}
{1, 4}


False

# 3. Control flow

Control flow statements are an essential part of Python. In this section we introduce to most important ones.

## `if-elif-else` statements

Python uses colons `:` to end a conditional statement and tabs for expressions that depend on the conditional statement.

**Exercise 24:** make sure you understand the `if-elif-else` structure below by running the code for different values of `x`. Also note the use of colons `:` and tabs.
* The `if` block is executed if and only if the first condition is true.
* The optional `elif` (short for 'else if') block is executed if and only if the first condition is false and the second one is true. There could be more than `elif` blocks.
* The optional `else` block is executed if and only if none of the others was. 

In [60]:
x = 3

if x < 0:
    print("You have entered a negative number.")
elif x > 9:
    print("You have entered a number greater than 9.")
else:
    print("Thank you. You entered %s." %x)

Thank you. You entered 3.


**Exercise 25:** use the variable `y` to define a Python object. If `y` is a list or a string, the code should proceed as follows: if `y` has length less than or equal to 5, print `y`; if `y` has length between 6 and 10, print the first 5 elements of `y`; if the list is longer than 10, print the sentence 'Wrong size.'. If `y` is neither a list or a string, print the sentence 'This is not a list or a string.'. Make sure your code processes all options correctly by explicitely testing for different `y`. (Hint: use the built-in function `isinstance` twice to test whether `y` is a list or a string.)

In [61]:
y = [6,8,3,'snake',0,2,'33',1.2,4,'python']
#y = 'kddffasfdfsdfafef'
#y = 'kddffasdf'
#y = 12
#y = 12345678901234567890
#y = ('python', 123, [1,2,3,4])

if isinstance(y,list) or isinstance(y,str):
    if len(y) <= 5:
        print(y)
    elif len(y)<=10:
        print(y[:5])
    else:
        print('Wrong size.')
else:
    print('This is not a list or a string.')

[6, 8, 3, 'snake', 0]


## `while` loops

`while` loops are similar to `if-else` statements, but jump back to the `while` statement after the `while` block is finished. There are two ways to exit a `while` loop: when the condition becomes false or with a `break` statement. In the first case, an optional `else` block can be executed.

**Example:** a simple `while` loop. Note that the integer 4 is not printed anymore.

In [62]:
i=0

while i < 4:
    print(i)
    i += 1

0
1
2
3


**Exercise 26:** run the code below, try different inputs and make sure you understand what the code does. Then, adjust the code such that it can deal with float inputs as well, instead of giving an error. If the input is a float number, the code should tell the user 'This is not an integer.' and start over again. (Hint: observe that the `input` function returns a string. If the string contains a number that is non-integer, acting with `int` on it gives an error. Use `float` instead of `int` and subsequently use the float method `is_integer` to check whether the float is a whole number or not.)

In [63]:
x = -1

while x < 0 or x > 9:
    x = float(input("Enter an integer between 0 and 9: ")) # The built-in input function enables realtime user input.
    if x.is_integer():
        if x < 0:
            print("You have entered a negative number.")
        elif x > 9:
            print("You have entered a number greater than 9.")
    else:
        x = -1
        print("This is not an integer.")
        
x = int(x)
print("Thank you. You entered %s." % x)

Enter an integer between 0 and 9: 3
Thank you. You entered 3.


**Example:** this is an alternative implementation, using `continue` (which skips the remainder of the loop and goes back to `while`) and `break`. The `break` only exits the innermost loop. Potential other loops are not exited.

In [64]:
while True:  # Change to True to run.
    x = float(input("Enter an integer between 0 and 9: "))
    if x.is_integer():
        if x < 0:
            print("You have entered a negative number.")
            continue  # Skip remainder of loop body and go back to `while`.
        if x > 9:
            print("You have entered a number greater than 9.")
            continue
    else:
        print("This is not an integer.")
        continue
    x = int(x)
    print("Thank you. You entered %s." % x)
    break  # Exit innermost enclosing loop.

Enter an integer between 0 and 9: 3
Thank you. You entered 3.


## `for` loops

A `for` loop can only iterate over iterable objects, or simply called iterables. Examples of iterables are lists, strings and tuples:

**Example:** a simple `for` loop iterating over a string, including a conditional `break`. In this example `letter` is called the loop variable. Every time the loop body is executed, the loop variable assumes the next value of the sequence.

In [65]:
for letter in "Python":
    if letter == 'o':
        break
    print(letter)

P
y
t
h


**Example:** `for` loops are typically used to execute a block of code a pre-specified number of times. Note that for a conditional block consisting of a single line, after the colon there is no need for an Enter and Tab. 

In [66]:
squares = [] # Creating an empty list, which will be filled in the for loop.

for i in [0,1,2,3,4,5,6,7,8,9]: squares.append(i**2)
    
print(squares)

[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]


**Example:** the built-in function `range` creates a lazy iterable object. It is lazy because the elements of the object are only created when they are called. Ranges can be turned into other iterable objects, like lists. Iterating over them in a `for`-loop can be done directly.

In [67]:
print(list(range(10))) # Turning a range into a list. 
print(tuple(range(10))) # Turning a range into a tuple.
print(set(range(10))) # Turning a range into a set.
print(list(range(3,10))) # Defining a range with an (optional) starting point and a (required) end point.
print(list(range(3,10,3))) # Defining a range with an (optional) step size.

for i in range(3,10,3): # Using a for loop to iterate over a range.
    print(i)

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
(0, 1, 2, 3, 4, 5, 6, 7, 8, 9)
{0, 1, 2, 3, 4, 5, 6, 7, 8, 9}
[3, 4, 5, 6, 7, 8, 9]
[3, 6, 9]
3
6
9


**Exercise 27:** what is computed in the cell below?

In [68]:
n = 7
f = 1

for i in range(n):
    f *= i+1
    
print(f)

5040


Answer: 7 factorial.

**Exercise 28:** use the earlier defined `digit_to_word` dictionary to print all digits (0,1,...,9) whose corresponding word as defined in `digit_to_word` consists of exactly four letters. Print both the digit and the word. (Hint: a dictionary is already an iterable object. In a for loop the keys become the loop variable.)

In [71]:
digit_to_word

{0: 'zero',
 1: 'one',
 2: 'two',
 3: 'three',
 4: 'four',
 5: 'five',
 6: 'six',
 7: 'seven',
 8: 'eight',
 9: 'nine'}

In [72]:
for key in digit_to_word: # The keys of a dictionary are automatically the loop variable.
    if len(digit_to_word[key]) == 4:
        print(key, digit_to_word[key])

0 zero
4 four
5 five
9 nine


In [73]:
# A more elegant solution to the exercise above.
for key, value in digit_to_word.items():
    if len(value) == 4:
        print(key, value)

0 zero
4 four
5 five
9 nine


## List, set and dict comprehensions

Comprehensions are syntactical expressions for creating new lists, sets or dictionaries. They are used often, because they make a code concise and readable.

**Example:** a list comprehension is always of the same form, with an (optional) condition at the end. 

In [74]:
print([x**(1/2) for x in range(6)])

list1 = ['cat', (1,'dog'), 999, 'rabbit', str(22)]

print([x.upper() for x in list1 if isinstance(x, str)])  # List comprehension with a conditional expression.

[0.0, 1.0, 1.4142135623730951, 1.7320508075688772, 2.0, 2.23606797749979]
['CAT', 'RABBIT', '22']


**Exercise 29:** create a list with all powers of 3 between 100 and 100000 (including the boundaries of the interval).

In [75]:
[3**x for x in range(1, 15) if 3**x >= 100 and 3**x <= 100000] # The choices for 1 and 15 are a bit random, but save.

[243, 729, 2187, 6561, 19683, 59049]

**Exercise 30:** use list comprehension to calculate the average length of the words in `list2` defined below.

In [76]:
list2 = ['a','as','bat','car','dove','python','aligator']

sum([len(x) for x in list2])/len(list2)

3.857142857142857

**Exercise 31:** set comprehension has the same syntax as list comprehension (with the `[]` replaced by `{}`). Use set comprehension to get a list of occuring lengths of the words in `list2`.

In [77]:
list({len(x) for x in list2})

[1, 2, 3, 4, 6, 8]

**Exercise 32:** dict comprehension has the same syntax as set comprehension (obviously, the expression part will now have the form `key : value`). Create a dictionary with the words of `list2` as keys and the length of the words as values. Exclude words that have less than 3 characters. (Hint: see p. 67 of McKinney)

In [78]:
{x : len(x) for x in list2 if len(x) > 2}

{'bat': 3, 'car': 3, 'dove': 4, 'python': 6, 'aligator': 8}

# 4. Binding names and mutable/immutable objects

Expressions of the form `name1 = object1` assign names (`name1`) to Python objects (`object1`). The names are also called variables and instead of assigning we talk of binding. What happens when you bind a name to an object, is that the Python object is stored in the memory of your machine and the name/variable is a reference to the object stored in the memory. When you execute `name2 = name1`, then both names are referring to the same object in memory (there is no copy of the object created).

There is a fundamental difference between mutable object types and immutable object types. After binding a name to a mutable object (i.e. store this object in the memory of your machine), the object can still be changed while keeping the same location in the memory of your machine and keeping the same name that is referring to this object. For immutable objects, this is not possible. "Changing" an immutable object always means creating a new object in memory. The name will subsequently refer to the new object. Examples of mutable objects are lists, dicts and NumPy arrays (see below). Examples of immutable objects are ints, strings and tuples.

**Example:** run the cell below. In this example, the variables `a` and `b` are referring to the same object in the sense that they are associated with the same object in the memory of your machine. This is not the case for `c`. Notice that by appending 4 to `a`, one merely changes the already existing object that both `a` and `b` are referring to. One does not create a new object. Hence `b` is now referring to a list of length 4.

In [79]:
a = [1,2,3]
b = a
c = [1,2,3]
a.append(4)

print(a)
print(b)
print(c)

[1, 2, 3, 4]
[1, 2, 3, 4]
[1, 2, 3]


**Example:** run the cell below. Notice a fundamental difference with the previous example. By adding 1 to the object that `a` is referring to, one creates a new object in the memory of your machine. `a` now refers to the new object (an integer with value 5), while `b` still refers to the old object. Actually, `b` and `c` are referring to the same object: it is the unique integer type object with value 4. One can verify this by using the built-in function `id`, which gives the integer that is guaranteed to be unique for the object during its lifetime.

In [80]:
a = 4
b = a
c = 4

a += 1 # In this line, one creates a new object in the memory of the machine.

print(a)
print(b)
print(c)

print(id(a))
print(id(b))
print(id(c))

5
4
4
4369995008
4369994976
4369994976


**Exercise 33:** although a tuple is immutable, the elements of a tuple can be of mutable type. Verify this by appending the integer 4 to the list in the tuple `t1` defined below and printing `t1` afterwards.

In [81]:
t1 = (3.4, 'abc', [1,2,3], {'a' : 'Amsterdam', 'b' : 'Berlin'})

t1[2].append(4)

t1

(3.4, 'abc', [1, 2, 3, 4], {'a': 'Amsterdam', 'b': 'Berlin'})

## Clearing

It is also possible to delete an object from memory. The name(s) that was/were bound to that object, become(s) free again.

**Example:** one can clear memory by using `del`.

In [82]:
a = 'abc'
b = ['a','b','c']
c = 'a', 'b', 'c'

del b, c

print(a)
print(b)
print(c)

abc


NameError: name 'b' is not defined

# 5. Functions

Besides built-in functions and methods, Python also allows for user-defined functions. One of the main advantages of functions is that they make code better readable and amendable. Whenever you (potentially) need to execute a certain routine in your code more than once, it is probably useful to convey it in a function. In this course, we will heavily rely on user-defined functions.

## Defining Functions

User-defined functions are declared using the `def` keyword. The output of a function is declared by the `return` statement.

**Example:** functions can have zero, one, two, etc., arguments. The arguments are ordered. Here an example of a function with two arguments. In the first cell we define the function, while in the second cell we call it. Note the use of the colon and tabs.

In [83]:
def mypower(a, b):
    return a**b

In [84]:
mypower(3,2) # Calling the function

9

**Example:** a function can also have multiple output arguments. They are returned as a tuple.

In [85]:
def plusminus(a, b):
    """This is the docstring of the function plusminus."""
    return a+b, a-b

c, d = plusminus(1, 2)

c, d

(3, -1)

**Exercise 34:** run the following cell. Using the `?` gives you information about the object. You can apply the `?` to any Python object to retrieve information. Note the docstring that we created in the example above. Docstrings are a tool for creating a proper documentation of your code, which is indispensable as soon as your Python project grows and/or multiple people need to use the code.

In [86]:
plusminus?

**Example:** a function can have multiple `return` statements. As soon as a `return` statement is encountered, the function is exited. If no `return` statement is encountered, the function output is `None`.

In [87]:
def signtest(a):
    if a > 0:
        return 'Positive'
    if a < 0:
        return 'Negative'
    
print(signtest(-1.2))

print(signtest(0)) # Function output is None, since no return statement is encountered.

Negative
None


**Example:** instead of positional arguments, we can also pass keyword arguments (using the `=` sign). For keyword arguments, the order does not matter. But positional arguments always need to precede keyword argument. 

In [88]:
mypower(b=2, a=3) # For keyword arguments the order does not matter.

9

**Example:** if often happens that keyword arguments are used in the definition of a function. In that case they are used to specify default values for an argument.

In [89]:
def mypower(x, y=2):  # Positional (nonkeyword) arguments always precede keyword arguments.
    return x**y 

print(mypower(3))
print(mypower(3, 2))
print(mypower(3, y=2))
print(mypower(y=2, x=3))

9
9
9
9


## Calling scope

**Example:** variables/name defined inside functions are local (not visible in the calling scope).

In [90]:
def f():
    z = 1
    
f() # In this line, we execute a function with zero arguments.

z # We get an error, because z is not globally defined.

NameError: name 'z' is not defined

**Example:** Python uses a *calling convention* known as *call by object reference*. This means that any modifications a function makes to its (mutable) arguments are visible to the caller (i.e., outside the function).

In [91]:
x = [1]

def f(y):
    y[0] = 2
    
f(x) 

print('x = {}'.format(x))  # Note that x has been modified in the calling scope.

x = [2]


## Closures

Functions are *first class objects* in Python. This implies, inter alia, that functions can return other functions. Such functions are called *closures*, because they close around (capture) the local variables of the enclosing function. Later in the course, we will use closures numerous times.

**Example:** the function `makemultiplier` returns a function `multiplier` that multiplies a numeric input with a pre-specified factor.

In [92]:
def makemultiplier(factor):
    
    def multiplier(x):
        return x*factor
    
    return multiplier

timesfive = makemultiplier(5) # Creating a function that multiples times 5.

print(timesfive(3))
type(timesfive)

15


function

## Practicing with functions

**Exercise 35:** create a function called `digit_word_switch` that translates integer digits (0, 1, ..., 9) to the associated words ('zero', 'one, ..., 'nine') and vice versa. If the input is neither a valid digit, nor a valid word, the function should print the sentence 'Your input cannot be interpreted' and return `None`. To test your function, run the second cell below and inspect the output. (Hint: use the earlier defined dictionaries `digit_to_word` and `word_to_digit`).

In [93]:
def digit_word_switch(input_var):
    
    if not input_var in list(digit_to_word.keys()) + list(digit_to_word.values()):
        print('Your input cannot be interpreted.')
        return None
    
    if input_var in list(digit_to_word.keys()):
        return digit_to_word[input_var]
    else:
        return word_to_digit[input_var]

In [94]:
test_set = [0, 'four', -3, 'nine', 9, [2], 'Two',7,{2}, 'abc']

[digit_word_switch(i) for i in test_set]

Your input cannot be interpreted.
Your input cannot be interpreted.
Your input cannot be interpreted.
Your input cannot be interpreted.
Your input cannot be interpreted.


['zero', 4, None, 9, 'nine', None, None, 'seven', None, None]

**Example:** the function `error` computes the mean squared error ('mse') between two equal-length lists of numeric values. One can think of them as the observed data and the predicted data based on some model. 

In [95]:
observed_data = [1.0, 1.22, -2, 4.3, 9, 0.5, -1.1, -3.1, -1.2, 2.22, -4.3, 1, 11.5, 1] # Example data
predicted_data1 = [1.1, 1.2, 2.1, 4.1, 8, 1.2, -1.4, -3.2, -1.1, 2, -4.3, 1, 10.2, 1.001] # Example data
predicted_data2 = [1.1, '1.2', 2.1, 4.1, 8, 1.2, None, -3.2, -1.1, 2, -4.3, 1, 10.2, 1.001] # Example data

def error(observed_data, predicted_data, error_function='mse'):
    
    if len(observed_data) != len(predicted_data):
        print('Lengths of prediction and real data do not match.')
        return None
    
    n = len(observed_data)
    differences = [predicted_data[i]-observed_data[i] for i in range(n)]
    
    if error_function == 'mse':
        return sum([x**2 for x in differences])/n        

error(observed_data, predicted_data1) # Testing the function on the example data.

1.4427715000000003

**Exercise 36:** improve the function `error` such that:
* it checks whether all objects in the input lists are integer or float. If this is not the case, then it prints 'Input data is corrupted' and returns `None`. Check your adjustments by using `prediction_data2` as one of the inputs.
* depending on the keyword argument ``error_function=``, it can also calculate the root mean squared error ('rmse'), the mean absolute error ('mae') and the bias ('bias'). The latter is simply the average difference between the predicted and observed values.

In [96]:
def error(observed_data, predicted_data, error_function='mse'):
    
    if len(observed_data) != len(predicted_data):
        print('Lengths of prediction and real data do not match.')
        return None
    
    n = len(observed_data)
    
    if not all([isinstance(x, int) or isinstance(x, float) for x in observed_data + predicted_data]):
        print('Input data is corrupted.')
        return None
    
    differences = [predicted_data[i]-observed_data[i] for i in range(n)]
    
    if error_function == 'mse':
        return sum([x**2 for x in differences])/n        
    if error_function == 'rmse':
        return (sum([x**2 for x in differences])/n)**(1/2)        
    if error_function == 'mae':
        return sum([abs(x) for x in differences])/n        
    if error_function == 'bias':
        return sum(differences)/n        

print(error(observed_data, predicted_data1, error_function='bias'))
print(error(observed_data, predicted_data2, error_function='bias'))

0.13292857142857134
Input data is corrupted.
None


# 6. The NumPy library

## Modules

Apart from the built-in objects, functions, etc. of the Python Standard Library, packages and modules for additional functionality need to be imported. Most of the packages/modules relevant for this course come preinstalled with Anaconda.


**Example:** importing modules can be done with `import`. There are some conventions for the shorthands of some packages (e.g., `np` for `numpy`). Following them improves code readability. For the same reason, it is good practice to put your `import` statements at the beginning of your document (which we didn't do in this notebook).

In [97]:
import math  # Importing the math module of the Python Standard Library.
import numpy as np # Importing the NumPy module and giving it the conventional shorthand name np.

print(math.factorial(7)) # Calling functions from the math modules requires the math.-prefix.

# Note that the following functions are not the same functions, because they are defined in different modules.
print(2**(1/2)) # Using Python's basic arithmetic functionality
print(math.sqrt(2)) # Using the square root function of math
print(np.sqrt(2)) # Using the square root function of NumPy

5040
1.4142135623730951
1.4142135623730951
1.4142135623730951


**Exercise 37:** you can use *tab completion* to discover which functions are defined by the math module: type `math.` and press the Tab key. Alternatively, use dir(math). Try both options!

**Example:**  note that importing the package/module does not bring the functions into the *global namespace*: they need to be called as `module.function()`. It is possible to bring a function into the global namespace (see example below). Heavy use of this option is discouraged, because it can lead to confusion and/or conflicting functions names.

In [98]:
from math import factorial # Importing the math-function factorial into the global namespace

factorial(7)

5040

## Numpy's ndarray

Arguably the most important object of the NumPy package is its N-dimensional array object, or ndarray. Using them is efficient in two ways:
* they are easy/intuitive to use and very flexible, meaning that there is an implementation for most operations you can think of.
* they are computationally efficient (i.e. much faster than Python's standard objects, like lists and tuples).

**Example:** the NumPy function `array` converts the input data (e.g. list, tuple, array, or other sequence type) to an ndarray.

In [99]:
arr1 = np.array([1.0, 5.1, -8.9, 0.2])
arr2 = np.array([[1,2,3,4],[5,6,7,8]])

arr1

array([ 1. ,  5.1, -8.9,  0.2])

In [100]:
arr2

array([[1, 2, 3, 4],
       [5, 6, 7, 8]])

**Example:** the objects `arr1` and `arr2` are of ndarray type.

In [101]:
print(isinstance(arr1, np.ndarray))

type(arr2)

True


numpy.ndarray

**Example:** all ndarrays have attributes dimension (`ndim`), shape (`shape`) and type of data that it contains (`dtype`). 

In [102]:
print(arr1.ndim)
print(arr1.shape)
print(arr1.dtype)

print(arr2.ndim)
print(arr2.shape)
print(arr2.dtype)

1
(4,)
float64
2
(2, 4)
int64


**Example:** an ndarray is a container for homogeneous data, meaning that all data must be of the same type. It is important to keep this in mind, because Python will automatically transform data when uploaded into an ndarray. Using the `dtype` keyword in the `array` function, you can overrule this.

In [103]:
np.array([1, 2, 3, 4.0])

array([1., 2., 3., 4.])

In [104]:
np.array([1, 2, 3, 4.0], dtype=int)

array([1, 2, 3, 4])

In [105]:
np.array([1, 2, 3, '4'])

array(['1', '2', '3', '4'], dtype='<U21')

In [106]:
np.array([1, 2, 3, '4'], dtype=np.float64) # float64 is a NumPy object of double-precision floating-point format

array([1., 2., 3., 4.])

**Example:** typical other ways of creating an ndarray.

In [107]:
print(np.zeros((2,3))) # 2x3 ndarray of zeros (floats).
print(np.ones((1,8))) # 1x8 ndarry of ones (floats).

print(np.identity(5)) # 5x5 identity matrix (floats).

print(np.arange(1, 15, 2)) # Using the arange function with (start, stop[, step]).

arr3 = np.random.randn(2,4) # Using NumPy's random number generator to create a 2x4 ndarray.

print(arr3)

[[0. 0. 0.]
 [0. 0. 0.]]
[[1. 1. 1. 1. 1. 1. 1. 1.]]
[[1. 0. 0. 0. 0.]
 [0. 1. 0. 0. 0.]
 [0. 0. 1. 0. 0.]
 [0. 0. 0. 1. 0.]
 [0. 0. 0. 0. 1.]]
[ 1  3  5  7  9 11 13]
[[ 0.94768864 -1.96257996  0.33676948 -0.82159906]
 [-0.51676293 -1.27662895 -0.71388906 -0.43036232]]


**Exercise 38:** in contrast to Python's list objects, arithmetic operations on an ndarray are executed element-by-element. Use this to create and print the following ndarrays:
* element-by-element sum of `arr2` and `arr3`.
* element-by-element product of `arr2` and `arr3`.
* elements of `arr3` multiplied by a factor of 10.
* elements of `arr2` to the power 4.
* one divided by the elements of `arr2`.

In [108]:
print(arr2 + arr3)

print (arr2*arr3)

print(arr3*10)

print(arr2**4)

print(1/arr2)

[[1.94768864 0.03742004 3.33676948 3.17840094]
 [4.48323707 4.72337105 6.28611094 7.56963768]]
[[ 0.94768864 -3.92515991  1.01030843 -3.28639624]
 [-2.58381463 -7.65977368 -4.99722343 -3.44289859]]
[[  9.47688638 -19.62579957   3.36769478  -8.21599059]
 [ -5.16762925 -12.76628947  -7.13889061  -4.30362323]]
[[   1   16   81  256]
 [ 625 1296 2401 4096]]
[[1.         0.5        0.33333333 0.25      ]
 [0.2        0.16666667 0.14285714 0.125     ]]


## Indexing and slicing

The basics of selecting elements or slices of an ndarray are the same as for other sequence types. Most of the time (the exception is so-called "fancy slicing"), the extra dimensions are separated by a comma between the brackets `[]`. For the exercises in this subsection, you may want to use pp. 94-105 of McKinney.

In [109]:
arr1D = np.arange(1,11)
arr2D = np.arange(1,25).reshape(4,6)
arr3D = np.arange(1,13).reshape(2,2,3)

print('arr1D:\n {0}'.format(arr1D))
print('arr2D:\n {0}'.format(arr2D))
print('arr3D:\n {0}'.format(arr3D))

arr1D:
 [ 1  2  3  4  5  6  7  8  9 10]
arr2D:
 [[ 1  2  3  4  5  6]
 [ 7  8  9 10 11 12]
 [13 14 15 16 17 18]
 [19 20 21 22 23 24]]
arr3D:
 [[[ 1  2  3]
  [ 4  5  6]]

 [[ 7  8  9]
  [10 11 12]]]


**Exercise 39:** use the arrays `arr1D`, `arr2D` and `arr3D` defined above to print the following objects.
* for `arr1D`: the fifth element.
* for `arr1D`: the array containing (only) the fifth element.
* for `arr1D`: the array containing the third until the nineth element.
* for `arr1D`: the array containing the first, second and seventh element. (see p.103 of McKinney)


* for `arr2D`: the element in the second row and third column.
* for `arr2D`: the array that is the fourth row.
* for `arr2D`: the array containing the second and the third row.
* for `arr2D`: the array containing the first and third row.
* for `arr2D`: the array containing the first and second row, but only the third until the fifth column thereof.
* for `arr2D`: the array containing the fifth and sixth column.
* for `arr2D`: the array containing the fifth and sixth column, which are represented as rows. (Hint: use `tranpose()`, or simply `T`)
* for `arr2D`: the 1-dimensional array containing the elements whose values are 1, 22, 12 and 3. (i.e., the output should be `[1,22,12,3]`) (see p.103 of McKinney, use fancy indexing)
* for `arr2D`: the array containing the first, second and fourth row, but only the fourth and second column thereof (in that order). (see p.103 of McKinney)


* for `arr3D`: the first 2-dimensional array of `arr3D`.
* for `arr3D`: the 2-dimensional array that looks like `[[5,6],[11,12]]`.
* for `arr3D`: the 1-dimensional array containing the elements whose values are 1, 5 and 9 (in that order). (see p.103 of McKinney, use fancy indexing)

In [110]:
print(arr1D[6])
print(arr1D[6:7])
print(arr1D[2:9])
print(arr1D[[0,1,6]])

print(arr2D[1,2])
print(arr2D[3])
print(arr2D[1:3])
print(arr2D[[0,2]])
print(arr2D[:2,2:5])
print(arr2D[:,4:6])
print(arr2D[:,4:6].T)
print(arr2D[[0,3,1,0],[0,3,5,2]])
print(arr2D[[0,1,3]][:,[3,1]])

print(arr3D[0])
print(arr3D[:,1,1:])
print(arr3D[[0,0,1],[0,1,0],[0,1,2]])

7
[7]
[3 4 5 6 7 8 9]
[1 2 7]
9
[19 20 21 22 23 24]
[[ 7  8  9 10 11 12]
 [13 14 15 16 17 18]]
[[ 1  2  3  4  5  6]
 [13 14 15 16 17 18]]
[[ 3  4  5]
 [ 9 10 11]]
[[ 5  6]
 [11 12]
 [17 18]
 [23 24]]
[[ 5 11 17 23]
 [ 6 12 18 24]]
[ 1 22 12  3]
[[ 4  2]
 [10  8]
 [22 20]]
[[1 2 3]
 [4 5 6]]
[[ 5  6]
 [11 12]]
[1 5 9]


**Example:** assigning a single value to a slice of an ndarray means propagation (also called: "broadcasted") to the entire selection.

In [111]:
arr = np.arange(10)

arr[3:6] = [10, 10, 10]
print(arr)

arr[3:6] = 12 # Broadcasting to entire selection
print(arr)

[ 0  1  2 10 10 10  6  7  8  9]
[ 0  1  2 12 12 12  6  7  8  9]


**Example:** there is a subtle difference between slicing an ndarray and slicing other sequence types (like a list). A slice of an array is a "view" on the original slice. This means that the data is not copied, and any modifications to the view will be reflected in the original array (and vice versa). This is in contrast with a list object for example, in which case a slice is a copy of the original data and hence a new object in the memory of your machine. (see McKinney pp. 94-95 for more information)

In the example below we take slices of a list and an ndarray, change the originals and inspect what happens to the slices. For completeness, we also included the corresponding elements (as opposed to slices). Elements are always copies, regardless of the sequence type.

N.B. If you want a slice to be a copy of the data instead of a view, you can use the ndarray method `copy`.

In [112]:
list0 = [0,1,2,3]
array0 = np.array(list0)

list_element = list0[2]
array_element = array0[2]
list_slice = list0[2:3]
array_slice = array0[2:3]

list0[2] += 10
array0[2] += 10

print('list: {0}'.format(list0))
print('array: {0}'.format(array0))
print('list_element: {0}'.format(list_element))
print('array_element: {0}'.format(array_element))
print('list_slice: {0}'.format(list_slice))
print('array_slice: {0}'.format(array_slice))

list: [0, 1, 12, 3]
array: [ 0  1 12  3]
list_element: 2
array_element: 2
list_slice: [2]
array_slice: [12]


In [116]:
list0

[0, 1, 12, 3]

## Boolean indexing

**Example:** suppose 4 people participate three times in a series of 5 experiments. The names of the participants (in order of participation) are in `names`, their test results are in `scores`. Putting an ndarray in a logical statement produces an array with boolean values. Subsequently, this can be used to manipulate (e.g. filter) other ndarray objects.

In [117]:
names = np.array(['Bob', 'Jane', 'Will','Bob','Mary','Mary','Mary','Jane','Jane','Will','Will','Bob'])

scores = np.random.randn(12,5)

print(scores, '\n')

print(names == 'Bob') # An array in a logical statement produces an array with boolean values.

scores[names == 'Bob'] # Selecting the test results of Bob.

[[-1.10177079 -0.55125814 -0.01047893 -1.31361691 -2.2368869 ]
 [ 0.18662465  0.63174021 -1.077271    2.00495003 -0.37815972]
 [ 1.52824211  0.86853807  0.70308301  0.24893945  0.48924303]
 [-0.57438893  0.31710735 -1.11439142 -1.2005542  -0.86317797]
 [-0.7315687  -1.15492811  0.60299109 -0.06701894 -0.82897168]
 [-1.11472649 -0.21447941  1.29382017  1.38507831 -0.66506746]
 [-0.50781158 -0.75948523  0.01433336  0.6313552   0.86610849]
 [ 0.59750092 -0.35919801  0.37153421 -0.46653961  0.02362334]
 [ 0.3093329   1.46629232 -1.15790123  0.53418483  0.28394924]
 [ 0.18646137 -1.29354766 -0.98269984  0.31723934  0.14838902]
 [-0.84022123  0.29478842  0.47702701  0.47613003 -2.14579293]
 [ 0.83219879 -0.12175142 -0.47450194  0.86121623 -0.42754763]] 

[ True False False  True False False False False False False False  True]


array([[-1.10177079, -0.55125814, -0.01047893, -1.31361691, -2.2368869 ],
       [-0.57438893,  0.31710735, -1.11439142, -1.2005542 , -0.86317797],
       [ 0.83219879, -0.12175142, -0.47450194,  0.86121623, -0.42754763]])

**Example**: selecting the test results of Bob of only the last two experiments of each series. Note that the slicing of the columns is performed in the same way as before.

In [118]:
scores[names == 'Bob', 3:]

array([[-1.31361691, -2.2368869 ],
       [-1.2005542 , -0.86317797],
       [ 0.86121623, -0.42754763]])

**Exercise 40**: create an array with the test results of Jane and Mary. (Hint: use `|` instead of `or` to combine NumPy arrays with an OR-statement.)

In [119]:
scores[(names == 'Jane') | (names == 'Mary')]

array([[ 0.18662465,  0.63174021, -1.077271  ,  2.00495003, -0.37815972],
       [-0.7315687 , -1.15492811,  0.60299109, -0.06701894, -0.82897168],
       [-1.11472649, -0.21447941,  1.29382017,  1.38507831, -0.66506746],
       [-0.50781158, -0.75948523,  0.01433336,  0.6313552 ,  0.86610849],
       [ 0.59750092, -0.35919801,  0.37153421, -0.46653961,  0.02362334],
       [ 0.3093329 ,  1.46629232, -1.15790123,  0.53418483,  0.28394924]])

**Exercise 41**: it turns out that something went wrong during Will's experiments. Replace his values by 0.0 (a float) and print the resulting `scores` ndarray. (Hint: use broadcasting)

In [120]:
scores[names == 'Will'] = 0.0

print(scores)

[[-1.10177079 -0.55125814 -0.01047893 -1.31361691 -2.2368869 ]
 [ 0.18662465  0.63174021 -1.077271    2.00495003 -0.37815972]
 [ 0.          0.          0.          0.          0.        ]
 [-0.57438893  0.31710735 -1.11439142 -1.2005542  -0.86317797]
 [-0.7315687  -1.15492811  0.60299109 -0.06701894 -0.82897168]
 [-1.11472649 -0.21447941  1.29382017  1.38507831 -0.66506746]
 [-0.50781158 -0.75948523  0.01433336  0.6313552   0.86610849]
 [ 0.59750092 -0.35919801  0.37153421 -0.46653961  0.02362334]
 [ 0.3093329   1.46629232 -1.15790123  0.53418483  0.28394924]
 [ 0.          0.          0.          0.          0.        ]
 [ 0.          0.          0.          0.          0.        ]
 [ 0.83219879 -0.12175142 -0.47450194  0.86121623 -0.42754763]]


**Exercise 42**: instead of putting Will's scores to 0.0, it makes more sense to remove Will's data completely. Use boolean indexing to remove Will's name from `names` and his test results from `scores`. Make sure the variables `scores` and `names` refer to the newly created Python objects (without Will's data in it). 

In [121]:
scores = scores[names != 'Will']

names = names[names != 'Will']

**Example:** NumPy's `where` function is the ndarray-version of the expression 'x if condition else y' (see pp.109-111 of McKinney). For NumPy arrays, this becomes `np.where(condition, xarr, yarr)`. It is often very useful. For example, replacing all negative test results by their positive counterparts:

In [122]:
np.where(scores >= 0, scores, -scores)

array([[1.10177079, 0.55125814, 0.01047893, 1.31361691, 2.2368869 ],
       [0.18662465, 0.63174021, 1.077271  , 2.00495003, 0.37815972],
       [0.57438893, 0.31710735, 1.11439142, 1.2005542 , 0.86317797],
       [0.7315687 , 1.15492811, 0.60299109, 0.06701894, 0.82897168],
       [1.11472649, 0.21447941, 1.29382017, 1.38507831, 0.66506746],
       [0.50781158, 0.75948523, 0.01433336, 0.6313552 , 0.86610849],
       [0.59750092, 0.35919801, 0.37153421, 0.46653961, 0.02362334],
       [0.3093329 , 1.46629232, 1.15790123, 0.53418483, 0.28394924],
       [0.83219879, 0.12175142, 0.47450194, 0.86121623, 0.42754763]])

**Exercise 43**: in the cell below two 2-dimensional arrays are defined. Create a new ndarray of the same dimensions. For each element in the new object, choose between the corresponding elements of `arr1` and `arr2` and select the one with the largest absolute value.

In [123]:
arr1 = np.random.randn(3,5)
arr2 = np.random.randn(3,5)

np.where(np.abs(arr2) > np.abs(arr1), arr2, arr1)

array([[-1.356709  , -1.580627  ,  0.2815455 , -1.08171872,  0.78412611],
       [ 1.26984242,  1.23671774, -0.64659995,  0.5312546 , -1.42375614],
       [-1.70112229,  0.13919316,  2.17867432, -0.12129034, -0.31524272]])

**Exercise 44**: in the cell below a third 2-dimensional array is defined. As in the previous exercise, create a new ndarray by choosing between elements of `arr1` and `arr2`. But now `arr3` decides, element-by-element, whether you should pick the element with the largest or smallest absolute value. If the element in `arr3` is positive, select the element with the largest absolute value. If the element in `arr3` is negative, pick the element with the smallest absolute value.

In [124]:
arr3 = np.random.randn(3,5)

np.where(arr3 > 0, np.where(arr2 > arr1, arr2, arr1), np.where(arr2 < arr1, arr2, arr1))

array([[-1.356709  ,  0.8034287 ,  0.2815455 ,  0.87148533,  0.31593656],
       [ 1.26984242, -0.64296147, -0.64659995,  0.5312546 ,  0.6708724 ],
       [-1.70112229, -0.07835931,  0.09600079, -0.12129034,  0.05719836]])

## Universal functions

A NumPy universal function (or "ufunc") performs element-by-element operations on data in ndarray's.

**Example:** calculating the exponential of the elements of `arr`. Note that one of the elements is NumPy's NaN (Not a Number). This is a float, representing the absence of a value. A NumPy ufunc does not give an error if it encounters NaN. Instead, it simply propagates the value as missing

In [125]:
arr = np.array([1.0, -.5, 2.0, 5.9,-2.0, 0.4, -3.1,4.7])
arr[np.random.randint(1,8)] = np.nan

print('arr: {}'.format(arr))

print('exponentials: {0}'.format(np.exp(arr)))

arr: [ 1.  -0.5  2.   nan -2.   0.4 -3.1  4.7]
exponentials: [2.71828183e+00 6.06530660e-01 7.38905610e+00            nan
 1.35335283e-01 1.49182470e+00 4.50492024e-02 1.09947172e+02]


**Exercise 45:** replace the NaN entry in `arr` by 0.0 and print the resulting array. (Hint: use the universal function `isnan`)

In [126]:
arr[np.isnan(arr)] = 0.0

arr

array([ 1. , -0.5,  2. ,  0. , -2. ,  0.4, -3.1,  4.7])

**Example:** creating a boolean ndarray that shows for which experiments Jane scored higher than Mary. `greater` is an example of a binary universal function, because it performs an operation on two input arrays element-by-element.

In [127]:
np.greater(scores[names == 'Jane'], scores[names == 'Mary'])

array([[ True,  True, False,  True,  True],
       [ True, False, False, False,  True],
       [ True,  True, False, False, False]])

## Reductions

Aggregations (often called reductions) are mathematical functions that compute statistics about an entire array, or about data along an axis of an array.

**Example:** two equivalent syntactical expressions to call a reduction. Let's go back to the three participants in `names` (Bob, Jane and Mary) and their test results in `scores`. Here, we compute the mean score for all experiments.

In [128]:
print(np.mean(scores))

print(scores.mean())

-0.13910023857368217
-0.13910023857368217


**Example:** with the keyword `axis` we specify the axis over which the aggregation should take place. Here, we compute the total of all scores per experiment.

In [129]:
np.sum(scores, axis=0)

array([-2.10460923, -0.74596044, -1.5518657 ,  2.36905494, -4.2261303 ])

**Example:** the total of all scores per series of 5 experiments.

In [130]:
np.sum(scores, axis=1)

array([-5.21401168,  1.36788417, -3.43540518, -2.17949635,  0.68462512,
        0.24450024,  0.16692085,  1.43585806,  0.66961403])

**Exercise 46:** for each series of 5 experiments, one can calculate the standard deviation of the test results using the NumPy function `std`. Use this to create a dictionary, with as keys the names of the participants (Bob, Jane and Mary in `names`) and as values the average of the 3 standard deviations of the 3 series of experiments of each participant. (Hint: using a dict comprehension, you only need one line of code.)

In [131]:
{name : np.mean(scores.std(axis=1)[names == name]) for name in set(names)}

{'Mary': 0.7596814466947043,
 'Bob': 0.630538614823004,
 'Jane': 0.762703226089366}

**Exercise 47:** reductions interprete boolean values in arrays as 1 (`True`) and 0 (`False`). Use this to count for how many of the 5 experiments Jane had the highest maximum score.

In [132]:
np.sum(scores[names == 'Jane'].max(axis=0) > scores[names != 'Jane'].max(axis=0))

2

## Linear algebra

NumPy contains standard operations for linear algebra. Most of them are located in the submodule `linalg`. Note that one cannot use `*` for multiplication between matrices and/or vectors, since NumPy has reserved this symbol for element-by-element multiplication. Instead, one can use the NumPy-function `dot`.

**Example**: using the NumPy-function `dot` to perform matrix-vector multiplications between a matrix $A$ and a vector $b$.

In [133]:
A = np.random.randint(1,10, size=9).reshape(3,3)
b = np.arange(1,4)

print('matrix A:\n {0}'.format(A) + '\n')
print('vector b:\n {0}'.format(b) + '\n')

print(A.dot(b)) # Computing A*b. Equivalently, one can write "np.dot(A,b)".

print(np.dot(b, A)) # Computing b^T*A. Equivalently, one can write "b.dot(A)".

matrix A:
 [[5 7 8]
 [1 2 8]
 [7 4 3]]

vector b:
 [1 2 3]

[43 29 24]
[28 23 33]


**Example**: instead of using `dot`, one can use the abbreviated ``@``-notation:

In [134]:
print(A @ b)
print(b @ A)

[43 29 24]
[28 23 33]


**Example:** one can use the NumPy function `outer` to compute the outer product of a column vector with a row vector, resulting in a 2-dimensional ndarray.

In [135]:
c = np.arange(5,7)

print(np.outer(b, c))
print('')
print(np.outer(c, b)) # Changing the order is equivalent to transposing the resulting matrix.

[[ 5  6]
 [10 12]
 [15 18]]

[[ 5 10 15]
 [ 6 12 18]]


**Exercise 48:** the `numpy.linalg` module has a standard set of matrix decompositions and functions calculating things like inverse, trace and determinant. Solve the set of linear equations $A x = b$ for the vector $x$ in two ways:
* use the `linalg`-function `inv()` to calculate the inverse and perform matrix multiplication.
* use the `linalg`-function `solve().`

Verify the solutions are the same (a neat way of verifying this would be to use NumPy's binary boolean reduction `allclose()`).

In [136]:
sol1 = np.linalg.inv(A).dot(b)

sol2 = np.linalg.solve(A, b)

print('x = {0}'.format(sol1))

np.allclose(sol1,sol2)

x = [ 0.72049689 -0.77639752  0.35403727]


True

**Exercise 49:** write a function called `adjoint` that computes the adjoint of a square matrix and returns it as an ndarray. The (i,j)-th element of the adjoint of a matrix $B$ can be computed with $$(\text{adj}(B))_{ij} = (-1)^{i+j} M_{ji},$$ where $M_{ji}$ is the determinant of the matrix that you get when you remove row $j$ and column $i$ from $B$ (called the $(j,i)$-minor of $B$).

(Hint: this is a hard exercise with many valid solutions. One valid solution uses a nested list comprehension to create the adjoint matrix as a list object, that subsequently can be turned into an ndarray. To keep your code readable, you may want to use a nested function (see below for an explanation) to compute the minors.)

In [137]:
def adjoint(matrix):
    
    dim_matrix = len(matrix) # Get the dimension of the square matrix.
    
    def minor(mat, row, column):
        return np.linalg.det(np.delete(np.delete(mat, row, axis=0), column, axis=1))
    
    adj_list = [[(-1)**(i+j)*minor(matrix, j, i) for j in range(dim_matrix)] for i  in range(dim_matrix)]
    
    return np.array(adj_list)

adjoint(A)

array([[-26.,  11.,  40.],
       [ 53., -41., -32.],
       [-10.,  29.,   3.]])

**Exercise 50:** run the cell below to test your `adjoint` function for the matrix $A$, making use of the general identity 
$$A^{-1} = \frac{1}{\text{det}(A)}\text{adj}(A).$$ The cell output should be `True`.

In [138]:
np.allclose(np.linalg.inv(A), adjoint(A)/np.linalg.det(A))

True

# [OPTIONAL] Advanced material about functions

The concepts below are not necessary for this course, but could be useful.

## Nested Functions

Functions can be defined inside other functions. They will only be visible to the enclosing function. Nested functions can see variables defined in the enclosing function.

In [139]:
def mypower(x, y):
    
    def helper():     # No need to pass in x and y:
        return x**y   # The nested function can see them!   
    
    a = helper()
    return a


mypower(2, 3)

8

## Splatting and Slurping

Splatting: passing the elements of a sequence into a function as positional arguments, one by one.

In [140]:
def mypower(x, y): 
    return x**y 

args = [2, 3]  # A list or a tuple
mypower(*args)  # Splat (unpack) args into mypower as positional arguments.

8

We can splat keyword arguments too, but we need to use a `dict` (key-value store).

In [141]:
kwargs = {'y': 3, 'x': 2}

mypower(**kwargs)  # Splat keyword arguments

8

Slurping allows us to create *vararg* functions: functions that can be called with any number of positional and/or keyword arguments. In the example below, the asterisk means "collect all (remaining) positional arguments into a tuple". The double asterisk means "collect all (remaining) keyword arguments into a dict".

In [142]:
def myfunc(*myargs, **mykwargs):
    
    for (i, a) in enumerate(myargs): 
        print("The %sth positional argument was %s." % (i, a))
    for a in mykwargs: 
        print("Got keyword argument %s=%s." % (a, mykwargs[a]))    
        
myfunc(0, 1, x=2, y=3)

The 0th positional argument was 0.
The 1th positional argument was 1.
Got keyword argument x=2.
Got keyword argument y=3.


## Anonymous (lambda) functions

Anonymous functions are functions without a name attribute and whose function body is a single expression. They are often useful for functions that are needed only once (e.g., to return from a function, or to pass to a function).

To illustrate this, the closure example can be written as:

In [143]:
def makemultiplier(factor):

    return lambda x: x*factor

timesfive = makemultiplier(5)
timesfive(3)

15

Suppose you want to order a list of strings by the number of distinct letters in each string, with the largest number of distinct letters first:

In [144]:
cities = ["amsterdam", "tokio", "honolulu", "york", "paris"]

cities.sort(key=lambda x: len(set(list(x))), reverse=True)

cities

['amsterdam', 'honolulu', 'paris', 'tokio', 'york']

## Applying a function element-wise to a list

If you want to apply a function element-wise to the objects of a list, one can use the built-in `map` function of the Python Standard Library.

In [145]:
list(map(makemultiplier(3), range(10)))

[0, 3, 6, 9, 12, 15, 18, 21, 24, 27]

In [146]:
print(list(map(lambda x: x**(1/2), range(0,101,10))))

[0.0, 3.1622776601683795, 4.47213595499958, 5.477225575051661, 6.324555320336759, 7.0710678118654755, 7.745966692414834, 8.366600265340756, 8.94427190999916, 9.486832980505138, 10.0]
