# Introduction to Python - Basics

In [1]:
# Author: Alex Schmitt (schmitt@ifo.de)

import datetime
print('Last update: ' + str(datetime.datetime.today()))

Last update: 2017-04-05 11:49:35.467006


"*Some readers may disagree, but to me **computers and mathematics are like beer and potato chips: two fine tastes that are best enjoyed together**. Mathematics provide the foundations of our models and of the algorithms we use to solve them. Computers are the engines that run these algorithms. They are also invaluable for simulation and visualization. Simulation and visualization build intuition, and intuition completes the loop by feeding into better mathematics*" (Stachurski, *Economic Dynamics*, 2009)

This is a primer to the Python programming language. It is based and draws heavily upon the excellent introduction to Python on the Quant-Econ webside. To get there, execute the following cell (press *Shift + Enter* or click the *Play* button in the toolbar above).

In [2]:
import webbrowser
# Test
# generate an URL
url = 'http://quant-econ.net/py/index.html'
webbrowser.open(url)

True

## Documentation

Documentation for the Python 3 standard library can be found here:

In [3]:
url = 'https://docs.python.org/3/library/'
webbrowser.open(url)

True

Documentation for external packages such as Numpy or Matplotlib is separate, but can be found easily by googling the name of the package. In general, most (if not all) problems you may run into when programming in Python have already been encountered by someone else, so Google should be the first place to go when you are stuck somewhere. 

In case you want to apply and practice your Python skills in other areas, MOOC (*massive open online courses*) sites like Coursera or Udacity have great free-of-charge courses on Python, both for beginners and more advanced programmers. 

## Jupyter Notebook
This environment is a *Jupyter notebook*. It basically provides a *browser-based* interface to Python, allowing you to run all your Python code and get the output from your code in a browser window. What's more, you can add text cells like this one, and even include mathematical formulas (as you will see below), based on the Latex syntax. In addition, you can also incoporate images. This not only makes a notebook a great tool for *writing and documenting code*, it is also great for teaching. That's why we will make heavy use of Jupyter notebooks in this course.  

Moreover, note that you don't even have to have Python and Jupyter installed on your computer to read the notebooks that I have created for this course. Jupyter notebooks that are stored on Github can be viewed through the website nbviewer.jupyter.org. Run the following piece of code to see what notebooks are available for this course (if it does not work, make sure to run the code above that imports the webbrowser module):

In [4]:
url = 'http://nbviewer.jupyter.org/github/Moony2D/Python-Intro-Ifo'
webbrowser.open(url)

True

To start a Jupyter notebook, open up a terminal and type "jupyter notebook". This should open up a new window on your default browser (recommended: Chrome or Firefox). You should see the Jupyter dashboard, which contains a list of all files in your current directory. If there is a Jupyter notebook in your directory (with suffix .ipynb), you can open it by clicking on it. Otherwise, you can open an empty notebook by clicking on *New* at top right and select Python 3.

As you may have already realized, a Jupyter notebook has two types of cells. A cell can be made into a *text cell* like this one by choosing "Markdown" in the drop-down menu above in the toolbar (*Markdown* is a type of text format). A new cell is by a default a *code cell*. In contrast to a text cell, it has some blue writing ("In [ ]:") to the left of it. Once you run it, a number appears in the bracket. You can run any cell by either clicking the *Play* button in the toolbar above or by pressing *Shift + Enter*. Running a cell processes its output: for a text cell, it just formats the text. For a code cell, it executes the code and prints the output below. Jupyter then either creates a new code cell below or (if already there) jumps to the next cell.     

## "Vanilla Python"

The core package (or "Vanilla Python") contains the Python Standard Library, a collection of many basic *built-in* modules and functions. In other words, it comprises all the functionalities in Python that you can use without installing any external packages (more on that in the next lecture).

Functions in Python are used by calling their name and their argument(s) in parenthesis. A frequently used function from the Standard Library is **print()**. As the name indicates, it displays output on screen, in Jupyter below a code cell: 

In [5]:
print("Hello ifo")

Hello ifo


In [6]:
print(2 + 2)

4


Note that Jupyter also displays output from the last line in a code cell. Compare the following examples:

In [7]:
"Hello"
print("Hello Westeros")
"Westeros"

Hello Westeros


'Westeros'

In [8]:
1 + 1
print(2 + 2)
3 + 3

4


6

As a general rule, use **print** whenever you wanna see some output shown on screen. 

## Assigning a name to an object

Since we don't just want to use Python as a glorified calculator that prints calculations to the screen, we typically work with *variables* when using programming languages. A variable in Python is essentially a *name* or a *label* that refers to an *object*. An object in Python is a collection of data stored in computer memory that consists of
- a type
- some content
- a unique identity
- (zero or more methods)

To be more concrete, let's look at an example:

In [9]:
S = "Hello ifo"

In this statement, we assign the name **S** to the object **"Hello ifo"**. This object is a *string* -- a sequence of letters --, which is its type. The content of the object is a sequence of nine characters (note that space is also a character). Its identity is just an internal index that Python uses to access the object in computer memory. It can be checked using the **id** function: 

In [10]:
print(id(S))

4420002544


Consider another example. Below I assign the name **A** to the *integer* 2. Whenever I call **A** later on, it will refer to this object.

In [11]:
A = 2
print(A)

2


Internally, Python uses some type of registry, where it keeps track of the names we have defined and the objects they point to. Note that more than one name can point to the same object. In the following, I assign the name **B** to the object that is already referred to by the name **A**. Hence, calling **B** prints out the same value; moreover, we can use the **id** function to verify that they really refer to the same object:

In [12]:
B = A
print(B)
print(id(A))
print(id(B))

2
4375677504
4375677504


It is straightforward to reassign a name to another object, as seen below. **A** now refers to a different object -- the integer 3 -- while **B** still points to the same object - the integer 2 - as before:

In [13]:
A = 3
print(A)
print(B)
print(id(A))
print(id(B))

3
2
4375677536
4375677504


Finally, note that you can assign names not only to integers and strings, but to various types of objects. The next section will go through the most important ones.

## Object Types

The most important data types in Vanilla Python are:
- integers ('int') and floats ('float') for numbers
- strings ('str') for text
- booleans, which can have two values, *True* or *False*
- arrays or containers, such as lists, sets, and dictionaries
In addition, external packages (such as Pandas and Numpy which we will use later on) often use their own object types.

To check the type of an object, you can use the **type()** function: 

In [14]:
a = 2
print(type(a))

<class 'int'>


The type of an object matters for what operations can be used with that type. If you try to use an operation on a type for which it is not defined, Python will return an error message. For example, you can use the standard arithmetic operations (+, -, *, /) on integers and floats. Trying to use division on strings, however, will not work. Moreover, some operations do different things for different types. Adding two numbers returns the sum, while adding two strings concatenates them. 

### Integers and Floats

In [15]:
a = 2
b = 1.5
print(type(a))
print(type(b))

# you can use the standard arithmetic operations on integers and floats and assign a new name to the result
c = a + b
print(c)
print(a * b)
print(a / b)
# to take b to the power of a, use '**"
print(b**a)

<class 'int'>
<class 'float'>
3.5
3.0
1.3333333333333333
2.25


Note: if you use Python 2.7 (rather than Python 3.5), division of two *integers* returns only the integer part! 

In [127]:
# in Python 2.7, the following would return 1 instead of 1.5
print(3 / 2)

1.5


### Strings

To distinguish strings from assigned object names, they must be set in quotation marks, either single or double:

In [16]:
# strings
c = "T"
d = 'yrion'
print(type(c))
# using + on two strings concatenates them 
print(c + d)
print(type(c + d))

# division on two strings (or a number and a string) throws an error
# print(c / d)

<class 'str'>
Tyrion
<class 'str'>


### Booleans

In [17]:
# booleans: can either be true or false
print(4 > 3)
print(4 > 5)
e = (4 == 5)
f = (6 > 5)
print(e)
print(type(e))
print(f)
# using arithmetic operations on two booleans treats True as 1 and False as 0
print(e + f)
print(e * f)
type(f)

True
False
False
<class 'bool'>
True
1
0


bool

### Lists
Vanilla Python has different types of arrays or "containers". The most important are probably lists, sets and dictionaries (we will get to the latter two in the next lecture). Lists are defined similar to row vectors in Matlab. However, note that they behave differently. In particular, vectorized operations (e.g. elementwise summation) does not work with lists (we will see a different type of array which you can use for vectorized operations later on). Instead, "summing up" two lists will concatenate them. All types of arrays can be used with the **len()** function, that gives the length of the array.

In [18]:
# lists
a = [1,2,3,4,5]
b = [6,7,8]
print(a)
print(type(b))
print(a + b)

print(len(b))

[1, 2, 3, 4, 5]
<class 'list'>
[1, 2, 3, 4, 5, 6, 7, 8]
3


Indexing works different from Matlab in two ways. First, it starts at zero; in other words, the first element of a list Q is Q[0]. Second, when you want to access multiple elements, say the second element (indexed by [1]) and the third (indexed by [2]), the notation would be Q[1:3]. The semi-colon here stands for "from 1 to 3, but excluding 3". In other words, the range starts at the first element and stops at the index of the first element *which is not included*. This is sometimes referred to as *slicing* a list. In addition, the index [-1] is used for the last element in an array. 

In [19]:
print(a[0])   # accesses the first element of list a
print(a[1:3]) # indexes the second and third element of list
print(a[1:])  # indexes all elements starting with the second
print(a[:-1]) # indexes all elements except the last
print(a[::-1])# indexes all elements in backwards order

1
[2, 3]
[2, 3, 4, 5]
[1, 2, 3, 4]
[5, 4, 3, 2, 1]


One thing that can be a bit confusing about Python is that different variable names can refer to the same object. In the following example, both b and c refer to the same list. Changing an element in b also changes c.

In [21]:
b = [6,7,8]
c = b
print(c)
print(id(b) == id(c))

b[0] = 9
print(c)
print(id(b) == id(c))


[6, 7, 8]
True
[9, 7, 8]
True


To get a list of integers, you can also use the *list* and the *range* functions. *list(range(x))* creates a list of all integers from 0 to x-1, hence again excluding the last element x. *list(range(x,y))* creates a list of all integers from x to y-1.

In [133]:
print(list(range(10)))   # list from 0 to 9
print(list(range(1,10))) # list from 1 to 9
print(list(range(10,1,-1))) # list from 10 to 2 (going backwards)

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
[1, 2, 3, 4, 5, 6, 7, 8, 9]
[10, 9, 8, 7, 6, 5, 4, 3, 2]


### Tuples

Note that lists are "mutable", which means they can be changed, as we have seen above. The "immutable" equivalent to lists are called *tuples*. We can use the same index notation as for lists in order to access its elements. However, trying to assign a new value to them throws an error.

In [139]:
d = (9, 10, 11)
print(len(d)) 
print(d[0])  # accesses the first element of d
# d[0] = 12  # will throw an error

3
9


A side note: a string behaves similar to a tuple of text, in the sense that you can access each letter by an index and that it is immutable.

In [142]:
string = 'Tyrion'
print(string[1])  # accesses the second letter of string
print(len(string))
# string[1] = 'x'  # will throw an error

y
6


## Loops
Iterating - applying the same action to a sequence of data - is an extremely important task in computation. Therefore, *loops* are an essential feature of every programming language. In Python, we will mainly use the **for** loop.

In [28]:
# iterating over a list of strings
text = ['Daenerys', 'Tyrion', 'Bran']
for item in text:
    print(item)  

# iterating over a list of integers
values = list(range(1,11))
for item in values:
    print(item**2)

# alternative: use range function (cp. above)  
for index in range(1,11):
    A = index**2
    print(A)

Daenerys
Tyrion
Bran
1
4
9
16
25
36
49
64
81
100
1
4
9
16
25
36
49
64
81
100


Some comments about the syntax of a for-loop:
1. A for-loop starts with the keyword **for**, followed by the name for what I will call an *index*. This is followed by **in** and a *sequence of data*. Often, this sequence is a list or another type of array. 
2. The first line ends with semi-colon (**:**). This is mandatory and will cause an error message if omitted. In case you are used to other programming languages like MATLAB which do not use semi-colons in analogous expressions, expect this to happen often in the beginning :).
3. The line(s) following the semi-colon comprise the *code block* that we are looping over. As you can see above, these lines are indented. This is very important, since Python knows the extent of the code block only from indentation, unlike other languages like MATLAB, which mark the end of a code block by an "end" statement. If you do not indent the lines in a for-loop or if the number of lines you indent is not the same for all the lines in a code block, you will get an error message. 

Note the following about indentation:
- It is a convention among Python programmers to indent lines in a code block by 4 spaces. In fact, many programs used to write Python code (such as Jupyter or many text editors) will automatically indent the line by 4 spaces when you press Enter after a semi-colon. Moreover, in Jupyter you can also use the tab key to indent by 4 spaces.
- If you have a code block within a code block, you need to indent by 8 spaces etc (see example below). In other words, indent 4 spaces after every semi-colon. 
- Why using indentation? While it can take some time to get used to (in particular when you have experience in languages which do not use this concept), clean and consistent indentation improves readability and avoids clutter, such as the brackets or end statements used in other languages. 

A side note: in Python, "readability" of code is extremely important and a dominant principle that guides both the design of the language and the way Python programmers should write code. There are countless style guides and guidelines that I encourage you to read. Just as an illustration of the philosophy underlying Python, you can read the "Zen of Python":  

In [149]:
import this

The Zen of Python, by Tim Peters

Beautiful is better than ugly.
Explicit is better than implicit.
Simple is better than complex.
Complex is better than complicated.
Flat is better than nested.
Sparse is better than dense.
Readability counts.
Special cases aren't special enough to break the rules.
Although practicality beats purity.
Errors should never pass silently.
Unless explicitly silenced.
In the face of ambiguity, refuse the temptation to guess.
There should be one-- and preferably only one --obvious way to do it.
Although that way may not be obvious at first unless you're Dutch.
Now is better than never.
Although never is often better than *right* now.
If the implementation is hard to explain, it's a bad idea.
If the implementation is easy to explain, it may be a good idea.
Namespaces are one honking great idea -- let's do more of those!


Two useful functions when iterating in Python are *enumerate()* and *zip()*. *enumerate()* loops through a list while returning an index for each element. *zip()* is useful when stepping through pairs from two sequences.

In [29]:
letter_list = ['a', 'b', 'c']
for index, letter in enumerate(letter_list):
    print("letter_list[{0}] = '{1}'".format(index, letter))
    
# use enumerate to get a dictionary from two lists
names = ['Daenerys', 'Tyrion', 'Arya', 'Samwell']
houses = ['Targaryen', 'Lannister', 'Stark', 'Tarly']
D = dict()
for (index, name) in enumerate(names):
    D[name] = houses[index]
print(D)    


# the same can be done using zip()
E = dict()
for (name, house) in zip(names, houses):
    E[name] = house
print(E)    

# in fact, we don't even have to use a loop
F = dict(zip(names, houses))
print(F)


letter_list[0] = 'a'
letter_list[1] = 'b'
letter_list[2] = 'c'
{'Tyrion': 'Lannister', 'Samwell': 'Tarly', 'Arya': 'Stark', 'Daenerys': 'Targaryen'}
{'Tyrion': 'Lannister', 'Samwell': 'Tarly', 'Arya': 'Stark', 'Daenerys': 'Targaryen'}
{'Tyrion': 'Lannister', 'Samwell': 'Tarly', 'Arya': 'Stark', 'Daenerys': 'Targaryen'}


## Comparisons and *if*-statements

We have already seen comparisons above when talking about booleans. A reminder:

In [20]:
e = (4 == 5)   # e is a boolean with value False, since 4 does not equal 5
print(e)
print(type(e))

f = 1
g = 2
print(f <= g)
print(f > g)

# chain inequalities 
print(1 < 2 < 3)

False
<class 'bool'>
True
False
True


Comparisons include **<, >, <=, >=, ==, !=**. The last one (**!=**) stands for 'not equal'. As seen above, you can also compare more than two elements. Moreover, you can combine different comparisons or booleans by using the keywords **and** and **or**. Expressions linked by **and** will only be evaluated as **True** if *all* expressions are true. Expressions linked by **or** will be evaluated as **True** if at least one expression is **True**:

In [29]:
print(4 == 5 and 5 < 6)  # False, since first expression is False
print(4 == 5 or 5 < 6)   # True, since second expression is True

A = True  # boolean       
print(A and 5 < 6)    # True, since both expressions are True

B = False
print(A or B)         # True, since A is True           

False
True
True
True


Comparisons and Booleans are frequently used for *conditional statements* aka *if-statements*. The idea is that a code block is executed only if a given condition is true. This condition can consist of a comparison or a Boolean - in other words, of anything that can be evaluated as True or False. 

In [33]:
x = 3
if x > 0:                   # condition using a comparison (here True)
    print('x is positive')  # code block that is executed only if the condition is met  

B = False    
if B:                                # condition using a comparison (here False)
    print('Programming is boring!')  # code block will not be executed if B is False
    

x is positive


As it was the case for for-loops, the first line of an if-statement ends with a semi-colon and the code block must be indented. That being said, sometimes if-statements can also be expressed in one line.

You can also specify code to be implemented if the condition does not hold, using **else**. If there more than two alternatives, you can distinguish the different cases with **elif**:

In [24]:
## two alternatives
x = - 5
if x > 0:                   # condition
    print('x is positive')  # code block that is executed only if the condition is met
else:
    print('x is negative')

## three alternatives  
s = 'Arya'
if type(s) == int:
    print('s is an integer')
elif type(s) == float:
    print('s is a float')
else:
    print('s is not a (real) number')

x is negative
s is not a (real) number


The last expressions could have also written without the **elif** part, by combining the two conditions using **or**:

In [32]:
s = 4
if (type(s) == int) or (type(s) == float):
    print('s is an integer')
else:
    print('s is not a (real) number')

s is an integer
