# Introduction to Python
Python is an interpreted light-weight object-oriented programming language (see Wikipedia article [here](https://en.wikipedia.org/wiki/Python_(programming_language))). Let's go over what those words mean:
1. Interpreted: An interpreted language loosely speaking means the code can be executed directly, without needing to be compiled, i.e. translated into a machine language. This means, you can write `2+2` without having to define a program, compiler etc. This makes it akin to MATLAB.
2. Light-weight: However, very much unlike MATLAB, Python does not actually need a lot of comping power to run. Of course, more computing power makes things typically run faster. But you can still develop sensible code with your standard machine without needing to beef it up just because you are running Python.
3. Object-oriented: This is the most difficult one to explain for anybody coming from e.g. a MATLAB environment. The easiest way to explain it is probably with an example. Say you want to calculate the area of a rectangle ($A = wl$, where $w$ is the width and $l$ is the length). In MATLAB, you would probably do something like this:

## Object orientation

In [1]:
def rectangle_area(width, length):
    return width * length

This means, you would simply write a function that takes two numbers and returns the result. While this is a great way to do things for some tasks, more complex programming often becomes cumbersome this way. But it does not have to be this way. Some very smart people (see [here](http://web.eecs.utk.edu/~huangj/CS302S04/notes/oo-intro.html)) came up with the idea of classes and objects. Let's see what that means:

In [2]:
class Rectangle:
    def __init__(self, width, length):
        self.width = width
        self.length = length
    
    def area(self):
        return self.width * self.length

Before explaining in detail what is happening here, let's see that these two things do exactly the same:

In [3]:
width = 3
length = 5

# Function evaluation
print(f'Function version: {rectangle_area(width, length)}')

# Object-based version:
the_rectangle = Rectangle(width, length)
print(f'Object-based version: {the_rectangle.area()}')


Function version: 15
Object-based version: 15


So what are we doing here? Instead of simply writing down an equation, like in `rectangle_area`, we create an **instance** or a specific rectangle, which has a given width and length. And then we use these **properties** to evaluate the area.
+ This means if you want to "do" a lot of things to this rectangle, you can have all of this information contained in one class. This makes it safe and easily readable.
+ You can use classes in other classes. So for example you could make a `Point` class, which is then used in a `Line` class, which is then used in a `Triangle` class etc. This makes for intuitively readable code
- Side-effects: you basically say `do something` with this object, but you don't know precisely what is happening. In the function version above, you put numbers in and get numbers out. There are no other actions going on except that.

In this course, we will limit the use of objects, just so that it does not become to cumbersome to follow. However, as we will see in the very end of the course, object orientation and mathematical programming natively go hand in hand!

## Now let's look at Python
So basically Python is a lot like MATLAB. But there are some key differences that I quickly want to touch upon:
- Some basic syntax
- Packages
- Indenting
- Zero-indexing
- Mutability
- Data representations

To do this, let's look at a simple example of calculating the area of a triangle given three points (see math [here](https://www.mathopenref.com/coordtrianglearea.html)):

In [None]:
import math

def triangle_area(p1, p2, p3):
    term1 = p1[0]*(p2[1] - p3[1])
    term2 = p2[0]*(p3[1] - p1[1])
    term3 = p3[0]*(p1[1] - p2[1])
    
    return math.abs((term1 + term2 + term3)/2)

### Some basic syntax
Most of the syntax in Python is fairly straight-forward to understand, however here is a list that you can use as a cheatsheet to get you started:

| Operator | MATLAB | Python | Comments |
|----------|--------|--------|----------|
|Addition  | +      | +      |          |
|Subtraction | -      | -      |          |
|Multiplication | *      | *      |          |
|Division | /      | /      |          |
|Exponential | ^      | **     |          |
|Square root | `sqrt`  | `math.sqrt` | For arrays, use `numpy` |
|Absolute value | `abs` | `math.abs` | For arrays, use `numpy` |
|Range (e.g. 1 to 10) | `1:10` | `range(1,11)` | Zero is the default value, so `0:10` would be `range(11)`. |
|Length of array | `len(p)` | `len(p)` | |

For the rest of it, check out [this awesome cheatsheet](http://mathesaurus.sourceforge.net/matlab-python-xref.pdf)! In general, Python syntax is very similar to MATLAB.

### Packages
Let's go line by line: the first line tells Python to use the `math` package. This is fundamentally different from MATLAB, where all the packages are automatically in place. Instead, the user has to set what to use. Why? Well, you should only include those packages that are needed to keep the program as light-weight as possible. If you include hundreds of packages, this will make your code slow and require all those packages to be stored somewhere.

You can, in principle, get rid of the `math.` by importing *everything* from the `math` package directly into your namespace. Syntax would then resemble MATLAB's a little more...

In [None]:
from math import *

# Can now use math functions without explicitly writing the package name
x = sqrt(4)
y = abs(-3)
# etc.

... but in general, this is considered bad practice, because this pollutes your namespace and may lead to namespace collisions causing headaches: 

In [None]:
from math import *
from brakingsystems import *
x = abs(-4)  
# Uh-oh, what is x? Is it 4, or an anti-lock braking system for 4 wheels?
# Let me read the documentation...

Since developers are lazy, some of the most popular packages' names are commonly abbreviated upon import. The most common idioms are:

In [None]:
import numpy as np
import scipy as sp
import pandas as pd
import matplotlib.pyplot as plt

The immediate follow-up question to this is: but how do I know which packages to use? Well, there are some standard packages which you will almost always need (`numpy`, `scipy`, `math`, `matplotlib` etc.), but beyond that it will always be stated that you need library `xlib` to run function `yfunc`. 


#### Installing packages

If `xlib` isn't alread installed in your environment, you can install it by typing at the command line:

```
conda install xlib
```

and Conda will do all the magic for you. But how does the magic work? In short, there are repositories online with thousands of libraries (e.g. the Anaconda one [here](https://anaconda.org/anaconda/repo)) that Conda (a tool in your Anaconda Python distribution) goes and checks out and fetches the package from. If you have something more exotic, you may have to tell Conda to look in a specific repository or channel. But this is beyond the scope of this introduction.

Once the `xlib` package is installed, you simply type `import xlib` in your Python code and voilà, you can use all of its contents. 

### Indenting
The next line is pretty self-explanatory: we define a function that takes three things as input. But what then? Well, you could try to remove those indentations from the code, but then it would throw you an error. Because Python understands whether something is part of a function or not by **indentation**. Let's try it out:

In [18]:
t = 5
  u = 5

IndentationError: unexpected indent (<ipython-input-18-85ecaaca66ef>, line 2)

So when do you indent? Whenever you open a new function, class, `if` statement of `for/while` loop. Just like in MATLAB. Only that in Python it's a must.

As a colleague once said, this forces us to write beautiful code, but it can also be pretty painful to work with, especially in the beginning. Fortunately, most editors have auto-indent functionally built in and will help you indent your code properly. 

### Zero-indexing
This is probably the biggest conceptual change when you come from MATLAB: in Python (like in ANY other self-respecting programming language) we have zero indexing, i.e. for the array `p = [4, 2, 1, 4]` we have `p[0] = 4` and `p[1] = 2`. Especially when you implement models with running indices like $1,...,10$, this is something to be really aware of. Fortunately enough, any mistakes you make will most likely be caught by the fact that if you try e.g. `p[4]` it will tell you "list index out of range":

In [23]:
p = [4, 2, 1, 4]
p[4]

IndexError: list index out of range

### Mutability
What does `=` mean? So for example:

In [11]:
a = 5
b = a
a = 3
print(b)

5


This is what we would also be expecting from MATLAB: when we set `a` to something, `b` is not affected. However, let's look at the following line:

In [12]:
a = [5]
b = a
a.append(3)
print(b)

[5, 3]


Here, `a` and `b` point to the same location in memory, so when we append something to `a`, it appears in `b` as well. The reason for this is that the `list` object that is created with the `[]` brackets is *[mutable](https://en.wikipedia.org/wiki/Immutable_object)*, i.e. once it is created, you can still add things to it (in fact, that's the point). This is however not the case for numerics (`float`, `int` and `bool`) as well as `string`s,`tuple`s and `frozenset`, because they are *immutable*, i.e. once they have been created, you cannot *do* anything to them (see [here](https://stackoverflow.com/questions/6158907/what-does-python-treat-as-reference-types)).

So what if you want to have the same behaviour in e.g. `list`s? Then you use the `copy` keyword:

In [13]:
a = [5]
b = a.copy()
a.append(3)
print(b)

[5]


### Data representations
Although not explicitly part of the example above, it is worth spending a minute on what data types are and which data is represented in Python.

A data type is basically a way to classify what type your data is in (duh!). So for example, `p = true` makes `p` a boolean, while `p = 5.5` makes `p` a float (also called a `double` because it is double the "precision" of a float). And of course `u = "Test"` makes `u` a string. There is a bunch of other types, but that is not important (see [here](https://realpython.com/python-data-types/) if you are interested).

But then there is the question on how data is represented in Python. At this point, I want to mention three ways (there are tons more):
- Lists
- Dictionaries
- Data frames

#### Lists
A list is just what it says: `p = [4, 2, 1, 4]`. You can access the individual items of a list by a running index, i.e. `p[2]` gets you the 3rd element of the list. Nothing more than that.

#### Dictionaries
A dictionary is a effectively an extension to a list. But instead of having a simple running index, you can define yourself what the index should be. Let's look at an example: say you want to assign a height to a name for a bunch of people:

In [7]:
# Define the data
people = ["Martin", "Anne", "Jesper", "Maria"]
height = [170, 165, 164, 171]

# Define the dictionary using a "dictionary comprehension"
heights = {people[i] : height[i] for i in range(len(people))}

# Index into the dictionary
heights["Martin"]

170

Or, defining the dictionary directly:

In [6]:
heights = {}  # or = dict()   Both give an empty dictionary
heights['Martin'] = 170
heights['Anne'] = 165
heights['Jesper'] = 164
heights['Maria'] = 171

heights["Maria"]

171

A couple of comments on dictionaries:
- The `key` (in our case `people`) has to be unique. You cannot have "Martin" appear twice in `people`.
- Dictionaries are typically a bit slower than lists, but make up for it by making code much, much easier to read.
- You can have tuples as keys, so for example:

In [33]:
# Define new index
city = ["London", "London", "New York", "Copenhagen"]

# Define new dictionary
heights_by_city = {(people[i], city[i]) : height[i] for i in range(len(people))}

# Index into it
heights_by_city["Martin", "London"]

170

#### Data frames
Data frames are part of the `pandas` package and are an extremely versatile way of representing your data. According to the description:
> Two-dimensional size-mutable, potentially heterogeneous tabular data structure with labeled axes (rows and columns). Arithmetic operations align on both row and column labels. Can be thought of as a dict-like container for Series objects.

What this means is that you can have tables, with whatever you want as cells, and the rows and columns also as individual indices chosen by you! Let's look at a simple example

In [43]:
import pandas as pd

# Let's define a dictionary with two values:
weight = [56, 65, 71, 68]
heights_and_weights = {people[i] : [height[i], weight[i]] for i in range(len(people))}

# Define our data frame
df = pd.DataFrame(data = heights_and_weights, index = ['Height','Weight'])

In [44]:
df

Unnamed: 0,Martin,Anne,Jesper,Maria
Height,170,165,164,171
Weight,56,65,71,68


In [45]:
df["Martin"]

Height    170
Weight     56
Name: Martin, dtype: int64

In [46]:
df["Martin"]["Weight"]

56

#### Dataclasses
In the beginning of this tutorial, I showed you classes. Now, we will look at a specific type of class called `dataclass` that was released in [Python 3.7](https://docs.python.org/3/library/dataclasses.html). In essence, it is a very convenient way to represent data in classes (hence the name). To show you how it works, let's look at the `Rectangle` class from earlier:

In [3]:
from dataclasses import dataclass

@dataclass
class Rectangle:
    width: float
    length: float
    
    def area(self):
        return self.width * self.length

This `Rectangle` class and the one on top are very similar, except for the fact that the new `Rectangle` class has a lot of boilerplate code taken away, which is why I find them very useful for optimization problems. However, due to this boilerplate code you cannot simple iterate over them, as they have stopped being *[hashable](https://stackoverflow.com/questions/14535730/what-does-hashable-mean-in-python)*. To enable them again to be hashable (and being used in `for` loops and the like), we need to add `frozen=True` to the definition:

In [6]:
@dataclass(frozen=True)
class Rectangle:
    width: float
    length: float
    
    def area(self):
        return self.width * self.length