# First steps
This lecture is partially inspired by a past [Zeuthen Data Science Seminar](https://indico.desy.de/event/32700/) held by Jakob van Santen (DESY).

## 1. Getting to know python
Python is a programming language which is:
- **interpreted**: the code of a python program is not compiled and translated into machine-language before execution, but rather translated line-by-line. We already have an example for this: the fact that we can have interactive notebooks! There is no real distinction between *compilation* and *runtime* like in compiled languages. A consequence of this is that computation-intensive operations are inherently inefficient (but we have libraries to get around this). 
- **strongly but dynamically typed**: variables have types, but this types are determined at *runtime*.   
- **multi-paradigm**: python supports imperative, procedural, object-oriented and functional programming. Important to remember: every algorithm can be written in the form of sequences, selections (if) and loops!

### Running python code
- running the interpreter on a source file (script): `python3 script.py`
- using an interactive prompt (also known as REPL, read-eval-print-loop). `ipython` is an example, Jupyter notebooks are just an improved way of doing it! 

## 2. Variables (actually, names and bindings)
In traditional compiled programming languages, a variable has an *r-value* (a memory address) and an *l-value* (its actual value), so when we write an *assignment*:
```C
int a = 1; /* a little detour into the realm of C language */
```
it means that the binary representation of `1` is stored at a memory address statically associated to `a`. 

Python has a simpler syntax, partly because is a more abstract language:
```python
a = 1
```
where `a` is a *name* and `1` represents in general an *object*. This operation, strictly speaking, is a *name binding*.

From now on, we will speak of *variables* and *assignments* for the sake of simplicity, but keep in mind that conceptually `python` is doing a different thing.

### Let's practice...
We will now show practical examples of the ideas we have just introduced!

In [None]:
# first assignment
a = 1
print(a, type(a))

In [None]:
# second assignment
a = "hello world!"
print(a, type(a))

Note that:
- the type of `a` is automatically determined by the value we have assigned;
- even simple data types are represented as instances of a class (objects)

Values directly written in code `1` and `hello world!` take the name of *literals*.

### What about constants?
There are no actual *constants* in python. This is an inconvenience we have to live with although it sometimes get in the way of writing solid code. Some people like to define constant value in capital letters, for example `PI = 3.14`, to avoid accidentally mixing them with variables. If you think about what we said on variables being *names*, the reason for the lack of constants should be clear.

## 3. Types
Summary of native types:
- string (`str`): contains characters, supports unicodes, there is no separate type for individual characters;
- numeric types: integers (`int`) have variable-length, that means they do not have minimum or maximum values. Floating point numbers (`float`) are double precision (64 bit). Important: **floating point** is a synonym of **variable precision**. This means that the resolution of your variable (i.e. the minimum difference between two values) depends on the order of magnitude of the number. Most of the times you will have enough precision for all practical purposes, but be aware that some numbers (especially decimals) may not have an exact representation!
- booleans (`True` and `False`)
- collections: `tuple` (immutable sequence), `list` (mutable sequence), `set` (set of unique items), `dict` (key-value mapping)
- none `None` is a special object of `NoneType`, its usage may vary.
Let's illustrate a how to write the corresponding literals:


In [None]:
"python", "🐍"                      # str
b"\xf0\x9f\x90\x8d"                 # bytes
42                                  # int
42., 42.0, 4.2e1                    # float
(1, 42., "🐍")                      # tuple
[1, 42., "🐍"]                      # list
{1, 42., "🐍"}                      # set
{1: "foo", 42.: "bar", "🐍": "baz"} # dict
None                                # NoneType
True, False                         # bool

**Notes**
- more than one variable can be written or assigned on a single line, for example `a, b = 0, 1`, this works by implicitly creating a `tuple`; most of the times you can use it to make your code more readable;
- running an instruction with a single variable will print a *representation* of the corresponding object, however as you have just seen this only works for the last variable, so use `print()` statements to proper control your output.

You can check the type of a variable with `isinstance()`:

In [None]:
print(isinstance("python", int))
print(isinstance("python", str))

## 4. Basic operations
We will introduce now some basic operations on native types.

Typical arithmetic operations are represented by the usual symbols: `+`, `-`, `*`, `/`. 

In [None]:
a = 1 # int
b = 0.2 # float
c = a + b # will be a float!
print(c) 

As in other languages, an operation such as `a = a + 1` can be abbreviated with `a += 1`. While it can be tempting, and sometimes convenient, to use this shorthand notation to prepare a variable that has to be used later on, **avoid** using the same name for different meanings in the same block of code: it will quickly lead to confusion.

Some operators also work on strings:

In [None]:
a = "Hello"
b = " "
c = "World"
print(a + b + c)

This property is called *overloading* which is a special case *polymorphism*. In practice, the same operator can behave differently depending on the type of the arguments.

### Division is special

In [None]:
a, b = 5, 2
c = a / b
print(c)

The above statement reads very intuitively for a human, but from a computer's perspective is awkward: an operation between two integers actually returns a float!

We can realise an Euclidean division (with remainder) using to the `//` and `%` (modulus) operators:

In [None]:
a, b = 10, 8
d = a // b
e = a % b
print(d , e)

In `python`, the `//` operator takes the name of *floor division* and together with `%` is also defined for floats:

In [None]:
a = 3.5
b = 1.2
print(a // b, a % b)

One can interpret `//` between floats as a normal division `/` followed by a *floor function* returning the nearest smaller integer. Strictly speaking, a `//` between integers is a different operation altogether, but the two provide consistent results across integers and float.

*Legacy note*: in early versions of python, the `/` operator for int values would return the result of the Euclidean division and could be overridden with enigmatic statements such as `from __future__ import division`. Hopefully, you will not have to deal with this anymore as almost all code should be python 3 by today, but you may always run into some outdated code.

### Booleans
Let's show very quickly boolean operations.

In [None]:
a = True
b = not a
print(b)

In [None]:
c = a and b
d = a or b
print(c, d)

In `python` as in other languages you can find *bitwise* operators, that they work as `not`, `and`, `or` but at the bit level. These are `&` (and), `|` (or), `~` (not). We will not go deeper into this, for now.

### Comparisons
Comparisons operators... compare two values and return a boolean. You can either print directly or store the boolean in a further variable.

In [None]:
a = 2
b = 1
print(a == b) # are they equal?
c = (a != b) # are they different?
print(c)

Don't forget the usual arithmetic comparisons: `>` (greater), `>=` (greater or equal), `<` (lesser), `<=` (lesser or equal).

#### Floating point pitfalls

In [None]:
a = 10
b = 9 + 0.5 + 0.1 + 0.1 + 0.1 + 0.1 + 0.1
print(a == b)
# print(a , b)

Can you guess what is happening?

## 5. String formatting

In `python`, there are several ways of building strings incorporating different types of variables.

### String interpolation (legacy)
The oldest style is *[string interpolation](https://docs.python.org/3/library/stdtypes.html#old-string-formatting)*:

In [None]:
a = 1.2345
b = 42
print("a = %d, b = %d" % (a, b)) # d -> integer
print("a = %02d" % a)
print("a = %f" % a) # f -> float
print("a = %.2f" % a)

The `%d` and similar strings are called *format strings* and it is similar to what was done in the C language. This style has many pitfalls, is basically deprecated and we recommend against using it in your code!

### f-strings and format() method
*f-strings* are *formatted string literals* allowing to easily incorporate python variables and expressions in strings. An alternative and less compact notation uses the `format()` method.

In [None]:
a, b = 1.2345, 42
print(f"a = {a}, b = {b}") # this is simple
print(f"{a=}, {b=}") # this is even more compact, although less flexible
print("a = {}, b = {}".format(a, b)) # this is an alternative standard, can be more or less readable depending on the circumstances


You can control the spacing, number of zeros, number of decimals etc. with specific format strings. 

In [None]:
a, b = 42, 1042
print(f"b = {b:4d}")
print(f"a = {a:4d}") # this fill up to 4 spaces regardless of the number of digits
print(f"a = {a:04d}") # this will fill with zeros instead


In [None]:
a = 123.456
print(f"a = {a}") # default
print(f"a = {a:.2f}") # only print two decimals
print(f"a = {a:.2e}") # exponential notation!

### Multiline strings
You can build a multiline string using the newline (`\n`) escape sequence. What's an escape sequence? It's a sequence of characters that starts with a special character (`\`) and is subject to a special treatment.

In [None]:
print("Line 1\nLine 2\nLine 3")

You could get the same with three `print()` statements, however in some cases you may want to use a single one. For better readability, you could compose the string as follows:

In [None]:
s = "Line 1\n"
s += "Line 2\n"
s += "Line 3"
print(s)

Code using this style can easily get very cluttered, so use this parsimoniously! 

## Collections

### Tuples
Tuples are immutables set of values. Once constructed, they cannot be modified.

In [None]:
a = (1,2,3)
print(a[0], a[2])
# a[0] = 1 # try this!

In [None]:
a, b, c = 1, 2, 3
t = (a, b, c)
print(t)
# t[0] = 4 # this cannot work
a = 4 # maybe this will work?

In [None]:
print(t)

So be careful, the tuple has stored the values of `a`, `b`, `c` and assigning a new value to `a` will not change what's in the tuple!

### Lists
Lists are the simplest form of collection that can be modified.

In [None]:
a = list() # create an empty list using the list() statement
b = [] # create an empty list using the `[]` literal
b.append(1) # add an element to the list
b.append("hello")
print(b)

Collections can be non-homogeneous, but this is rarely a good practice to adopt!

You can create lists from tuples:

In [None]:
c = list((1,2,3))
print(c)
c[0] = 4 # now we can modify the list!
print(c)

We will show a few examples of list *slicing*. Slicing is a very powerful syntactic tool that allows to manipulate collections by means of a very compact notation. Spend a bit of time learning it, you will use it all the time!

In [None]:
l = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]

print(l[2:]) # start at index 2
print(l[2:9]) # select between indices 2 and 9-1 (upper limits are exclusive)
print(l[:9]) # stop at index 9-1
print(l[-1], l[-2], "...") # access individual elements in reverse order
print(l[::-1]) # reverse the entire list

We can check the number of elements in a collection with the `len()` function:

In [None]:
print(len(l))

Note how lists are not the same of arrays or vectors! For example:

In [None]:
a = [1,2,3]
b = [4,5,6]
c = a + b # this concatenates the lists, does not add their values!
print(c)

You can have nested lists:

In [None]:
a = [1,2,3]
b = [4,5,6]
c = [7,8,9]
m = [a, b, c]
print(m)

### Sets
Sets are similar to lists but do not hold multiple repetition of the same value.

In [None]:
a = {"a", "b"} # note that {} is not an empty set but an empty dictionary!
a.add("a")
print(a)

# We can also build it from a list
l = [1,2,2,3,4,5]
s = set(l) # repeated elements will not be preserved!
print(s)

### Dictionaries
Dictionaries map a key to a value. Keys and values can be of any type, and do not have to be homogeneous in general (but again, there is difference between what you *can* and what you *should* do.

In [None]:
d = dict() # initialisation statement
d[1] = 'a'
d['b'] = 2


"""
As mixed as it gets (almost).
A bit confusing.
Also not very useful?    
"""
print(d)

In [None]:
d = { 'a' : 1, 'b' : 2, 'c' : 3}
print(d)

- Dictionaries are one-way maps: you can get a value given its key, you can have repeated values but not repeated keys!
- You can use integers as keys, but this does not turn a dictionary into a vector.
- Dictionaries are not *sorted*. Typically they will preserve the order the elements have been inserted, but there is no concept of "sort by" and you should not rely on the idea that such a collection is sorted.

### A simple structured dataset

A common situation in science is having a table with labeled data. Python does not provide a native table or matrix format, but you can achieve something similar with a dictionary of lists, for example:

In [None]:
names = ['proton, neutron, electron']
symbols = ['p', 'n', 'e']
masses = [938, 939, 0.511]

particles = { "name" : names, "symbol" : symbols, "mass" : masses}

print(particles)

Now, if you access a given index on each list, you will get all the properties of a particle. This is still a crude way to build a structured dataset, but one that can be easily converted in the formats used by popular libraries.

### `in` operator
The `in` operator has two main use cases:
- check if a string is part of another string;
- check if a value is present in a collection.

In [None]:
a = "hello" in "hello world"
print(a)

b = 3 in [2,4,5]
print(b)

### From strings to collections, and vice-versa
A string can be turned into a list of strings representing its character:

In [None]:
l = list("hello")
print(l)

And a list of strings can be joined to form a string:

In [None]:
s = "".join(l) # use the "" string to join the strings in 
print(s)
t = " ".join([s, "world"]) 
print(t)

We can also do the opposite, split a string according to a certain expression:

In [None]:
l = t.split(" ")
print(l)

## Conditional statements
We have talked about booleans for quite some time, but what use are they for? Conditional statements are one of the building blocks of computer programming. A conditional allows for controlling the execution of a sequence based on a boolean value, that can be the result of a comparison operation. Let's introduce the `if-else` construct:


In [None]:
a = 1
ref_value = 1
if (a > ref_value):
    print(f"{a=} is greater than {ref_value=}")
else:
    print(f"{a=} is less than or equal to {ref_value=}")
# Change the value of a and run this cell again!

We could have been tempted to write the condition directly as `a > 1` instead of using an auxiliary variable `ref_value`. However, this form allows us to avoid repetitions of `1` in our string and makes our code more easily reusable. When possible, make your code depend on *parameters* rather than literals. 

We can have cascaded selections using `elif`:

In [None]:
a = 1
ref_value = 1
if (a == ref_value):
    print(f"{a=} is equal to {ref_value=}")
elif (a > ref_value):
    print(f"{a=} is greater than {ref_value=}")
else:
    print(f"{a=} is less than {ref_value=}")
# Change the value of a and run this cell again!

### match-case (only since python 3.10!)
This is also known as *switch-case* statement and has been part of other programming languages since ages. Surprisingly in python this has only been available since the recent 3.10 version. Sure you can make good use of it!

The `match` statement allows to select among different code blocks depending on the value of a variable:

In [None]:
a = 1
match a:
    case 1:
        print("one")
    case 2:
        print("two")
    case 3:
        print("three")
    case _:
        print("I don't know how to write this number!")
# change the value of a and see how the construct behave...

You can rewrite this using `if` and `elif`, but it will be much less nice to read!

This feature is actually more powerful than we have shown here, as the argument of `match` can be a more sophisticated expression. For the time being, let's just take note of its existence.

## Our first loop

### Very brief history of structured programming
We said at the beginning that every algorithm can be built from a combination of three constructs:
- sequence
- conditionals
- cycles (or loops).

This concept was rigorously demonstrated in the 1960s (Bohm & Jacopini) and constitutes the foundation of what is known as *structured programming*. While it is the natural way of teaching computing today, it has not always been the most popular paradigm. In the early days, it was common to introduce arbitrary *jumps* in the code with the now infamous `GOTO` (go to) statement. One of the founding fathers of modern computing, Edsger W. Dijkstra, wrote an open letter in 1968 advocating against the use of `GOTO`.

It goes without saying that python does not have a `goto`!

### Looping over a collection
Loops are repetition of a sequence of instructions controlled by a conditional statement: as long as the condition is true, the instructions are repeated. In `python`, loops can be a bit more abstract such as "repeat a sequence of instructions for all the elements of a collection", for example:

In [None]:
l = [1, 2, 3, 4, 5]
for n in l:
    print(n)