# Session 02: 2024-09-27

## Matteo Poggi, PhD

# Homework Solutions

## Create `venv` and bla bla...

If you can read this, I must have done :)  
We did yesterday in class.

## `pip freeze`

* `jupyterlab` package does not come alone. It depends on several other packages, so you see all these dependencies in the `requirements.txt` file.
* The number next to the library name is the version of the library `==` means that that version exactly is required.

## Favorite text editor

It is `neovim`, for the moment.

## Prime Number Test

In [None]:
from math import sqrt

def is_prime(n):
    largest_possible_divisor = int(sqrt(n)) + 1
    if n <= 1:
        return False
        
    for i in range(2, largest_possible_divisor):
        if n % i == 0:
            return False
            
    return True

NOTE:
  * early `return`: instead of a lo of branch (= if) blocks it is better to return the result. The code after that statement will not be executed!
  * the function `int` transform the result of a square root `sqrt` in an interger. That's ok... But wait a minute, **where are the types here?** Follow the lecture!  

# Lecture

## Anatomy of a Python Program

In [None]:
#!/usr/bin/env python
# the line before (called shbang) is optional

# imports
from math import sqrt

# function definition
def is_prime(n):
    largest_possible_divisor = int(sqrt(n)) + 1
    if n <= 1:
        return False
        
    for i in range(2, largest_possible_divisor):
        if n % i == 0:
            return False
            
    return True
   
# other function definition
def print_prime(n, prime):
    print(f"The number {n}", "is" if prime else "is not", "prime.")

# yet another function
def test_prime(n):
    print_prime(n, is_prime(n))

# this block only runs if it is not a library
if __name__ == "__main__":
    # local constant
    MAX = 10

    for n in range(1, MAX + 1):
        test_prime(n)

Some comments are in order:
  * there is a strange string preceded by `f`;
  * there is a strange if statement inside the function `print`.
  * there is a strange variable `__name__` that, apparently *may* be equal to `__main__`. 

### `f`-string

Stands for form *formatted strings*. The variable in curly braces are *not* considered part of the string but its value (we will see more precisely later) is printed. 

In [None]:
a = 6
print(f"The value of a is {a}")     # with `f`-string
print("The value of a is {a}")      # without `f`-string

They have tons of properties ([learn more](https://docs.python.org/3/reference/lexical_analysis.html#f-strings)). They are very handy.

### inline `if`

The syntax is
```Python
true_value if condition else false_value
```
It returns `true_value` if `condition` is true, otherwise returns `false_value`.  
(It is similar to the ternary operator `? :` in C.

### Attributes

`__name__` is an attribute, that is a variable, of the currend module (=file). It is set automatically by Python.

In [None]:
__name__

Its value depends on how the current module is executed. Other attributes of the current module can be seen using function `dir()`

In [None]:
dir()

This function, with an argument, it is used to inspect the propery of a certain object:

In [None]:
def add_one(x):
    return x + 1

dir(add_one)

However we will use as in the example to distinguish the "entry" point of a certain program. (Think of `main()` in a C program)

## Running Python

* If you want to open a Python shell interactively

```bash
(venv)$ python
(venv)$ ipython  # requires installation
```
  Then you can close it with `Ctrl+D`. Once close, if you reopen you have to type again all your program.

* If you want a jupyter lab notebook (we did yesterday)

```bash
(venv)$ jupyter notebook  # requires installation
```

* If you want to run a Python program.

```bash
(venv)$ python <filename>
```
  Then when the program reaches the end the Python shell is closed.

* If you want to keep the shell open after the program reaches the end add the option `-i`

```bash
(venv)$ python -i <filename>
```

## Hands on:

* save the above program to a file `prime.py`;
* execute it standalone;
* execute it interactively (with `-i` option) and test wheter 117 is prime or not.

SEE: `examples/prime/prime.py`

## `import`s

Any Python file can be `import`ed as a *module*.

Let us see very simple examples: `examples/prime/{main_1.py,main_2.py,main_3.py,main_4.py,main_5.py}`: in all these example we put in the same folder of the `prime.py` a `main_n.py`. Let us analyse what is going on.

### `main_1.py`

```Python
import prime                          # import all the module.

if __name__ == "__main__":
    prime.test_prime(1117)            # functions of the module should be prefixed with `prime.`
    print(prime.is_prime(1117))       # ditto.
```

Here we import all the packages. Every function should be prefixed with the module name followed by a `.`.  
Notice that the code under `if __name__ == "__main__"` in `prime.py` is *not* executed. Can you spot the what is the value of `__name__` in this case?

### `main_2.py`

```Python
from prime import *                   # import all the items present in prime modules.

if __name__ == "__main__":
    test_prime(1117)                  # no `prime` required here.
    print(is_prime(1117))             # ditto.
```

However this is not a good practice. The guiding principle is that you have to import only the function that you use.

### `main_3.py`

```Python
from prime import test_prime, is_prime  # import only `test_prime` and `is_prime`.

if __name__ == "__main__":
    test_prime(1117)                    # no prime required here.
    print(is_prime(1117))               # ditto.
```

In this case we limited our import. So far, so good. Now suppose that in the current module there is a function defined whose name chlahses with one imported from the module `prime`. How can we solve the name clash?

### `main_4.py`

```Python
from prime import test_prime                   # this function is imported with the same name as defined in `prime`.
from prime import is_prime as is_prime_number  # the function `is_prime` from `prime` in this module will be referred to as `is_prime_number`

def is_prime(string):                          # the function `is_prime` in this module is that one.
    return string == "prime"

if __name__ == "__main__":
    test_prime(1117)
    print(is_prime_number(1117))
    print(is_prime("prime"))
```

You can use `as` to create an alias on the current module. This is useful to solve name clashes and to have shorthands. It can be used also with modules.

### `main_5.py`

```Python
import prime as prm

if __name__ == "__main__":
    prm.test_prime(1117)
    print(prm.is_prime(1117))
```

Here `prm` is used instead of `prime`.

## Packages

Any folder containing a magic `__init__.py` is a *package*.

A typical packages has the following dir structure


```bash
mypkg  # <- package root
├── __init__.py
├── subpkgA  # <- a subpackage
│  ├── __init__.py
│  ├── spkgA_mod0.py  # <- a module
│  └── spkgA_mod1.py  # <- another module
└── subpkgB
   ├── __init__.py
   └── spkgB_mod0.py
```

The environment variable `$PYTHONPATH` defines the order to search for packages. Its content is also accessible from within Python in `sys.path`.

In [None]:
import sys
sys.path

Reminder: try to run this inside and outside the `venv`. What do you see?

### Relative `import`s

* very useful when developing a large library or project AND the directory structure is consolidated
* should be **avoided** in all other instances

```bash
mypkg  # <- package root
├── __init__.py
├── subpkgA  # <- a subpackage
│  ├── __init__.py
│  ├── spkgA_mod0.py  # <- a module
│  └── spkgA_mod1.py  # <- another module
└── subpkgB
   ├── __init__.py
   └── spkgB_mod0.py
```

```python
# file mypkg/subpkgA/__init__.py

from .spkgA_mod0 import item0, item1
from ..subpkgB.spkgB_mod0 import other_item as alias_name
```

Ref: [Realpython: Absolute vs relative imports](https://realpython.com/absolute-vs-relative-python-imports/)

### `__init__.py` file

* Can contain actual code and definitions
* Often used to define the public interface to more complex packages
  * imports the public definitions from private packages
  * often contains a variable `__all__`
    * it lists the names available with the syntax `from <module> import *`

```python
# file __init__.py

from .private_module import public_definition
from .other_public_module import *

__all__ = ("public_definition") + other_public_module.__all__
```

## Types and typing

Typing discipline:
 * strong
 * dynamic

Builtin types:

* boolean: `bool`
* numeric: `int, float, complex`
* byte arrays: `bytes, bytearrays`
* strings: `str`  ("immutable sequences of Unicode points")

* array-like: `list, tuple; set, frozenset; range`
* mappings: `dict`

**everything** is an object

Moreover every object has three properties:
  * identity `id` operator
  * type `type` operator
  * value

In [None]:
a = True
print(id(a))    # id
print(type(a))  # type
print(a)        # value

a = -19.65
print(id(a))    # id has changed
print(type(a))  # type has changed
print(a)

the operator `is` tests whether id of two object is the same

In [None]:
a = 42.56
b = a
print(a is b)
print(id(a), id(b))

Notice also that Python cache some simple objects (i.e. integer number)

In [None]:
a = 4
b = 4
print(a is b)

... up to a certain value

In [None]:
a = 1000
b = 1000
print(a is b)

### Mutability

An object is mutable if you can change its content without changing its id

In [None]:
obj = []
id(obj)

In [None]:
obj.append(42)
id(obj)

In [None]:
obj = 3
id(obj)

In [None]:
obj += 4
id(obj)

### Pass by value or pass by reference?

**Neither!**

#### Pass by assignment:
  * mutable types are just aliased (~reference):
    - `bytearray`
    - `list`
    - `set`
    - `dict`
  * immutable types are recreated anew (~value):
    - `bool`
    - `int`
    - `float`
    - `complex`
    - `bytes`
    - `str`
    - `tuple`
    - `range`
    - `frozenset` 

**BEWARE** : side effect ahead

This it what happens for a mutable type.

In [None]:
def foo(mylist):
    mylist.append(42)

x = []
foo(x)      # this modify the list in the outer scope
print(x)

And this is what happens for an immutable type

In [None]:
def bar(y):
    y += 5

x = 6
bar(x)
print(x)

**BEWARE**: buggy code

In [None]:
def bar(mylist=[]):
    mylist.append(42)
    print(mylist)
    
bar()
bar()    # the default value, being mutable, is updated every time

possible solution

In [None]:
def bar(mylist=None):
    if mylist is None:
        mylist = []
        
    mylist.append(42)
    print(mylist)

### Copying mutable objects

In [None]:
sublist = ["sub", "list"]
mylist = [sublist]

mylist_copy = mylist.copy()
mylist_copy[0] is sublist     # reference to the original list

Let us try to append an element to the sublist copy...

In [None]:
mylist_copy[0].append("bye")
mylist_copy[0]

Let us check what happen to the "original" sublist...

In [None]:
mylist[0]

It is also here!

What we need is a **deep copy**:

In [None]:
from copy import deepcopy

mylist_deepcopy = deepcopy(mylist)
mylist_deepcopy[0] is sublist  # a different object!

Let us append an element to the sublist deep copy:

In [None]:
mylist_deepcopy[0].append("Hello")

In [None]:
mylist_deepcopy[0]

And check if it has some effects to the "original"

In [None]:
mylist[0]

It does not. Original and deep-copied sublist are independent.

## an appetizer of Classes

*more on class later*

In [None]:
class MyClass(object):
    """
    class docstring
    """
    
    def __init__(self, variable):
        """
        Initialization or costructor
        """
        self._variable = variable
    
    def print_variable_n_times(self, n):
        """
        Method
        """
        for i in range(n):
            print(self._variable)
        
instance = MyClass(42)
instance.print_variable_n_times(5)

One can also check instance variable directly:

In [None]:
instance._variable

## Exceptions

* **NOTE** not all exceptions are errors!
* but all exceptions inherit from `Exception`

### `try`, `except`,  [`else`, `finally`]

```python
try:
    exception_prone_logic()
except MyException as e:
    # the `as e` above is optional
    exception_handling(e)
else:
    execute_if_no_exception()
finally:
    always_execute()
```

In [None]:
def foo(x):
    try:
        fourth_element = x[3]
    except (TypeError, IndexError) as e:
        print(f"This error was raised: {e}")
        x = [40, 2]
    except Exception as e:
        print("Unexpected exception")
        raise e
    else:
        print(f"fourth element is {fourth_element}")
    finally:
        print(sum(x))

foo(5)
print("----")
foo([3,4,5])
print("----")
foo([1,2,3,4])

### Is `finally` needed?

In [None]:
def bar_finally(x):
    try:
        fourth_element = x[3]
    except (TypeError, IndexError) as e:
        print(f"This error was raised: {e}")
        return 0
    else:
        return sum(x) - fourth_element
    finally:
        print("Here I'm")

In [None]:
def bar_no_finally(x):
    try:
        fourth_element = x[3]
    except (TypeError, IndexError) as e:
        print(f"This error was raised: {e}")
        return 0
    else:
        return sum(x) - fourth_element

    print("Here I'm")

In [None]:
bar_finally(5)

In [None]:
bar_no_finally(5)

Yes, sometimes it is needed because it is executed even if the function returns.

### to `raise` an Exception

When you are writing a program you may want to propagate/emit an exception. In Python this is done with the keyword `raise`:

```Python
raise RuntimeError("A runtime error has occurred.")
```

# Intermezzo: Lab Session

Here we will try to retrieve some *data* from the website [Our World in Data](https://ourworldindata.org/).

This website comes with a python API (the package [owid-catalog](https://pypi.org/project/owid-catalog/)) that allows you to retrieve datas. (see [docs](https://docs.owid.io/projects/etl/api/covid/))

## What is an API?

* It stands for Application Programming Interface. It is a library, a collection of utilities (functions, classes, ect.) that you can use from your python code.
* It is not a program because it does not contain an "entry point".
* One can wrap a library in a program allowing the non-programmer user to use it:
  - CLI (Command Line Interface)
  - TUI (Text User Interface)
  - GUI (Graphical User Interface) 

## Step 0

* setup a clean environment using `pip`
* install the package `owid-catalog`

## Step 1

* open a Python shell
* import `owid-catalog` packages following docs
* use the `find` function of the package querying `'covid'`, from the namespace `'owid'`. (If you do not kwno how to do, ask help)

  in particular chekout the `URI` (aka `path`) of the dataset we want to retrieve.

## Step 2

* install `jupyterlab` package with `pip`
* run `jupyter lab`
* repeat Step 1 within Jupyter Lab

### Solution

In [None]:
from owid import catalog

catalog.find("covid", namespace="owid")

## Step 3

* initlialize a `RemoteCatalog` object and call it `rc`
* query `rc` the `URI` retreived in Step 1 and 2 and call it `data`.

### Solution

In [None]:
rc = catalog.RemoteCatalog()
uri = "garden/owid/latest/covid/covid"
data = rc[uri]

## Step 4

write a python program `owid-covid.py` with the following requirements:
  * it has to retrieve the data as we did above
  * it ha to contain a function like
  ```Python
  def save_data(uri, path):
      pass
  ```
  that is able to save the `data` retrieved above as `csv`. Call the file `owid-covid.csv`. (Hint use method `.to_csv` on variable `data`)

### Solution

see `examples/owid-covid`. However the funtion is

In [None]:
def save_data(uri, path):
    rc = catalog.RemoteCatalog()
    data = rc[uri]
    data.to_csv(path)

## Step 5

* Refactor the code above extacting a package named `covidlib` containing a the module `retriever.py` containing the function `save_data`.
* Write a `main.py` program using that library.

### Solution

see `examples/covidlib`.

## Step 6

* What happen if one make a mistake in the `URI`?
* Wrap the body of the `save_data` function in a `try`-`except` block. If there is an exception of the a certain type, raise a `RuntimeError` exception informing the user that the `URI` does not exist.
* In the main program run the function `save_data` in another `try`-`except` lock. In the `RuntimeError` is caught, handle it terminating the program gently and printing the error message.

### Solution

In [None]:
# library
def save_data(uri, path):
    rc = catalog.RemoteCatalog()
    try:
        data = rc[uri]
    except KeyError:
        raise RuntimeError(f"The URI {uri} does not exist.")
    data.to_csv(path)

In [None]:
# main program
try:
    save_data("pippo", "pippo.csv")
except RuntimeError as error:
    print(error)