In [1]:
%%html
<style>
h1,h2,h3 {
    text-align: center;
}

.term {
    text-align: center;
    margin-top: 1em;
    margin-bottom: 1em;
}

.organizers {
    text-align: center;
    margin-left: 20%;
    margin-right: 20%;
    margin-bottom: 1em;
}

.presenter {
    text-decoration: underline;
}
</style>


$
\newcommand{\nc}{\newcommand} 
\nc{\t}{\text}
\nc{\tb}{\textbf}
\nc{\ti}{\textit}
$

# Python Programming for Machine Learning

<div class="term">Winter Term 2024</div>

<div class="organizers">
    Jannik Wolff, Stefaan Hessmann, Ping Xiong, Max Eißler, Nathan Lee, Johannes Maeß, Ryan "Wilbur" Gelston, <span class="presenter">Eike Middell</span>
</div>

<center><img src='images/python-logo-only.svg' width=250> </center>

## Why should you learn Python?

- [pypl.github.io](https://pypl.github.io/) calculates a programming language popularity index based on Google searches
- since 2019 Python is the most popular language
- de facto it is **the** programming language for Data Science and Machine Learning

In [2]:
import pandas as pd
import matplotlib.pyplot as p

url = ("https://gist.githubusercontent.com/emiddell/7dea88e36a739f070cb020877b4e3be6"
       "/raw/0bfd1477d649ac0d79ec9ae1a8a942a2cc9070c8/pypl.csv")

data = pd.read_csv(url, parse_dates=["Date"])

p.figure(figsize=(12,6))
p.rc("font", size=16)

for language in ["Python", "Java", "C/C++", "JavaScript"]:
    p.plot(data["Date"], data[language], label=language)

p.legend()
p.xlabel("Date")
p.ylabel("PYPL Popularity Index");

ModuleNotFoundError: No module named 'pandas'

## Some Facts about Python

* Python is a high-level language, implemented in C through CPython

* Python is an **interpreted** language (C/C++/Julia are compiled)

* Python offers paradigms such as object oriented programming (Java, C++) and functional programming (Haskell)

* Python is dynamically typed, but type-annotations are available (C/C++/Java are statically typed)

* Python uses mandatory indentation levels to identify code blocks (C/C++/Java use `{}`)

* Python offers a vast standard library (similar to Java/C++, in contrast to C or Lua)

* Python is slow, much slower than C/C++ or Java, but allows simple cross-language binding with C (and others)

## Why should you learn Python? (cont'd)

- expressiveness
- readability
- portability
- 'batteries included' philosophy of the Python Standard Library
- extensibility
- package ecosystem


### Expressiveness
- audience: computer
- How easy is it to express complex things in a compact way? How many details do you have to provide?

### Readability

- audience: coworkers (or your future self)
- code is read much more often than it is written
- Python uses whitespace indentation to identify code blocks
- English keywords

In [7]:
def func(x):
    if x < 0:
        print(x)
        print("x <= 0")
    else:
        print(x)
        print("x >= 0")

## Portability
- python code gets interpreted at runtime
- if equivalent Python environments are available on different machines Python code can run without modifications on different operating systems

## The Python Standard Library
- "batteries included" philosophy
- scroll the [Python Standard library](https://docs.python.org/3/library/index.html)

## Extensibility
- Python's import mechanism
  ```python
  import foo
    
  foo.func()
  ```
- search mechanism that looks at specified paths for python code in `foo.py` or a shared library `foo.so` (or `foo.dll`)
- cross-language bindings: possibility to write Python extensions in other languages (e.g [C](https://docs.python.org/3/extending/extending.html),[C++](https://pybind11.readthedocs.io/en/stable/index.html), [Rust](https://pyo3.rs/))

## Package Ecosystem
- [numpy](https://numpy.org/)
    - efficient n-dimensional arrays on the CPU
- [matplotlib](https://matplotlib.org/)
    - plotting 
- [pytorch](https://pytorch.org/)
    - n-dimensional arrays on the CPU & GPU, Autograd, neural nets
- [pandas](https://pandas.pydata.org)
    - data analysis on tabular data
- [scikit-learn](https://scikit-learn.org/)
    - machine learning in Python
- [SciPy](https://scipy.org/)
    - fundamental algorithms for scientific computing in Python

## Python Interpreters - CPython

- python scripts are interpreted at runtime by a Python interpreter
- CPython is the main Python implementation (written in C)
- e.g. hello.py: 
    ```python
    #!/usr/bin/env python
    print("Hello World!")
    ```
- execute with:
  ```
  % chmod +x hello.py
  % ./hello.py
  Hello World!
  ```

- interactive use:
    ```
    % python
    Python 3.11.3 | packaged by conda-forge | (main, Apr  6 2023, 08:57:19) [GCC 11.3.0] on linux
    Type "help", "copyright", "credits" or "license" for more information.
    >>> print("Hello World!")
    Hello World!
    ```

## Python Interpreters - IPython

- enhanced interactive Python shell (e.g. tab-completion, object introspection, system shell access, history)
  ![IPython](images/ipython_screenshot.png)
- special commands
    ```ipython
    In [1]: %timeit 1+1 7.88 ns ± 0.0494 ns per loop (mean ± std. dev. of 7 runs, 100000000 loops each)
    ```
- full list of shell features in the [documentation](https://ipython.readthedocs.io/en/stable/overview.html#enhanced-interactive-python-shell)

## Python Interpreters - Jupyter Notebooks

- [Jupyter Notebooks](https://jupyter.org/): web application for creating and sharing computational documents 
- notebooks mix code, computation results, documentation
- two process architecture: clients connecting to computing kernels
- computing kernels constrained neither to the local machine nor to Python

- this presentation is a Jupyter Notebook:

In [4]:
1 + 1

2

## Installing Python

* Ubuntu/Debian
```shell
 % sudo apt install python3-pip python3-venv
```

* MacOS (install brew)
```shell
 % brew install python
```

* Windows/others from [python.org](https://www.python.org/downloads/)

## Creating a virtual environment

* isolation: folders with Python executables and packages
* use for every project its own environment
<p>

* Open a terminal and check whether you have `pip` and `venv` available

```shell
  % python3 -m pip --version
  % python3 -m venv --help
```


* Create a virtual enviroment to install new Python packages


```shell
 % python3 -m venv venv
```

* Activate the enviroment by typing:

```shell
 % source venv/bin/activate
```

 ## Install Python Packages (pip)
 
 * Make sure that your Python version is 3.8 or later with 

```shell
 % python3 --version
```

* Install the following packages by:

```shell
 % python3 -m pip install numpy matplotlib jupyter pandas
```

## Alternative: Installing Conda

 * Get miniconda from [here](https://docs.conda.io/projects/miniconda/en/latest/). Run the installer and follow the prompts:
 ```shell
 % bash Miniconda3-latest-Linux-x86_64.sh
 ```

 * Update conda:
 ```shell
 % conda update conda
 ```

 * Create and activate a new environment conda:
 ```shell
 % conda create --name pyml
 % conda activate pyml
 ```

 * Install packages:
 ```shell
 % conda install jupyter numpy matplotlib pandas
 ```

## Development Environment

* Using one of the installation approaches, you can run a local instance of jupyter with:
```bash
% jupyter notebook
```

* For Python development beyond Jupyter notebooks, we recommend using an Integrated Development Environment (IDE)
    - [PyCharm](https://www.jetbrains.com/pycharm/)
    - [Visual Studio Code](https://code.visualstudio.com/) (using the [Python extension](https://code.visualstudio.com/docs/languages/python) and the [Ruff extension](https://code.visualstudio.com/docs/python/linting) for linting and code formatting)

## Starting out with Python

* Python features the **print** *function* to write text to standard output:

In [3]:
print('Hello world')  # Shift+Enter to execute a cell

Hello world


In [None]:
print('Hello')
print('world' + ' is here') # string concatenation by '+' sign

'Hello world' # last line within a cell will be printed automatically (Notebook only)

### Math operators

 * Addition and subtraction: $\quad 2 + 3, \> 3 - 5\>$ Multiplication and division: $\quad 4 \cdot 2 ,\>  \frac{4}{3}$

In [6]:
2 + 3, 3 - 5, 4 * 2 , 4 / 3 

(5, -2, 8, 1.3333333333333333)

 * Powers and roots: $\quad  3^2, \> \sqrt9, \> \frac{1}{\sqrt{4}}, \> \sqrt[3]{9^2}$

In [None]:
3 ** 2, 9 ** 0.5,  4 ** -0.5, 9 ** (2 / 3) # power of two, square root, one over square root, etc...   

 * Floor division, modulo,  bit-wise shift

In [None]:
10 // 3, 5 % 3, 1 << 8 # (dual 1 -> 1 00 00 00 00)

All available operators:
```
+       -       *       **      /       //      %      @
<<      >>      &       |       ^       ~       :=
<       >       <=      >=      ==      !=
```

### Variables

 * Variables in Python are typed **dynamically**

In [5]:
result = 'hello'
print(result)

result = 6 / 3 * (1 + 2)
print(result)

hello
6.0


- in-place operators may be used on variables

In [None]:
result = 3
result *= 3
print(result)

message = 'hi'
message += '!'
print(message)


All in-place operators are:
```
+=      -=      *=      /=      //=     %=      @=
&=      |=      ^=      >>=     <<=     **=
```

### Variables (cont'd)

- all data in a Python program is represented by objects or by relations between objects
- every object has an identity, a type and a value
- assigning variables binds objects to names (similar to pointers in C/C++)
![IPython](images/name_binding.svg)

In [6]:
x = "Hello World!"
print(id(x))
print(type(x))

y = "Hello World!"
print(id(y))
print(type(y))

z = x
print(id(z))
print(type(z))

1703421574896
<class 'str'>
1703422924528
<class 'str'>
1703421574896
<class 'str'>


## Mutability
- Python distinguishes between mutable and immutable types
    - e.g. numeric types (`int`, `float`) and `str` are immutable, i.e. they never change their value
    - e.g. `list` is mutable



In [None]:
x = 1; y = x
print(x, id(x), y, id(y))
x += 1
print(x, id(x), y, id(y))

In [None]:
x = "Hello World"; y = x
print(x, id(x), y, id(y))
x += "!"
print(x, id(x), y, id(y))

In [None]:
x = [1,2,3]; y = x
print(x, id(x), y, id(y))
x += [4]
print(x, id(x), y, id(y))

## Types

* the numeric types are `int`, `float` and `complex`; `bool` is a subtype of `int`
* we can use the `type` built-in to get the type of any item, e.g. literals:

In [None]:
type(4), type(2.4), type(3.3j), type(True)

* or results (note the implicit conversion)

In [4]:
type(2 + 3), type(4 / 3), type(4**0.5j), type(3**2), type(5.4 % 3), type(not 1.3)

(int, float, complex, int, float, bool)

* the sequence types are `list`, `tuple`, `range` and `str`

In [5]:
type([1, 'x', 3.3, True]), type((1, True, 'hi')), type(range(5)), type('wow')

(list, tuple, range, str)

## Explicit Type Conversion

In [10]:
int("45"), bool(0), bool(23), int(4 / 3), float(2 + 2), float('42')

(45, False, True, 1, 4.0, 42.0)

In [9]:
 int(1.99), round(1.59), bool(""), bool(" "), bool([]), bool([1])

(1, 2, False, True, False, True)

* convert a tuple to a list

In [11]:
a = (1, 2, 3)
type(a), type(list(a))

(tuple, list)

* or to a string

In [None]:
type(a), str(a), type(str(a))

## Booleans

- in Python, any object can be evaluated for truth value testing, i.e. *considered a bool*; the `bool` constants are `True` and `False`

In [None]:
a = True
not a

- 0, 0.0, empty strings and empty containers evaluate to `False`, everything else evaluates to `True`

In [None]:
print(bool(0), bool(0.0), bool(''), bool([]), bool({}), bool(()))
print(bool(1), bool(0.01), bool(' '), bool([1]), bool({3}), bool((6,)))

- `and`, `or`, and `not` work as expected on bools, but when used on other objects, will return the first operand that decides the result (first `True` value or last `False` in `or`, last `True` value or first `False` value in `and`)

In [None]:
0 or '', '' or 0, 1 and 'a', '' and 1, 0 or 'tree' or 5, 6 and [] or [5], a and 6

## Comparison Operators

- comparison operators always evaluate to bools

In [None]:
2 == 2, 2 == 4, 2 != 4, 2 >= 4, 3 <= 3, 43 < 3457

In [None]:
"hello" == "world", "hello ".strip() == "hello"

- the `is` operator checks for identity

In [None]:
[1] == [1], [1] is [1]

In [None]:
x = 1; y = x
x is y

### Lists

- list can be created with the literal notation `[]` or with the built-in `list` type (one trailing comma is okay)

In [None]:
[], [1, 2, 'a'], [1, 2,], list()

- the `list` built-in will attempt to iterate over its (optional) input and create a list of the elements

In [None]:
list('hello')

* the `+` operator creates a new, concatenated list

In [None]:
[1, 2, 3] + [2, 3, 4]

* the `*` operator repeats lists given an int (also works with strings)

In [None]:
3 * [1, 2], 10 * '+='

* list elements may be accessed using the index notation (zero-based indexing)

In [None]:
lst = [4, 2, 1, 5, 3]
lst[2], lst[0]

### Lists (cont'd)

* indexing with negative integers will count backwards starting from the back of the list (imagine omitting the `length` in `length - 1`)

In [None]:
lst = [4, 2, 1, 5, 3]
lst[-1], lst[-2], lst[-3]

* lists can be indexed using the slicing notation `lst[start:stop:step]` to create slices of lists (slicing is *exclusive*, i.e. the element with index `stop` will not be included)

In [None]:
lst[1:3], lst[:2], lst[2:], lst[:-1], lst[2:2], lst[:]

* omitting `start` defaults to the beginning of the list, `stop` to the end and `step` defaults to `1`
* step can be used to skip elements (i.e. obtain only every `i`th element)

In [None]:
lst = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
lst[::2], lst[1::3]

## Lists (cont'd)

* using a negative step will iterate in *reverse* from `start` to `stop` (defaults are the end and start of the list respectively)

In [13]:
lst = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
lst[2::2], lst[8::-2], lst[7:3:-2]

([2, 4, 6, 8, 10], [8, 6, 4, 2, 0], [7, 5])

* the `len` built-in returns the number of elements in a container

In [14]:
len(lst)

11

### Tuples

- tuples can be instantiated with the literal `()` notation, or the `tuple` built-in (note the single element notation `(1,)` with the trailing comma to differentiate between the tuple literal and parantheses)

In [15]:
(), (1,), (1, 2, 3)

((), (1,), (1, 2, 3))

- the `tuple` literal will attempt to iterate over its input and create a tuple of the elements

In [None]:
tuple([3, 4, 5]), tuple('hello')

* tuples can be indexed, sliced, concatenated, repeated, etc.

In [None]:
tup = (1, 2, 3)
tup[1], tup[::-1], (1,) + (2,), (1,) * 3, len((1, 2, 3))

 * tuples are *immutable*, i.e. they cannot be modified

In [None]:
tup = (2, 4, 6)
tup[-1] = 3

 * use lists for *mutable* containers or create a new tuple

## List vs. Tuples: What's the deal with immutability?


* At first glance, `list` and `tuple` seem to fill the same role

* The important difference is their *mutability*

* As `list` is mutable, passing it to, e.g., a function may change its contents.

* In contrast, `tuple` is immutable, thus it is *guaranteed* to keep its state constant.

* This will also be relevant with respect to their *identification* and *hashing* in the context of `dict` and `set`

## Strings

* strings can be created with the `''` or `""` literals (single or double quotes; stick to one!), or triple quotations to span multiple lines `''' '''` `""" """`

In [None]:
"Doctor's" + ' ' + 'office' + '''
Open'''

* the `+` operator creates concatenations of strings

In [None]:
'Doctor\'s' + ' ' + 'office' 

* strings can be indexed, sliced, concatenated, repeated, etc.

In [None]:
msg = 'hello'
msg[1], msg[::-1], 'hi' + 'lo', 'ya' * 3, len('house')

 * strings are also *immutable*, allowing the assumption of constant states of strings.

In [None]:
msg[2] = 'e'

## Membership Operator

* container types and strings support the membership operator `in`
* for strings, this indicates whether a *substring* is part of the string

In [None]:
('ell' in 'hello', 'hi' in 'hello')

* container types check for equality of `any` element

In [None]:
print([2, 3] in [[2], 3, 4, [2,3]])
print('bear' in ['monkey', 'snake', 'bear'])
print(() in [[], '', False])

## String Utility Functions

 * be aware of empty spaces 

In [None]:
'   the story \r\n\r\n '.strip()  # strip leading and trailing whitespaces/newlines

In [None]:
txt = 'Hello world !'  # index of a sub-string
first = txt.find('wo')  # 4
first, txt[first]

In [None]:
'Apples,Lettuce,Bread'.split(',')  # split a string into a list of strings given a delimiter

In [None]:
' and '.join(['Huey', 'Dewey', 'Louie'])  # joining a list of strings using a delimiter

* Full list of the String methods are [here](https://docs.python.org/3/library/stdtypes.html#string-methods)

### String formatting

 * Using f-strings (recommended)

In [None]:
day = 17
suffix = 'th'
day_of_week = 'Monday'
month = 'October'

f'Today is {day_of_week} {day}{suffix} of {month}'

* Formatting options

In [None]:
f'Result: {3 ** 0.5:0.3f}' # operation inside + 3 digits after the floating point

In [None]:
f'Value:{15:05d}', f'Value:{15:5d}' # 5 digits long output filled with zeros/empty string

* Other ways to format strings: `str.format` and the `%` operator:

In [None]:
template = 'Today is {}, {}{} of {}'
print(template.format('Tuesday', 2 * 8 + 1, 'th', 'October'))
print(template.format('Monday', 2, 'nd', 'November'))
print('The weight of %d %s is approximately %.1f g' % (3, 'Apples', 85 * 3))

## Functions

* Definition, write default values with `key=value`

In [57]:
def func(x, y, coeff=1):
    z = (x ** 2 + y ** 2) ** 0.5 
    z *= coeff
    return z, z ** 2  # multipe outputs possible (returned type is a tuple)

* Function call by positional arguments

In [None]:
print(func(3, 4, 2))
print(func(3, 4))  # default coeff=1 is used

* Function call by keyword arguments

In [None]:
func(y=4, coeff=2, x=3)

## Function docstrings

* You should provide functions with a docstring. 

In [None]:
def add_c(first, second):
    """ Adds two complex numbers
        
        Args: 
            first (tuple)  : the first complex number
            second (tuple) : the second complex number
        
        Returns:
            complex : sum of the two complex numbers
    """
    
    assert type(first) == tuple, 'not a tuple'
    assert type(second) == tuple, 'not a tuple'
    
    re = first[0] + second[0]
    im = first[1] + second[1]
    
    return complex(re, im)

add_c((2,3), (2,-1))


In [None]:
# or using the build-in help function
help(add_c)

In [62]:
# in Jupyter/IPython you may use a question mark
add_c?

## Type Annotations
- you can annotate variables with type informations
- ignored by the interpreter but provides further documentation and can be evaluated by external tools (IDEs, type checkers)

In [63]:
def add_c(first : tuple, second : tuple) -> complex:
    """ Adds two complex numbers
        
        Args: 
            first: the first complex number
            second: the second complex number
        
        Returns:
            sum of the two complex numbers
    """
    
    assert type(first) == tuple, 'not a tuple'
    assert type(second) == tuple, 'not a tuple'
    
    re = first[0] + second[0]
    im = first[1] + second[1]
    
    return complex(re, im)

## Sequence unpacking

In [64]:
values = (1, 2, 3)

* you can unpack sequences into multiple values

In [None]:
val1, val2, val3 = values
print(val1, val2, val3)

* the number of variables must match the length of the sequence, use this to your advantage when a certain length is required

In [None]:
val1, val2, val3, val4 = values

In [None]:
val1, val2 = values

## Sequence unpacking (cont'd)

* you can unpack sequences in place using the `*` operator, e.g. to fill new sequences

In [None]:
print([5, *values, 19])

* you can also unpack into function arguments

In [None]:
func(*values)

In [None]:
print(4, 5, *values, 3, 3, 3)

## Lambda Expressions

* `lambda` can be used to define functions inside expressions (inline)

In [None]:
# definition and execution without a name   
(lambda x, y, coeff=1: coeff * (x ** 2 + y ** 2) ** 0.5) (3, 4)

* you can assign the lambda expression to a variable to achieve an effect similar to using `def`

* ... but good style is to **always prefer** `def` when assigning a function to a variable

In [None]:
def f(x, y, coeff=1):
    return coeff * (x ** 2 + y ** 2) ** 0.5

f = lambda x, y, coeff=1: coeff * (x ** 2 + y ** 2) ** 0.5
f(3, 4)

## Passing Around Functions

* Functions and lambda expressions behave like any other variable or function

In [None]:
my_function = f
my_function(3, 4, 5)

* Naturally, you can pass functions to other functions

In [None]:
def square_fn(x: float):
    return x ** 2

def do_twice(func, x):
    return func(func(x))

do_twice(square_fn, 2)

In [None]:
do_twice(lambda x : x**2, 2)

##  Dictionaries (HashMaps/Key-Value-Stores/...) 

* `dict`s can be instatiated using the literal notation `{}` or the `dict()` function. Let's create a data point using a `dict`.

In [17]:
fruit = {
    'color': 'green',
    'taste': 'sweet',
    'size 3d': [1, 3, 2]
}

fruit2 = dict(color='green', taste='sweet', size3d=[1, 2, 3])

type(fruit), fruit

(dict, {'color': 'green', 'taste': 'sweet', 'size 3d': [1, 3, 2]})

* `dict` support indexing notation to assign/ look-up values given a *key*

In [18]:
fruit['size 3d'] = [2, 3, 1]
fruit['size 3d']

[2, 3, 1]

* we can use the `.get` method to return a default value in case the key is not found

In [None]:
print(fruit.get('price', 23))

## Dictionaries (cont'd)

* iterating dicts will return the keys

In [None]:
list(fruit)

In [None]:
list(fruit.values())

* only *immutable* (hashable) types may be used as keys

In [None]:
{(1, 2): 3, 'hi': 'cool', 1: 'wow'}

In [None]:
{[1, 2]: 3}

## Dictionaries (cont'd)
* dictionaries are ordered by the time the key was first written

In [None]:
dict_a = {'a': 1, 'b': 2, 'c': 3}
dict_b = {'b': 2, 'a': 1, 'c': 3}
print(dict_a)
print(dict_b)
print(list(dict_a))
print(list(dict_b))
dict_a['a'] = 0
print(dict_a)

### Combining Dicts

In [None]:
# copy a dict
cool_fruit = dict(fruit)
cool_fruit is fruit, cool_fruit == fruit

In [85]:
# some new dict for combining
properties = {
    'best before': 7,
}

In [None]:
cool_fruit.update(properties)  # cool_fruit is changed in-place
cool_fruit

### Dictionary Unpacking

* similar to sequence unpacking with `*`, we can use *dictionary unpacking* using the `**` operator

In [None]:
print({**fruit, 'price': 3})
print({**fruit, **properties})

* use it to unpack into function keyword arguments

In [None]:
def f(a,b):
    return a +b

kwargs = {"b" : 1, "a" : 2}

f(a=1, b=2)
f(**kwargs)

## Sets unordered, unique elements

* `set`s may be instantiated with the literal `{}` notation, or the `set()` type. Note that `{}` (empty braces) will evaluate to an emtpy dict

In [19]:
{1, 2, 2, 'apples'}, set()

({1, 2, 'apples'}, set())

* as `set`s are unordered (i.e. not a sequence), they do not support indexing

In [None]:
my_set = {'cool', 'alright'}
my_set[0]

* add elements using `.add`, or use an in-place union using `.update` (adding elements that already exist has no effect)

In [None]:
my_set = {'cool', 'alright'}
my_set.add('good')
print(my_set)
my_set.update({'cool', 'awesome'})
print(my_set)

### Sets (cont'd)

* sets are *mutable* like lists (since you can add elements in-line), thus they are unhashable and cannot be used as keys

In [None]:
{{1,2,3}: 'numbers'}

* the `set` constructor attempts to iterate its input to generate a new set, they may be iterated themselves e.g. to create a sequence

In [20]:
print(set([3, 3, 1, 4]))
print(list({5, 4, 1}))

{1, 3, 4}
[1, 4, 5]


### Conditional expressions

* $\t{Example: Let's try to classifying the following fruits: }$

<br><br>

<center>
    <img src='./images/fruits.png' width='1200'/>

### Conditional expressions (cont'd)

In [94]:
def classify(x: dict) -> str:
    if x['color'] == 'green':

        if x['size'] == 'big':
            decision = 'watermelon'

        elif x['size'] == 'medium':
            decision = 'apple'                

        else:
            decision = 'other'
    else:
        decision = 'other'
    
    return decision

* we can use the `dict` type to represent fruit  properties

In [None]:
fruit_1 = {'color': 'green', 'size': 'big'}

fruit_2 = {'color': 'green', 'size': 'medium'}

fruit_3 = {'color': 'red', 'size': 'small'}

classify(fruit_1), classify(fruit_2), classify(fruit_3)


### Ternary condition operator

* the in-line ternary operator `true_value if condition else false_value` can be used inside expressions for some syntactic sugar, or when we are only allowed to specify a single expression

In [None]:
def compare(x: dict, y: dict) -> str: 
    """ 
        Compares two fruits either they are the same 
        
        Args:
            x (dict) : first fruit
            y (dict) : second fruit
            
        Returns:
            string : either 'same' or 'different'  
    """
    
    return "same" if x == y else "different" # short if-else form 

compare(fruit_1, fruit_2), compare(fruit_1, fruit_1)

### Range Type

* the `range` type represents a range of integers, use the `range(start, stop, step)` built-in to instantiate ranges

In [None]:
print(range(5))
print(range(0, 5))
print(range(0, 5, 1))

* ranges are immutable, like tuples, support indexing and slicing, but do not support item assignment

In [None]:
print(range(2, 20, 3)[4])
print(range(20, 1, -2)[2:])

In [None]:
my_range = range(3, 7)
my_range[2] = 2

* you can use a sequential container type, like list, to iterate ranges to create a list of the range members

In [None]:
list(range(3, 1, -1))

### $$\textbf{Iterables, Iterators and for-loops}$$
<hr>  

 * iterable types can be iterated, the manual way is to use `iter`, which creates an iterator that tracks the state of the iteration, and `next` to get the next element in sequence, when the sequence is over, a `StopIteration` will be raised

In [None]:
my_iter = iter([1, 2, 3])
print(my_iter)
print(next(my_iter))
print(next(my_iter))
print(next(my_iter))
print(next(my_iter, 'no more elements here!'))

 * an easier way to iterate iterable types is to use the `for` loop

In [None]:
for i in range(2, 13, 4):
    print(i)

In [None]:
for i in {1, 2, 3}:
    print(i)

In [None]:
for i in 'hello':
    print(i)

### For, Continue, Break

 * `break` can be used to immediately exit a loop, `continue` jumps to the next iteration

In [None]:
for i in range(5):
    if i > 3:
        break
    if i == 1:
        continue
    print(i)

 * the extended definition of `for` allows an `else`, which is only triggered when no `break` was hit

In [None]:
for i in 'hello':
    if i == 'a':
        break
else:
    print('No e encountered!')

### Enumerators

 * enumerators wrap an iterable, and generate a tuple with the index and the true element for each element in the iterable

In [None]:
for n, tag in enumerate(['yes', 'no', 'maybe']):
    print(f'{n} -> {tag}')

## Iterators are Lazy

* Iterators are lazily evaluated, while creating lists results in memory storage. See their memory sizes for comparison:

In [None]:
import sys

myiter = iter(range(0, 1000000))
sys.getsizeof(myiter), sys.getsizeof(list(myiter))

## Generators

* Generators are functions that return lazy iterators
* keyword `yield` bears some similarity to `return`, but it does not terminate the function.
  The function computation continues upon calling `next` on the function.
* useful to define very large iterables without the need to store all data in memory

In [None]:
def counter():
    n = 0
    while True:
        yield n
        n += 1
        

lazy_iter = counter()
print(next(lazy_iter))
print(next(lazy_iter))
print(next(lazy_iter))
#...

### Example: Implementing a Classifier

* We define some data using a list of dicts

In [None]:
data = [
  {'color': 'green', 'size': 'big'},
  {'color': 'yellow', 'shape': 'round', 'size': 'big'},
  {'color': 'red', 'size': 'medium'},
  {'color': 'green', 'size': 'big'},
  {'color': 'red', 'size': 'small', 'taste': 'sour'},
  {'color': 'green', 'size': 'small'}
]

type(data), type(data[0])

### Example: Implementing a Classifier (cont'd)

In [None]:
results = []

for x in data:
    res = classify(x)
    
    print(f'Fruit: {x} \nClass: {res} \n') # \n is a so called "carriage return" sign it stands for a new line
    
    results.append(res) # equiv.to  results += [classify(x)]
    
"All:", results # tuple object is printed

### List Comprehensions

- List comprehension are the *pythonic* way of the functional programming paradigms

In [112]:
results = [classify(x) for x in data]

- `filter` can be done by passing an `if` at the end of the expression

In [None]:
results = [x for x in data if x['color'] == 'green']
results

 * this can be combined with any expression

In [None]:
[classify(x) if x['color'] == 'green' else 'other' for x in data]

Similarly, there are dict- and set-comprehensions:

In [None]:
print ( "set:", {i for i in [1,2,3]} )
print ( "dict:", {i : 2 * i for i in [1,2,3]} )

## Counting "watermelon" objects in the data 

In [None]:
result = [classify(x) for x in data]
print(result)

obj = "watermelon"
count = 0
for res in result:
    if res == obj:
        count += 1

f'Total number of {obj}s is {count}'

* *Pythonic* way using list comprehensions and the `list.count`:

In [None]:
# sum([True, False, True, ...]) -> sum([1, 0, 1,...])

lst = [classify(x) for x in data] 
cnt = lst.count('watermelon')

f'Total number of {obj}s is {cnt}'

## Reading Data from a file

Content of file _scores.txt_ that lists the performance of players at a certain game:

<br>

`80,55,16,26,37,62,49,13,28,56`

`43,45,47,63,43,65,10,52,30,18`

`63,71,69,24,54,29,79,83,38,56`

`46,42,39,14,47,40,72,43,57,47`

`61,49,65,31,79,62,9,90,65,44`

`10,28,16,6,61,72,78,55,54,48`

- The following program reads a file and stores scores into a list
- `with` statement here takes care of opening and closing the file.

In [None]:
with open('./scores.txt', 'r') as fd: # f is then only available within the code block
    
    data = []
    
    for line in fd:
        line_entries = line.strip().split(',')
        
        print(line_entries)
        lst = [float(x) for x in line_entries]
        
        #data.append(lst)
        data.extend(lst)
    
    
print(f'Data length: {len(data)}')
f'File content: {data}'


### Training and Test data separation

In [None]:
N = len(data)

ratio = 0.8
split = int(ratio * N) # 80 % of length

train_data = data[:split]
test_data  = data[split:]

print(f"Train len: {len(train_data)} \nTest len: {len(test_data)}")

### Writing results back into a file with exception handling

In [120]:
import os # imports package for file and dir handling


def write(data, outfile='outputs.txt', folder='./data'):
    
    os.makedirs(folder, exist_ok=True)
    filepath = os.path.join(folder, outfile)
    
    try:

        # Make sure not to overwrite an existing file
        if os.path.exists(filepath):
            
            raise RuntimeError(f"File '{filepath}' already exists.")

        with open(filepath, 'w') as f:  # 'a'
            
            f.write(str(data))
            
            print(f'Sucessfuly written to {filepath}')

    except RuntimeError as e:   
        
        #recreate_file(data, outfile)
        print(f"Exception occured: {e}")

In [None]:
write(train_data, outfile='train_scores.txt')
write(test_data, outfile='te_scores.txt')

## Classes

$\text{Classes are useful for modeling anything that has an internal state, for example, machine learning models.}$

In [122]:
class Classifier:
    """Classifies whether a score is above/below the average."""
    
    def __init__(self, name='Score'): # constructor (special method)
        self.avg = 0
        self.name = name
        
    # special method to define an object representation of the class
    def __repr__(self):
        return f'{self.name} classifier with avg: {self.avg:0.3f}'
    
    # methods
    def train(self, data): 
        self.avg = sum(data) / len(data)
        return self
        
    def predict(self, data):
        return ['above' if x > self.avg else 'below' for x in data]
    

### Creation of a new classifier object

In [None]:
c = Classifier(name='Custom')
print(c.avg, c.name)

 # __repr__ function is called
print(c)              

d = Classifier(name='Temp')
print(d)  

c is d

 * Train the classifier and inspect what the classifier has learned:

In [None]:
c.train(train_data)
print(c)

### Application of the test data to the model

In [None]:
print(f"Test data len: {len(test_data)}")

test_preds = c.predict(test_data)

print(f"Avg: {c.avg:0.3f}")
print("Test: ", test_data)
print("Pred: ", test_preds)

# $$\textbf{Thank you for your attention.}$$