<h1 style="color: #0f49c4 ;"><center>Correlaid - Machine Learning Spring School</center></h1>
<h1  style="color: #0f49c4 ;"><center>Introduction to Python</center></h1>

<h2>Content</h2>
<ol>
    <li> Python and Jupyter </li>
    <li> Basics </li>
    <li> Control Flow and Functions </li>
    <li> Import and Export </li>
    <li> NumPy and Pandas </li>
    <li> Visualization </li>
</ol>

### Literature and further sources
Joel Grus: *Data Science from Scratch*, O'Reilly, 2015  
Of particular interest is Chapter 2 *A Crash Course in Python*, which forms the basis for this notebook:  
http://proquest.tech.safaribooksonline.de/book/databases/9781491901410/2dot-a-crash-course-in-python/python_html

J.R. Johansson: *Scientific Python Lectures*  
https://github.com/jrjohansson/scientific-python-lectures/blob/master/Lecture-1-Introduction-to-Python-Programming.ipynb

Python Tutorials  
http://www.tutorialspoint.com/python/  
https://www.python-kurs.eu  

<h1><center> 1 Python and Jupyter

### 1.1 Getting started with Jupyter

Jupyter is based on [iPython](https://ipython.org), an interactive Python shell. Jupyter itself is a notebook application in which executable Python code and explanations (like this one) can be mixed into documents called notebooks. The individual blocks in the notebook are called Notes or Cells. Blocks containing explanations are written in [Markdown Language](https://de.wikipedia.org/wiki/Markdown).
Code blocks contain executable Python code. The type of a block can be changed using the menu bar and `Esc-M` (Markdown) or `Esc-Y` (Code).

Each block is executable. To execute a block, the following options are available:

- `Ctrl-Enter`: Execute
- `Shift-Enter`: Execute and jump to the next block
- Run-Cell command in the menu bar: like `Shift-Enter`.

<div class="alert alert-info"><b>Try it!</b> Double-click once in this text and then press Shift-Enter.</div>

Other shortcuts: by clicking over the keyboard icon in the menu bar or by [cheat-sheet](https://www.google.de/search?q=jupyter+cheat+sheet).


### 1.2 The first Python program

Python is an interpreter language. That is, every command is given to the interpreter (running in the background) and executed. 

<div class="alert alert-info"><b>Try it!</b> Press Shift-Enter in the next code cell.</div>

In [None]:
print("Hello World!")

In [None]:
print(3+2)

In [None]:
# Variable assignment
x = 3+5

In [None]:
# Displaying a value within a Python session even without print
x

In [None]:
# Calculator 2^8
2**8

### 1.3 The Zen of Python

As you will soon realize, Python has its own peculiarities and it seems a bit unusual at first glance. At second glance, you'll notice that many things in Java or C# are very awkward to formulate compared to Python. Good Python code is described as "Pythonic". Some basic principles of Python programming are these:

*Beautiful is better than ugly.  
Explicit is better than implicit.  
Simple is better than complex.  
Complex is better than complicated.  
Readability counts.  
There should be one-- and preferably only one--obvious way to do it.*

All 19 aphorisms can be found at [python.org](http://legacy.python.org/dev/peps/pep-0020/). You can also display them via `import this` in the Python interpreter.

<div class="alert alert-info"><b>Try it!</b> Create a new code cell and execute the above mentioned command.</div>

### 1.4 Python 2.7 vs. Python 3.x

The [history of Python](https://en.wikipedia.org/wiki/History_of_Python) dates back to the early 90s. Over the years, a lot of messiness has accumulated, which was cleaned up in 2009 with Python 3.0. It was consciously accepted that Python 3 programs are incompatible with Python 2.7, the last two-version. Since many libraries are based on Python 2.7, this version is still used today. 

We work with Python 3, so when copying 2.7 code examples from the Internet, adjustments may be necessary. The most important differences are briefly presented below. More information can be found [here](https://docs.python.org/3.0/whatsnew/3.0.html). 

#### Print Function 

An important difference from Python 3.x to Python 2.7: `print` is a function and must be written with parentheses:

    print "Hello" # allowed in Python 2.7
    print ("Hello") # Python 3.x
       
#### Type conversions

In Python 3, type checking has been tightened so that number types in particular must be explicitly cast for combination with strings.

    sum = 3+5
    print "Sum: "+sum # Python 2.7
    print("Summe: "+str(summe)) # Python 3.x

#### Generator objects instead of list, dictionaries etc.

In Python 3, many functions like `range` (creates list with numbers) were changed, so that only generator-objects instead of lists were generated. Generatores are more efficient, because they generate the values only when they are needed.

### 1.5 Indentation and line breaks

Code blocks in Python are not defined by curly braces, but by indentation alone. The head of a code block (branch, loop, function, etc.) is always terminated with a colon `:`, after which it must be indented.

An `if` statement, which in C-based languages looks like this:

      if (a < b)
      {
          statement;
          statement;
      }

is thus formulated in Python like this:

      { if a < b:
          statement
          statement

Be careful: The number of spaces must be correct and tabs should be avoided. Jupyter does the latter automatically.

A line break always introduces a new command. However, inside parentheses, spaces and line breaks are ignored.

    list_of_lists = [[1, 2, 3], [4, 5, 6], [7, 8, 9]]

    easier_to_read_list_of_lists = [ [1, 2, 3],
                                     [4, 5, 6],
                                     [7, 8, 9] ]
If a long command is to be wrapped, a backslash can be used to insert a newline:

    very_long_variable_name = module_a.very_long_function_name() \
                                      .concatenated_function_call()

### 1.6 Comments and Docstrings
In Python a **comment** starts with <code>#</code>. The rest of the line is ignored. In case of a multi-line comment, each line must be marked with a <code>#</code>. 

In [None]:
# This is a comment.

# This is a 
# multi-line comment.

A **docstring** is usually the first statement of a Python modul (Class, Method, Function). A docstring hat has the syntax <code>"""This ist my documentation"""</code>.
With <code>help</code> you can see the documentation.

In [None]:
def myFunction():
    """Documentation of the function"""

In [None]:
help(myFunction)

In [None]:
help(print)

<h1><center> 2 Basics

### 2.1 Operators

**Arithmetic Operators**

Python's got - like a calculator - arithmetic operators, functions and constants. 
<div class="alert alert-danger"><strong>Attention:</strong> The decimal separator is a point and not a comma!!</div>

| **Operator, function, constant**              | **description**                         |
|:---------------------------|:------------------------------------------|
| +, -, *, /, //            | addition, substraction, multiplication, division |
| ^, %                      | power, rest function                     |
| abs()                     | absolut value                                   |
| round()                   | round                                   |

<div class="alert alert-info"><b>Try it!</b> Execute the following cells.</div>

In [None]:
1.1 + 1.1

In [None]:
5*3

In [None]:
print(10/3)
print(10//3)

In [None]:
abs(-3.33334)

The modul `math` provides more functions and constants. It is included by default in every python installtion. Therefore, we do not need to install it through our anaconda navigator. 

In [None]:
import math as m

| **Operator, function, constant**              | **description**                     |
|:---------------------------|:--------------------------------------|
| m.exp()                   | exponential function                  |
| m.log(), m.log10()        | logarithm |
| m.sin(), m.cos(), m.tan() | trigonometric functions          |
| m.sqrt()                  | Square root                        |
| m.pi                    | π                          |

<div class="alert alert-info"><b>Try it!</b> Execute the following cells.</div>

In [None]:
m.exp(4)

In [None]:
m.sqrt(4)

In [None]:
m.pi

**Assignment Operator**

With the assignment operator `=` it is possible to name values or results in order to reuse them.

In [None]:
a = 2+3
print(a)

In [None]:
A # Python is case sensitive

In [None]:
a, b = 1, 2 # Multiple assignment: 
# a and b are assigned simultaneously
a
b

**Comparison Operator**\
Are used to compare two values. A comparison returns ether `True` or `False`.

| **Operator** | **description**                               |
|:---------------------------|:------------------------------------------------|
| >, >=, <, <=              | greater, greater-than-or-equal-to, less, less-than-or-equal-to |
| ==, !=                    | equal, not equal                               |
| !                         | NOT (Negation)                     |
| & , and                   | AND                                  |
| I, or                     | OR                                 |

<div class="alert alert-info"><b>Try it!</b> Execute the following cells.</div>

In [None]:
4 >= 3

In [None]:
a = 3 # Assignment
a == 10

In [None]:
a != 10

In [None]:
(3 < 2) & (4 == (3 + 1))

<div class="alert alert-success"><b>Exercise:</b> Assign <code>a</code> the value 10 and <code>b</code> the value 5. Is <code>b</code> greater-than-equal to <code>a</code> satisfied?</div>

### 2.2 Data types

Each variable has a data type. In this chapter, we look at the four most common types:
* **strings**: Strings of characters, spaces, or other symbols such as special characters, numbers, etc. To define strings, all characters are written inside single or double quotes, e.g. "hello", 'house'. 
* **integers**: Positive or negative numbers, e.g. -1, 4, 7, -10.
* **floats**: Decimal numbers or floating point numbers, e.g., 1.34, -3.33.
* **boolean**: Boolean variables can take the values `True` and `False`

In [None]:
print(9)
print(3.76)
print(3.00)
print("Hello world!")
print(True)
print(False)

The data type is given using the function `type()`. The output `int` stands for an integer object, `float` for a decimal number, `str` for a string and `bool` for a truth value.

In [None]:
print(9, type(9))
print(3.76, type(3.76))
print(3.00, type(3.00))
print("Hello world!", type("Hello world!"))
print(True, type(True))
print(False, type(False))

<div class="alert alert-danger"><strong>Attention:</strong> We cannot caluclate with strings.</div>

In [None]:
3 + "2" # Integer + String

In [None]:
3.76 + "2" # Float + String

#### 2.2.1 Typecasting
The data type of a variable can be converted to another data type. This conversion is calles *typecasting*. For example, an integer object can be converted to a float object: 

In [None]:
float(100)

<div class="alert alert-success"><b>Exercise:</b> What happens if we typecast a float into an integer? And what happens if we typecast a boolean into an integer or float?</div>

#### 2.2.2 String Operations
We can access individual elements (characters) of a string. This is called indexing. An index denotes a certain position and is specified with *square* brackets and the index number.
<div class="alert alert-danger"><strong>Attention:</strong> The index position count starts with value 0. </div>
The indexing follows the logic: <code>[Start:Stop:Interval]</code>. However, these parameters are optional.

In [None]:
alphabet = "abcdefg"

| **a** | **b** | **c** | **d** | **e** | **f** | **g** | 
|-------|-------|-------|-------|-------|-------|-------|
| 0     | 1     | 2     | 3     | 4     | 5     | 6     | 

In [None]:
alphabet[0]

In [None]:
alphabet[0:4]

<div class="alert alert-danger"><strong>Attention:</strong> The interval excludes the right endpoint, i.e. the interval contains the element on position 0, but not the element on position 4: (0, 4]</div>

In [None]:
alphabet[4:]

In [None]:
alphabet[::2]

In [None]:
alphabet[-1] # last element

In [None]:
len(alphabet) # number of elements

In [None]:
len("Hello World") # Blank character

| **H** | **e** | **l** | **l** | **o** | &nbsp; | **W** | **o** | **r** | **l** | **d** |
|-------|-------|-------|-------|-------|-------|-------|-------|-------|-------|------|
| 0     | 1     | 2     | 3     | 4     | 5     | 6     | 7     | 8     | 9     |10     |

<div class="alert alert-success"><b>Exercise:</b> You have given the string: <b>s1="house"</b>. What expressions give us the last character <b>e</b> in <b>s1</b>? 
<ol>
    <li> s1[len(s1)-1]</li>
    <li> s1[len(s1)+1]</li>
    <li> s1[-1]</li>
    <li> s1[len(s1)]</li>
    </ol>
</div>

### 2.3 Data collections

#### 2.3.1 Tuples
Tuples can contain objects of different data types. There are special due to their immutability. While you can overwrite values for other data containers, this is not possible for tuples. Tuples are instantiated with **round brackets** and the elements are separated by a comma.

In [None]:
tuple1 = ("Hello", "World", 123)

| **"Hello"** | **"World"** | **123** |
|-------------|-------------|---------|
| <center>0</center>  | <center>1</center>           | <center>2</center>       |

In [None]:
type(tuple1)

In [None]:
tuple1[0]

In [None]:
type(tuple1[0])

In [None]:
print(tuple1[2])
print(type(tuple1[2])) 

In [None]:
# Immutable
tuple1[0] = 3 # Error

In [None]:
# Concatenate
tuple2 = tuple1 + ("456", 789)
tuple2

In [None]:
# Nest
tuple_nest = (1, 4, ("House", "Bread"), "Disco", (4, 9))
len(tuple_nest)

| **1** | **4** | **("House", "Bread")** | **"Disco"** | **(4, 9)** |
|-------|-------|----------------------|-------------|------------|
| 0     | 1     | <center>2</center>   | <center>3</center>           | <center>4</center>          |

In [None]:
print("3rd element: ", tuple_nest[2])

In [None]:
print("First element of the third element: ", tuple_nest[2][0])

#### 2.3.2 Lists
Lists are often used data containers that can contain different data types, but also empty sets. Lists can be instantiated by **square brackets** or by the function `list()`.

In [None]:
# Empty list
list1 = []

In [None]:
list1

In [None]:
type(list1)

In [None]:
list2 = [1, 4, ("House", "Bread"), "Disco", [4, 9]]

In [None]:
print("3rd element of list: ", list2[2])

In [None]:
print("Last element of list: ", list2[-1])

In [None]:
# Mutable
list2[0] = 1.43
print(list2)

Methods for data operations are available to lists. 
<div class="alert alert-danger"><strong>Attention:</strong> Both functions and methods are called with their respective names and <b>round</b> brackets. However, functions and methods differ: Methods are functions that are bounded to an objects. Functions (e.g. print()) can be called directly - methods only in conjuction with a links object (e.g. a.append()).</div> 

In [None]:
list3 = [4, 3, 1]
list3.append(2)
list3

In [None]:
list2.extend(list3)
print(list2)

In [None]:
list3.pop()
print(list3)

In [None]:
list3.sort()
list3

In [None]:
list3.remove(3) 
list3

#### 2.3.3 Dictionaries
The instantiation of a dictionary is done by **curly braces** or by the function `dict()`. It consists of an arbitrary number of *keys* and *values*. Instead of numeric indexes, as is the case with lists, a dictionary is indexed by its *keys*. 

| Key 1   	| Key 2   	| Key 3   	|
|---------	|---------	|---------	|
| Value 1 	| Value 2 	| Value 3 	|

In [None]:
dict1 = {"strongly disagree": 1, "disagree": 2, "neutral": 3, "agree": 4, "strongly agree": 5}
dict1

In [None]:
dict1["neutral"]

In [None]:
dict1["neutral"] = -1
dict1["neutral"]

In [None]:
dict1.keys()

In [None]:
dict1.values()

In [None]:
dict1['do not know'] = 0
dict1

In [None]:
del(dict1['do not know'])
dict1

In [None]:
"strongly agree" in dict1

In [None]:
"do not know" in dict1

#### 2.3.4 Sets
In a set, each element occurs exactly once. Sets are *mutable* like dictionaries or lists - that means values can be removed, added or changed. The instantiation is done like with dictionaries using **curly brackets**, but without the *key* values or with 'set()'.

In [None]:
set1 = {"tomato", "zucchini", "potato", "bell pepper"}
set1 #ordered

In [None]:
# convert list to set
list6 = [3, 3,  4, 5, 5, 6]
set2 = set(list6)
print(set2) # each element occurs once

In [None]:
set1.add("apple")
set1

In [None]:
set1.remove("potato")
set1

In [None]:
"apple" in set1

set1 | apple | bell pepper  | tomato  | zucchini
---- | ----- | -------- | ------- | -------
**set3** | **pear** | **raspberry** | **bell pepper** | **tomato**


In [None]:
# intersection
set3 = {"pear", "raspberry", "bell pepper", "tomato"}
set1 & set3

In [None]:
set1.intersection(set3)

In [None]:
# difference set
set1.difference(set3)

In [None]:
set3.difference(set1)

In [None]:
set1.union(set3)

In [None]:
set4 = {"apple", "zucchini"}
set4.issubset(set1)

In [None]:
set1.issuperset(set4)

In [None]:
set1.isdisjoint(set4)

<div class="alert alert-success"><b>Exercise:</b> <code>a = (3, 1, 2)</code> is given. Sort the numbers using the function <b>sorted</b>. How does the data type changes?</div>

<div class="alert alert-success"><b>Exercise:</b> The tuple <code>b = ("a", "1", 2, ("b", 3, "c"), ("4", "d"))</code> is given. Index both nested tuples <code>("b", 3, "c"), ("4", "d")</code> and the character <b>c</b>.</div>

### 2.4 Counter and Errors
[Counters](https://docs.python.org/3/library/collections.html?highlight=counter#counter-objects) are used to quickly determine the frequency of occurrence of different keys in a list. Counters must be imported from the built-in Python module **collections**.

In [None]:
from collections import Counter
num_list = [0, 1,2, 0, 1, 1, 1, 2, 2, 3]
c = Counter(num_list) 
print(c)

In [None]:
doc = ['I', 'I','I', 'am','am','here','who','who','who','who','else']
word_count = Counter(doc)

print(word_count)
print(word_count.most_common(3))

We already know two different error types: 
* **Type Error**: Wrong data type
* **Name Error**: Not assigned variable name

However, there are a lot more errors which may occur.\\
Python gives us not only the error type, but also the line where the error is. In this example we see that the error happened in line 6: 

In [None]:
int1 = 5
int2 = 8
string1 = "3"

print(int1 + int2)
print(int1 + string1)
print(int1 * int2)

<div class="alert alert-danger"><strong>Attention</strong>: Python executes a script line by line. This means that the lines are not executed further once an error occurs. In the previous example, Python does not execute the seventh line (int1 * int2).</div>

**Dealing with errors**\
Errors cannot always be avoided. However, we can create scripts that are prepared for errors and can react to them. \
A **try-except** statement can be helpful: 

In [None]:
# example 1
try:
    print(3 / 0)
except:
    print("Do not divide by zero!")

In [None]:
# example 2
try:
    print(3 / 1)
except:
    print("Do not divide by zero!")

First the **try** block is executed. If no error occurs, the **except** block is skipped (example 2). If an error occurs during the execution of the **try** block, the rest of the block is skipped and the **except** block is executed.

In [None]:
# example 3
x = 0

try: 
    print(3 / x)
except NameError:
    print("What is x?")

In [None]:
# example 4
try: 
    print(3 / x)
except NameError:
    print("What is x?")
except ZeroDivisionError:
    print("Do not divide by zero!")

In [None]:
# example 5
try: 
    print(3 / x)
except (NameError, ZeroDivisionError):
    print("Not assigned or divided by zero!")

In [None]:
# example 6
try: 
    print(3 / x)
except:
    pass

The **pass** instruction does not result in any operation. This means that nothing happens when **pass** is executed. While comments (#) are ignored and not executed, the **pass** instruction is executed, but nothing happens.  

<h1><center> 3 Control Flow and Functions

The control flow denotes the sequence of commands in our program. In this learning unit we will get to know three control structures (case distinctions: `if`, `elif`, `else`; condition-controlled loops: `while` and collection-controlled loops: `for`), which allow us to deviate from the sequential processing of commands and to make the sequence of commands dependent on individual expressions. In addition to control structures, we will also familiarize ourselves with functions.

### 3.1 Conditions

**if-statement**

In [None]:
num = 1
if num > 0:
    print("The number is positive.")

In [None]:
if num > 0:
print("The number is positive.") # No indentation Error

In [None]:
if num > 0: print("The number is positive.")

In [None]:
num = -1
if num > 0:
    print("The number is positive.")

**if-else-statement**

In [None]:
if num > 0:
    print("The number is positive.")
else:
    print("The number is not positive.")

**elif-statement**

The indented block after elif is only checked if the expression of the previous **if condition** has taken the truth value `False`. If the condition of elif also takes the truth value `False`, either the next **elif** condition is checked or the statement in the **else** block is executed.  <br>
There can be multiple **elif** blocks, but only one **else** block. This must be placed at the end of the statement. Additional **elif** blocks after the **else** block are not allowed.

In [None]:
num = 3
if num > 5:
    print("{} is a positive number greater 5.".format(num))
elif 0 < num <= 5:
    print("{} is a positive number, but less than 5.".format(num))
elif num == 0:
    print("{} is zero.".format(num))
else:
    print("{} is a negative number".format(num))

<div class="alert alert-success"><b>Exercise:</b> Write a statement that for all numbers greater than 0 and less than or equal to 10 outputs the sentence: The number {x} is greater than 0 and less than or equal to 10. <br>
Test your statement using the number 11.</div>

### 3.2 While loops

Loops allow us to execute code iteratively, changing variables systematically. `while` loops are executed as long as a certain condition is met.<br>
<div class="alert alert-danger"><strong>Attention</strong>: If the condition cannot take the truth value False, the process is not aborted and the computer must be restarted!.</div>

In [None]:
idx = 1 # Define an index
while idx < 5: # The loop is executed until i is no longer less than 5
    print(idx)
    idx += 1 # Short for idx = idx + 1

In [None]:
idx = 1
while idx < 5: 
    print(idx)
    idx += 1
else:
    print(f"The varaible takes value {idx}. The condition is not met anymore.")

### 3.3 For loops

Before we look at `for` loops, we will learn about the `range()` function. This dynamically generates a sequence of numbers within the desired parameters. The general syntax is:

    range(start, end, increment).
    
If no start value is given, Python starts at the number 0. If no number range (steps) is defined, the interpreter generates a list of consecutive numbers (e.g. ones steps). The final value must always be specified and is **not** included in the generated list.

Examples:

    range(5) # generates the values 0, 1, 2, 3, 4
    range(3, 10) # generates the values 3, 4, 5, 6, 7, 8, 9
    range(4, 10, 2) # generates the values 4, 6, 8

In [None]:
a = range(10)
list(a)

In [None]:
list(range(3, 10))

In [None]:
list(range(-3, 10))

In [None]:
list(range(0, 10, 2)) # Steps

In [None]:
# Argument-unpacking Operator *
[1, 2, *range(7, 15)]

We know that each element in a data container has a specific index value. `for` loops take advantage of this and thus make it possible to operate on every element in a data container.

In [None]:
# very very inefficient
x = [1, 2, 3, 4, 5]
print(x[0])
print(x[1])
print(x[2])
print(x[3])
print(x[4])

In [None]:
# better
for i in x: 
    print(i)

In [None]:
for x in "banane":
    print(x)

In [None]:
k = range(100, 200, 10) 
for i, x in enumerate(k): # enumerate gives back the values + indexes of the data container
    print(i, x)

<div class="alert alert-success"><b>Exercise:</b> The list <code>l = list(range(1, 13))*4</code> is given. Assign the value 2 to even-numbered entries that are also divisible by 5, the value 1 to all other even-numbered entries, and the value 0 to odd-numbered entries. <br>
<b>Hint</b>: Use a for-loop!</div>

<div class="alert alert-success"><b>Exercise:</b> Display all values of the Fibonacci series up to 1000. <br>
<b>Hint</b>: Use a while-loop and multiple assignments!</div>

<div class="alert alert-success"><b>Exercise:</b> The list <code>col = ["rot", "blau", "gelb", "rot", "blau"]</code> is given. Copy all elements of col to col_new using a while-loop. Stop the loop, if the color is not blue or red. <br>
    <b>Hint</b>: Use the method append! </div>

**List comprehension**\
List comprehensions allows us to easily and efficiently generate lists in Python. Every list comprehension can be written als for-loop, however, it is not possible to write every for-loop as list comprehension. We differ between conditional (contain if-statement) and unconditional (do not contain an if-statement) list comprehension. The syntax looks like this: \
`[expression for item in iterable if Bedingung == True]`

In [None]:
numbers = [4, 7, 23, 76, 103]
[i + 1 for i in numbers] # New list

In [None]:
[i + 1 for i in numbers if i < 100]

In [None]:
[i + 1 if i < 100 else i *10 for i in numbers] # If-else-Statement

In [None]:
[(i, i**2) for i in numbers] # Tuples

<div class="alert alert-success"><b>Exercise:</b> Logarithmize as sparingly as possible all values in <code>numbers = [15, 100, 30, 43, 80]</code> using the function <b>log</b> from the package <b>math</b>.</div>

### 3.4 Functions

In [None]:
# First function
def hello(name):
    print(f"Hello, {name}!") 

In [None]:
hello("Johanna")

**Local and global variables** \
Each variable which is defined **inside** a function, is only valid locally.

In [None]:
def ex_function():
    print(a)
    a = 500
    print("within function:", a)

a = 3
print("before function:", a)
bsp_funktion() # NameError
print("after function:", a)

In [None]:
def ex_function():
    a = 500
    print("within function:", a)

a = 3
print("before function:", a)
ex_function() # a = 500 only within the function
print("after function:", a) # a is 3 globally

In [None]:
def ex_function():
    global a # variable is defined globally 
    a = 500
    print("function:", a)

a = 3
print("before function:", a)
ex_function()
print("after function:", a) # a = 500 globally

**Print and Return**

In [None]:
def ex_print(a):
    b = a * 2
    print(b)

In [None]:
output_print = ex_print(5)

In [None]:
print(output_print)
output_print*2 # Fehlermeldung

In [None]:
def ex_return(a):
    b = a * 2
    return b

In [None]:
output_return = ex_return(5)

In [None]:
print(output_return)
output_return*2

**Parameters** 
* Positional: Are obligatory and do not have any standardized value
* Keyword: Do have a standardized value

In [None]:
def hello(name, weather="sunny"):
    return(f"Hello, {name}! The weather is {weather}.")

print(hello("Johanna"))
print(hello("Felix", "rainy"))
print(hello("rainy", "Felix"))
print(hello(weather="rainy", name="Felix"))

<div class="alert alert-success"><b>Exercise:</b> Write the function <b>max_two</b>, which displays the maximum value of two numbers. Test your function with random numbers.</div>

<h1><center> 4 Import and Export with Pandas

**.csv**

In [None]:
import pandas as pd

In [None]:
df = pd.read_csv("Datensatz_Herzinfarkt.csv", sep=",")

In [None]:
df.dtypes # Pandas recognize data types

In [None]:
pd.read_csv("Datensatz_Herzinfarkt.csv", header= None)

In [None]:
pd.read_csv("Datensatz_Herzinfarkt.csv", header = None, names=["A", "B", "C", "D", "E", "F", "G", "H"])

In [None]:
pd.read_csv("Datensatz_Herzinfarkt.csv", usecols=["Blutdruck", "Cholesterin", "Herzinfarkt"])

In [None]:
pd.read_csv("Datensatz_Herzinfarkt.csv", nrows=100)

**Git**

In [None]:
url = "https://raw.githubusercontent.com/datasets/rio2016/master/athletes.csv"
df2 = pd.read_csv(url)
df2

**Libraries**

In [None]:
import seaborn as sns

In [None]:
sns.get_dataset_names()

In [None]:
df_iris = sns.load_dataset("iris")
df_iris

**Export to .csv**

In [None]:
df_iris.to_csv("df_neu.csv")

<h1><center> 5 NumPy and Pandas

### 5.1 NumPy

Numerical Python (`NumPy`) is a standard library for mathematical and numerical functions in Python. With `NumPy` you can create efficient multidimensional arrays, perform fast arithmetic operations (without using loops) and generate random numbers.

In [None]:
import numpy as np

**Generate Arrays**\
NumPy extends Python with additional data structures like the NumPy array. This looks like a list at first sight, but only data of the same data type can be stored in a `NumPy` array. 

In [None]:
x = np.array([1, 2, 3, 4, 5, 6])
print(x, type(x))
print(x.dtype) # Integers

In [None]:
list1 = [1, 2, 3, 4, 5, 6]
print(list1, type(list1))

In [None]:
print(x.ndim) 
print(x.shape) 
print(x.size) 

In [None]:
# Arange() function
np.arange(1, 14)

In [None]:
# reshape() function
print(x)
print(x.shape)
print("---------------")
print(x.reshape(2,3))
print(x.reshape(2,3).shape)

In [None]:
np.zeros([3, 5, 2])

In [None]:
np.full([3,5], 10)

In [None]:
np.eye(5) # 5x5 Matrix

In [None]:
np.diag([7, 8, 9])

In [None]:
np.linspace(start=0, stop=20, num=5)

In [None]:
np.linspace(start=0, stop=20, num=5, dtype=int)

**Indexing**\
Like list. However, Indexing with list and booleans possible

In [None]:
print(x[0])
print(x[0:4])

In [None]:
list2 = [2, 3]
print(x[list2])

In [None]:
list3 = [True, False, False, False, False, True]
print(x[list3])

In [None]:
# mutable
x[0] = 100.3
print(x) # Float to Integer

In [None]:
b = np.arange(10, 19).reshape(3,3)
print(b)

|         | j = 0   | j = 1   | j = 2   |
|---------|---------|---------|---------|
|**i = 0**| B[0, 0] | B[0, 1] | B[0, 2] |
|**i = 1**| B[1, 0] | B[1, 1] | B[1, 2] |
|**i = 2**| B[2, 0] | B[2, 1] | B[2, 2] |

In [None]:
print(b[1, 2])
print(b[1][2])

In [None]:
print(b[1, 1:3]) 
print(b[1][1:3])

**Arithmetic operations in contrast to lists easy possible**

In [None]:
print(b + 3)

In [None]:
list2 + 3 # TypeError

In [None]:
print([j + 3 for j in list2])

<div class="alert alert-success"><b>Exercise:</b> Create the array <b>a</b> with values -15, -14, -13, ... 13, 14. Select the elements $a_i$ from the vector <b>a</b> which fulfill the following conditions: 
    <ol>
        <li> a_i is less than 10.</li>
        <li> a_i is a negative number. </li>
        <li> a_i is less than ten and greater than -10.</li>
        <li> a_i is a multiple of 3. </li>
</div>

### 5.2 Pandas
Pandas (acronym for Python and data analysis) provides additional functions and data structures for data analysis. The two most important data structures are called `DataFrame` and `Series`. The first denotes a data table, the second a column of a data table (i.e. a `DataFrame` consists of `Series`). 

In [None]:
#import pandas as pd 

#### 5.2.1 Series
Series' objects contain an additional index in addition to the values. In the default setting, this is numeric and starts at 0. However, it can also be customized.

In [None]:
a = pd.Series([15, 16, 17, 18])
print(a)

In [None]:
print(a.index) 
print(a.values) 

In [None]:
b = pd.Series([15, 16, 17, 18], index=["a", "b", "c", "d"])
b

In [None]:
print(b["c"]) # Indexing using string
print(b.c) # dot notation
print(b[2]) # Indexing with position

#### 5.2.2 DataFrames
A DataFrame contains multiple Series objects and is the "default" object when analyzing multidimensional data. Usually, the columns contain different types of data (our *variables*) and the rows contain the corresponding values (our *observations*).

In [None]:
a = [21, 22, 23, 24]
b = [1000, 2000, 1500, 1700]
table1 = pd.DataFrame({"a":a, "b":b}) # key is column name
table1

In [None]:
table2 = pd.DataFrame({
    "age":a, 
    "income":b, 
    "faculty": "economics", 
    "studies": pd.Categorical(["A", "A", "B", "B"])}, 
    index=("Person1", "Person2", "Person3", "Person4"))
table2

In [None]:
table2.head(2)

In [None]:
table2.tail(n = 2)

In [None]:
table2.index

In [None]:
table2.columns

In [None]:
table2.describe() 

In [None]:
table2.describe(include = "all")

**Indexing**
* []: Indexing with numpy.notation 
* .: Attribute access operator
* .loc: lable-based
* .iloc: integer-position
* .at: one value

In [None]:
table2[0:1]

In [None]:
table2.age

In [None]:
table2.loc["Person1"]

In [None]:
table2.iloc[3]

In [None]:
table2.at["Person2", "income"]

<div class="alert alert-success"><b>Exercise:</b> The dataframe <b>e</b> is given. Get an overview of <b>e</b> by 
    <ol>
        <li> looking at the indexes.</li>
        <li> output the column names. </li>
        <li> output the first two rows.</li>
        <li> sort <b>e</b> descending by ranking using the method <b>sort_values</b>. </li>
    </ol>
</div>

In [None]:
e = pd.DataFrame({
    "Title":["Pulp Fiction", "Die Verurteilten", "Der Pate", "Fight Club", "The Dark Knight"], 
    "Year": [1994, 1994, 1972, 1999, 2008], 
    "Ranking": [1, 5, 3, 2, 4]})
e

<h1><center> 6 Visualization </h1>
Numerous plots can be implemented in Python. It is crucial to choose the right plot for the right data. First, we need to ask ourselves what type of data we are dealing with: 
* categorical
* numeric
* categorical + numeric
* spatial data
* etc.

The website https://www.data-to-viz.com/index.html provides information to different visualization techniques.

### 6.1 First graphic

In [None]:
import matplotlib.pyplot as plt
#import seaborn as sns

In [None]:
y = [1, 4, 2, 3]
plt.plot(y)
plt.show()

**High-Level-Parameters:**

| Argument   | Value                            | Description                            |
|:------------|:---------------------------------|:-----------------------------------------|
| linestyle | "--", "-.", "", ...             | line type                               |
| marker     | "o", "v", "1", ...              | Type of data point symbols               |
| color      | "Name", "RGB", "Hex-Code" | Colour of data point symbols              |
| linewidth  | Number                         | Line width |
| label      | "Legend"        | Legend                    |

In [None]:
x = [0, 3, 4, 5]
plt.plot(x, y, color="green", marker="o", linestyle="--")
plt.show()

**Low-Level-Functions:**

| Function                                                                          | Description                       |
|:---------------------------------------------------------------------------------|:------------------------------------|
| plt.title("Text")                                                             | Adds title             |
| plt.xlabel("Text")                                                            | Adds title for the x-axis    |
| plt.ylabel("Text")                                                            | Adds title for the y-axis    |
| plt.legend(loc="Ort", fontsize=Zahl, ...)                                   | Characteristics of the legend          |
| plt.grid(b=bool, ...)                                                         | Characteristics of the grid |
| plt.axis(xmin=Zahl, xmax=Zahl, ymin=Zahl, ymax=Zahl, option=bool,...) | Characteristics of the axes                |
| plt.axvline()                                                         | Vertical line                  |
| plt.axhline()                                                         | Horizontal line                   |

In [None]:
# Plot
plt.plot(x, y, label="min.Temp")

# Layout
plt.title("Grafik1 $\sim N(\mu, \sigma^2)$", loc="left", fontsize=20)
plt.xlabel("days", style="italic")
plt.ylabel("temp", color="red")
plt.xlim([1,9])
plt.ylim([0,5])
plt.legend(loc = "upper right")

# PShow plot
plt.show()

### 6.2 Visualization of one variable

In [None]:
df = pd.read_csv("Bestsellers.csv")
df.head()

**Barplot**

In [None]:
# absolute values
data4bar = df["Genre"].value_counts()
data4bar

In [None]:
# Bars
plt.bar(data4bar.index, data4bar.values)

# Layout
plt.title("Säulendiagramm")
plt.xlabel("Genre")
plt.ylabel("Anzahl")

# Show plot
plt.show()

In [None]:
plt.bar(data4bar.index, data4bar.values, color=["Red", "Yellow"])
plt.show()

In [None]:
plt.bar(data4bar.index, data4bar.values, color=(0.2, 0.4, 0.6, 0.2), edgecolor = "red")
plt.show()

In [None]:
sns.barplot(x = data4bar.index, y = data4bar.values, color="#69b3a2")
plt.show()

**Histogram**

In [None]:
plt.hist(df["Reviews"], density = True)
plt.show()

In [None]:
sns.set(style="white")
sns.displot(df["Reviews"])
plt.show()

In [None]:
sns.histplot(data=df["Reviews"], bins=10)
plt.show()

**Boxplot**

In [None]:
sns.boxplot(x=df["Reviews"])
plt.show()

**Kernel**

In [None]:
sns.kdeplot(df["Reviews"])
plt.show()

In [None]:
sns.kdeplot(df["Reviews"], shade=True, color="olive")
plt.show()

<div class="alert alert-success"><b>Exercise:</b> Import the dataset <b>company</b>. Create a line chart with <i>total_profit</i> and <i>month_number</i>. Thereby, the line color should be green, the title <i>Profit per month</i>, the x-axis title <i>Month</i>, the y-axis title <i>Profit in US-Dollary</i>, the data point symbol triagnles and the line type: "-.-.-.-.-.-.-.-.". Furthermore, there should be a red, horizontal line, which marks the arithmetic mean of <i>total_profit</i>.<br>
    <b>Hint:</b> Use the function np.mean!</div>

### 6.3 Visualization of two variables

In [None]:
# Scatterplot
plt.plot(df["Reviews"], df["Price"], marker="o", linestyle="")

# Layout
plt.title("Scatterplot")
plt.xlabel("User Rating")
plt.ylabel("Price")

# Show plot
plt.show()

In [None]:
# Scatterplot
plt.plot(df["Reviews"], df["Price"], marker="o", alpha=0.2, linestyle="")

# Layout
plt.title("Scatterplot")
plt.xlabel("User Rating")
plt.ylabel("Price")

# Show plot
plt.show()

In [None]:
sns.boxplot(y=df["Reviews"], x=df["Genre"])
plt.show()

<div class="alert alert-success"><b>Exercise:</b> Import the dataset <b>HappinessReport2019</b>. Create a scatterplot with <i>Overall rank</i> and <i>Social support</i> with green dots. </div>