**What you learn:**

In this notebook you will learn about functions in Python. This includes named functions, lambda (anonymous) functions, generators, and function libraries. 

Originally based on a [tutorial by Zhiya Zuo](https://github.com/zhiyzuo/python-tutorial) and extended where appropriate.

Jens Dittrich, [Big Data Analytics Group](https://bigdata.uni-saarland.de/), [CC-BY-SA](https://creativecommons.org/licenses/by-sa/4.0/legalcode)

This notebook is available on https://github.com/BigDataAnalyticsGroup/python.

### Functions

#### Calling functions

Previously, we have already made use of many built-in functions to facilitate programming. A function is a block of code with (optional) input arguments (optional) return values. In Python (and many other languages), a function can be called as follows:

```python
>> output = foo(input_argument1, input_argument2)
```

We called several functions already when handling [loops](https://github.com/BigDataAnalyticsGroup/python/blob/master/05%20Control%20Logics.ipynb) of this tutorial, for example:

In [1]:
a = range(5)
a

range(0, 5)

We can nest function calls, here the output of range(5) is the input to list(..):

In [2]:
list(range(5))

[0, 1, 2, 3, 4]

Another example:

In [3]:
abs(-3.5)

3.5

Often we need more than one input argument. For example:

In [4]:
list(range(5, 0, -1))

[5, 4, 3, 2, 1]

A second example, given a dictionary produce a list of the keys sorted by their associated values in the dictionary (not the keys themselves!):

In [5]:
d = {'a': 100, 'c': 50, 'b': 70}
# output keys available in this dictionary:
list(d.keys())

['a', 'c', 'b']

In [6]:
type(d), d

(dict, {'a': 100, 'c': 50, 'b': 70})

In [7]:
# sort dictionary (this function will return a list of the keys)
l = sorted(d)
type(l), l

(list, ['a', 'b', 'c'])

In [8]:
# show the value of key 'a':
d['a']

100

In [9]:
def values(key):
    return d[key]

In [10]:
# sort the keys of the dictionary by their associated values:
sorted(d, key=values)

['c', 'b', 'a']

In [11]:
# sort the keys of the dictionary by their associated values:
sorted(d, key=lambda k: d[k])

['c', 'b', 'a']

In [12]:
# sort the keys of the dictionary by their associated values in reverse (aka descending) order:
sorted(d, key=lambda k: d[k], reverse=True)

['a', 'b', 'c']

#### Lambda functions

Aha, we just saw something different: `lambda`!

Lambda functions are just functions, except that they are anonymous (literally). See [here](https://stackoverflow.com/questions/890128/why-are-python-lambdas-useful) for many good discussions. In short, you can use regular functions to achieve anything with `lambda`. Yet, it is handy because it is lightweight and anonymous.

The example above is actually a good example of when to use `lambda`:

In [13]:
sorted(d, key=lambda k: d[k])

['c', 'b', 'a']

There is one and only one expression within the `lambda` function. In this case, the input parameter is `k`, it is expected to be an existing key inside the dictionary `d`. The output of the lambda function is `d[k]`. Therefore we are sorting our dictionary entries by their values rather than by the keys themselves.

#### Define our own functions

Note that we are not limited to built-in or lambda functions only. Let's now try make our own functions. Before that, we need to be clear on the structure of a function:
```python
def func_name(arg1, arg2, arg3, ...):
    # start of the actual code block: must start with a tab
    # Do something here # <-- whatever number of code lines, must start with a tab
    # end of the actual code block: must start with a tab
    return output
```

\* *`return output` and all arguments are optional*

So again: each line in the code blocks must start with a `tab`. In contrast to Java/C++ which uses {}-Syntax for this.

In the following example, we make use of `sum`, a built-in function to sum up numeric iterables:

In [14]:
def mySum(list_to_sum):
    print('mySum was called.')
    return sum(list_to_sum)

In [15]:
mySum(range(5))

mySum was called.


10

In [16]:
# in this case the outpout is no different to calling sum() directly (other than print statement above):
sum(range(5))

10

The same sum function using a for loop to a add up the values in the input list:

In [17]:
def mySumUsingLoop(list_to_sum):
    runningSum = 0
    for item in list_to_sum:
        print(item)
        runningSum += item
        print('current runningSum:', runningSum)
    return runningSum

In [18]:
mySumUsingLoop(range(5))

0
current runningSum: 0
1
current runningSum: 1
2
current runningSum: 3
3
current runningSum: 6
4
current runningSum: 10


10

*The two example functions are not doing anything interesting but just serve as illustrations to build customized functions.*

#### Functions without side effects:

Sometimes functions may have surprising side-effects.

Actually, a function should **never** have a side-effect.

In [19]:
# no side effect:
def func1(mylist):
    print ("  inner1 ", mylist)
    mylist = [47,11] # this creates a new list object and assigns it to local variable myList!
    print ("  inner2 ", mylist)

fib = [0,1,1,2,3,5,8]
print("outer1   ", fib)
func1(fib)
print("outer2   ", fib)

outer1    [0, 1, 1, 2, 3, 5, 8]
  inner1  [0, 1, 1, 2, 3, 5, 8]
  inner2  [47, 11]
outer2    [0, 1, 1, 2, 3, 5, 8]


In [20]:
# show the python-internal id of object fib:
print(id(fib))
a = 42
id(a)

140091908619472


93899897223072

From the [Python Docu](https://docs.python.org/3/library/functions.html#id):

**id(object)**
> Return the “identity” of an object. This is an integer which is guaranteed
> to be unique and constant for this object during its lifetime. Two objects
> with non-overlapping lifetimes may have the same id() value.

In [21]:
# same as above, just additionally printing the id of the list
def func1(mylist):
    print ("  inner1 ", id(mylist), mylist)
    mylist = [47,11] # this creates a new list object and assigns it to local variable myList!
    print ("  inner2 ", id(mylist), mylist)

fib = [0,1,1,2,3,5,8]
print("outer1   ", id(fib), fib)
func1(fib)
print("outer2   ", id(fib), fib)

outer1    140091934768976 [0, 1, 1, 2, 3, 5, 8]
  inner1  140091934768976 [0, 1, 1, 2, 3, 5, 8]
  inner2  140091908619472 [47, 11]
outer2    140091934768976 [0, 1, 1, 2, 3, 5, 8]


#### Functions **with** side effects:

In [22]:
# list and fib refer to the same address as shown by calling id():
def func2(mylist):
    print ("  inner1 ", id(mylist), mylist)
    mylist += [47,11] # appends two elements to the list pointed to
    print ("  inner2 ", id(mylist), mylist)
    
fib = [0,1,1,2,3,5,8]
print("outer1   ", id(fib), fib)
func2(fib)
print("outer2   ", id(fib), fib)

outer1    140091934668032 [0, 1, 1, 2, 3, 5, 8]
  inner1  140091934668032 [0, 1, 1, 2, 3, 5, 8]
  inner2  140091934668032 [0, 1, 1, 2, 3, 5, 8, 47, 11]
outer2    140091934668032 [0, 1, 1, 2, 3, 5, 8, 47, 11]


#### Functions **without** side effects:

In [23]:
# no side effect:
def func3(stlocal):
    print ("  inner1 ", id(stlocal), stlocal)
    stlocal += 'blub' # this creates a new string object and assigns it to stlocal!
    print ("  inner2 ", id(stlocal), stlocal)
    
st = 'bla'
print("outer1   ", id(st), st)
func3(st)
print("outer2   ", id(st), st)

outer1    140091908525424 bla
  inner1  140091908525424 bla
  inner2  140091908487792 blablub
outer2    140091908525424 bla


#### Generator function:

generator functions are useful in situations where we need to iterator over a sequence of items, a prominent example of this is are for-loops:

In [24]:
# generator function:
def square(n):
    for i in range(n):
        print ("yield:, ", i**2)
        yield i**2

for i in square(10):
    print("loop: ",i)

yield:,  0
loop:  0
yield:,  1
loop:  1
yield:,  4
loop:  4
yield:,  9
loop:  9
yield:,  16
loop:  16
yield:,  25
loop:  25
yield:,  36
loop:  36
yield:,  49
loop:  49
yield:,  64
loop:  64
yield:,  81
loop:  81


In [25]:
import random
import string

# generator function for words consisting of random characters:
def blabla(n, length):
    for i in range(n):
        ret = ''
        for y in range(length):
            ret += random.choice(string.ascii_letters)
        yield ret.lower()

for bla in blabla(10,7):
    print(bla+", ", end="")
    

pfzlmvn, iurgepa, thhsgjq, yejstoy, oohvyxh, flzgskr, prumydt, keakmnn, icvmiuw, cxrrbfm, 

Here, a function is easier (and more elegant) than defining a lambda function.

### Libraries

Often we need either internal or external help for complicated computation tasks. In these occasions, we need to _import libraries_, basically collections of existing functions.

One strength of Python is the **incredible universe of available libraries**. You can find libraries for almost anything.

to start:

[Python Standard Library](https://docs.python.org/3/library/)

[Python Package Index](https://pypi.org/search/)

#### Built-in libraries

We will use the __math__-library as an example.

In [26]:
import math # use import to load a library

To use functions from the library, do: `library_name.function_name`. For example, when we want to calculate the logarithm using a function from `math` library, we can do `math.log`

In [27]:
x = 5
print("e^%i"%x,"= %f"%math.exp(x))
print("log(%i)"%x,"= %f"%math.log(x))

e^5 = 148.413159
log(5) = 1.609438


You can also import one specific function only:

In [28]:
from math import exp # You can import a specific function
print(exp(x)) # This way, you don't need to use math.exp but just exp

148.4131591025766


Or import all functions from a library:

In [29]:
from math import * # Import all functions

In [30]:
print(exp(x))
print(log(x)) # Before importing math, calling `exp` or `log` will raise errors

148.4131591025766
1.6094379124341003


Depending on what you want to achieve, you may want to choose between importing a few or all (by `*`) functions within a package.

#### External libraries

There are times you'll want some advanced utility functions not provided by Python. There are many useful packages by developers.

We'll use __numpy__ as an example. (__numpy__, __scipy__, __matplotlib__,and probably __pandas__ will be of the most importance to you for data analyses.

Installation of packages for Python through the command line is easy <a href="https://packaging.python.org/installing/" target="_blank">pip</a>:

```bash
~$ pip install numpy scipy pandas
```
This assumes that pip executes pip3. On my machine I have to call:

```bash
~$ pip3 install numpy scipy pandas
```

For this lecture, you do not have to install libraries yourself. All necessary libraries are preinstalled through vagrant. If we need more, we will update [the vagrant file](https://github.com/BigDataAnalyticsGroup/python/blob/master/Vagrantfile). Also make sure you do not miss [our instructions](https://github.com/BigDataAnalyticsGroup/python/blob/master/Instructions.md) on how to use vagrant.