# Functional programming

## 1. introduction

* functions are a way to modularize and reuse code
* functions are code that performs a task, usually have input(s) and output(s)
* as a beginner, it is easy to write in a scripting - procedural style
* it is relatively hard to get the right level of abstraction while writing functions and it requires a programmer mindset

## 2. writing functions

let's start writting a simple function that ensembles a phrase from a list of words...

In [82]:
def list2string(str_lst):
    string = ' '.join(str_lst)
    return string * 2

In [83]:
a = list2string(['a', 'b', 'c'])

In [3]:
to_string_list = ['my', 'name', 'is', 'David']

list2string(to_string_list)

'my name is David'

there are other ways to pass the agruments to the function...

In [67]:
# directly in the function call

list2string(['my', 'name', 'is', 'Pepe'])

'my name is Pepe'

In [68]:
# as a keyword argument

list2string(str_lst=to_string_list)

'my name is David'

In [86]:
# this will raise a TypeError

list2string(['my', 'name', 'is', 1])

TypeError: sequence item 3: expected str instance, int found

fix? below

In [88]:
# functions can have any computation inside, there is no limits...

def list2string(str_lst):
    str_lst_fixed = [str(e) for e in str_lst]
    string = ' '.join(str_lst_fixed)
    return string

list2string(['my', 'name', 'is', 1])

'my name is 1'

two interesting concepts while passing arguments to functions, args and kwargs...

args:

In [103]:
# this function expects an 

def multiply(*args):
    mult = 1
    for arg in args:
        mult *= arg
    return mult

In [104]:
multiply(1, 2, 3, 4)

24

In [107]:
# can even pass arguments as a list using '*'

numbers_to_multiply = [1, 2, 3, 4]
multiply(*numbers_to_multiply)

24

kwargs:

In [129]:
# can have arbitrary keyword arguments

def keyword_arguments(**kwargs):
    print(kwargs)
    return 1

In [135]:
a = keyword_arguments(a=5, b=2, c=3, david=45, asdf='b')

{'a': 5, 'b': 2, 'c': 3, 'david': 45, 'asdf': 'b'}


## 3. global vs local variables

this concept is complex, do not get frustrated, but be careful...

In [136]:
a = 'this variable is global'

# global variables can be used inside functions, but can not be modified

def random_function_1():
    print(a)  # what does this function returns?

In [137]:
# functions can have no arguments, this function uses a global variable without trying to modify it

random_function_1()

this variable is global


In [138]:
# this function tries to overwrite the global variable, but it is not possible...

a = 1

def random_function_2(b):
    print(a)
    a = a + b
    return a

In [139]:
# so it is going to fail...

random_function_2(9)

UnboundLocalError: local variable 'a' referenced before assignment

fix?

In [142]:
# call the global variable inside function

a = 1

def random_function_2(b):
    global a  # this is considered a very bad practice btw
    print(a)
    a = 9
    print(a, b)

In [143]:
random_function_2(8)

1
9 8


In [144]:
a

9

In [231]:
import random
a = []

def append_random(l=(3, 4, 5)):
    print(a)
    random_int = random.randint(1, 20)
    l.append(random_int)
    print(l)
    print('ok')

local variables are not available outside the function:

In [235]:
# this function define a local scope variable c inside

a = 1

def random_function_3(b):
    c = a + b
    print(c)

In [236]:
random_function_3(5)

6


In [237]:
# this will raise a NameError as c is not available outside

print(c)

NameError: name 'c' is not defined

## 4. pandas apply (why is this here?)

<font color=red>WARNING</font>: be extremely careful while manipulating pandas DataFrames inside functions as some operations modify original object while others just return a copy. It usually leads to confusion.

In [240]:
import pandas as pd

In [241]:
data = pd.read_csv('https://vincentarelbundock.github.io/Rdatasets/csv/carData/Arrests.csv',  # dataset about Arrests for Marijuana Possession 
                   index_col=0)

In [243]:
data

Unnamed: 0,released,colour,year,age,sex,employed,citizen,checks
1,Yes,White,2002,21,Male,Yes,Yes,3
2,No,Black,1999,17,Male,Yes,Yes,3
3,Yes,White,2000,24,Male,Yes,Yes,3
4,No,Black,2000,46,Male,Yes,Yes,1
5,Yes,Black,1999,27,Female,Yes,Yes,1
...,...,...,...,...,...,...,...,...
5222,Yes,White,2000,17,Male,Yes,Yes,0
5223,Yes,White,2000,21,Female,Yes,Yes,0
5224,Yes,Black,1999,21,Female,Yes,Yes,1
5225,No,Black,1998,24,Male,Yes,Yes,4


In [279]:
data

Unnamed: 0,released,colour,year,age,sex,employed,citizen,checks
1,Yes,White,2002,21,Male,Yes,Yes,3
2,No,Black,1999,17,Male,Yes,Yes,3
3,Yes,White,2000,24,Male,Yes,Yes,3
4,No,Black,2000,46,Male,Yes,Yes,1
5,Yes,Black,1999,27,Female,Yes,Yes,1
...,...,...,...,...,...,...,...,...
5222,Yes,White,2000,17,Male,Yes,Yes,0
5223,Yes,White,2000,21,Female,Yes,Yes,0
5224,Yes,Black,1999,21,Female,Yes,Yes,1
5225,No,Black,1998,24,Male,Yes,Yes,4


In [280]:
def get_means(df):
    df.columns = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h']
    numeric = df.select_dtypes('number')
    means_df = pd.DataFrame(numeric.mean()).reset_index()
    means_df.columns = ['colname', 'mean']
    return means_df

In [281]:
get_means(data)

Unnamed: 0,colname,mean
0,c,1999.509376
1,d,23.846537
2,h,1.636433


In [282]:
data

Unnamed: 0,a,b,c,d,e,f,g,h
1,Yes,White,2002,21,Male,Yes,Yes,3
2,No,Black,1999,17,Male,Yes,Yes,3
3,Yes,White,2000,24,Male,Yes,Yes,3
4,No,Black,2000,46,Male,Yes,Yes,1
5,Yes,Black,1999,27,Female,Yes,Yes,1
...,...,...,...,...,...,...,...,...
5222,Yes,White,2000,17,Male,Yes,Yes,0
5223,Yes,White,2000,21,Female,Yes,Yes,0
5224,Yes,Black,1999,21,Female,Yes,Yes,1
5225,No,Black,1998,24,Male,Yes,Yes,4


easier:

In [122]:
data.mean()

year      1999.509376
age         23.846537
checks       1.636433
dtype: float64

In [283]:
%%timeit 

data.mean()

47.4 ms ± 1.18 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)


In [284]:
%%timeit

get_means(data)

2.66 ms ± 5.86 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)


(OPTIONAL) DataFrame modification example...

In [58]:
# Here

(OPTIONAL) function inception:

In [285]:
def a(b):
    v = 5
    def c(d):
        global v
        v = 0
        return v ** 2 + d
    return 5 * c(b)

In [286]:
a(2)

10