<center><img src="https://i.imgur.com/zRrFdsf.png" width="700"></center> 

_____


# Creating Functions

In [31]:
# for some cells to run R
%load_ext rpy2.ipython 

The rpy2.ipython extension is already loaded. To reload it, use:
  %reload_ext rpy2.ipython


Functions in **Python** are created with **def**:

In [65]:
# function name: 'commaToNumP1'

def byCommaP1(some_String): # function input: 'xString'
    import re              # locally calling a library (unavailable outside)
    some_String_changed=re.sub(",","",some_String) # substituting part of the string
    someNumber=float(some_String_changed)          # converting previous result into a number type
    return someNumber            # function output

In [66]:
# use function name, and input a string
byCommaP1("1,200")

1200.0

When functions are very simple, you may use the **lambda style**:

In [67]:
import re

byCommaP2=lambda some_String:float(re.sub(",","",some_String))

byCommaP2('1,250.9')

1250.9

Functions in **R** are created with **function**:

In [68]:
%%R
byCommaR=function(some_String){
some_String_changed=gsub(",","",some_String)
someNumber=as.numeric(some_String_changed)
return (someNumber)
}

In [69]:
%%R
byCommaR('2,400')

[1] 2400


# Applying Functions to data simple structures

Most of the time, you need that the function be applied to several values. Let's use the previously defined **byCommaP2** function:

In [70]:
# convert values in a list
valuesInList=['1','2','3','5,500']
byCommaP2(valuesInList)

TypeError: expected string or bytes-like object, got 'list'

In [71]:
# traverse the list
[byCommaP2(x) for x in valuesInList]

[1.0, 2.0, 3.0, 5500.0]

In [72]:
# apply function to list
list(map(byCommaP2,valuesInList))

[1.0, 2.0, 3.0, 5500.0]

It is very similar in R: 

In [73]:
%%R
valuesInList=list('1','2','3','5,500')

Notice that R applies the function to a data structure with no error nor warning:

In [74]:
%%R
byCommaR(valuesInList)

[1]    1    2    3 5500


# Applying Functions to Data Frames


In [99]:
import pandas as pd

linkGH="https://github.com/CienciaDeDatosEspacial/code_and_data/raw/main/data/dataFrame_example.csv"
testDataP=pd.read_csv(linkGH)
testDataP

Unnamed: 0,code,measure1,measure2
0,a1,121,2831
1,a2,133,3112
2,a3,111,2597
3,a4,105,2457
4,a5,1566,3664
5,a6,132,3089
6,a7,122,2855
7,a8,1301,3044
8,a9,1344,3145
9,a10,1465,3428


In [76]:
testDataP.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 14 entries, 0 to 13
Data columns (total 3 columns):
 #   Column    Non-Null Count  Dtype 
---  ------    --------------  ----- 
 0   code      14 non-null     object
 1   measure1  14 non-null     object
 2   measure2  14 non-null     object
dtypes: object(3)
memory usage: 464.0+ bytes


Here, we need something different:

In [101]:
commaToDot=lambda some_String:float(re.sub(",",".",some_String))

In [102]:
#would this work?

commaToDot(testDataP.measure1)

TypeError: expected string or bytes-like object, got 'Series'

Let's apply the function to one column:

In [103]:
testDataP.measure1.apply(commaToDot) # 'apply' belongs to 'pandas'

0     12.10
1     13.30
2     11.10
3     10.50
4     15.66
5     13.20
6     12.20
7     13.01
8     13.44
9     14.65
10    15.50
11    10.30
12    10.44
13    11.40
Name: measure1, dtype: float64

Now, apply it to two columns:

In [104]:
testDataP.loc[:,['measure1','measure2']].apply(commaToDot)

TypeError: expected string or bytes-like object, got 'Series'

When teh function makes changes cell by cell, you will need **applymap**:

In [105]:
testDataP.loc[:,['measure1','measure2']].applymap(commaToDot)

Unnamed: 0,measure1,measure2
0,12.1,28.31
1,13.3,31.12
2,11.1,25.97
3,10.5,24.57
4,15.66,36.64
5,13.2,30.89
6,12.2,28.55
7,13.01,30.44
8,13.44,31.45
9,14.65,34.28


In [106]:
# notice no changes yet:
testDataP

Unnamed: 0,code,measure1,measure2
0,a1,121,2831
1,a2,133,3112
2,a3,111,2597
3,a4,105,2457
4,a5,1566,3664
5,a6,132,3089
6,a7,122,2855
7,a8,1301,3044
8,a9,1344,3145
9,a10,1465,3428


In [107]:
# now:
testDataP.loc[:,['measure1','measure2']]=testDataP.loc[:,['measure1','measure2']].applymap(commaToDot)
testDataP

Unnamed: 0,code,measure1,measure2
0,a1,12.1,28.31
1,a2,13.3,31.12
2,a3,11.1,25.97
3,a4,10.5,24.57
4,a5,15.66,36.64
5,a6,13.2,30.89
6,a7,12.2,28.55
7,a8,13.01,30.44
8,a9,13.44,31.45
9,a10,14.65,34.28


It would be similar in R:

In [115]:
%%R
linkGH="https://github.com/CienciaDeDatosEspacial/code_and_data/raw/main/data/dataFrame_example.csv"
testDataR=read.csv(linkGH)
testDataR

   code measure1 measure2
1    a1     12,1    28,31
2    a2     13,3    31,12
3    a3     11,1    25,97
4    a4     10,5    24,57
5    a5    15,66    36,64
6    a6     13,2    30,89
7    a7     12,2    28,55
8    a8    13,01    30,44
9    a9    13,44    31,45
10  a10    14,65    34,28
11  a11     15,5    36,27
12  a12     10,3     24,1
13  a13    10,44    24,43
14  a14     11,4    26,68


In [109]:
%%R
str(testDataR)

'data.frame':	14 obs. of  3 variables:
 $ code    : chr  "a1" "a2" "a3" "a4" ...
 $ measure1: chr  "12,1" "13,3" "11,1" "10,5" ...
 $ measure2: chr  "28,31" "31,12" "25,97" "24,57" ...


In [110]:
%%R
commaToDot=function(some_String){
some_String_changed=gsub(",",".",some_String)
someNumber=as.numeric(some_String_changed)
return (someNumber)
}

In [111]:
%%R
commaToDot(testDataR$measure1)

 [1] 12.10 13.30 11.10 10.50 15.66 13.20 12.20 13.01 13.44 14.65 15.50 10.30
[13] 10.44 11.40


This will not work:

In [112]:
%%R
commaToDot(testDataR[,c('measure1','measure2')])

[1] NA NA


You may need **lapply**:

In [113]:
%%R

lapply(testDataR[,c('measure1','measure2')],commaToDot)

$measure1
 [1] 12.10 13.30 11.10 10.50 15.66 13.20 12.20 13.01 13.44 14.65 15.50 10.30
[13] 10.44 11.40

$measure2
 [1] 28.31 31.12 25.97 24.57 36.64 30.89 28.55 30.44 31.45 34.28 36.27 24.10
[13] 24.43 26.68



In [116]:
%%R
# no changes
testDataR

   code measure1 measure2
1    a1     12,1    28,31
2    a2     13,3    31,12
3    a3     11,1    25,97
4    a4     10,5    24,57
5    a5    15,66    36,64
6    a6     13,2    30,89
7    a7     12,2    28,55
8    a8    13,01    30,44
9    a9    13,44    31,45
10  a10    14,65    34,28
11  a11     15,5    36,27
12  a12     10,3     24,1
13  a13    10,44    24,43
14  a14     11,4    26,68
