[Back to course overview](../CourseOverviewR.ipynb)

# Functions in R
* Author: Johannes Maucher
* Last Update: 2017-03-13

![Data Science R Overview](../../../Pics/DSRfunctions.PNG)

R is primarily a functional language. Functions are treated as other data types. For example functions can be assigned to variables and can be passed as arguments to other functions. Even simple operators as *+* are functions. The conventional formulation *x+y* is just a shortcut for "+"(x,y):

In [2]:
9+6
"+"(9,6)

One of the most popular concepts of R functions is that they can be applied in a *vectorized* manner. This means that they can be executed for an individual element as well as elementwise for a collection of elements, e.g. vectors or matrices:

In [3]:
a<-1:10
b<-11:20
"+"(a,b)
a+b

In other programming languages such operations are typically defined only for single elements and an elementwise calculation on vectors is usually implemented by a repeated call of the operation within a for-loop. 

## Build-in functions
R provides an immense bunch of built-in-functions. These are functions, which are available in the basic R-package. They can be applied whenever needed and need not be explicitely loaded. 

### Examples for mathematical build-in functions
Some basic statistics, such as maximum, minimum, mean, standard-deviation and variance can be calculated by the following built-in-functions:

In [4]:
max(a)
min(a)
mean(a)
sd(a)
var(a)

Some other mathematical built-in functions are:

In [5]:
sqrt(25)
cos(pi)
seq(1,100,by=20)

### Examples for built-in character functions
Character functions are executed on textual data. A small but important subset of character built-in functions is:

* `nchar(x)`: Returns the number of characters in *x*,
* `substr(x,u,v)`: Returns the substring of *x*, which starts at index *u* and terminates at index *v*,
* `strsplit(x,split,fixed=FALSE)`: Splits the character-variable *x* at all characters defined in the pattern *split*. If *fixed=TRUE*, then the pattern *split* is interpreted as a character variable. If *fixed=FALSE*, the *split* is interpreted as a regular expression,
* `grep(pattern,x,ignore.case=FALSE,fixed=TRUE)`: Searches for *pattern* in *x*. If *fixed=FALSE*, then pattern is a regular expression. If *fixed=TRUE*, then *pattern* is a text string. Returns the matching indices,
* `sub(pattern, replacement, x, ignore.case=FALSE, fixed=FALSE)`: Finds pattern in *x* and substitutes the *replacement* text. If *fixed=FALSE*, then *pattern* is a regular expression. If *fixed=TRUE*, then *pattern* is a text string. Note that`sub()` replaces only the first occurence of *pattern*. If all occurences shall be replaced `gsub()` can be applied.
* `paste(A,sep="")`: Concatenates the strings in *A* (sequence of strings) after using the *sep* string to separate them,
* `toupper(x)`: Turns all characters in *x* to uppercase,
* `tolower(x)`: Turns all characters in *x* to lowercase.

These functions are demonstrated in the following lines of codes:

In [6]:
myCharVar="Das ist ein einfacher Satz. Und hier kommt nochmal ein Satz."
nchar(myCharVar)
substr(myCharVar,5,8)
strsplit(myCharVar,'.',fixed=TRUE)
strsplit(myCharVar,'\\s') #split at all whitespaces
seqChars=c("Das ist Satz 1.","Hier ist der zweite Satz.","Und hier der dritte.")
grep('der',seqChars,fixed=TRUE)
grep('\\si',seqChars,fixed=FALSE)
grep('\\d',seqChars,fixed=FALSE)
sub('zweite','2.',seqChars,fixed=TRUE)
paste("Feature",1:5,sep="-")
curval=10
paste("The value is",curval,sep=": ")
paste("Today is",date(),sep=": ")
toupper('abCD')
tolower('EFgh')

### Examples for other useful built-in functions
Besides a vast variety of mathematical functions, there are a lot other useful build-in functions in R. Here is just a small subset of such useful helpers:

* `length(x)`: Returns the length of an object *x*. E.g. *length(c(4,2,19))* returns 3.
* `seq(x,y,z)`: Generates a sequence of numbers from *x* to *y* with a step-size of *z*.
* `rep(x,n)`: Generates a sequence, which contains *n* copies of *x* (x need not be a number, but can also be e.g. a sequence).
* `cut(x,n)`: Divides the continuous variable *x* into a vector with *n* levels.
* `pretty(x,n)`: Divides a continuous variable *x* into *n* intervals by selecting *n+1* equally spaced rounded values.
* `cat(A)`: Concatenates objects in A.

**Examples:**

In [7]:
x<-seq(13,40,5)
x
y<-rep(x,2)
y
length(x)
length(y)

In [18]:
cut(c(8,4,16),3)
x2<-c(45,2,82,22,4)
x2
cut(x2,4)

In [19]:
pretty(x,5)

In [9]:
m=3
j=5
cat(" Value of m is:\t", m, "\n","Value of j is\t",j)

 Value of m is:	 3 
 Value of j is	 5

## Functions from external R packages
There exists more than 10000 R-packages, which provide solutions for all kinds of problems. External R packages can be downloaded e.g. from [https://cran.r-project.org/](https://cran.r-project.org/). The list of all installed package, available in your current environment, can be obtained by the following statement:

In [10]:
library()

Functions of external packages, which are installed in your environment must be loaded before they can be applied. Functions from package *X* can be loaded by
*library(X)*. For example the following statement loads the package *ggplot2*

In [11]:
library(ggplot2)

External packages, which are not already installed in your environment, can be downloaded and installed by

install.packages("NameOfPackage")

For example the following statement downloads and installs the package *ttseries*

In [12]:
install.packages("tagcloud")

ERROR: Error in contrib.url(repos, "source"): trying to use CRAN without setting a mirror


This command is only successful if the package exists on the configured CRAN-mirror. A list of all available CRAN-mirrors can be obtained by the *getCRANmirrors()*-function. A particular mirror can the be set by

*options(repos=structure(c(CRAN="http://cloud.r-project.org/")))*

Example:

In [13]:
getCRANmirrors(all = FALSE, local.only = FALSE)
options(repos=structure(c(CRAN="http://cloud.r-project.org/")))

Unnamed: 0,Name,Country,City,URL,Host,Maintainer,OK,CountryCode,Comment
1,0-Cloud [https],0-Cloud,0-Cloud,https://cloud.r-project.org/,"Automatic redirection to servers worldwide, currently sponsored by Rstudio",winston # stdout.org,1,us,secure_mirror_from_master
2,0-Cloud,0-Cloud,0-Cloud,http://cloud.r-project.org/,"Automatic redirection to servers worldwide, currently sponsored by Rstudio",winston # stdout.org,1,us,secure_mirror_from_master
3,Algeria [https],Algeria,Algiers,https://cran.usthb.dz/,University of Science and Technology Houari Boumediene,Boukala m c <mboukala # usthb.dz>,1,dz,secure_mirror_from_master
4,Algeria,Algeria,Algiers,http://cran.usthb.dz/,University of Science and Technology Houari Boumediene,Boukala m c <mboukala # usthb.dz>,1,dz,secure_mirror_from_master
5,Argentina (La Plata),Argentina,La Plata,http://mirror.fcaglp.unlp.edu.ar/CRAN/,Universidad Nacional de La Plata,esuarez # Fcaglp.unlp.edu.ar,1,ar,
6,Australia (Canberra) [https],Australia,Canberra,https://cran.csiro.au/,CSIRO,"Bill.Venables # CSIRO.au, ServiceDesk2 # CSIRO.au",1,au,secure_mirror_from_master
7,Australia (Canberra),Australia,Canberra,http://cran.csiro.au/,CSIRO,"Bill.Venables # CSIRO.au, ServiceDesk2 # CSIRO.au",1,au,secure_mirror_from_master
8,Australia (Melbourne 1) [https],Australia,Melbourne,https://mirror.aarnet.edu.au/pub/CRAN/,AARNET,<sysadmin # aarnet.edu.au>,1,au,secure_mirror_from_master
9,Australia (Melbourne 2) [https],Australia,Melbourne,https://cran.ms.unimelb.edu.au/,"School of Mathematics and Statistics, University of Melbourne",unix-ops # lists.unimelb.edu.au,1,au,secure_mirror_from_master
10,Australia (Perth) [https],Australia,Perth,https://cran.curtin.edu.au/,Curtin University of Technology,unix # curtin.edu.au,1,au,secure_mirror_from_master


> **Note:** The CRAN mirror for downloading R packages can be set in the current R script as shown above. However, a CRAN mirror can also be set permanently by inserting the line 
```options(repos=structure(c(CRAN="YOUR FAVORITE MIRROR")))```
into the *Rprofile* configuration file. This file can be found in the *library* of the directory, where R is installed. In my case it is located in `C:\Users\xxx\Anaconda2\envs\condatascience\R\library\base\R`. 

## User-defined functions

Users can define their own functions. The encapsulations of code in functions provides a more structured and readable code. The most important advantage however is, that some routines, which are required not only once need not be implemented repititevly. A function must be defined only once and can then be used wherever it is required.

The general syntax for functions in R is:

In [14]:
functionName<-function(listOfParameters){
 statements
 return (result)
}

The list of parameters within the brackets that follow the keyword *function* are the arguments, which are passed as input to the function. Within the function body (inside the curly brackets) arbitrarily complex statements are executed. The result of this computation is returned by the function. The defined function is assigned to a variable *functionName*. The function can be accessed via this variable-name as shown below.

For example in the following code-snippet a function is defined, which normalizes the values of the vector, which is passed as argument to the function. The normalized values of the passed vector is returned by the function.

In [1]:
myNormalizer<-function(rawdata){
    maximum<-max(rawdata)
    minimum<-min(rawdata)
    normeddata=(rawdata-minimum)/(maximum-minimum)
    return (normeddata)
}

Now, this function can be executed wherever it is required, by the name 

In [3]:
(a<-10:20)
A<-myNormalizer(a)
print(A)

 [1] 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0


In [4]:
(b<-c(33,34,20,52,60,71))
B<-myNormalizer(b)
print(B)

[1] 0.2549020 0.2745098 0.0000000 0.6274510 0.7843137 1.0000000


### Passing optional arguments to a function in the function
The function *myNormalizer()* shall now be extended such that it can also provide normalized values, which are rounded to a configurable number of digits. The standard R-function *round()* already has the parameter *digits*, which allows to set the number of digits after the decimal point. The round()-function shall now be applied in the new function *myRoundNormalizer()*. Hence, a value for the *digits* parameter of the *round()*-function must be passed to *myRoundNormalizer()*. Passing an arbitrary set of parameters to an inner function can be realized by the *...*-function (triple dot function). This is demonstrated in the following code cells. 

In [5]:
myRoundNormalizer<-function(rawdata,round=F,...){
    maximum<-max(rawdata)
    minimum<-min(rawdata)
    normeddata=(rawdata-minimum)/(maximum-minimum)
    if (!round){
        return (normeddata)
    }else{
        return (round(normeddata,...))
    }
}

In [6]:
B<-myRoundNormalizer(b)
print(B)
B<-myRoundNormalizer(b,round=T)
print(B)
B<-myRoundNormalizer(b,round=T,digits=2)
print(B)

[1] 0.2549020 0.2745098 0.0000000 0.6274510 0.7843137 1.0000000
[1] 0 0 0 1 1 1
[1] 0.25 0.27 0.00 0.63 0.78 1.00


## Efficient evaluation of functions: apply, lapply, sapply
In the case that there exists many sequences of numeric values (such as *a* and *b* above) and each sequence shall be normalized by the *myNormalizer*-function, one can just implement a loop, which envokes in each iteration the *myNormalizer*-function for an individual input-argument (sequence of numeric values). Such an implementation would work, but is not very efficient. It would be much more efficient to use the R built-in function *lapply(list of variables, functionName)*.

As shown in the following code-snippet no looping is required in this way:

In [7]:
columnlist<-list(l1=a,l2=b)
print(columnlist)

$l1
 [1] 10 11 12 13 14 15 16 17 18 19 20

$l2
[1] 33 34 20 52 60 71



In [8]:
columnlistNormed<-lapply(columnlist,myNormalizer)

In [9]:
columnlistNormed

Note that the first parameter of the *lapply()*-function is a list, which contains the objects on which the function (*myNormalizer()* in the example above) shall be executed. In the case that an arbitrary function shall be executed not on a list of objects, but on an array or a matrix, the *apply()*-function can be used. This function has an additional parameter, which determines along which axes of the multidimensional object the function shall be applied. This is demonstrated in the example below. Here the *myNormalizer()*-function is first performed rowwise (parameter 1 in *apply()*) and then columnwise (parameter 2 in *apply()*).

In [14]:
mymat=matrix(floor(runif(28)*20),nrow=4,ncol=7)

In [15]:
mymat

0,1,2,3,4,5,6
1,18,5,15,7,16,15
5,3,7,14,17,18,18
12,0,15,6,5,14,3
13,12,3,1,11,16,18


Rowwise normalization:

In [17]:
mymatNormed=apply(mymat,1,myNormalizer)
mymatNormed

0,1,2,3
0.0,0.1333333,0.8,0.7058824
1.0,0.0,0.0,0.6470588
0.2352941,0.2666667,1.0,0.1176471
0.8235294,0.7333333,0.4,0.0
0.3529412,0.9333333,0.3333333,0.5882353
0.8823529,1.0,0.9333333,0.8823529
0.8235294,1.0,0.2,1.0


Columnwise normalization:

In [18]:
mymatNormed=apply(mymat,2,myNormalizer)
mymatNormed

0,1,2,3,4,5,6
0.0,1.0,0.1666667,1.0,0.1666667,0.5,0.8
0.3333333,0.1666667,0.3333333,0.9285714,1.0,1.0,1.0
0.9166667,0.0,1.0,0.3571429,0.0,0.0,0.0
1.0,0.6666667,0.0,0.0,0.5,0.5,1.0


*sapply()*-is similar to *lapply()*. However, it returns a vector or a matrix instead of a list: 

In [19]:
columnmean<-lapply(columnlist,mean)
columnmean
class(columnmean)

In [20]:
columnmean<-sapply(columnlist,mean)
columnmean
class(columnmean)

## Exercises
[Exercise on functions in R](../../Assignments/Ass05FunctionsR.ipynb)