# Functions and Controls
Mark Klik & Misja Mikkers

# Packages

For this notebook we need `tidyverse`.

In [None]:
library(tidyverse)

# Introduction

In this notebook we will explore the use of control statements and functions. These concepts are part in more or less all programming languages. 

# Control statements


 `R` knows a a limited number of control statements:

* `if`, `else` (or  `ifelse`)
* `for`
* `repeat` / `while`
* `break`
* `next`
* `return`

With these statements you are able to run a specific part of your code, depending on a condition or run so called _loops_ in your notebook.

We will start with the `if` statement. The `if` statement determines if a condition is TRUE or FALSE and excute a part of code dependent on the outcome.

For example,

In [1]:
x <- 6

if (x < 4) {
  print("smaller")  # first part of the code
} else {
  print("larger")  # second part of the code
}

[1] "larger"


Because `x` fullfills the second condition, the second part of the code will be executed.
To use a part of the code multiple times, you can use a `for` loop:




In [2]:
for (counter in 1:5) {
  print(counter)
}

[1] 1
[1] 2
[1] 3
[1] 4
[1] 5


The same code-snippet will be executed 5 times, because the counter gets the value from 1 to 5
 (`1:5` is equivalent to `c(1, 2, 3, 4, 5)`). 
 
An iteration is each time a part of the code in the loop will be executed. You can skip an iteration with the command `next`.

In [3]:
for (counter in 1:5) {
  if (counter == 3) next
  print(counter)
}

[1] 1
[1] 2
[1] 4
[1] 5


The iteration  `counter == 3` is not executed..If you want to stop the _for loop_ completely, you can use `break`:

In [5]:
for (counter in 1:5) {
  if (counter == 3) break
  print(counter)
}

[1] 1
[1] 2


All interations after `counter == 2` are not executed.

Similar code can be produced with `while` and `repeat`. 

`while` allows you to run code until a condition is fullfilled:

In [6]:
counter <- 1

while (counter <= 5) {
  print(counter)
  counter <- counter + 1
}

[1] 1
[1] 2
[1] 3
[1] 4
[1] 5


and:

In [7]:
repeat {
  number <- sample(1:10, 1)
  print(number)
    if (number > 8) break
}

[1] 7
[1] 6
[1] 5
[1] 6
[1] 8
[1] 5
[1] 9


(print random numbers until number> 8). 

Most people prefer `for` loops, therefore, you rarely see `while` and `repeat`.

> **Assignment**:
Construct a `for` loop that runs from 1 to 100 and test for each number if it can be divided by 2,3,5. Count the numbers that are **not** divisible by 2,3,5 and print these numbers.
> _Hint: use the function `round` for the division test in combination with the `if` statement and the `next` statement_



In [None]:
counter <- 0

for 

print(counter)

The function `cat` is used, because `print` returns the values as a list, while `cat` returns a row.

# Functions

R is build on function. By using commands from packages, you use -often without knowing- functions. 
Functions are used to automate the boring stuff...

Our first function:

In [9]:
squared <- function(x) {
  x * x
}

The funtion `squared` returns the squared values of an input vector:

In [10]:
squared(c(1,3))

A function may have more arguments:

In [11]:
squared_plus <- function(x, y) {
  x * x + y
}

squared_plus(3, 5)

This functions uses two input variables and returns one output.



> **Assignment**: Create a variable with 2 'inputs' which pastes the two inputs with the function `paste`  and run an example with your function that pastes "house number" to the number 110

In [None]:
Paste <- 

Paste("hoursnumber", 110)  # paste two varibles

Paste("letter", LETTERS)  # the function also works with vectors

# Dataframe's

For the next function, we first need to create a dataset. We will use information that resembles the risk adjustment system in the Netherlands.

We want to create a dataframe with 10.000 people. Therefore, we first create the variable `number_of_persons' and assign the number 10000 this variable.

Then we will create a dataframe with the name `dt` noemen.And then we will create the variables 

* _ID_: runs from 1 to 10000
* _Group_: number from 1 to 40
* _FKG_: FKG score (drugs)
* _DKG_: DKG score (diagnose)

Group is a random number from 1 to 40. We will set `replace = TRUE`, otherwise we cannot assign the group numbers to 10.000 people. For FKG and DKG we will do the same.

`FKG == 0` means that a person did not use drugs, while `FKG==1` means that a person used drugs. The same holds for DKG's.


In [13]:
number_of_persons <- 10000

dt <- data.frame(
  ID = 1:number_of_persons,
  Group = sample(1:40, number_of_persons, replace = TRUE),
  FKG = sample(0:1, number_of_persons, replace = TRUE),
  DKG = sample(0:1, number_of_persons, replace = TRUE))

head(dt) 

ID,Group,FKG,DKG
1,40,0,1
2,8,1,1
3,31,0,0
4,12,0,0
5,10,0,1
6,26,1,1


## What is the meaning of the variable _Group_ in this dataframe?

We now have a dataframe with 10000 persons. Each person has an _ID_ (number from 1 to 10000).

WE know each person the group he belongs to (number from 1 to 40)
Each group is a combination of age and gender according to the following rules:

* Group 1 to 20 are male
* Group 21 to 40 are females
* Group 1 is the age category from 0 to 4 jaar, group 2 from 5 to 9 jaar, etc.
* The same holds for females, starting from group 21


## The conditional statement ifelse


We created a datafile similar to a part of the Dutch Riskadjustment Scheme. The column group does not give much insight. Therefore, we would like to add a new column "gender". And we would like to make a column that indicates if a certain person is younger or older than 65. 
Males are in group 1-20, females in 21-40.

We need the following functions
* `%>%`
* `mutate`
* `ifelse` 

The pipe operator `%>%` and the function `mutate` will be part of the datacamp `dplyr` course.

`%>%` is the pipe-operator. You can read the pipeoperator as "then". First we assign  `dt1` to `dt` and then...

`mutate` is the `dplyr` command to create a new variable (column).

`mutate(Gender= ...)` creates a new variable with the name "Gender". The `=` indicates how this variable is defined.

With `ifelse` we can create the condition male or female:

In [14]:
dt1 <- dt %>%
  mutate(Gender = ifelse(Group <= 20, "male", "female")) %>%
  mutate(Elderly = ifelse(
    (Group >= 14 & Group <= 20) | (Group >= 34 & Group <= 40), "65+", "65-"))

"package 'bindrcpp' was built under R version 3.3.3"

The first part of the code reades as follows:

1.  Create a dataframe `dt1`, use dataframe `dt`, then
2.  create a new column `Gender`. If the group number (i.e. 1 - 40) is smaller or equal to twenty, give the value "male", else return value "female", then

We add a new variable "Elderly". This is slightly more complex, because a male is an elderly if he is located in groups 14-20. Females belong to the elderly if they are in groups 34-40.

 `|` is the operator 'or'

So the code continues:

3.  create a new variable elderly. If the group number is larger or equal than 14 and is smaller than 20 OR the group number is larger than 34 and smaller than 40 return the value "65+" and else return value "65-"

With `str(dt1)` we can inspect the structure of the data.


In [15]:
str(dt1)

'data.frame':	10000 obs. of  6 variables:
 $ ID     : int  1 2 3 4 5 6 7 8 9 10 ...
 $ Group  : int  40 8 31 12 10 26 25 4 29 28 ...
 $ FKG    : int  0 1 0 0 0 1 0 1 1 1 ...
 $ DKG    : int  1 1 0 0 1 1 0 1 1 0 ...
 $ Gender : chr  "female" "male" "female" "male" ...
 $ Elderly: chr  "65+" "65-" "65-" "65-" ...


With `summary(dt1)` we can have a summary of the data

In [16]:
summary(dt1)

       ID            Group            FKG              DKG        
 Min.   :    1   Min.   : 1.00   Min.   :0.0000   Min.   :0.0000  
 1st Qu.: 2501   1st Qu.:11.00   1st Qu.:0.0000   1st Qu.:0.0000  
 Median : 5000   Median :21.00   Median :1.0000   Median :0.0000  
 Mean   : 5000   Mean   :20.64   Mean   :0.5061   Mean   :0.4854  
 3rd Qu.: 7500   3rd Qu.:31.00   3rd Qu.:1.0000   3rd Qu.:1.0000  
 Max.   :10000   Max.   :40.00   Max.   :1.0000   Max.   :1.0000  
    Gender            Elderly         
 Length:10000       Length:10000      
 Class :character   Class :character  
 Mode  :character   Mode  :character  
                                      
                                      
                                      

## To use a function

It is possible to classify people in age groups of 5 year with`ifelse()`. This could look something like this:


In [17]:
dt1a <- dt1 %>%
  mutate(age_class = ifelse(Group== 1 | Group==21, "00-04",
                            ifelse(Group== 2 | Group==22, "05-09", 
                             ifelse(Group==3 | Group ==23, "10-14",
                                   "old"))))
head(dt1a)

ID,Group,FKG,DKG,Gender,Elderly,age_class
1,40,0,1,female,65+,old
2,8,1,1,male,65-,old
3,31,0,0,female,65-,old
4,12,0,0,male,65-,old
5,10,0,1,male,65-,old
6,26,1,1,female,65-,old



This is very time consuming. If you need a code multiple times, you could use a function.



> **Assignment**: Create a function `age_category` which translates a a group into a age_category (i.e. age_category is a function of group)
Example: We would like to change the groupnumber 3 into a category "10-14".
Therefore, we need to change values into a text. We can do this with the existing function `paste0()`
(see the help files or [here](https://www.r-bloggers.com/difference-between-paste-and-paste0/))





In [None]:
age_category <-function(group) 
{
  age_start <- group * 5 -5             # start age range
  age_end <- group * 5 - 1               # end age range
  
  age_label <- paste0(age_start, "-", age_end)             # text label
}

1. Define parameter `age_start`. This should be equal to the group number multiplied with 5 minus 5.
3. The same holds for `age_end` , but then with minus 1  in stead of minus 5 
4. With `paste0` we can paste `age_start` and `age_end` with a minus between them. For group 1 we should receive a label '0-4' and group 14 should be translated in '65-69'. 

a.  Now you can print the age_category 3 and
b.  A vector of the age category 1:20

In [19]:
# a single group
print(age_category(3))

# a vector of groups
print(age_category(1:20))

[1] "10-14"
 [1] "0-4"   "5-9"   "10-14" "15-19" "20-24" "25-29" "30-34" "35-39" "40-44"
[10] "45-49" "50-54" "55-59" "60-64" "65-69" "70-74" "75-79" "80-84" "85-89"
[19] "90-94" "95-99"


Now we are ready to apply our function to our dataframe and create a new column:

In [20]:
dt2 <- dt1 %>%
  mutate(Age = ifelse(Group <= 20, age_category(Group), age_category(Group - 20)))

head(dt2)

ID,Group,FKG,DKG,Gender,Elderly,Age
1,40,0,1,female,65+,95-99
2,8,1,1,male,65-,35-39
3,31,0,0,female,65-,50-54
4,12,0,0,male,65-,55-59
5,10,0,1,male,65-,45-49
6,26,1,1,female,65-,25-29


The code reads:

1. Create a new dataframe `dt2`, which is equal to `dt1`, then....
2. Create a new column `Age` with `mutate`. For males (Group is smaller or equal to 20) we can use  use our function `age_category(Group)` directly. For females we can use the age category for the groupnumber minus 20. 

For the graded assignment 2 we need dt2. Therefore, we store the dataframe as csv in sourcedata.


In [None]:
write.csv2(dt2, "../Sourcedata/graded_assignment_2.csv", row.names = FALSE)

End notebook