# Lab week 8 - Outcomes


In this lab you should read through and run the code in the lab sheet and complete the lab assessment. By the end of this lab you should be able to use R to:


* work with iterations
* understand Boolean values and basic logical comparisons
* work with conditionals
* understand and calculate conditional probabilities





This week, we will improve our programming skills in R. We will start with "iterations". At the end of last week's lab sheet you have already encountered an iteration when repeating sampling from a specific distribution. You may remember the following piece of code from last week:

In [None]:
n = 1000
m = 100
sample_means = c()

for (i in 1:m) {
    sample_means[i] = sum(runif(n,0,5))/n
}

As you can see, in R we can use the `for()` command to repeat a task multiple times; this is also referred to as a **'for loop'**. Let's start with an easy example of a for loop to get a better understanding of its structure.

In [None]:
for (i in 1:10) {
    print(i)
}

The first line states the limits of the for loop. So 1 to 10 in the exapmle above. The second line is called "the body", which is executed once for each item of the for loop. So here we execute `print(i)`10 times for changing values of i. Note that i increases by 1 between executions of the body. 

# Exercise 1.1: For Loop Iteration
The next block of code will show you two possibilities of how a for loop can be used to calculate large sums (here: $\sum_{i=1}^{100}\, i$) or approximate series. A series is an infinite sum (for example: $\sum_{i=1}^{\infty}\,i$). Try to understand the two methods in the code cells below. Can you see how they differ? Then use this technique to calculate the sums/ approximate the series in your assessment.

In [None]:
results <- c()                    # creates an empty vector
for (i in 1:100){
    results[i] <- i               # i is stored at i-th position in the vector
    }
sum(results)                      # all vector entries are being summed up
    

In [None]:
result <- 0                      # sets initial value to 0
for (i in 1:100){
    result <- result + i        # the initial value is replaced by (itself + i)
    }
result                          # the value of "results" is printed

In [None]:
# Use this block for your own iterations to solve the assignment questions


## Exercise 1.2

Do you remember how we determined the warmest year (within the years 2000-2009) based on average (over months) temperature? 

In [None]:
weather <- read.csv("week8-perth_weather.csv")
head(weather)

Use the template below to program a for loop which stores the average (over all 12 months) temperature for each year in a vector called 'vec'.

In [None]:
number.years <- 2009-1950+1 
vec <- rep(0,number.years)                                  # creates a vector of the right size with zeros only.
                                                            
for(i in 1:number.years){
    jan <- 12*i-11
    dec <- ...1...
    vec[...2...] <- mean(weather$average.temp[jan:dec])           # to be completed by you
}

vec

mean(weather$average.temp[1:12])                            # the first entry in 'vec' should be equal to this number.

# Exercise 2: Booleans

Another important feature in R is called "conditional". In order to properly understand how conditionals work, we firstly need to get a greater understanding of "Booleans".  Boolean values can either be `TRUE` or `FALSE`. We get Boolean values when using comparison operators, amongst which are `<` (less than), `>` (greater than), and `==` (equal to).\
You have already seen Booleans a few times throughout past labs (remember that the `==` was helpful for separating the Maths from the Physics students), but let's take a closer look still.

Run the cells below to see examples of comparison operators in action.

In [None]:
5 < 4-6
5 < 4+6
5 == 4-6
5 == 3+2
5 != 3+2
3 < 4 
(3:10)==5

Note the `!=` notation in the last line of the previous code block which opposes the `==` and reads as "is not equal to". We can also assign the result of a comparison operation to a variable.

In [None]:
bool <- 20 -15 == 15/3
bool

Vectors are also compatible with comparison operators. The output is a vector of Boolean values.

In [None]:
c(1, 5, 7, 8, 3, -1) > 3

Also remember from the lecture, that in R the `&&` symbol represents the "AND condition" and that both sides of `&&`  have to be true for an AND-condition to be true.

In [None]:
5 < 7 && 4 > 3
5 < 7 && 4 > 5

From the code below, can you guess what condition is represented by the `||` symbol? 

In [None]:
5 < 7 || 4 > 3
5 < 7 || 4 > 5
5 < 3 || 2 > 3


If we compare vectors of Booleans, `&&` and `||` will only compare the first element of both vectors. Instead we have to use `&` and `|` if we want to compare the vectors element by element. For example:

In [None]:
x <- c(1,2,3,4)
y <- c(0,2,4,-1)

x<y
x>y

x<y || x>y

x<y | x>y

Take some time to work through the corresponding assignment question before you proceed to the next section.

In [None]:
# Use this block to solve your assignment questions.

# Conditionals

Let us now get back to the earlier mentioned "conditional". A conditional statement is made up of multiple lines that allow R to choose from different alternatives based on whether some condition is 'TRUE' or 'FALSE'. It always begins with an "if header", which is a single line consisting of "if" and a condition in parentheses. This line is then followed by the "body".
The body is only executed if the condition of the "if-header" evaluates to 'TRUE'. If the condition evaluates to 'FALSE', then the body of that "if header" is skipped. See the following code for clarification.


In [None]:
if (3 < 5){
    print ("3 is smaller than 5")
} else{
    print ("3 is not smaller than 5")
}


if (7 < 4){
    print ("7 is smaller than 4")
} else{
    print ("7 is not smaller than 4")
}

# Functions

Conditional statements often appear within the body of a function. 

We have already seen many examples of functions, like `summary()`, `print()` and `mean()` to name a few. These are in-built functions which we can readily access. However, we can also build our own functions. In general, a function performs a task based on its arguments. The `mean()` function for instance calculates and returns the mean of its argument which has to be a vector of numerical values.
If we would like to create our own function we need to use this basic syntax:

```
function_name <- function(arg_1, arg_2, ...) {
   function body 
}
```

As you can see a function is created by assigning the keyword 'function' followed by a set of arguments to the variable 'function_name'. If we want to call the function later, we simply type 

```
function_name(arg_1, arg_2,...)
```

The code `mean(c(2,5,4))` is an example of that where 'mean' is the function name and 'c(2,5,4)' is its first (and only) argument. 

The following lines of code present a function which accepts a vector as input and calculates the sum of the squared elements of that vector. 

In [None]:
sum.of.squares <- function(x){
    return(sum(x^2))
}
   
sum.of.squares(c(1,2,3))

Getting back to conditionals which often appear within the body of a function to express alternative behavior depending on the argument of the function: Study the following example for clarification. Also notice the `else if` command which adds another condition to our code. The "body" of that "else if" condition is only executed if the original condition (here x < y) is 'FALSE'. The final `else` is only reached if all previous conditions evaluate to 'FALSE'.

In [None]:
fn.1 = function(x,y) {
    if( x < y) {
        cat(x,"is smaller than", y, "\n")
    } else if (x > y) {
        cat(y,"is smaller than", x, "\n")   
    } else {
        cat(x,"is equal to", y, "\n")
    }
}

fn.1(3,5)
fn.1(4,2)
fn.1(6,6)


The general format of a multi-clause conditional statement looks like the one above.
There is always exactly one "if header", but there can be any number of "else if" clauses. R will evaluate the
"if condition" first. Then, in case the previous expression was 'FALSE' the next "else if condition" is evaluated and so on until one is found that is 'TRUE'. Then R executes the corresponding "body".

The "else clause" is optional and often serves as a collection of all remaining cases which weren't part of any "if" or "else if condition" yet. Hence, when an "else" clause is provided, its "body" is executed
only if none of the header expressions of the previous clauses are 'TRUE'. The "else" clause is always the last clause or doesn't appear at all. A code template looks like this:

```
if (condition 1){
    if-body 
} else if (condition 2) {
    else-if-body 
} else if (condition 3) {
    else-if-body 
}  ...
  else {
    else-body
}
```

# Exercise 3

Imagine betting on a monkey throwing a dart (a dartboard consist of 20 equally sized panels labeled 1 to 20. For the purpose of this exercise we ignore the fact that the centre of the board consists of two additional panels). The rules of the game are:

1) if the monkey hits the panel '1', you lose 1 dollar\
2) if the monkey hits an even number nothing happens\
3) if the monkey hits an odd number between 3 and 9 (including both), you win 2 dollars\
4) if the monkey hits an odd number between 11 and 17 (including both), you win 4 dollars\
5) if the monkey hits 19, you win 10 dollars


Your task now is to write a function that returns your winnings. Hint: In R we can test if an integer $a$ is divisible by $b$ via `a%%b`, which will state the remainder. See examples below.

In [None]:
24 %% 2
47 %% 5

Now insert the missing code pieces in the code block below to create a function called "monkey.bet", which returns your winnings for one throw of a dart. To avoid confusion: **Your assignment questions will only require you to type in the current blanks.**

In [None]:
# THIS CODE BLOCK WILL HAVE TO BE MANIPULATED BY YOU TO WORK!
# To do so, replace all sections which include "...j..." with the missing code for j=1,2,3,4.

# Complete all if-, else if- and else clauses.


monkey.bet = function(x) {
# Returns payout if the dart lands on panel x.
    if (...1...) {
        return (-1)
    }
    else if (...2...) {
        return (0)
    }
    else if (...3...) {
        return (2)
    }
    else if (...4...) {
        return (4)  
    }
    else  {
        return (10)   
    }
}

Use the code below to check your "monkey.bet" function.

In [None]:
results <- c()
for (i in 1:20) {
    results[i] <- monkey.bet(i)
}
results

* What happens when you call your function with an argument of "21" instead?
* Why does this happen?

To assure that the function only accepts arguments which are elements of the sample space add the command line below into your monkey.bet function as the first "if-header". 

In [None]:
if (x>20 | x<1) {stop ("number is not a dart panel")}

If you now call the function with "21" again, you should receive an error message.

# Exercise 4: 
We can use our knowledge about conditionals in order to find how many summands are needed for a series to get above a certain threshold. For example let us take the series $\sum_{i=1}^{n} i^2$ and the number of summands needed to exceed the threshold of 1000. Since the sum starts at $i = 1$ finding the number of summands is equal to finding the value of $n$, for which this series is greater than $1000$ for the first time. (Note that this changes once the sum does not start at $i = 1$.)
To find out study the following lines of code. Notice that the `break` command exits the for loop.

In [None]:
result <- 0                     # sets initial value to 0
counter <- 0                    # sets the summand counter to 0

for (i in 1:100){
    result <- result + i^2      # the initial value is replaced by (itself + i^2)
    counter <- counter + 1      # the summand counter increases by 1.
    if (result > 1000) {        # conditional
        print(counter)          # prints the number of summands that were needed to get past the threshold
        break                   # exists the for loop.
    }
 }


As you can see by the output, 14 summands are needed (i.e. $1^2 + 2^2 + 3^2 +...+ 14^2$) until the sum is greater than 1000. Now use the knowledge you have gathered throughout this lab sheet to manipulate the code above such that it will help you answer your assignment questions. 

In [None]:
# Use this cell to manipulate the code above for your own calculations regarding your assignment questions.


# Conditional Probability

First, revisit the Monty Hall Problem from the lecture, for which it was statistically smarter to change doors upon being presented with one empty door. Revealing that one of the doors is empty changes our initial knowledge of the situation and hence poses a condition.


Conditional probabilities are often needed for medical testing. To illustrate this we will now try to calculate conditional probabilities for a very current matter.\
The state of Victoria has a total population of roughly 8.166 Million people. So far there have been 17,173 confirmed cases of CoVid19. Hence, the probability for a citizen of New South Wales to be or have been infected with CoVid19 is $P(C) = 17,173/8,166,000 \approx 0.002$, where $C$ stands for the event of a citizen having the virus. \
Let's now assume that we have a fictitious medical test that correctly identifies people with CoVid19 in 95% of all times, i.e. the test is positive if the patient has CoVid19. We will denote $+$ as the event of a positive test result. Then $P(+|C)=0.95$. The test furthermore correctly identifies people without CoVid19 in 90% of all times, i.e. the test is negative if the patient doesn't have CoVid19, therefore $P(-|\bar{C})= 0.9$, where $-$ represents the event of a negative test result.\
Note: $P(A|B$) states the conditional probability of $A$ given $B$ and $\bar{A}$ states the complement of $A$.

The following tree diagram illustrates the situation.



                         / \
                        /   \
          P(C)= 0.002  /     \  ?
                      /       \
                  CoVid      no CoVid
                   / \         / \
     P(+|C)=0.95  /   \ ?   ? /   \ P(-| no Covid)=0.9
                 /     \     /     \
                +       -   +       -

# Exercise 5

First of all, be reminded that the probabilities above are entirely fictitious.

Now use the multiplication and addition rule from lecture as well as the conditional probability rule stated below to calculate the following probabilities:

* $P(\bar{C})$
* $P(-)$
* $P(C|+)$
* $P(C|-)$

Note: For the situation above, conditional probabilities can be calculated as follows: $P(A|B)= \frac{P(B|A)\,P(A)}{P(B)} =\frac{P(B|A)\,P(A)}{P(B \cap A) +P(B \cap \bar{A})} = \frac{P(B|A)\,P(A)}{P(B|A)\,P(A) + P(B|\bar{A})\,P(\bar{A})} $

Please finalise this week's assignment now. Remember to round all results to 3 decimal places. Following zero's can of course as always be omitted. For example, state 2.5 as 2.5 and not as 2.500. DO NOT round intermediate results, but only the final answer.

Good luck!