<img src="./intro_images/MIE.PNG" alt="notebook banner image" width="100%" align="left" />

<table style="float:right;">
    <tr>
        <td>                      
            <div style="text-align: right"><a href="https://alandavies.netlify.com" target="_blank">Dr Alan Davies</a></div>
            <div style="text-align: right">Senior Lecturer Health Data Science</div>
            <div style="text-align: right">University of Manchester</div>
         </td>
         <td>
             <img src="./intro_images/alan.PNG" alt="Alan Davies photo" width="30%" />
         </td>
     </tr>
</table>

# 6.0 Iteration
****

#### About this Notebook
This notebook introduces the concept iteration - using loops to repeat blocks of code for a certain number of times or until a certain condition is met. This is often used with data structures to apply operations/computation to data held within them.

<div class="alert alert-block alert-warning"><b>Learning Objectives:</b> 
<br/> At the end of this notebook you will be able to:
    
- Investigate key loops and methods of iteration available in R

- Explore how these can be used to iterate over data structures

</div> 

<a id="top"></a>

<b>Table of contents</b><br>

6.1 [For loops](#for)

6.2 [While loops](#while)

6.3 [Apply functions](#apply)

R has many ways of applying computations to its data structures without having to use <code>loops</code> to access individual elements of a data structure and apply changes. We will see some of these later. We will cover loops here for the benefit of knowing this concept as it's key to many different programming languages.

Now we have seen data structures such as lists and dictionaries it makes sense to look at iteration as these concepts are often used together to exploit the power of such data structures. Iteration is a way of saying doing something over and over again or in a <code>loop</code>. This is useful when we want to do things like a repeat an operation a number of times or traverse through data structures like lists or vectors. Let's look at an example of this. Let's say we have a list of medical procedures that we offer in our hospital cardiac catheterization lab:

In [1]:
cathlab_procedures <- c('angiogram', 
                      'pacemaker insertion', 
                      'electrophysiological studies', 
                      'transoesophageal echocardiogram',
                      'Percutaneous Coronary Intervention',
                      'AICD insertion',
                      'reveal monitor insertion')

We can list them one by one as before using the elements index:

In [4]:
cathlab_procedures[6]

Doing this item at a time would be extremely time consuming. Also what if we offered 250 procedures? Instead we can use a <code>loop</code> to go through each item and print the result.

<a id="for"></a>
#### 6.1 For loops

In [5]:
print("The procedures we offer are:")

for(procedure in cathlab_procedures)
{
    print(procedure)
}

[1] "The procedures we offer are:"
[1] "angiogram"
[1] "pacemaker insertion"
[1] "electrophysiological studies"
[1] "transoesophageal echocardiogram"
[1] "Percutaneous Coronary Intervention"
[1] "AICD insertion"
[1] "reveal monitor insertion"


Here we use the <code>for</code> keyword to create a loop for every procedure in our vector. All the code within the opening and closing braces <code>{}</code> is contained within the loop. We can use loops to repeat items. Here is an example of saying hello 5 times.

In [7]:
for(i in 1:5)
{
    print("Hello")
}

[1] "Hello"
[1] "Hello"
[1] "Hello"
[1] "Hello"
[1] "Hello"


Each time the loop repeats it automatically adds to the loop counter variable (in this case called <code>i</code>). This is called <code>incrementation</code> (the opposite is <code>decrementing</code>). We can see this in action if we print the value if <code>i</code> in the loop:

In [10]:
for(i in 1:5)
{
    cat("i =", i, "\n")
}

i = 1 
i = 2 
i = 3 
i = 4 
i = 5 


<div class="alert alert-block alert-info">
<b>Task 1:</b>
<br> 
Write a loop that prints your name 10 times.
</div>

In [11]:
for(i in 1:10)
{
    print("Your name")
}

[1] "Your name"
[1] "Your name"
[1] "Your name"
[1] "Your name"
[1] "Your name"
[1] "Your name"
[1] "Your name"
[1] "Your name"
[1] "Your name"
[1] "Your name"


We have 7 items in our list of procedures above so we could also write:

In [12]:
for(i in 1:7)
{
    print(cathlab_procedures[i])
}

[1] "angiogram"
[1] "pacemaker insertion"
[1] "electrophysiological studies"
[1] "transoesophageal echocardiogram"
[1] "Percutaneous Coronary Intervention"
[1] "AICD insertion"
[1] "reveal monitor insertion"


<div class="alert alert-block alert-info">
<b>Task 2:</b>
<br> 
1. Can you think of a reason why putting the number 7 in the range might not be best practice? <br />
2. How might we resolve this?
</div>

**1.** This would not be good practice because we could add or remove items from our procedures which would cause an error if we try to access an element in the list that doesn't exist. Alternatively if we added extra items (above 7) we wouldn't see them output as the loop would stop at 7.

**2.** We could use the <code>length()</code> function to work out the exact length of the list so we only loop over items that actually exist. Alternatively we could use the <code>for ... in</code> method shown above.

Let's add another item to our procedures and output the values again:

In [13]:
cathlab_procedures <- append(cathlab_procedures, "cox maze")

In [14]:
for(i in 1:7)
{
    print(cathlab_procedures[i])
}

[1] "angiogram"
[1] "pacemaker insertion"
[1] "electrophysiological studies"
[1] "transoesophageal echocardiogram"
[1] "Percutaneous Coronary Intervention"
[1] "AICD insertion"
[1] "reveal monitor insertion"


Here we are missing the last item. So it is good practice to do this as we did in the first instance or by using the <code>length()</code> function.

In [15]:
for(i in 1:length(cathlab_procedures))
{
    print(cathlab_procedures[i])
}

[1] "angiogram"
[1] "pacemaker insertion"
[1] "electrophysiological studies"
[1] "transoesophageal echocardiogram"
[1] "Percutaneous Coronary Intervention"
[1] "AICD insertion"
[1] "reveal monitor insertion"
[1] "cox maze"


A short hand way of ensuring that you only output values that exist is:

In [16]:
for(procedure in cathlab_procedures)
{
    print(procedure)
}

[1] "angiogram"
[1] "pacemaker insertion"
[1] "electrophysiological studies"
[1] "transoesophageal echocardiogram"
[1] "Percutaneous Coronary Intervention"
[1] "AICD insertion"
[1] "reveal monitor insertion"
[1] "cox maze"


<div class="alert alert-success">
<b>Note:</b> Using the <code>for ... in</code> is best practice in R. The option above of looping with a range is how loops work in most other programming languages. These tend to have an initialisation statement, a condition that must be met and an incrementation statement for example in the C language:<br> <code>for(i=0; i&lt;10; i++){ }</code>.  
</div>

Using loops allows us to perform operations on entire vectors (and other data structures). Let's say we had some working hours and we needed to reduce everyone's working hours by a single hour. We can use iteration to loop over the list and carry out this operation on each element:

In [18]:
hours_worked <- c(8.5, 9, 12, 6, 6.5, 8.5, 12, 12, 9)
for(i in 1:length(hours_worked))
{
    hours_worked[i] <- hours_worked[i] - 1
}   
print(hours_worked)

[1]  7.5  8.0 11.0  5.0  5.5  7.5 11.0 11.0  8.0


In fact, this kind of thing is much easier to accomplish in R than many other programming languages as it has been optimized for vectorisation. For example we could do the same thing as above far more simply. 

In [19]:
hours_worked <- c(8.5, 9, 12, 6, 6.5, 8.5, 12, 12, 9)
print(hours_worked-1)

[1]  7.5  8.0 11.0  5.0  5.5  7.5 11.0 11.0  8.0


Here R applies the <code>-1</code> to every element of the vector automatically negating the need to use a loop which would be required in most other languages.

It is also possible to <code>nest</code> one loop inside another like so:

In [24]:
for(x in 1:3)
{
    for(y in 1:3)
    {
        cat("x =", x, ", y =", y, "\n")
    }
}

x = 1 , y = 1 
x = 1 , y = 2 
x = 1 , y = 3 
x = 2 , y = 1 
x = 2 , y = 2 
x = 2 , y = 3 
x = 3 , y = 1 
x = 3 , y = 2 
x = 3 , y = 3 


If there is a single item in a loop we can omit the braces and produce the same output e.g.

In [25]:
for(x in 1:3)
    for(y in 1:3)
        cat("x =", x, ", y =", y, "\n")

x = 1 , y = 1 
x = 1 , y = 2 
x = 1 , y = 3 
x = 2 , y = 1 
x = 2 , y = 2 
x = 2 , y = 3 
x = 3 , y = 1 
x = 3 , y = 2 
x = 3 , y = 3 


<div class="alert alert-block alert-info">
<b>Task 3:</b>
<br> 
    Using the <code>readline()</code> function for the maximum number of stars (<code>*</code>) and 2 nested (a loop inside a loop) loops. Output the following pattern: <br />
*<br />
* *<br />
* * *<br />
* * * *<br />
The above pattern was produced with an input of <code>5</code>. Hint: you will need use the <code>cat</code> function to output the stars and to add a newline using <code>cat("\n")</code> after each line of stars.
</div>

In [20]:
num_stars <- as.integer(readline(prompt="Enter number of stars:"))
for(i in 1:num_stars){
    for(j in 1:i)
    {
        cat("* ")
    }
    cat("\n")
}

Enter number of stars:6
* 
* * 
* * * 
* * * * 
* * * * * 
* * * * * * 


<a id="while"></a>
#### 6.2 While loops

Sometimes we don't want to loop through something or repeat something a set number of times (or we might not know how many times). Instead we sometimes want to keep looping until a certain condition is met. This is where we can use a different type of loop that works with the logical operators we saw earlier. Let's say we want to read in password from a user. To read input we can use the <code>readline()</code> function like so:

In [26]:
readline(prompt="Type something: ")

Type something: hi


We might want to keep prompting a user for a login until they enter the correct login details. In this case when you run the code below it will keep prompting you to enter your username until you type in <code>letmein</code>. Give it a go. Try entering something else first before the required login.

In [27]:
login <- ""
while(login != "letmein")
{
    login = readline(prompt="Enter username: ")
}

Enter username: hi
Enter username: letmein


Here we are saying keep repeating everything between the braces (<code>{}</code>) <code>while</code> (or as long as) the variable <code>login</code> does not equal (<code>!=</code>) the string <code>letmein</code>.

<div class="alert alert-danger">
<b>Note:</b> You should use caution with <code>while</code> (and other) loops because you can inadvertently trap your code in an infinite loop if the exit condition is not met. This will cause your program to lock up and freeze.   
</div>

<a id="apply"></a>
#### 6.3 Apply functions

There are a bunch of built in <code>apply</code> functions in R that can apply functions to various data structures such as matrices, lists, dataframes and so on. These include <code>apply()</code>, <code>lapply()</code>, <code>sapply()</code>, <code>vapply()</code>, <code>mapply()</code>, <code>rapply()</code>, and <code>tapply()</code>.

Let's look at a couple of example. First let's create a matrix with some values.

In [28]:
M = matrix(c(2,4,2,1,6,2), nrow=2, ncol=3, byrow=TRUE)

In [30]:
print(M)

     [,1] [,2] [,3]
[1,]    2    4    2
[2,]    1    6    2


If we wanted to sum the columns of the matrix we could use loops but this would be overly complex as the apply functions will take care of this kind of thing for us. The first parameter is the matrix <code>M</code>, the second is the <code>MARGIN</code> where a value of <code>1</code> is for rows and <code>2</code> is for columns. The final parameter is the function that we want to apply. In this case we use the built in <code>sum</code> function to sum by column. 

In [31]:
apply(M, 2, sum)

<div class="alert alert-success">
<b>Note:</b> For both rows and columns we can pass in <code>MARGIN=c(1,2)</code>.
</div>

<div class="alert alert-block alert-info">
<b>Task 4:</b>
<br> 
    Using <code>apply()</code> find the average value (<code>mean</code>) for each of the rows in <code>M</code>.
</div>

In [32]:
apply(M, 1, mean)

The <code>lapply</code> function can be applied to lists. In the example below we use to apply the <code>toupper</code> function that makes text upper case to all the items in the vector.

In [33]:
conditions <- c("Diabetes", "Heart failure", "Angina", "Stroke")
print(lapply(conditions, toupper))

[[1]]
[1] "DIABETES"

[[2]]
[1] "HEART FAILURE"

[[3]]
[1] "ANGINA"

[[4]]
[1] "STROKE"



Whenever you want to do something to an entire data structure in R consider using the apply functions as this provides a quick and easy way of applying a function to each element of a structure without having to use complex looping.

In the next notebook we will take a deeper look at functions. You have already been using a range of inbuilt functions for various purposes like <code>print()</code> to output messages, <code>length()</code> for the length of vectors and <code>subset()</code> to retrieve a subset of data from a data frame. We will look at how we can produce our own functions to break up tasks into smaller manageable and reusable chunks of code.

### Notebook details
<br>
<i>Notebook created by <strong>Dr. Alan Davies</strong>.
<br>
&copy; Alan Davies 2022

## Notes: