### Iterating (Looping) in R

In R, when we have had to do *iteratively* do something to a data table (e.g., `mutate()`, or `summarize()`), we've taken advantage of how R is set up to this very efficiently ('vectorizing'). 

However, there are other ways in which you can iterate or 'loop' through a table or a list: `for` loops.

In this prelab, we'll introduce you to this main type of loop. In general, loops do pretty much exactly what you'd expect based on the name -- they let you "loop" over a piece of code over and over until a certain condition has been met, or we run out of things to compute on.

Much of the below is inspired and summarized from [Hadley Wickham and Garrett Grolemund's: *R for Data Science*](https://r4ds.had.co.nz/iteration.html), which gives a great view of the basics and more details if you are interested in (beyond what we have time to go through here). Let's start by executing the following commands below.

In [None]:
library(tidyverse)

In [None]:
df <- tibble(
    x = runif(5),
    y = runif(5),
    z = runif(5)
)

Let's say you wanted to calculate the mean of x, y, and z. You *could* do:

In [None]:
mean(df$x)
mean(df$y)
mean(df$z)

But that seems suboptimal because the code is duplicated here. This is problematic because as the size of the table gets big, ultimately becomes infeasible to write down explicitly -- imagine what you'd have to do if df had 1000s of entries! In general, you want to avoid "copy and pasting" code *twice* or more.

Instead, you could make a loop and do the following:

In [None]:
output <- vector("double", ncol(df))  # 1. output
for (i in seq_along(df)) {            # 2. sequence
  output[[i]] <- mean(df[[i]])        # 3. body
}
output

Every for loop has three components:

**The output**: `output <- vector("double", length(x))`. Before you start the loop, you must always allocate sufficient space for the output. This is very important for efficiency: if you grow the for loop at each iteration using c() (for example), your for loop will be very slow.

A general way of creating an empty vector of given length is the vector() function. It has two arguments: the type of the vector (“logical”, “integer”, “double”, “character”, etc) and the length of the vector.

**The sequence**: `i in seq_along(df)`. This determines what to loop over: each run of the for loop will assign `i` to a different value from `seq_along(df)`. It’s useful to think of i as a pronoun, like “it”.

You might not have seen `seq_along()` before. It’s a safe version of the familiar `1:length(l)`, with an important difference: if you have a zero-length vector, `seq_along()` does the right thing:

    y <- vector("double", 0)
    seq_along(y)
    #> integer(0)
    1:length(y)
    #> [1] 1 0

You probably won’t create a zero-length vector deliberately, but it’s easy to create them accidentally. If you use `1:length(x)` instead of `seq_along(x)`, you’re likely to get a confusing error message.

**The body**: `output[[i]] <- median(df[[i]])`. This is the code that does the work. It’s run repeatedly, each time with a different value for `i`. The first iteration will run `output[[1]] <- median(df[[1]])`, the second will run `output[[2]] <- median(df[[2]])`, and so on.

Let's look at another example:

In [None]:
seq <- c(1,2,3,4,5)
for (i in seq_along(seq)) {
    print(paste("The number is:", seq[i]))
}

Here, again you have each of the components: The *sequence* you want to iterate through, the *body* of the code, and the *output* of the code. 

Let's look at another example.

In [None]:
total <- 0
seq_too <- c(1,1,2,3,5)
for (i in seq_along(seq_too)) {
    total <- total+seq_too[i]
}
print(total)

You can see here that the goal of this bit of code is to sum (accumulate) the total. Of course, you could have gotten the same thing by `sum(seq_too)`, so always important to remember that in R if you can use vectorized functions rather than loops, you should, because it's faster).

Let's look at yet another example:

In [None]:
total <- 0
for (i in seq_along(seq)) {
    for (j in seq_along(seq_too)) {
        total <- total+seq[i]*seq_too[j]
    }
}
print(total)

You can see here that we've *nested* a second loop within the first, and we are now accumulating the sum of the products of each element in `seq` and `seq_too`.

You can also use loops to *modify* content. Let's say we wanted to adjust `seq_too` by `seq_one`. You could do that via:

In [None]:
seq_too <- c(1,1,2,3,5)
mean <- mean(seq_too)
for (i in seq_along(seq_too)) {
   seq_too[i] <- seq_too[i] - mean
}
seq_too

Take a look at the following code below. Is there something wrong? if so, what?

In [None]:
seq_foo <- c(1,2,3,4)
seq_too <- c(1,1,2,3,5)
for (i in seq_along(seq_too)) {
    print(paste("My dog has", seq_foo[i], "fleas!"))
}

Your Answer Below: