# Control flow Part 1
---

Welcome back, Today we will talk control flow in R. So, what is control flow and why we need it?

Usually, R will execute our code sequentially which means from the top line to the bottom line. But sometimes, we want to run some codes if particular conditions are met or want to do the same operation for each element in a vector,  matrix or other data structure.  In this case, we need use control flow. 

Today, we will discuss `for` loop, and some people may call it iteration.

Here is the syntax.

```R
for (value in sequenc) { 
  statement1
  statement2
  ...
}

```
***Tips***: If you use R studio or other IDE for code, you probably do not need to worry about the `space`, but if not, please remember that a space should add behind `for`, and a space between `)`  and `{`.  Two spaces should be added before all statements in curly braces.

Let's see some examples.

In [1]:
for (i in 1:5) {
  print("Hello!")
}

[1] "Hello!"
[1] "Hello!"
[1] "Hello!"
[1] "Hello!"
[1] "Hello!"


By using `for` we print "Hello!" five times.  You might think this loop is boring. Yes, you are right, it is boring.

We could use for loop do something fresh. We need to build a data frame first.

In [2]:
# we need to set seed first to make sure our result is same every time.
set.seed(8888)
a <- rnorm(5)
b <- rnorm(5)
c <- rnorm(5)
d <- rnorm(5)
df <- data.frame(a, b, c, d)

In [3]:
df

a,b,c,d
0.19575208,-0.7085771,-0.51737215,1.0634396
0.97601196,-0.9257923,-0.39114343,0.3614407
-0.03688087,1.4530496,-0.63996059,-0.9772762
-0.02436063,-0.156897,0.09916379,0.8798711
0.02211168,-0.4370462,0.08867925,0.2600235


We created a data frame with five rows and four columns. Each column comes from standard normal distribution. If you do not understand the normal distribution, do not worry; we will cover it when we talk about statistics. 

Now, if you want to know the mean for each column,  what should you do? We get how select column in a data frame, so you could do like this way.

In [4]:
# Calculate mean for each column
mean(df$a)

In [5]:
mean(df$b)

Or, maybe you know there is a function named `colMeans()`  could help us to solve this problem.

In [6]:
colMeans(df)

***Tips***: Try following function by yourself. `colMeans()`,  `colSum()`,  `rowMeans()` and, `rowsum()`.

Calculating mean value is easy, how about median, stand deviation or Kurtosis?

You still could calculate them column by column, but when your data frame with 100 columns, this method will become inconvenient. In this case, for loop will help us to do it.


In [7]:
for (i in 1:length(df)) {
  print(median(df[[i]]))
}

[1] 0.02211168
[1] -0.4370462
[1] -0.3911434
[1] 0.3614407


See, this for loop help us calculate the mean value for each column automatically.  So, you can image, how easy to calculate median for a data frame with 100 or 100 columns.

There is a trick in this `for` loop. Try following code on your own.

In [8]:
# Check the warning message. 
# Could you figure the difference amoung df[[1]], df$a and, df[1]?
for (i in 1: length(df)) {
  print(mean(df[i]))
}

“argument is not numeric or logical: returning NA”

[1] NA


“argument is not numeric or logical: returning NA”

[1] NA


“argument is not numeric or logical: returning NA”

[1] NA


“argument is not numeric or logical: returning NA”

[1] NA


OK,  this is the tutorial about `for` loop. Next time we will talk about the `while` loop.

See you guys next time!