In the previous files, we wrote code that ran sequentially, one line after the other. This is the "default" mode in which code is executed in R and other programming languages.

![image.png](attachment:image.png)

As a data scientist, however, we won't always want our code to be run sequentially. We'll want to perform a calculation only if a condition is met, or we'll want to repeatedly apply a function to elements of an object.

In this file, We'll use [control structures](https://en.wikipedia.org/wiki/Control_flow), methods for directing the order in which operations are performed, to manipulate and analyze [FiveThirtyEight's data](https://www.kaggle.com/fivethirtyeight/world-cup) on the [2014 FIFA World Cup](https://en.wikipedia.org/wiki/2014_FIFA_World_Cup).

The data set provides information about when the matches were scheduled, the countries that played in each match, the goals scored, and the teams that won.

Have you ever heard of ["home-field advantage"](https://en.wikipedia.org/wiki/Home_advantage), a common excuse when a friend's favorite sports team loses when playing an away match? In this file, we'll explore the 2014 World Cup data to see if home teams had an advantage over away teams.

First, let's start by importing the data into R using the `read_csv()` function from the `readr` package. We can store the imported data as a **data frame**, a data structure consisting of vectors of equal lengths, by assigning it to a variable name like below:

`new_data_frame <- read_csv("filename.csv")`

When you use the `read_csv()` function, R will return a message letting you know the data types it assigned to each column. This is not an error message, so don't be concerned when we see it!

Next, let's import the 2014 World Cup data set into R as a data frame.

`library(readr)
scores <- read_csv("scores.csv")`

We imported data on the 2014 FIFA World Cup into a data frame, the most common data structure we'll use for analyzing data in R. 

Our goal is to add a new variable to the data frame that provides information about whether the home team won each match. To first get some practice, let's work with just the first two rows of the `scores` dataframe and follow the steps below:

1. Analyze the data to determine whether the home team won each match.
2. Create a vector containing either "home team won" or "home team did not win" for each row based on the analysis.
3. Add this vector to the data frame as a new column.

First, let's assign the first two rows of `scores` to a dataframe named `scores_two`:

`scores_two <- scores[1:2,]`

![image.png](attachment:image.png)

In the first row, the value in the `home_goals` column equals 3 and the value in `away_goals` column equals 1, so we can determine that the home team won. Likewise, in the second row, `the home_goals` value equals 1 and the `away_goals` value equals 0, so the home team also won this match. Let's create a new vector containing two "home team won" values indicating that the home team won both matches in the scores_two dataframe:

`home_team_won <- c("home team won", "home team won")`

Finally, let's add the `home_team_won` vector to the `scores_two` dataframe as a new column. Recall that we can use the `mutate()` function from the dplyr package to add the new column as follows:

`scores_two <- scores_two %>% mutate(home_team_won = home_team_won)`

Next, we'll perform the same exercise with the first five rows of the scores dataframe below:

`scores_five <- scores[1:5,]
home_team_won <- c("home team won","home team won","home team did not win","home team won","home team won")
scores_five <- scores_five %>% mutate(home_team_won = home_team_won)`

In the above exercise, we visually compared the home team and away team goals, created a new vector containing information on whether the home team won each match, and added the vector to the data frame. However, not only is this approach time-consuming, but it's also prone to mistakes.

Instead, we can automate the comparison of home team and away team goals by including a type of selection control structure called a **conditional statement** in our code. That way, our code will return "home team won" only if the home team has won the match.

![image.png](attachment:image.png)

In this case, the **condition** will be the home team scoring more goals than the away team. A condition is a statement that resolves to a logical value, either `TRUE` or `FALSE`. We can express conditions using conditional operators. To express the condition in this case, we would write `home_goals > away_goals`.

The **action** will be printing "home team won". In this file, we'll use the `print()` function to display output generated by your code. This will allow us to make sure our code is performing the operations we want it to. If we type the following code:

`print("this is some text")`

The following will appear:

`"this is some text"`

To return "home team won" if `home_goals > away_goals`, we can use a type of conditional statement called an **if statement**. An if statement is used to write code to perform an operation only if the specified condition is `TRUE`.

Let's write an if statement to return "home team won" if `home_goals > away_goals` during the first match.

Remember, we can index a data frame to specify the column (`$`) and the element of the column (`[]`).

`if (scores`$`home_goals[1] > scores`$`away_goals[1]) {
    print("home team won")
    }`

`"home team won"`

Since the number of goals scored in the first match by Brazil, the home team, is larger than the number scored by Croatia, the away team, the code returns `"home team won"`.

What about when the home team loses, though? In the example above, if the away team scored more goals than the home team, nothing would be returned.

We can incorporate into our code an alternative operation to be performed if the condition, `match_1`$`home_goals > match_1`$`away_goals`, is not met.

![image.png](attachment:image.png)

To return "home team did not win" if the home team did not score more goals than the away team, we can add a type of conditional statement called an **else statement** to our code. Conditional statements that include both if and else statements are referred to as **if-else statements**.

`if (scores`$`home_goals[1] > scores`$`away_goals[1]) {
  print("home team won")
} else {
  print("home team did not win")
}`

`"home team won"`

Notice the way we indented the two print statements above, but not the other lines of code. With control structures, it is [considered good style](https://google.github.io/styleguide/Rguide.html) and helps keep code readable to write blocks of code that will be executed together with the same indentation level. R executes all lines of code between curly braces together, but visually matching up each opening brace with its closing brace can be tricky. Indents, on the other hand, do not affect how the code is executed but are easy to keep track of visually. For that reason, indenting all code that appears between braces as a block helps keep everything understandable.

Let's write an if-else statement to return a result telling us whether the home team won or lost the third match.

**Task**

* Write an if-else statement to return:
 1. "home team won" if the number of goals scored by the home team is greater than the number of goals scored by the away team in the third match
 2. "home team did not win" if the condition specified above is not met
 
**Answer**

`if (scores`$`home_goals[3] > scores`$`away_goals[3]) {
    print("home team won")}   else {
    print("home team did not win")
}`

We've now successfully used selection control structures in our code.

Let's return to our initial goal of adding a new variable to the World Cup data set that provides information about whether the home team won or lost each match. To return "home team won" or "home team did not win" for each match, we could write an if-else expression for each match:

![image.png](attachment:image.png)

However, this would be horribly inefficient. Generally, in programming, it's best to minimize repetition in our code as much as possible. If we find ourself copying and pasting blocks of code several times, it's time to seek a better solution.

To address the problem of inefficiently copying and pasting blocks of code, let's learn about a type of control structure for repetition: [For-loops](https://en.wikipedia.org/wiki/For_loop).

For-loops perform an operation a given number of times, enabling us to execute a piece of code repeatedly on elements in a sequence.

![image.png](attachment:image.png)

First, let's look at a few examples to understand how for-loops work and practice writing them. Then, we'll learn to use a for-loop to create a new variable containing information about the home team's performance in each match.

As a first example, let's write a for-loop to print every number in a sequence of numbers from one to 10.

`for (i in 1:10) {
  print(i)
}`

The index variable `i` represents an element of a sequence. We can read the code above as, "For every element in the sequence of the numbers one to 10, print the element".

![image.png](attachment:image.png)

We can use any variable name we want instead of `i` for the index, and should consider using a name that describes what the variable actually represents to make complex code more readable.

The output is:

![image.png](attachment:image.png)

Next, let's write a for-loop to print the date of each match in in the World Cup `scores` data frame:

`for (d in scores`$`match_date) {
  print(d)
}`

We can read this statement as, "for each element in the `match_date` column of the `scores` data frame, print the element".

Notice we used the variable name `d` instead of `i` to remind ourselves it's a date. We could have used the word `date` as the variable, but be careful about using common words as variable names; if R already has a variable or function with that name, it can lead to problems later on. For example, assigning the maximum of a vector to the variable `max` is a bad idea, since if we use try to the function `max()` later on, our code won't behave properly.

The first few rows of the output are:

![image.png](attachment:image.png)

Let's practice writing a for-loop to print elements of the scores data frame.

**Task**

* Write a for-loop to print each element in the column home_country of the scores data frame.

**Answer**

`for (i in scores`$`home_country) {
    print(i)
}`   

When we write a for-loop, the elements we specify can be values, vectors, lists, or other data structures. Since we are working with a data frame (scores), let's write a for-loop to execute an operation on elements that are rows of the data frame.

The for-loop will calculate the total number of goals (`away_goals + home_goals`) for each match.

In the `scores` data frame, each match has its own row. Since we want to perform the addition operation for each row of the data frame, the first part of the for-loop will consist of defining `i` as an element of the sequence of numbers from one to the number of rows in the data frame. Here's how we'll specify the sequence:

`for (i in 1:nrow(scores))`

In the code above, `nrow(scores)` returns the number of rows in the `scores` data frame. Since `scores` has 59 rows, we can read it as, "for each element in the sequence one to 59".

We could acheive the same result by writing for `(i in 1:59)`. However, in programming, it's good practice to refer to data objects instead of including numbers in our code. This ensures that our code makes sense to us and others when we look at it later on, and that the code will be useful in the future if changes to data structures are made.

To write the rest of the for-loop, you'll index home_goals and away_goals by i to add the two values together for each row:

`for (i in 1:nrow(scores)) { 
  print(scores`$`home_goals[i] + scores`$`away_goals[i]) 
}`

![image.png](attachment:image.png)

Using the `print()` function to display the results is a useful tool as we learn to write for-loops so we can make sure the loop is performing the way we want it to.

The first few lines of output are:

![image.png](attachment:image.png)

**Task**

Write a for-loop to subtract away_goals from home_goals for each match in scores. In our for-loop, include code to print the results of our calculations.

**Answer**

`for (i in 1:nrow(scores)) { 
  print(scores`$`home_goals[i] - scores`$`away_goals[i]) 
}`

Control structures are powerful tools, and can do even more when combined together. For example, the actions repeated by a for-loop don't need to be sequential code — they could be conditional statements. This pattern of executing one or more control structures inside another one is called **nesting**.

Originally, We wanted to create a new variable that provides information about whether the home team won or lost each match.

We've already written an expression, using an if-else statement, that produces different output depending on whether the home team won or lost a specific match:

![image.png](attachment:image.png)

Now, we'll perform that operation for each match in the `scores` data frame, which we already know how to do: Use a for-loop! To do this, we'll use an index variable that ranges from one to the number of rows in the data frame, as we did earlier. Then, we'll apply the if-else statement to each element of that sequence.

The syntax is similar to what we've seen so far: we'll place the if-else statement between the braces (`{}`) in the for-loop:

![image.png](attachment:image.png)

The first few lines of the output are:

![image.png](attachment:image.png)

Nested control structures are very powerful, but can be tricky to get used to and can make finding errors in your code harder. The workflow in the previous examples is a good way to avoid problems when writing nested control structures: Start by writing the inner operation and make sure it works correctly, and then wrap the outer operation around it.

**Task**

Write a for-loop that, for each row in the scores data frame, prints `TRUE` if home_goals is greater than away_goals and `FALSE` if not.

**Answer**

`for (i in 1:nrow(scores)) {
    if (scores`$`home_goals[i] > scores`$`away_goals[i]) {
        print(TRUE)
    } else {
        print(FALSE)
    }
}`
