PUT INTO 1.2.!!!!

A **conditional statement** is written to execute one block of code if the statement is true and a different block of code if the statement is false. 

A conditional statement requires a boolean or true/false statement that will be either TRUE or FALSE. A couple of the more commonly used functions used to create conditional statements include...
 - `if(){}` or an if statement means "execute R code when the condition is met".
 - `if(){} else{}` or an if/else statement means "execute R code when condition 1 is met, if not excute R code for condition 2".

There are six comparison operators that are used to created these boolean values. 
- `==` means "equals".
- `!=` means "not equal".
- `<` means "less than".
- `>` means "greater than".
- `<=` means "less than or equal to".
- `>=` mean "greater than or equal to".

There are also three logical operators that are used to create these boolean values.
- `&` means "and".
- `|` means "or".
- `!` means "not".

We'll be using a dataset that has subject demographic data that contains information on each subject's race. Often times statistical tests are run across racial groups, however often times we don't have enough subjects in non-White racial groups to run these tests. Therefore, race will be dichotomized into White and non-White subjects to increase statistical power.CHANGE!!!

1. How many White and non-White subjects are there?

In [5]:
# put in 2.1
# Let's start by reviewing a conditional statement: Is the subject's race White?
subject_data$Race == 'W'

We can see R outputed a boolean vector (ie. a vector containing TRUE/FALSE values), where TRUE values correspond to subjects that are White and FALSE values correspond to subjects who are non-White.

In [9]:
# Number of White subjects
sum(subject_data$Race == 'W')

In [10]:
# Number of non-White subjects
length(subject_data$Race) - sum(subject_data$Race == 'W')

words.

# Improving Coding Efficiency

This training module was developed by Alexis Payton, Dr. Kyle R. Roell, and Dr. Julia E. Rager

Spring 2023

## Introduction to Training Module

Coding efficiency involves performing a task in as few lines as possible and can...
- eliminate redundancies
- reduce the number of typos
- help other coders understand script 

In this module, we'll explore how functions and loops are often used to make code more succint. As a brief overview, a **function** contains a block of code organized together to perform one specific task, while a **loop** is employed when we want to perform a repetitive task.

Let's start with loops first. There are three main types of loops in R: `for`, `while`, and `repeat`. However, we're only going to discuss the `for` loop in this module. For more information on the others and loops in general, click [here](https://intro2r.com/loops.html).

A `for` loop is used when we want to specifiy the number of times we'd like R to repeat a task. We'll load in our data and then explore how a for loop works. 


### Installing required R packages
If you already have these packages installed, you can skip this step, or you can run the below code which checks installation status for you

In [1]:
if (!requireNamespace("readxl"))
  install.packages("readxl");

Loading required namespace: readxl



### Loading required R packages

In [2]:
library(readxl)

### Set your working directory

In [None]:
setwd("/filepath to where your input files are")

### Importing example dataset
This example dataset contains subject demographic data, including serum cotinine concentrations to confirm a subject's tobacco product use group (ie. non-smoker, e-cigarette user, or cigarette smoker). Let's upload and view these data:

In [25]:
# Load the data
subject_data <- data.frame(read_excel("Input/ModuleData1.xlsx"))

# Creating a smaller dataframe for easier viewing
smaller_subject_data <- subject_data[1:10,]

# View the top of the dataset
head(subject_data) 

Unnamed: 0_level_0,SubjectNo,Group,SubjectID,Race,Ethnicity,Sex,Age,BMI,Serum_Cotinine
Unnamed: 0_level_1,<dbl>,<chr>,<chr>,<chr>,<chr>,<chr>,<dbl>,<dbl>,<dbl>
1,1,NS,NS_1,W,NH,F,28,37.67,0
2,2,NS,NS_2,O,H,F,33,35.0,0
3,3,NS,NS_3,W,NH,F,25,18.7,0
4,4,NS,NS_4,W,NH,F,26,23.0,0
5,5,NS,NS_5,As,NH,F,25,24.7,0
6,6,NS,NS_6,AA,NH,F,42,34.6,0


## Training Module's Environmental Health Questions 
This training module was specifically developed to answer the following environmental health questions or tasks:
1. Convert the variable serum cotinine concentrations from pg/mL to milliliters (g/mL).
2. Dichomotize subjects into 2 groups with a normal BMI (less than 25) and an overweight BMI (at least 25).
3. Question combining both or for homework?

In [None]:
# Function

In [11]:
# Basic structure of a for loop
for (i in 1:4){
    print(i)
}

[1] 1
[1] 2
[1] 3
[1] 4


In the above code block, we always start with `for` followed by a statement in parentheses. In the parentheses, we are telling R to iterate or repeat the code in curly brackets 4 times. Each time we told R to print the value of our iterator or `i`, which had a value of 1,2,3, and then 4.

We can also have loops iterate through columns in a dataset. Let's use a `for` loop to print the ages of each subject.

In [26]:
# Finding the total number of rows or subjects in the dataset
number_of_rows = length(smaller_subject_data$Age)

# Creating a for loop to iterate from 1 to the last row
for (i in 1:number_of_rows){
    # Printing each subject age
    # Need to put `[i]` to index the correct value
    print(smaller_subject_data$Age[i])
}

[1] 28
[1] 33
[1] 25
[1] 26
[1] 25
[1] 42
[1] 33
[1] 21
[1] 25
[1] 34


Now that we've reviewed the structure of a `for` loop, we can use it to dichotomize BMI with the help of an if/else statement.

In [27]:
for (i in 1:number_of_rows){
    
    # if BMI is < 25 
    if (smaller_subject_data$BMI[i] < 25){
        # then classify the subject as having a normal BMI
        smaller_subject_data$Dichotomized_BMI[i] = "Normal"
        
    # if the BMI is > 25
    } else {
        # then classify the subject as having an overweight BMI
        smaller_subject_data$Dichotomized_BMI[i] = "Overweight"
    }
}

# Viewing data
smaller_subject_data

Unnamed: 0_level_0,SubjectNo,Group,SubjectID,Race,Ethnicity,Sex,Age,BMI,Serum_Cotinine,Dichotomized_BMI
Unnamed: 0_level_1,<dbl>,<chr>,<chr>,<chr>,<chr>,<chr>,<dbl>,<dbl>,<dbl>,<chr>
1,1,NS,NS_1,W,NH,F,28,37.67,0,Overweight
2,2,NS,NS_2,O,H,F,33,35.0,0,Overweight
3,3,NS,NS_3,W,NH,F,25,18.7,0,Normal
4,4,NS,NS_4,W,NH,F,26,23.0,0,Normal
5,5,NS,NS_5,As,NH,F,25,24.7,0,Normal
6,6,NS,NS_6,AA,NH,F,42,34.6,0,Overweight
7,7,NS,NS_7,AA,NH,F,33,33.5,0,Overweight
8,8,NS,NS_8,W,H,M,21,24.2,0,Normal
9,9,NS,NS_9,W,NH,M,25,23.4,0,Normal
10,10,NS,NS_10,W,NH,M,34,19.8,0,Normal


## Additional Resources
- https://intro2r.com/prog_r.html