Dates

Writing Custom Functions

Control FLow

# Nullable Data Types
There are only a few nullable data types we have to consider in R:
- NULL

Don't worry about this one too much.  It's just a special keyword for "there is no data here".  Not that it's necessarily missing but rather "this variable name isn't tied to any data (yet)".

- NA

This stands for "Not Available".  This will be our stand-in value for missing data, and you may encounter this quite frequently in real medical data.

- Inf and -Inf

It is usually the result of an explicit mathematical error on your part or a numerical instability that implicitly leads to the mathematical error.  Don't worry too much about what the implicit part means.  Focus on the explicit and don't divide by 0.

- NaN 


This stands for "Not a Number".  This is usually the result of a mathematical error or numerical instability like with Inf.

In [None]:
0/0
1/0
-1/0
sum(3, 2, NA, 5)

Recall that "na.rm = FALSE" is the default for this named argument.  If we set the argument to TRUE, the function will remove all NA and NaN values before computing the sum:

In [None]:
sum(3, 2, NA, 5, na.rm = TRUE)

# Basic Character/String Operations
We will cover 4 types of string operations
- capitalizations
- substring selection and replacement
- string search
- concatenations

But first, a handy function!

In [2]:
nchar("I am a looooooooooooooooonnnnnnngggggggg boi")

### Capitalizations

In [3]:
toupper("im not screaming. ur screaming")
tolower("WOAH CHILL OUT DUDE")

These are often useful if you get a dataset where someone's annoyingly decided to capitalize everything.  Or if you get a variable with mixed capitalization, and you need to make things uniform.  For example,

In [4]:
sex1 <- 'female'
sex2 <- 'Female'
sex1 == sex2

but

In [5]:
tolower(sex1) == tolower(sex2)

### Substring Selection & Replacment
Substring selection is fairly simple - just specify the start and stop indices of the substring you want to extract.

In [None]:
substr("I never really understood this candlejack meme or whatever. It must be from like 2007 or something, idk.", 1, 53)

Substring replacement is done the exact same way.  Except now, the substring goes on the *left* side of the assignment operator.  R will know only to replace the substring you specify with whatever string is on the *right* side of the assignment operator.

In [None]:
x <- "yoooo dudeee - that was dummyyyyy sickkkkkkk"
substr(x, 1, 12) <- "good day sir"
substr(x, 25, nchar(x)) <- 'adept and proficient'
x

However, a large limitation of doing string replacement that way is that whatever's on the right side has to the exact same length (you can test with the "nchar" function) as the substring on the left.  You can get around this by instead using the "gsub" function.  This instead performs a **search** and replaces any matches with a string of any length that you provide it.

In [None]:
gsub("good day sir", "my guy", x)

### String Search
The "gsub" function provides one way to do string search, but then automatically does a string replacement after.  What if we only want to do the search and just get a logical returned to us if we have a match?  Then we use the "grepl" function.

In [None]:
bob.quote <- "A lot of people are turned off by the phrase ‘flat earth’ ... but there’s no way u can see all the evidence and not know... grow up... No matter how high in elevation you are... the horizon is always eye level ... sorry cadets... I didn’t wanna believe it either...Have u been to the edge ? or is that what your science book told you?"
bob.quote

In [None]:
smart.rapper <- !grepl("flat earth", bob.quote)
smart.rapper

Note that "grepl" can take regular expressions as its search argument.  If you don't know what this means, just ignore it.  But effectively, regular expressions are a common syntax over all programming languages that allow for extremely general and/or flexible search/matching conditions.  They're often times very useful for parsing or cleaning clinical notes or other medical free text.

### Sequences & Repetitions
A sequence vector is a set of ordered numbers with pre-defined spacing in betwen them.  The default spacing is 1. 

In [None]:
seq(3, 11)

R provides some syntactical sugar for this, since it is so common.

In [None]:
3:11

However, we can also specify the spacing manually.  Alternatively, we can specify how many elements we want the vector to be and have R automatically infer the correct spacing.

In [None]:
seq(3, 11, by = 2)
seq(3, 11, length.out = 5)

We can also repeat a single datum or a vector of data multiple times to form a larger vector.

In [None]:
rep(2022, 10)
rep(c("low", "medium", "high"), 5)

### Sorting

Sorting is one of the most common operations in all of programming.  The list of indices that would sort a vector in ascending order is given by

In [None]:
x <- c(1.2, 0.1, 0.7)
order(x)

We can then use this vector to index "x".   Since

In [None]:
length(order(x)) == length(x)

, we know that the length of the output will simply be the length of our original "x".  However, now the order of the elements will be rearranged:

In [None]:
x[order(x)] # x[c(2, 3, 1)]

Another example of why typed data is so important is that it allows us to correctly sort Dates.  Suppose we had the following character vectors.

In [None]:
x <- c('10/03/2011', '11/05/2005', '03/28/2007')
x[order(x)]

They were sorted in alphanumeric order!  But we wanted chronological order.  Luckily, if x's elements were of type Date, R would know to sort them just that way.

In [None]:
x <- as.Date(x, format = "%m/%d/%Y")
x[order(x)]

### Concatenation &  Insertion

It is extremely easy to concatenate two vectors together or extend them.  Just wrap them together using the "c" syntax.

In [None]:
x <- 1:3
y <- 4:7
c(x, y)

Since vectors can only be basic data types, R knows that you are not trying to create a length-2 vector with x as element 1 and y as element 2.  Instead, it flattens x and y out into one single vector.

You can even append elements directly rather than having to join two vectors.

In [None]:
c(0.5, x, 3.5)

Pop quiz? What do you think would be the output of the "class" function on the vector output above? "x" by itself refers to an integer vector.

If you'd like to insert a new element (or vector) in the middle of another vector, use the append function.

In [None]:
append(x, c(2.25, 2.5, 2.75), after = 2)

### Vector Initialization
We showed how you could initialize a vector directly with data, such as

In [None]:
x <- c(1, 4, 9, 16, 25)
x

However, sometimes you'd like to *preallocate* space for the data you're going to need and fill in the data later.  We can initialize our vectors with dummy values and then replace them once we are ready to fill in the appropriate data.  Vectors need to be given 2 arguments: their data type and their length.

In [None]:
x <- vector(mode = "logical", length = 6)
x

As syntactical sugar, R gives us some extra functions for vector initialization where you only need to provide the length:

In [None]:
character(3)
numeric(3)
integer(3)
logical(3)

Get/Set syntactical sugar:

is really just short form for some sort of **get** function that dives into the vector you give it, extracts the 3rd element, and finally returns it to you.  You can imagine in functional notation it might look something like this: get(death.dates, index = 3).  The takeaway here is that indexing is really just a type of function call.

R is quite nice and provides several way of specifying which elements we want to extract.

### Data Replacement
Once you know how to specify which elements from a vector you want to hone in on, you can then modify those elements however you please.  Say the data was recorded incorectly for Vicky.  We can fix this as follows

In [1]:
death.dates <- c(Wanda = 2022L, Cosmo = 2015L, Timmy = 2017L, 
                 Vicky = 2020L, "Mr. Crocker" = 2019L)
death.dates['Vicky'] <- 2017L
death.dates['Vicky']

We could similarly modify multiple values at a time.  

In [2]:
death.dates[c("Timmy", "Mr.Crocker")] <- c(2016L, 2014L)
death.dates[c("Timmy", "Mr.Crocker")]

Lists can be concatenated using the same syntax as with vectors.

In [None]:
adi.extra <- list(fav.sports = c("swimming", "soccer", "squash"), age = 28)
c(adi, adi.extra)

If we wanted to make nested lists, we would use the following syntax:

In [None]:
list(adi, adi.extra)

Lists can also be preallocated using the same synatx as vectors.  However, note that the syntax is a little deceptive, since a list is fundamentally a different data structure than a vector.  

In [None]:
x <- vector(mode = "list", length = 5)
x 