In [3]:
source("setup.R")

------------

> ### Learning Objectives
>
> * Define the following terms as they relate to R: object, assign, call,
>   function, arguments, options.
> * Assign values to objects in R.
> * Learn how to _name_ objects
> * Use comments to inform script.
> * Solve simple arithmetic operations in R.
> * Call functions and use arguments to change their default options.
> * Inspect the content of vectors and manipulate their content.
> * Subset and extract values from vectors.
> * Analyze vectors with missing data.

------------

## Creating objects in R

You can get output from R simply by typing math in the console:
add two more math expression of your own. 

In [4]:
3 + 5
12 / 7

However, to do useful and interesting things, we need to assign _values_ to
_objects_. To create an object, we need to give it a name followed by the
assignment operator `<-`, and the value we want to give it:

* add your own variable called media_ml and assign it a value

In [10]:
FBS_ml <- 1000
FBS_ml

 `<-` is the assignment operator. It assigns values on the right to objects on
the left. So, after executing `x <- 3`, the value of `x` is `3`. The arrow can
be read as 3 **goes into** `x`.  

 In RStudio, typing <kbd>Alt</kbd> + <kbd>-</kbd> (push <kbd>Alt</kbd> at the
same time as the <kbd>-</kbd> key) will write ` <- ` in a single keystroke in a PC, while typing <kbd>Option</kbd> + <kbd>-</kbd> (push <kbd>Option</kbd> at the
same time as the <kbd>-</kbd> key) does the same in a Mac.

 Objects can be given any name such as `x`, `current_temperature`, or
`subject_id`. You want your object names to be explicit and not too long. They
cannot start with a number (`2x` is not valid, but `x2` is). R is case sensitive
(e.g., `weight_kg` is different from `Weight_kg`). There are some names that
cannot be used because they are the names of fundamental functions in R (e.g.,
`if`, `else`, `for` etc.

There several styles of naming multiple word variables - 

* camelCase
* PascalCase
* snake_case

**pick one and be consistent**

It's important to be consistent in the styling of your
code (where you put spaces, how you name objects, etc.). Using a consistent
coding style makes your code clearer to read for your future self and your
collaborators. 

style guide [tidyverse's](http://style.tidyverse.org/). The tidyverse's is very
comprehensive and may seem overwhelming at first. You can install the
[**`lintr`**](https://github.com/jimhester/lintr) package to automatically check
for issues in the styling of your code.

> ### Objects vs. variables
>
> What are known as `objects` in `R` are known as `variables` in many other
> programming languages. Depending on the context, `object` and `variable` can
> have drastically different meanings. However, in this lesson, the two words
> are used synonymously. For more information see:
> https://cran.r-project.org/doc/manuals/r-release/R-lang.html#Objects


Now that R has `weight_kg` in memory, we can do arithmetic with it. For
instance, we may want to convert this weight into pounds (weight in pounds is 2.2 times the weight in kg):


In [12]:
.001 * FBS_ml

We can also change an objects value by assigning it a new one. 

In [15]:
FBS_ml <- 2000
.001 * FBS_ml

This means that assigning a value to one object does not change the values of
other objects  For example, let's store the animal's weight in pounds in a new
object, `weight_lb`:

In [19]:
FBS_L <- .001 * FBS_ml

FBS_L

and then change `FBS_ml` to 100.

In [7]:
FBS_ml <- 100

What do you think is the current content of the object `FBS_L`? 100 or 2?

once you make a guess, display the value below to check your answer.

### Comments

The comment character in R is `#`, anything to the right of a `#` in a script
will be ignored by R. It is essential to annotate your scripts, providing notes and explainations.

I also use comments to break up big chunks of my code. When writing functions its very usefull to comment on what the function is used for. 

commenting out peices of code that you don't want to be run at that timeis also useful if you just want to try something out. though be sure to go back and lean it up, you don't want lots of messy commented-out code. 

you want to make sure that someone other than you can understand what you did and why you did it. 

its also useful for reminders to go back and add something at that particular section of the code. 

* keep comments short
* make them comcise
* comment often but don't get crazy 




In [30]:
### Challenge ##

##What are the values after each statement in the following?

mean <- 55            # mass?
SD  <- 12             # age?
mean <- mean + 5      # mass?
SD  <- SD +1       # age?
CV <- SD/mean  # mass_index?



In [90]:
## use this space to check your guesses ## 
mean
SD
CV

[1] 60


### Functions and their arguments

* Functions are "canned scripts" that automate more complicated sets of commands

* Many functions are predefined, or can be made available by importing R *packages* (more on that later). 

* A function usually takes one or more inputs called *arguments*. 

* Functions often (but not always) return a *value*. 

the template of a functio with argument could look like this 

variable  <-  generic_function(arg_name_1 = argument1, arg_name2 = argument2, arg_name3 = argument3)

**has anyone used the "/" in slack?** 


### how many arguments does `sqrt()`  take? how would we find out if we don't know? 



In [32]:
sqrt(mean)

In [None]:
## lets assign this to a new variable ## 

mean_root <- 

* Here, the value of `mean` is given to the `sqrt()` function, the `sqrt()` function calculates the square root, and returns the value which is then assigned to the object `mean_root`. 

* The return 'value' of a function can be almost anyhing, number, text, data frame, image. 

* Arguments can be anything, not only numbers or filenames, but also other objects. Exactly what each argument means differs per function, and must be looked up in the documentation. 

* Some functions take arguments which may either be specified by the user, or, if left out, take on a *default* value: these are called *options*. Options are typically used to alter the way the function operates, such as whether it ignores 'bad values', or what symbol to use in a plot.  However, if you want something specific, you can specify a value of your choice which will be used instead of the default.

* Let's try a function that can take multiple arguments: `round()`.


In [40]:
## what arguments does round take? hint:we can use args(round) or ?round ##

args(round)

Its best practice to explicitly name each argument in a function. `round` takes two arguments `x` and `digits` 

In [None]:
# execute round, the x argument should be mean_root and make digits eual to 2. assign that to `round_root`



### let's make our own function! 

let's make a function to calculate the standard error of the mean. We'll call it `SEM`. To make a fucntion we start by using the key word `function` and then specifying arguments. `SEM <- function(sample){ }` 

In [92]:
SEM <- function(sample){
   stder <- sd(sample)/sqrt(length(sample))
    return(stder)
    
}

confluence  <- c(60,50,80,100,100,60,90)

SEM(sample = confluence)

[1]  60  50  80 100 100  60  90


in the above function there is an example of a nested function. You can exceute a function wthin a function within a function etc. 


# vectors, data types, and some things that you can do to them 

A vector is the most common and basic data type in R. A vector is composed by a series of values, which can be either numbers or characters. the most common basic data types you'll use are: 
* "Doubles" aka numeric aka numbers including decimals
* "Integers" aka numeric aka only whole numbers 
* "Strings" aka text
* "logical" aka boolean aka true/false 

We can assign a series of values to a vector using the c() function. what we did in our previous example is create an integer vector called 'confluence'. 

In [49]:
#lets do that again here: this is a vector of integers - whole numbers 

confluence  <- c(60,50,80,100,100,60,90)
masses  <- c(10.2, 6.78, 11.54675634657, 9, 8.01)

# vectors can also be strings

cell_lines <- c("HeLa", "MCF-7", "MCF-7", "HeLa", "CHO", "HEK293", "MCF-7")

# vectors can also be logical (TRUE/FALSE)

BC_lines  <- c(FALSE,  TRUE, TRUE, FALSE, FALSE, FALSE, TRUE)


## Subsetting vectors 

say that now we only want to extract a subet of information from our vectors. for example, say that I only want the MCF-7 cell lines. There are several ways to do this: 



In [57]:
# We can explicitly extract the position of each element. we can do this with brackets
cell_lines[c(2,3,7)]   # this applies a numeric vector to vector "cell_lines"


# We can also apply a logical vector to the original vector "cell_lines" 
cell_lines[c(FALSE, TRUE, TRUE, FALSE, FALSE, FALSE, TRUE)] # does this look familliar?

#whats another way we can write this? 

cell_lines[BC_lines]


# We can generate logical vectors using operators and conditional subsetting

confluence > 80
confluence == 100 # be carful '==' means 'equal to' but '=' is an assignement operator much like '<-' 
masses >= 9 
cell_lines == "MCF-7"

# the above expressions generate the logical vectors 
# we can then take these to apply the to their original vectors to get the values themselves. 

confluence[confluence > 80]
confluence[confluence == 100]
masses[masses >= 9] 
cell_lines[cell_lines == "MCF-7"]




# Missing data 

As R was designed to analyze datasets, it includes the concept of missing data. Missing data are represented in vectors as NA.

If your datasets include NA's or missing values, most functions will return NA. In order to avoid this we usually have to remove the NA values. some fuctions include and option ignore the missing values often as `na.rm = TRUE`

Let's say we are trying to count the muber of colonies on multi-well plates but not all of them have colonies. 

In [67]:
num_colonies  <- c(22, NA, 57, NA, NA, 103, 73, NA, 23, 36, NA, NA)

mean(num_colonies)
max(num_colonies)

#try adding ', na.rm = TRUE'
mean(num_colonies)
max(num_colonies)

is.na(num_colonies) # is.na returns a logical vector as to if the value is NA. 
                         # applies the logical vector to extract the TRUE values
!is.na(num_colonies)  # the `!` is the negation operator which means 'not'
                         # applies the logical vector to extract the TRUE values

na.omit(num_colonies) # Returns the object with incomplete cases removed. this applies to rows 
num_colonies[complete.cases(num_colonies)]  # Extract those elements which are complete cases. this applies to rows 

# The final Trial 

let use what we have learned today to make a function to remove missing data from a numerical vector and the return the values from that vector that are above a certain number that we define then find the median of that subeset of numbers. Once you create the function create a variable that will be the input argument for the threshold number. 

outline your algorithm in plain english below - this is an essential part of writing a good algorithm. 

In [89]:
# use this vector as input
num_cells  <- c(22, NA, 57, NA, NA, 103, 73, NA, 23, 36, NA, 62, 13, NA, 78, NA, NA, 114, 110, 64, 21, 23, NA, 54)


thresh_med  <- function (vector, thresh){     # 1. name an initialize your function, name your (2) arguments
    cleaned  <-  vector[!is.na(vector)]       # 2. remove NA's
    threshed  <-  cleaned[cleaned > thresh]   # 3. extract values above the threshold value
    med  <- median(threshed)                  # 4. get the median of those extracted values 
    return(med)                               # 5. return the value
}                 

threshold  <- 50
thresh_med1(num_cells, threshold)             # execute your code on an example
