# Functions in R

Functions are weird but important to understand. **Any time** you see a word with parentheses next to it (e.g., `mean()`) it's a function! R is flexible enough to let you write your own functions, which can be useful if you need to perform some type of operation many times on different types of data.

The basic logic is as follows:  
input -> operations -> output

Whenever you're working with functions, it's so crucial to understand all three steps: (1) what format does your input need to be in, (2) what type of operations are happening to your inputs, (3) what type of format will your output be in. When *using* functions, it's much more important to understand (1) and (3), but, when writing functions, you need to have a good grasp on all three steps.

Defining a function takes the following form:

```
myFunction <- function(arg1, arg2) {
    operations
    return(output)
}

## then when going to use the function:
myFunction(arg1, arg2)
```
What's going on here:  
* `myFunction` is just some arbitrary name you give to the function.
* `arg1` and `arg2` are the *inputs* to your function (ie, the data that is coming in that you'll be working with inside the function.
* `operations` just represents literally anything you want to perform on your inputs. Any valid R code will work.
* `return()` you stick inside these parentheses everything you want the function's *output* to be.

This basic format will get you a long way. Let's say we want to create a function that returns the sum of an input vector:

In [5]:
## define the function name and inputs, notice the input here is a VECTOR!!
mySum <- function(arg1) {
    ## initialize an output variable
    ## notice I only care about returning one number, so I don't need to worry about initializing an object with a certain size
    out <- 0
    ## start a for loop where each 'element' is one element in the input vector
    for (element in arg1) {
        ## update the ouput variable by adding the new element of the input vector to the existing total
        out = out + element
    }
    ## return the 'out' variable, which in this case is just a single number
    return(out)
}

## call the function
print(mySum(c(5,7,9,6)))

[1] 27


Notice how even though the function only takes one argument, I'm still passing a whole vector to that argument. That's what the function expects to happen--one input, and that input is a vector of many elements. What happens if I don't use the `c()` notation?

In [7]:
print(mySum(5,7,9,6))

ERROR: Error in mySum(5, 7, 9, 6): unused arguments (7, 9, 6)


`mySum()` is only expecting one argument, so if I don't explicitly tell it that all those numbers should be treated as a single vector, it won't know what to do.

But the nice thing about R is that it already *has* many functions that we need to do simple (and complex) statistical operations. We could've used the built-in `sum()` function:

In [12]:
print(sum(c(5,7,9,6)))

[1] 27


**Functions and Your Environment**  
One last thing to note is that you need to be careful about how the operations in your function interact with the variables in your environment. You should assume that all the operations within your function will *only* have access to the variables that you fed to the function as inputs, *not* variables that globally exist in the environment (even though technically functions will have access to global variables, it's best practice to code as if they won't).

One that same note, any variables that you create from within a function that aren't explicitly returned as output *will not* be accessible from outside the function.  

I'll demonstrate both cases.

In [1]:
## accessing global vars from within a function
x <- 'hello'
demo <- function(){
    print(x)
}
demo()

[1] "hello"


Like I said, this will work, but try to avoid coding like this.

In [8]:
## accessing vars that were created inside a function but not explicitly returned as output
x <- 'hello'
demo <- function(){
    y <- 5
    print(x)
}
demo()
print(y)

[1] "hello"


ERROR: Error in print(y): object 'y' not found


Because the variable `y` was created inside the function `demo()`, it's not accessible when we try to get that function from outside the function, because we didn't explicitly return it. Contrast against:

In [4]:
## accessing vars that were created inside a function and explicitly returned as output
x <- 'hello'
demo <- function(){
    y <- 5
    print(x)
    return(y)
}
y <- demo()
print(y)

[1] "hello"
[1] 5


Notice how you have to assign the output to a variable. You can always keep an eye on the "Environment" window in RStudio to get a feel for when and why certain vars enter your environment.

Try writing your own mean function:

In [None]:
myMean <- function(arg1){
    ### your code here
    
    return(out)
}

x <- c(5,4,1,9,4,2)
print(myMean(x))