In [1]:
library('ggplot2')

### What is R?

* R is a language and environment for statistical computing and graphics. 


* R provides a wide variety of statistical 
  * E.g.: linear and nonlinear modeling, classical statistical tests, time-series analysis, classification, clustering, etc. 

* Outstanding, publication-quality plots
  * Including mathematical symbols and formulae where needed. 
  * Good defaults for the minor design choices in graphics
  * Extremely customizable plots

* Runs on macOS, Windows, and Linux. 

* R is available as Free Software
  * Free Software Foundation’s GNU General Public License in source code form. 


### What is R?

* _" Many users think of R as a statistics system. We prefer to think of it as an environment within which statistical techniques are implemented."_ 

* A programming environment that can do a lot but that is often used for doing stats

* R can be extended (easily) via packages
  * You want to do something complex, download and install package that does it. 
  
* A vibrant community of users
  * See the [following](https://medium.com/@ODSC/data-driven-exploration-of-the-r-user-community-worldwide-57416018e958) for a data science and R-based exporation of the R Community:
  ![](https://www.dropbox.com/s/1hj6e0mownt0amf/R_community.png?dl=1)

### R IDE

* R has many Integrated Development Environments
  * An IDE is simply a graphical user interface (GUI) software that provides tools for developing software in a language. 
  * In the least, it provides tools for writing, executing and debuggind code.

* The most commonly used IDE with R is RStudio

  ![](https://www.dropbox.com/s/g3gsv50xta7ka89/r_ide.png?dl=1)

### R IDE - Continued

* Many other IDEs exist, including:
  * ([JetBrains's PyCharm](https://www.jetbrains.com/pycharm/))
      * My preferred tool because I can also use it to write code in other languages
  * [RBrain](https://r-brain.io/)
  * [Architect](https://www.getarchitect.io/)
  * Jupyter Notebooks and [Jupyter Lab](https://jupyterlab.readthedocs.io/en/stable/)
* Here, we will be using Jupter Environment:
  * [Jupyter Notebooks]() and [Jupyter Lab](https://jupyterlab.readthedocs.io/en/stable/)

### Jupyter Environment

* One of the most popular environments for data science
* Idea in a classroom setting
    * Weave text, links, images, videos and command outputs
    
* Easy to install and use
  * You can even run in Goolgle Colab, the equivalent of Google Doc for Notebooks 😲
  
* We will occasionally use RStudio to get familiar with the interface
    * It's fairly straightforward to use if you need to transition to it

### Jupyter Environment -- Cont'd

* Collection of notebooks as tabs in the same environment
   * as opposed to separate pages
* Each notebook is a collection of cells
* Each cell can hold: 
  * code: you can type your R code and execute it using run button or keyboard shortcuts
  * MarkDown: you can type text with a markup language to format your text
  * Also supports `HMTL` and `JavaScript` but those are beyond our scope
  
![](https://www.dropbox.com/s/hkyjln74prbqi3r/code_cell.png?dl=1)

### Jupyter Notebook Demo

<u>We will cover:</u>
1. Starting a Notebook
2. Creating a cell
3. Running or rendering a cell
 * The ouput section versus rendered cell
4. `In [ ]` vs. `In [*]` vs. `In [1]`
5. Add cells at different locations
6. Editing a cell (pencil icon)
7. Cell types
8. Writing Markdown
9. Keyboard shortcuts
10. File extension
11. About the kernel menu
12. Mixing up HTML with Markdown

In [None]:
### Jupyter Lab Demo 

In [4]:
i <- 0
for(j in 1:100000000) {
}

### Markdown Syntax:

* Markdown uses special characters to format text
    * E.g. We use number of `#` for defining heading type; 
      * ex. `## Some Title` defines a heading of type 2 (second largest H2 in HTML)
* Rules for character usage are well-defined. 
    * Breaking the rules can lead to errors in display
  * ex. Some applications (ex Jupyter) cannot interpret `##Some Title` as heading of type 2.
    * Knows that `##` is a valid command, but `##Title` is not a valid command
    * Exceptions are meant to be helpful, but instead create inconsistency
* For more info, see the [Mark Down Guide's Basic Syntax](https://www.markdownguide.org/basic-syntax/)

### Introduction to R Syntax

* As discussed, R is a "scripting language"

  * We will be using it to write programs, opposed to doing stats
  
* Programming languages need to provide
  * Ways to store data. Those are called variables (and variable types).
  * ways to structure data: those are called data structures
  * Ways to interact with or manipulate the data: those are typically operators or functions
  * Ways to write ways to manipulate data, i.e. way to write novel functions or create new operators  

### Introduction to R Syntax - Variable

* A variable is simply an alias (name) that allows you to refer to some data.

  * Think of it as a labeled folder that contains some data.
  * The data is stored somewhere in the computer's memory.

* We create a variable to store some data using the following syntax
```
x <- 2
```

* We use `x` every time we need to interact with that value

* We crate interact with data contained in `x` by adding 1 to it

```
x + 1 
```

### Introduction to R Syntax -- Variable names

* There are a few rules you need to follow to name a variable
  * Syntactic rule 1: a variable name cannot start with a number
    * e.g.: `1x` is not a valid variable name. `x1` is a valid variable name
  * Syntactic rule 2: a variable cannot start with special characters like: `^`, `!`, `$`, `@`, `+`, `-`, `/`, or `*`
  * Stylistic rule 3: Variables should avoid short meaningless names, like `x`. 🥺
    * More on how a good variable name should be when we cover R style 

In [1]:
x <- 2

In [2]:
x + 1 

### Introduction to R Syntax - Cont'd

* For some operations, `R` returns (gives us back) something.
  * Think of it as s the result of the operation

 For other operations, R doesn't return anything
  * `x <- 2` put (or assign in technical terms) the value 2 to a variable named `x`
    * Nothing is returned
  * `x + 1 ` computes the sum and returns the value back.
    * As a convenience, Jupyter prints the returned values in the outputs section (below the code)
  
* in R, any line that starts with `#` is considered a comment and ignored by the R interpreter

```
# We create a variable and store the value 2 in it
x <- 2
###### We compute the sum of x+x 
x + x
```


### Introduction to R Syntax - Cont'd

* It helps to think of operations as `give me/get me` type or `do for me` type:
  * the former is called an expression, the latter is called a statement.

* e.g.: the expression `x + x - 2` will return a value
    `x + x -2`

* On the other hand, the following statement assigns to x the value 17. 
    * Does the work but does not give you back anything.
     `x<- 17`
     
* For convenience, Jupyter prints out in the output section the value returned by the `give me` type of operations

     

### Functions in R 

* Data can be operated on using operators `+`, `/`, `%%`, etc...
* Data can also be operated on using operators functions
  * Much more common and powerful way
* You can think of an R function as a sort of mathematical function `f(x)`
    * in math: `f(x)` means that function `f` takes an input `x` and returns some value
* Thousands (personal guess) of functions in R that do something.
* Examples, `log -- base e --`, `log10`, `sqrt`, `lm`, etc..  
* Functions don't have to be mathematical

In [3]:
log10(100)
sqrt(9)

In [24]:
x <- log10(100)
paste("Log of 100 is ", x)


### Jupyter And Return Values

* You create as many variables as you need to invoke as many functions on them as you need
* Stick to one statement of expression per line

```
x<-2
sqrt(x)
y<-3
log(y)
```

In [27]:
x<-2
sqrt(x)
y<-3
log(y)

### R Arithmetic Operators


* Most basic types of operators in R:

* Addition: `+`
* Subtraction: `-`
* Multiplication: `*`
* Division: `/`
* Exponentiation: `^`
* Modulo: `%%`

* Note these operators return a value


In [4]:
x + 3


In [6]:
y <- 3
y - y
x / y
x ^ y

### The Assignment Operator

* Recall we can assign a value to a variable (an alias)
    * E.g.: `x <- 2`

* We can, in fact, assign the return value of an expression into a variable

`z <- x + y`

* Evaluate the right-hand expression (`x+y`), which returns a value, 

* Takes that right handle value and assigns it to the left-hand alias

* If you want to see the value in `z` use the expression
  
  `z`
*  i.e., Interpreter, give me the value of `z` 

In [7]:
z <- x + y


In [8]:
z

### Data types in R

* Programming languages tend to classify the types of data they can handle into a small set of categories
  * Those are data or variables types
  
* A data type lets R know what sort of function, or operations can be applied to the data

* For example, if the data contained in a variable `z` is a city name, then R cannot use the division operator on it.
  * In R it's called a `character` data type (collection of characters). In other (most) languages, `z` would be called a `string`
* Attempting an arithmetic operation on `z` would yield an error
  * An error simply means that something did not follow R's rules 
  * It helps to remember simply means that R saw something it was not expecting
 
  

In [12]:
z <- 2

ERROR: Error in eval(expr, envir, enclos): object 'Honolulu' not found


In [10]:
z / 2

ERROR: Error in z/2: non-numeric argument to binary operator


### Data types in R - Cont'd


* R has some fundamental data types:

  * `character`: "a", collection of one or more characters (called a string in CS)
  * `numeric`: 2, 15.5     
  * `logical`: `TRUE`, `FALSE`. Typically called boolean in CS.
  * `complex`: 1+4i (complex numbers with real and imaginary parts)
    * Won't be used in this course
    
* The type (R calls it mode) can also have classes. For example:
  * A numeric can be a `double` ex. 3.12 of 7.0
  * A numeric can also be a `integer`. 2
    
    * `integer` is declared by adding an L to the
        `x <- 1L`
     * Integers require less "precision", i.e., less memory to store the fractional part
     * `1L` explicitly stores the value in a way that uses less RAM 

* We will cover classes later in the course 

In [46]:
# instantiate a variabe x with the 1L (1 iteger)
x <- 1L

# then we use the typeof() function to
# print the class of the variable x
typeof(x)

In [47]:
y <- 200.12

mode(y)

typeof(y)

In [48]:
z <- FALSE

typeof(z)

### Operations on Basic Data Types

* Python offers dozens of operations on the basic data types

* Most operations are data type specific
  * For instance:
    * `round` function rounds a decimal number to the closest integer.
    * sometimes I will write "`x` function" simply as `x()`

In [51]:
round(3.12)

In [52]:
round(3.99)

In [53]:
# the following will generate an error
# as `round()` expects a

round("Hello")

ERROR: Error in round("Hello"): non-numeric argument to mathematical function


### R Data Structures 

* In computer science, a data structure is an abstract construct that provides the user a convenient way to store and access the data
   * Data structures abstract the mechanism used to store the data 
* Depending on the type of data we need to structure, various data types can be used.
  ex. a list of elements or a table with multiple columns and rows (a la Excel)
* The data structures built into R are: Vectors, lists, Matrices, data frames, and factors
* Many additional complex and specialized data structures are available through modules.
* We focus on (atomic) vectors first and cover other data structures in future modules.

### R Data Structures -- Atomic Vectors

* An atomic vector (or simply vector) is an ordered collection of elements of the same type
  * I could have just said list, but `list` is its own data structure in Python.
* Atomic, here, simply means all the elements are of the same type (class)
  * E.g.: all elements are of `character`.
  * Cannot mix different types
* The easiest way to creat a vector of elements is by using the combine `c()` (the function `c` stands for combine)
  * Takes the list of elements to combine ex. `c(1,2,3)` and returns a constructed atomic vector data structure
* Operators and function apply slighly differently on vectors

In [15]:
x <- c(10, 10, 20)
print(x)

[1] 10 10 20


In [16]:
y <- c(3, 2, 1)
x
y
x + y

In [17]:
x * 3

In [18]:
x * y

In [19]:
y <- c(FALSE, FALSE, TRUE, TRUE< FALSE)
length(y)

In [92]:
# The following will generate an warning and ambiguous result since the vectors have diffrent lenghts
# Argh, R!
x <- c(1, 2, 1)
y <- c(5, 6, 7, 8)
x + y

“longer object length is not a multiple of shorter object length”


### Operations on Atomic Vectors

* Functions are the workhorses of R
  
 * Make manipulating numbers seamless
  
    * Ex. computing mean, standard deviation, variance etc. can be easily done with `mean()`, `sd()`, `var()`, etc.  

In [96]:
x <- c(1, 2, 3, 4, 5, 6)
mean(x)
sd(x)
sqrt(var(x))

### Functions 

* A function is a piece of code that implements some logic (an algorithm)

* For example, log, sqrt, round, were all functions in R that we could use (invoke).
	* R has more than 1,000 functions installed by default 
	* There are thousands of other functions that you can download as part packages and install (personal guess)

* It helps to think of functions in their mathematical form.
  $f(x) = x^2 + 2x + 3 $
  
* Here `f(x)` is a math function that takes a value of `x` and returns the computed expression of `x^2 + 2x + 3`

ex.$f(x=3) = 3^2 + 2\times 3 + 3 = 21$

* Math function can have more than one variable, ex. `f(x,y,z) = 3xy + 2yx+ 2z`
  * So do R function

* Computation in a math function can be fixed for a variable. E.g.: 

`f(x,y, z=3) = 3xy + 2yx + 2*3`

  

### Functions Cont'd

* Functions in programming languages operate similalry.
  * Ex. function `round`, takes some value (called an argument) and returns the nearest integer (whole number) value.
  * arguments in R are also called parameters in CS or other programming languages 
    ex. round('3.14') = 3

* Also like math functions, R function can take multiple arguments. 
  * ex. `seq` function takes two arguments x and y, where x < y and returns a vector of values `c(x, x+1, ... y)`



### Functions -- Cont'd


* In R, functions have some default behaviour that can be changed using optional parameters
  * For example, seq(1,5) will return the sequence of integers between 1 and 5 with an increment of 1.
  * To change this default behaviour we can use the optional parameter `by`
  `seq(1, 5, by=0.5)`

* So, in short, some functions have a default "behavior" that is often good enough for most uses
  * Default behavior can be changed via optional parameters

* The behaviorfunction `round`'s behavior can be changed via the optional parameter `digits`.
  `round(3.1415, digits=3)`
  * so, the `digits` parameter is optional whereas the `x` values to round is required

* The default parameter is equivalent to conditioning the execution on that default value
  * In analogy to `f(x,y, z=3) = 3xy + 2yx + 2*3`

In [20]:
seq(1, 5)

In [106]:
seq(1, 5, by=0.1)

In [107]:
round(3.9999, digits=3)

### Functions -- Cont'd

* Just like a math function, you can use function composition

* Given `f(x) = 2x+1 and g(x)= x + 10`, you can compute:

`f(g(x)) = 2(x + 10) + 1 = 2x + 20 + 1  = 2x + 21 `

* Similarly, you can take a function that returns a value and use that value as input to another fuction
  * For example, given:
`
x <- 2.1
`
running the following:
`

y <- round(x) # returns 2
log(y) # return the log(2), or 0.693...
`
is equiavalent to:
`
log(round(x)) # returns 0.693
`


In [164]:
log(2)

### Invoking a Function's Help

* Most functions in R have a good help section that explains what the function does and provides the list of required and optional parameters.
  * Understanding help functions is an acquired skill :rolling eyes:

* We can invoke a function's help by prefixing it with '?'.
* One can also see a function's arguments simply using the `args` function
  * `args(round)`

* The last part of a help page contains,typically, an example of how to use that function.


* Using `??` will search for help pages using a keyword

  * e.g.: `??lo`

In [22]:
?seq

In [21]:
args(seq)

In [159]:
args(round)

### Using Libraries

* As mentioned, R has hundreds of publicly available packages that you can download and just start using.

* One very popular package is `ggplot2`, which is often used to produce publication quality plots.

* A package can be installed using `install.packages`

  e.g.: `install.packages("ggplot2")`
 
* `R` does not load in memory every package you install. 
  * Avoids wasting your RAM.
  * Instead, every time you need to use a library, you need to explicitly load it via the `library` function
  
e.g.: `library("ggplot2")`

In [None]:
## Lab Session


### Question 1

*  Draw a concepts diagram that uses all the following R and programming terminology 
   * Include any missing keywords that will simplify the concepts diagrams 


Code, Variable, Data, Function, Call, Invoke, Type, Expression, Assign, Return value, Character, Numeric, Integer, Logical, Data structure, Arguments, Parameters, Default values, Vector, Data Type



### Question 2

* Open a new Jupyter Notebooks and create new notebook called `Lab 1`

* Make sure the notebook is set to use the `R` Kernel (insted of python by defautl)
  * This is called the runtime

* You can test the run time by instantiating a vector `x` with 5 values and computing its mean

* Practicre creating a markdown cell by reproducing the the third cell of this notebook 
  * The cell has the title: `R IDE`
  

### Question 3

* Recall that `c` creates atomic vectors

* What does the following create?

`c(1, 2, 3, "Hi")`

* Would the following be valid?

`c(1, 2, 3, "Hi") + 1 `

* How about 

`c(1, 2, 3, FALSE) + 1 `

* Before answering, we should know that the function `class()` returns the atomic data type stored in a vector


### Question 4: 

* We will be plotting the following plot

![](https://www.dropbox.com/s/c4nf3n96np3i7nm/simple_qplot_example.png?dl=1)





### Create `x-axis` values

* Create a variable called `x_axis` that is a `vector` of numerical values between 0 and 10 with a step of 0.5. 
    * i.e., `x_axis` will contain the values 0, 0.5, 1, 1.5, 2 .... 10
    * hint: you need a function that returns a sequence of values as a vector

In [None]:
# Write you code in this cell



### Create  `y-axis` values

* Create a variable called `y_axis` that is a list of $x^2 + 2x + 3 $. 
* i.e. each position in `y_axis` is computed as $x^2 + 2x + 3 $, x is the value at the same position in x_axis
* For example: 
  * The value at the first position of `y_axis` is 0^2 + 2*0 + 3 = 3
  * The value at the second position of `y_axis` is 0.5^2 + 2*0.5 + 3 = 4.25
  * etc...

In [None]:
# Write you code in this cell




### Generate the plot of `x_axis` versus `y_axis`

* Plot the values of `x_axis` and `y_axis`
* Use the `qplot` function, which is part of the ggplot library to plot the `x_axis` and `y_axis`.
    * If in doubt consult `qplot`'s documentation or check its arguments
 
* Change the behavior of your plot so that it has:
  * A label for the `x-axis`. Mine says "My  x_axis"
  * A label for the `y-axis`. Mine says "My  x_axis"
  * A title. Mine says "My amazing plot of x_axis versus y_axis"  
  * Dots that are bigger than those produced by default.
 
* Hint: We know that we can change the default behavior of a function by changing the default parameters.
  * which default parameter (param) controls the `x-axis`, `y-axis`, and plot labels?
  * which default param controls the size of the symbol (dot here)?


In [25]:
# Write you code in this cell



### Reading Next Week

* Waht is R https://www.r-project.org/about.html
* Python Basic + writing function : https://rstudio-education.github.io/hopr/basics.html
  * Section 2.6 Does not apply since we're uisng Jupyter Notebooks
* What are packages and libraries: https://rstudio-education.github.io/hopr/packages2.html
* Vectors, lists, dataFrames, factors + reading and writing data: https://rstudio-education.github.io/hopr/r-objects.html#lists

* About Github: 
  * Git: what and why  https://happygitwithr.com/big-picture.html
 

