<H1> Basic R in the Jupyter Notebook and RStudio

R is a programming environment created specifically for statistics. It is a scripting language (if you don't know what that means, don't worry for now). R can be used interactively (as we will see in this notebook), or it can be told to execute a list of commands stored in a plain text file (called a 'script').

From within the Jupyter notebook, we can access the R 'kernel' (the program that interprets R code and returns results). This is just one way to use R. We will also learn to use a program called Rstudio.

<H2> The Notebook

As you can see, the notebook is browser based (it opens a window in your browser) and works a lot like a web server. This notebook is running an R kernel, but we could chooose a Python kernel, a bash kernel (unix shell) or from a long list that is currently expanding:

https://github.com/ipython/ipython/wiki/IPython-kernels-for-other-languages

Be careful, though. Much of this is under development and considered 'beta' (or even alpha) - the tools can be buggy. We will be careful to use only the better developed parts of the Jupyter universe.

The notebook is comprised of 'cells'. Cells can either be 'code' or 'markdown'. Code is for writing R commands. Markdown is for text and is an extension of html. This cell is a markdown cell.

More about markdown here:

https://en.wikipedia.org/wiki/Markdown

You can type anything you want in markdown (though there are some special characters that will be interpreted as commands). 

Code cells require R syntax. The following cell is an R cell:

In [2]:
# This is an R cell. The '#' tells R this is a comment
3+1

When I run the code cell, it executes the R code (3+1) and returns the output in an output cell. To run a code cell, you can type shift-enter, or press the run button at the top of the screen.

<H3> R Basics - Data Types

In any programming language, we have the notion of 'data modes' and 'data structures'. This is because programs manipulate data, and different kinds of data require different manipulations. For example, numbers are treated differently than characters (or strings of characters) and single numbers are treated differently than lists of numbers (vectors) or arrays of numbers (matrices).

The following are some simple R data modes:

* numeric
* character
* logical (TRUE or FALSE)
* complex (we won't worry about these!)

Modes can be combined to form data structures:

* Vectors
* Matrices
* Strings
* Data Frames


<H4> Examples

In [47]:
class(c(1,3,2,8.4)) #This is a vector

In [48]:
class(matrix(c(1,3,2,4,5,6),nrow=2,ncol=3)) #This is a matrix

In [54]:
"This is a string!"

In [58]:
data.frame(c("This is a string","This is another string"),matrix(c(1:6),nrow=2,ncol=3))

Unnamed: 0,c..This.is.a.string....This.is.another.string..,X1,X2,X3
1,This is a string,1,3,5
2,This is another string,2,4,6


The important thing to note above is the combination of both character and numeric data into one object! That is what is special about data frames.

<H3> R Basics - Creating Objects

<H4> c - Concatenate

We have just seen this command in action. The 'c' command combines objects by concatenation. For example:

In [30]:
c(5,6,7)

creates a vector of length 3. We can append to that vector, like so:

In [32]:
c(c(5,6,7),8)

Of course, we would usually have named the first vector something:

In [50]:
v1<-c(5,6,7)
v2<-c(v1,8)
print(v1)
print(v2)

[1] 5 6 7
[1] 5 6 7 8


<H4> rbind

Now, if we would like to create a matrix, we could use the matrix command as above:

In [51]:
matrix(c(1,3,2,4,5,6),nrow=2,ncol=3)

0,1,2
1,2,5
3,4,6


Or, we could create two vectors and combine them:

In [43]:
v1<-c(1,2,5)
v2<-c(3,4,6)
m1<-rbind(v1,v2)
m1
class(m1)

0,1,2,3
v1,1,2,5
v2,3,4,6


Notice that R has automatically assigned row names for us. Thank you, R! We can also use the column-based version (rbind means 'row bind') to append a column to a matrix:

<h4> cbind

In [52]:
m1<-matrix(c(1,2,3,4),nrow=2,ncol=2)
m1

0,1
1,3
2,4


In [53]:
m2<-cbind(m1,c(5,6))
m2

0,1,2
1,3,5
2,4,6


Now, the notebook is a great environment, especially for doing reproducible research, documenting all your steps and exploratory analysis. As an intro to R, though, Rstudio has a few more features - most notable - it has browsable help files, so let's go over there for a moment. 



<H2> Rstudio

Your VM has been setup as an Rstudio 'server'. This means that you can connect to it in a similar manner as you did to the notebook server. Use the following URL:

http://colab-sbx-XXX.oit.duke.edu:8787

Rstudio Server uses the system login authentication, so type in your VM username (bitnami) and the password you were assigned by Duke's OIT.

Your window should look like so:

<img src="RstudioScreen.png" style="width: 98%; height: 98%"/>​

In the upper right corner, we have a 'script' window. This is where you type code that you would like to save into a file. In the lower right is a console window. This is where you type code to execute. It is just like the command line in unix. You type code, press enter, and it gets executed.

On the right hand side is a window with tabs for Files, Plots, Packages, Help and Viewer. As we are beginners, the 'help' tab will be the most relevant. Here, we can find information on syntax, what functions do what, tutorials, etc.

<H3> Work!

* Plot a histogram of 100 numbers generated from the standard normal distribution. Hint: Use the 'Search Engine' feature under 'Help' to find out how to generate random numbers from the standard normal, then search for 'histogram'. Don't use the ggplot histogram. We'll cover graphics grammar later on!

* Use the script window to compute the mean, variance and median of the following list:
(1,2,5,5,2,5,6,8,1,10)
and save the script under the name 'Example1.R'