# Primer for R

![](banner_R.jpg)

## Setup

In [1]:
f = "setup.R"; for (i in 1:10) { if (file.exists(f)) break else f = paste0("../", f) }; source(f)

## Key Concepts

Here are a few key concepts about using R code.  Subsequent discussions will elaborate.

R is a programming language well-suited to data analysis.

R makes use of **dataframes** and **vectors** to hold collections of values: 
- **Dataframe:** Holds a (2-dimensional) collection of values, indexed by row and column.  Each row has a row number.  Each column has a column number and name, and holds values of a specific type (e.g., numeric values, character string values, or categorical values).  All values in a single column must be of the same type, but any specific column need not have values of the same type as any other column.  Note, a dataframe is sometimes loosely referred to as a table.


- **Vector:** Holds a (1-dimensional) collection of values, like a column of values, indexed by position.  Each position has a position number.  A vector holds values of a specific type (e.g., numeric values, character string values, or categorical values).  All values in a vector must be of the same type.

R makes use of **functions** to make calculations and/or perform actions (e.g., `max(...)`, `min(...)`).  Each function generally takes some arguments and returns a dataframe, vector, or single value.  A function distinguishes arguments by their order and/or by their names. A function's specific behavior depends on the types of its arguments.

R provides a special function (data.frame) to **create a dataframe**.

R provides a special function (c) to **create a vector** or **concatenate** multiple vectors into a single vector. 

## Examples

### Create It

A numeric value:

In [2]:
5

A character string value:

In [3]:
"hello"

A vector of numeric values:

In [4]:
c(3,4,2,3,1)

A vector of character string values:

In [5]:
c("hello","goodbye","data","analytics","statistics")

A dataframe:

In [6]:
data.frame(x1=c(3,4,2,3,1), x2=c(6,7,5,1,3), x3=c("hello","goodbye","data","analytics","statistics"))

x1,x2,x3
3,6,hello
4,7,goodbye
2,5,data
3,1,analytics
1,3,statistics


### Name It

In [7]:
valn = 5
valn

In [8]:
vals = "hello"
vals

In [9]:
vecn = c(3,4,2,3,1)
vecn

In [10]:
vecs = c("hello","goodbye","data","analytics","statistics")
vecs

In [11]:
data = data.frame(x1=c(3,4,2,3,1), x2=c(6,7,5,1,3), x3=c("hello","goodbye","data","analytics","statistics"))
data

x1,x2,x3
3,6,hello
4,7,goodbye
2,5,data
3,1,analytics
1,3,statistics


### Select It

A numeric value indicated by position within a numeric vector: 

In [12]:
vecn[5] # first position is 1

A character string value indicated by position within a character string vector: 

In [13]:
vecs[3] # first position is 1

A numeric value indicated by row position and column position within a dataframe:

In [14]:
data[4, 2]

A numeric vector indicated by a column name within a dataframe:

In [15]:
data$x2

A dataframe indicated by row positions and column positions within another dataframe:

In [16]:
data[c(2,3,4),c(1,2)]

Unnamed: 0,x1,x2
2,4,7
3,2,5
4,3,1


### Apply It

In [21]:
mean(vecn)

In [22]:
rev(vecn)

In [23]:
cor(vecn, rev(vecn))

In [24]:
c(vecn, rev(vecn))

In [25]:
vec.boolean = c(TRUE, TRUE, FALSE, FALSE)
vec.boolean
as.numeric(vec.boolean)

## Further Reading

* http://www.rdatamining.com/
* http://yanchang.rdatamining.com/
* https://www.datacamp.com/community/tutorials/r-packages-guide

<font size=1;>
<p style="text-align: left;">
Copyright (c) Berkeley Data Analytics Group, LLC
<span style="float: right;">
Document revised January 7, 2021
</span>
</p>
</font>