#  R tutorial

### Packages 

In [1]:
.libPaths() # get library location
library()   # see all packages installed
search()    # see packages currently loaded

### Getting Help 

In [None]:
help.start()   # general help
help(foo)      # help about function foo
?foo           # same thing
apropos("foo") # list all functions containing string foo
example(foo)   # show an example of function foo

## Basics 

Assign a value to a variable:
x <- 42

In [1]:
mydata$x1 = 10
mydata$x2 = 20

ERROR: Error in mydata$x1 = 10: object 'mydata' not found


### Get the data type 

In [3]:
# Declare variables of different types
my_numeric <- 42
my_character <- "universe"
my_logical <- FALSE 

# Check class of my_numeric
class(my_numeric)

# Check class of my_character
class(my_character)

# Check class of my_logical
class(my_logical)

In [2]:
mydata$sum <- mydata$x1 + mydata$x2

ERROR: Error in eval(expr, envir, enclos): object 'mydata' not found


In [None]:
mydata$mean <- (mydata$x1 + mydata$x2)/2

### Matrix 

matrix(data, nrow, ncol, byrow = FALSE)

data: The collection of elements that R will arrange into the rows and columns of the matrix 

nrow: Number of rows 

ncol: Number of columns

byrow: The rows are filled from the left to the right. We use `byrow = FALSE` (default values), if we want the matrix to be filled by the columns i.e. the values are filled top to bottom.

In [4]:
# Construct a matrix with 5 rows that contain the numbers 1 up to 10 and byrow =  TRUE 
matrix_a <-matrix(1:10, byrow = TRUE, nrow = 5)
matrix_a

0,1
1,2
3,4
5,6
7,8
9,10


In [5]:
# Print dimension of the matrix with dim()
dim(matrix_a)

In [6]:
# Construct a matrix with 5 rows that contain the numbers 1 up to 10 and byrow =  FALSE
matrix_b <-matrix(1:10, byrow = FALSE, nrow = 5)
matrix_b

0,1
1,6
2,7
3,8
4,9
5,10


In [7]:
# concatenate c(1:5) to the matrix_a
matrix_a1 <- cbind(matrix_a, c(1:5))
# Check the dimension
dim(matrix_a1)

In [8]:
matrix_a1

0,1,2
1,2,1
3,4,2
5,6,3
7,8,4
9,10,5


In [9]:
matrix_a2 <-matrix(13:24, byrow = FALSE, ncol = 3)
matrix_a2

0,1,2
13,17,21
14,18,22
15,19,23
16,20,24


matrix_c[1,2] selects the element at the first row and second column.

matrix_c[1:3,2:3] results in a matrix with the data on the rows 1, 2, 3 and columns 2, 3,

matrix_c[,1] selects all elements of the first column.

matrix_c[1,] selects all elements of the first row.

## Import a CSV

In [None]:
# first row contains variable names, comma is separator
# assign the variable id to row names
# note the / instead of \ on mswindows systems

mydata <- read.table("c:/mydata.csv", header=TRUE,
   sep=",", row.names="id")

### Data Frame 

In [12]:
# Create a, b, c, d variables
a <- c(10,20,30,40)
b <- c('book', 'pen', 'textbook', 'pencil_case')
c <- c(TRUE,FALSE,TRUE,FALSE)
d <- c(2.5, 8, 10, 7)
# Join the variables to create a data frame
df <- data.frame(a,b,c,d)
df

a,b,c,d
10,book,True,2.5
20,pen,False,8.0
30,textbook,True,10.0
40,pencil_case,False,7.0


In [13]:
# Name the data frame
names(df) <- c('ID', 'items', 'store', 'price')
df

ID,items,store,price
10,book,True,2.5
20,pen,False,8.0
30,textbook,True,10.0
40,pencil_case,False,7.0


In [14]:
## Select row 1 in column 2
df[1,2]

In [15]:
## Select Rows 1 to 2
df[1:2,]

ID,items,store,price
10,book,True,2.5
20,pen,False,8.0


In [16]:
## Select Columns 1
df[,1]

In [17]:
## Select Rows 1 to 3 and columns 3 to 4
df[1:3, 3:4]

store,price
True,2.5
False,8.0
True,10.0


In [18]:
# Slice with columns name
df[, c('ID', 'store')]

ID,store
10,True
20,False
30,True
40,False


#### Append a Column to Data Frame 

In [19]:
# Create a new vector
quantity <- c(10, 35, 40, 5)

# Add `quantity` to the `df` data frame
df$quantity <- quantity
df

ID,items,store,price,quantity
10,book,True,2.5,10
20,pen,False,8.0,35
30,textbook,True,10.0,40
40,pencil_case,False,7.0,5


#### Select a Column 

In [20]:
# Select the column ID
df$ID

In [21]:
# Select price above 5
subset(df, subset = price > 5)

Unnamed: 0,ID,items,store,price,quantity
2,20,pen,False,8,35
3,30,textbook,True,10,40
4,40,pencil_case,False,7,5


### Descriptive Statistics 

R provides a wide range of functions for obtaining summary statistics. One way to get descriptive statistics is to use the sapply( ) function with a specified summary statistic.

In [None]:
# get means for variables in data frame mydata
# excluding missing values
sapply(mydata, mean, na.rm=TRUE)

### Plotting in R 

In [None]:
# Creating a Graph
attach(mtcars)
plot(wt, mpg)
abline(lm(mpg~wt))
title("Regression of MPG on Weight")

-

-