# R
## Introduction, Variables, Data Types

## Brief History
- R was developed initially as an alternative implementation of a language known as S
  - S first came out in 1975 and was originally developed at Bell Labs
- Work on R began in 1993, the first paper was publish in 1996, and the language reached version 1.0 in 2000
    - Lead by a team at the University of Auckland in New Zealand originally
- Designed originally for statisticians, not for programmers

## Running R
- R can be run
    - From the command line, by using the command `R`
    - Using the shebang line `#!/usr/bin/Rscript`
    - In jupyter using the IR kernel
    - From inside the RStudio IDE


## Limitations of R
- Code is generally slower than other languages
    - This was an acceptable trade off given the ease of use
- Uses a lot of memory
    - No easy way to perform calculations in chunks, although some packages are starting to provide support for this
    - Is potentially a poor choice for big data

## Assignment
- `R` supports two assignment operators: `<-` and `=`
- Although both are fine, most style guides and books suggest using `<-` is preferred
    - There are many people that argue the exact oppostite however
- `<-` Can be reversed to be written as `->` but this is not normally done

In [1]:
a <- 1
b = 1
1 -> c

In [2]:
a == b

In [3]:
b == c

## Variable Names
- Variables can contain letters, numbers, underscores, and the dot symbol
    - Because of some historical weirdness, dots in `R` are often found instead of underscores
```R
a.long.name <- "String"
```
- The following names should not be used
    ```
    c, q, s, t, C, D, F, I, T
    ```

In [4]:
aLongName <- 0
a_long_name <- 0
a.long.name <- 0

In [5]:
print(aLongName)

[1] 0


In [6]:
print(a_long_name)

[1] 0


In [7]:
print(a.long.name)

[1] 0


## Data Types and Data Structures
- `R` has data types, and they are important, but they take a back seat to the data structures
    - A variable cannot be scalar in `R`
- The simplest data structure are `vector`s 
    - Every assignment that seems like a single number, string, etc. is actually a single element vector

In [8]:
num <- 1
print(num)

[1] 1


In [9]:
string <- "String"
print(string)

[1] "String"


In [10]:
bool <- TRUE
print(bool)

[1] TRUE


## Data Types
- The data types supported by `R` are:
    - integer
    - double
    - complex (Uses "i" rather than "j" as seen in python)
    - character (This can hold strings of any length)
    - logical 

In [11]:
#Integers must be denoted by appending "L" to the number
#Otherwise they will be interpreted as a double by default
int <- 1L

#typeof() function returns the type as a string
print(typeof(1L))
print(typeof(1))

[1] "integer"
[1] "double"


In [12]:
float.a <- 1
float.b <- 1.01

print(typeof(float.a))
print(typeof(float.b))

[1] "double"
[1] "double"


In [13]:
#Infinity and Not-a-Number are both represnted as doubles
float.c <- NaN
float.d <- Inf
float.e <- -Inf

print(typeof(float.c))
print(typeof(float.d))
print(typeof(float.e))

[1] "double"
[1] "double"
[1] "double"


In [14]:
imaginary.a <- 1 + 1i
imaginary.b <- 1 + 0i

print(typeof(imaginary.a))
print(typeof(imaginary.b))

[1] "complex"
[1] "complex"


In [16]:
string.example.1 <- "String"
string.example.2 <- 'String'

print(typeof(string.example.1))
print(typeof(string.example.2))

string.example.2 <- 1
print(typeof(string.example.2))

[1] "character"
[1] "character"
[1] "double"


In [17]:
#Logical values are typed in all uppercase letters
logic.t <- TRUE
logic.f <- FALSE

print(typeof(logic.t))
print(typeof(logic.f))

[1] "logical"
[1] "logical"


## Testing Data Types
- `R` has numerous predicate functions relating to data types
- There is one for each data type
   - `is.DATA_TYPE_NAME(x)`
   - e.g. `is.integer(x)`
- There is also a generic number predicate
    - `is.numeric(x)`

In [19]:
print(int)
print(is.integer(int))
print(is.double(int))
print(is.numeric(int))
print(is.numeric("1"))

[1] 1
[1] TRUE
[1] FALSE
[1] TRUE
[1] FALSE


## Type Casting
- While data types will automatically be coerced in some situations, to explicitly cast use variations of the `as` function
    - `as.DATA_TYPE_NAME(x)`
    - eg `as.integer(1.003)`
- This pattern is used throughout `R`, not just with primitive data types

In [20]:
print(as.character(1L))
print(as.integer(1.0004))
print(as.integer(Inf))
print(as.double(1L))
print(as.complex(1))
print(as.numeric(TRUE))

[1] "1"
[1] 1


In print(as.integer(Inf)): NAs introduced by coercion to integer range

[1] NA
[1] 1
[1] 1+0i
[1] 1


## Data Structures
- Basic Data Structures in `R` can be described by the number of dimensions supported, and the data types allowed
- From "Advanced R" by Hadley Wickham

| | Homogeneous | Heterogeneous |
| - | --------------------------------- |
| 1-D | Vector | List|
| 2-D | Matrix | DataFrame |
| N-D | Array | |

## Vectors
- A vector can be created by using the `c` function
```R
a.vector <- c(1,2,3,4)
```
- All elements of a vector must be the same. If multiple types are passed to the `c` function, they will be coerced

In [21]:
a.vector <- c(1,2,3,4)
print(a.vector)

[1] 1 2 3 4


In [23]:
a.vector <- c(1.001,2,3,4)
print(a.vector)

[1] 1.001 2.000 3.000 4.000


In [24]:
a.vector <- c(1.01,TRUE,3,4)
print(a.vector)

[1] 1.01 1.00 3.00 4.00


In [25]:
a.vector <- c(TRUE,"a",3,4)
print(a.vector)

[1] "TRUE" "a"    "3"    "4"   


## Factors
- Factors are vectors that are limited to certain values
    - Represent categorical data
    - Helpful in statistical analysis
- A factor can be created using the `factor` function, or converting an existing vector by using `as.factor`

In [26]:
factor.1 <- factor(c("UMBC","UMCP","UMUC","UMB","UB"))
print(factor.1)
cat("\n")
factor.2 <- factor(c("Senior","Junior","Senior",
                     "Junior","Sophmore"))
print(factor.2)

[1] UMBC UMCP UMUC UMB  UB  
Levels: UB UMB UMBC UMCP UMUC

[1] Senior   Junior   Senior   Junior   Sophmore
Levels: Junior Senior Sophmore


In [28]:
# Can use the levels keyword to specify all possible values
factor.3 <- factor(c("Senior","Junior","Senior",
                     "Junior","Sophmore"),
                    levels=c("Senior","Junior",
                             "Sophmore",'Freshman'))
print(factor.3)
cat("\n")
factor.4 <- as.factor(c("Senior","Junior",
                        "Senior","Junior","Sophmore"))
print(factor.4)

[1] Senior   Junior   Senior   Junior   Sophmore
Levels: Senior Junior Sophmore Freshman

[1] Senior   Junior   Senior   Junior   Sophmore
Levels: Junior Senior Sophmore


## Lists
- A list is a one dimensional (technically) data structure
    - It can hold a mixture of any data types
    - It can recursively hold other lists and vectors
- Created using the `list` function
```R
a.list <- list("a",2,3.14,FALSE)
```

In [29]:
a.list <- list("a", 2, 3.14, FALSE)

#The str function will show the structure of a variable
#str DOES NOT stand for string, it stands for structure
str(a.list)
print(a.list)

List of 4
 $ : chr "a"
 $ : num 2
 $ : num 3.14
 $ : logi FALSE
[[1]]
[1] "a"

[[2]]
[1] 2

[[3]]
[1] 3.14

[[4]]
[1] FALSE



In [30]:
recursive.list <- list("a", 2, 3.14, list("re","cursive"))
str(recursive.list)

List of 4
 $ : chr "a"
 $ : num 2
 $ : num 3.14
 $ :List of 2
  ..$ : chr "re"
  ..$ : chr "cursive"


In [31]:
# If you try to use c recursively, there is no error
# Everything is just flattened
a.vector <- c(1,2,3,c(4,5))
str(a.vector)

#Applying c to an arguments including at least one list 
#coerces the entire structure to a list
coerced.list <- c(1,2,3,list(4,5),list(6,7))
str(coerced.list)

 num [1:5] 1 2 3 4 5
List of 7
 $ : num 1
 $ : num 2
 $ : num 3
 $ : num 4
 $ : num 5
 $ : num 6
 $ : num 7


## Attributes
- Under the surface, R is a very object-oriented language
    - We will talk more about creating user-defined objects in a later lecture
- All data structures we will discuss today have attributes that can be assigned values
- The general syntax is 
```R
attr(OBJECT, "ATTRIBUTE_NAME") <- ATTRIBUTE_VALUE
```

In [32]:
obj <- c(3,4,5,6)
print(attr(obj,"time_created"))
attr(obj,"time_created") <- date()
print(attr(obj,"time_created"))
cat("\n")
print(attributes(obj))

NULL
[1] "Tue Oct  3 12:00:22 2017"

$time_created
[1] "Tue Oct  3 12:00:22 2017"



## Special Attributes
- While an attribute name can be anything, a few special attributes exist that modify the behavior of the object
    - Names
    - Dimensions
    - Class
- These attributes are so important that they have dedicated functions to access them, and cannot be access with the `attr` function

## Naming Indexes
- An existing list or vector can be given named indices by setting the names attribute
- Just as before, we assign into what looks like function call
```R
names(OBJECT) <- c(SERIES OF CHARACTERS)
```
- A list or vector can also be created using named indices
```R
VARIABLE <- c(a = 1, b = 2)
```

In [33]:
scores <-  c(80,75,80,100,95,85)
names(scores) <- c("Regex HW","Regex Quiz",
                   "Shell HW","Shell Quiz", 
                   "R HW", "R Quiz")
print(scores)

  Regex HW Regex Quiz   Shell HW Shell Quiz       R HW     R Quiz 
        80         75         80        100         95         85 


## Matrices
- A matrix is a 2-d data structure that is homogenous in type
    - Usually numbers, but could be boolean or characters too
- Can by created by
    - Using the `matrix` function
    - Adding dimensions to an already existing vector
    - Using the `cbind` or `rbind` functions

In [36]:
# Using the Matrix Function
m <- matrix( c(1,2,3,4,5,6,7,8,9,10,11,12), 
            nrow=3, ncol=4 )
print(m)
cat("\n")
m2 <- matrix(1:12,ncol=4)
print(m2)

     [,1] [,2] [,3] [,4]
[1,]    1    4    7   10
[2,]    2    5    8   11
[3,]    3    6    9   12

     [,1] [,2] [,3] [,4]
[1,]    1    4    7   10
[2,]    2    5    8   11
[3,]    3    6    9   12


In [37]:
#Creating a matrix of zeros
zeros <- matrix(0,nrow=3,ncol=4)
print(zeros)
cat("\n")
print(dim(zeros))

     [,1] [,2] [,3] [,4]
[1,]    0    0    0    0
[2,]    0    0    0    0
[3,]    0    0    0    0

[1] 3 4


In [38]:
#Adding Dimensions to an existing Vector
vec <- 1:12
print(vec)
print(dim(vec))
cat("\n")
dim(vec) <- c(3,4)
print(vec)

 [1]  1  2  3  4  5  6  7  8  9 10 11 12
NULL

     [,1] [,2] [,3] [,4]
[1,]    1    4    7   10
[2,]    2    5    8   11
[3,]    3    6    9   12


In [39]:
#Using cbind
m3 <- cbind(c(1,2,3),c(4,5,6),c(7,8,9),c(10,11,12))
print(m3)
cat("\n")
m4 <- rbind(c(1,4,7,10),c(2,5,8,11),c(3,6,9,12))
print(m4)

     [,1] [,2] [,3] [,4]
[1,]    1    4    7   10
[2,]    2    5    8   11
[3,]    3    6    9   12

     [,1] [,2] [,3] [,4]
[1,]    1    4    7   10
[2,]    2    5    8   11
[3,]    3    6    9   12


## Data Frames
- Data Frames are 2-d data structures in which a given column of the data frame must have the same type, but columns may have different types
- Each row is like a record in a simple database
- Is generally the most common data structure encountered in R

## Creating a Data Frame
- While Data Frames are often created by reading directly from a file, it is also possible to create them programmatically. 
- The general syntax is 
```R
df <- data.frame(COL1 = c(VALUES FOR COL 1),
                 COL2 = c(VALUES FOR COl2), ..., 
                 COL_N = c(VALUES FOR COL_N))
```

In [40]:
df <- data.frame(name=c("UMBC","UMCP","Towson"),
                 zipcode=c(21250,20742,21252),
                 undergrad=c(11142,28472,19596),
                 graduate=c(2498,10611,3109))
print(df)

    name zipcode undergrad graduate
1   UMBC   21250     11142     2498
2   UMCP   20742     28472    10611
3 Towson   21252     19596     3109


## Common Functions on a Data Frame
- The function `nrow` returns the number of rows in the data frame
- The functions `ncol` and `length` both return the number of columns
- The names of the the rows can be accessed and changed using the `row.names` function

In [41]:
print(nrow(df))
print(ncol(df))
row.names(df) <- c('A','B','C')
print(df)

[1] 3
[1] 4
    name zipcode undergrad graduate
A   UMBC   21250     11142     2498
B   UMCP   20742     28472    10611
C Towson   21252     19596     3109


## Reading Data
- `R` has many built in functions to read data files into data frames
    - `read.table` reads a space separated file by default, and is the base to many other functions
    - `read.csv` reads a comma separated values file, is actually just a call to read.table
- `R` supports many other formats through various libraries
    - One of the most common libraries is `foreign` which reads in data from many similar languages to `R`

In [44]:
usm <- read.table("usm.tsv",sep="\t",header=TRUE)
print(usm)

          Name ZIP.Code Undergraduate.Enrollment Graduate.Enrollment
1         UMBC    21250                    11142                2498
2         UMCP    20742                    28472               10611
3       Towson    21252                    19596                3109
4         UMUC    20774                    67434               19382
5 Morgan State    21251                     6362                1327


In [45]:
usm2 <- read.csv("usm.csv",row.names=1)
print(usm2)

             ZIP.Code Undergraduate.Enrollment Graduate.Enrollment
UMBC            21250                    11142                2498
UMCP            20742                    28472               10611
Towson          21252                    19596                3109
UMUC            20774                    67434               19382
Morgan State    21251                     6362                1327


## Writing Data
- `R` similarly supports many different formats in which to write data to a file
    - `write.table`
    - `write.csv`
- By default, column and row names are printed to the file, to remove them set `col.names` or `row.names` to FALSE

In [46]:
write.csv(usm2,'usm2.csv')

In [47]:
write.csv(usm2,'usm2.csv',append=TRUE,col.names=FALSE)

In write.csv(usm2, "usm2.csv", append = TRUE, col.names = FALSE): attempt to set 'col.names' ignored

In [48]:
write.table(usm2,'usm2.csv',sep=","
          ,append=TRUE,col.names=FALSE)

## Math
- Standard operations of +,-,\*,/, and ^
- Modulus operator is %%
- Integer division is %/%
- Square root and absolute value are part of R's base package

In [49]:
#Addition
print(1 + 1)
print(1 + 1.0)
print(1 + 1i + 2)
print(2 + 1 + 3i)
print(2 + 3i + 4 + 5i)

[1] 2
[1] 2
[1] 3+1i
[1] 3+3i
[1] 6+8i


In [50]:
#Subtraction
print(3-2)
print(0-3)

[1] 1
[1] -3


In [51]:
#Multiplication
print(3 * 4)
print(3 * .12)

[1] 12
[1] 0.36


In [52]:
#Division
print(3/4)
print(0/4)
print(0/0)
print(3/0)
print(-3/0)

[1] 0.75
[1] 0
[1] NaN
[1] Inf
[1] -Inf


In [53]:
# Integer Division
print(3 %/% 4)
print(12 %/% 5)
print(3 %/% 0)
print(0 %/% 0)

[1] 0
[1] 2
[1] Inf
[1] NaN


In [54]:
#Modulus

print(3 %% 3)
print(10 %% 3)
print(0 %% 0)
print(3 %% 0)

[1] 0
[1] 1
[1] NaN
[1] NaN


In [55]:
print(3 ^ 3)
print(9 ^ 0.5)
print(10 ^ -2)

[1] 27
[1] 3
[1] 0.01


## High-Dimensional Math
- Mathmatical operation on higher dimensional data structures is navtively part of `R`
- For scalar operations, like mutiplying every value by 2, the dimensionality doesn't matter
    - For operations involving two data frames, two matrices, etc. the size should match to prevent unintended outcomes
- In addition, both matrices and data.frames can be transposed using the `t` function

In [56]:
#Vector / Scalar Math
vec <- 1:5
print(vec * 2)
print(vec / 10)
print(vec + 1)

[1]  2  4  6  8 10
[1] 0.1 0.2 0.3 0.4 0.5
[1] 2 3 4 5 6


In [57]:
#Vector addition
vec2 <- 10:15
print(vec + vec2)
vec2 <- 11:15
print(vec + vec2)

In vec + vec2: longer object length is not a multiple of shorter object length

[1] 11 13 15 17 19 16
[1] 12 14 16 18 20


In [60]:
#Element-wise multiplication
print(vec * vec2)
cat("\n")
#Dot Product
print(vec %*% vec2)
#print(cvec,vec2))

[1] 11 24 39 56 75

     [,1]
[1,]  205


In [61]:
#Matrix / Vector Operations
mat <- matrix(1:20,nrow=5)
print(mat)
print(mat / vec)

     [,1] [,2] [,3] [,4]
[1,]    1    6   11   16
[2,]    2    7   12   17
[3,]    3    8   13   18
[4,]    4    9   14   19
[5,]    5   10   15   20
     [,1]     [,2]      [,3]  [,4]
[1,]    1 6.000000 11.000000 16.00
[2,]    1 3.500000  6.000000  8.50
[3,]    1 2.666667  4.333333  6.00
[4,]    1 2.250000  3.500000  4.75
[5,]    1 2.000000  3.000000  4.00


In [62]:
#Matrix / Vector Operations
mat2 <- matrix(1:20,nrow=4)
print(mat2)
print(mat2 / vec)

     [,1] [,2] [,3] [,4] [,5]
[1,]    1    5    9   13   17
[2,]    2    6   10   14   18
[3,]    3    7   11   15   19
[4,]    4    8   12   16   20
     [,1]     [,2]  [,3]      [,4] [,5]
[1,]    1 1.000000  2.25  4.333333 8.50
[2,]    1 6.000000  2.00  3.500000 6.00
[3,]    1 3.500000 11.00  3.000000 4.75
[4,]    1 2.666667  6.00 16.000000 4.00


In [63]:
#DataFrame Operations
print(usm)
cat("\n")
print(usm * 2)

          Name ZIP.Code Undergraduate.Enrollment Graduate.Enrollment
1         UMBC    21250                    11142                2498
2         UMCP    20742                    28472               10611
3       Towson    21252                    19596                3109
4         UMUC    20774                    67434               19382
5 Morgan State    21251                     6362                1327



In Ops.factor(left, right): ‘*’ not meaningful for factors

  Name ZIP.Code Undergraduate.Enrollment Graduate.Enrollment
1   NA    42500                    22284                4996
2   NA    41484                    56944               21222
3   NA    42504                    39192                6218
4   NA    41548                   134868               38764
5   NA    42502                    12724                2654


In [66]:
#Transposition
print(t(mat))
cat("\n")
#What is the datastructure returned by this function?
print(t(usm))
print(as.data.frame(t(usm)))

     [,1] [,2] [,3] [,4] [,5]
[1,]    1    2    3    4    5
[2,]    6    7    8    9   10
[3,]   11   12   13   14   15
[4,]   16   17   18   19   20

                         [,1]    [,2]    [,3]     [,4]    [,5]          
Name                     "UMBC"  "UMCP"  "Towson" "UMUC"  "Morgan State"
ZIP.Code                 "21250" "20742" "21252"  "20774" "21251"       
Undergraduate.Enrollment "11142" "28472" "19596"  "67434" " 6362"       
Graduate.Enrollment      " 2498" "10611" " 3109"  "19382" " 1327"       
                            V1    V2     V3    V4           V5
Name                      UMBC  UMCP Towson  UMUC Morgan State
ZIP.Code                 21250 20742  21252 20774        21251
Undergraduate.Enrollment 11142 28472  19596 67434         6362
Graduate.Enrollment       2498 10611   3109 19382         1327


## Boolean Comparison
- `R` supports the standard boolean operators of `<, >, <=, >=, == !=`
    - The and an or operators are `&` and `|` respectively
- When used between vectors or matrices, returns a object of the same size filled with boolean values

In [67]:
##Standard Scalar Comparison
print(3 == 4)
print(3 < 4)
print(3 < 4 & 5 < 10)
print(3 == 4 | 4 != 4)

[1] FALSE
[1] TRUE
[1] TRUE
[1] FALSE


In [71]:
## Comparing Data Structures
print(vec)
print(vec2)
cat("\n")
print(vec == vec2)
print(vec < vec2)

[1] 1 2 3 4 5
[1] 11 12 13 14 15

[1] FALSE FALSE FALSE FALSE FALSE
[1] TRUE TRUE TRUE TRUE TRUE


In [74]:
#Vector and Matrix Comparison
print(vec)
print(mat)
cat("\n")
print(vec == mat)

[1] 1 2 3 4 5
     [,1] [,2] [,3] [,4]
[1,]    1    6   11   16
[2,]    2    7   12   17
[3,]    3    8   13   18
[4,]    4    9   14   19
[5,]    5   10   15   20

     [,1]  [,2]  [,3]  [,4]
[1,] TRUE FALSE FALSE FALSE
[2,] TRUE FALSE FALSE FALSE
[3,] TRUE FALSE FALSE FALSE
[4,] TRUE FALSE FALSE FALSE
[5,] TRUE FALSE FALSE FALSE


## Subsetting Vectors
- **Indexing starts at 1!**
- Subsetting is done using square brackets (**[ ]**)
- Subsetting is most commonly done with a vector of
    - Positive Integers
    - Negative Integers
    - Boolean Values

## Positive Integer Subsetting
- Positive integers denote which values to return

In [75]:
print(vec)
print(vec[1])
print(vec[2:3])
print(vec[c(1,5)])
#Can repeat indices
print(vec[c(2,2)])

[1] 1 2 3 4 5
[1] 1
[1] 2 3
[1] 1 5
[1] 2 2


## Positive Integer Subsetting
- Negative integers denote which values to *not* return

In [78]:
print(vec)
print(vec[-1])
print(vec[-2:-3])
print(vec[c(-1,-5)])

[1] 1 2 3 4 5
[1] 2 3 4 5
[1] 1 4 5
[1] 2 3 4


## Boolean Value Subsetting
- Values are returned when the subsetting vector contains TRUE
- To prevent unexpected errors, the vector used to subset should be the same length as the vector being indexed into
    - If the index vector is shorter than the vector being indexed, the values will repeat as many times as necessary

In [79]:
# Explicit Boolean Subsetting
print(vec)
print(vec[c(TRUE,FALSE,TRUE,FALSE,TRUE)])
cat("\n")
#Using an expression
print(vec[vec %% 2 == 0])

[1] 1 2 3 4 5
[1] 1 3 5

[1] 2 4


## Subsetting Lists
- Subsetting a list with the [] operator will return another list
    - To return a specific value (as a vector) use **[[]]**
- The dollar operator is an alias for **[[]]**, but only **[[]]** can use a variable to do the subsetting

In [84]:
#Returns a list
li <- list(a=1,b=2,c=3,d=4,e=5)
print(li[2])
print(li[[2]])
print(li[['b']])
print(li$b)
idx <- 'b'
cat("\n")
print(li[[idx]])
print(li$idx)


$b
[1] 2

[1] 2
[1] 2
[1] 2

[1] 2
NULL


## Subsetting Matrices
- Matrices can also be subset using the **[]** operator
    - With matrices, two indices can be provided, in the order of row,column
    - If just one is provided, it treats the matrix like a vector


In [91]:
print(mat)
cat("\n")
print(mat[5])
print(mat[5,])
print(mat[,4])
print(mat[5,4])
print(mat[c(5,4),])

     [,1] [,2] [,3] [,4]
[1,]    1    6   11   16
[2,]    2    7   12   17
[3,]    3    8   13   18
[4,]    4    9   14   19
[5,]    5   10   15   20

[1] 5
[1]  5 10 15 20
[1] 16 17 18 19 20
[1] 20
     [,1] [,2] [,3] [,4]
[1,]    5   10   15   20
[2,]    4    9   14   19


## Subsetting Data Frames
- Subsetting Data Frames is very similar to matrices, but passing one index considered a column
    - The $ operator as used with lists can also be used to refer to a specific column
- Rows (or observations) are selected by adding a comma after the row indices

In [92]:
print(usm[1])
cat("\n")
print(usm['Name'])
#This is a vector rather than a one column DF
print(usm$Name)

          Name
1         UMBC
2         UMCP
3       Towson
4         UMUC
5 Morgan State

          Name
1         UMBC
2         UMCP
3       Towson
4         UMUC
5 Morgan State
[1] UMBC         UMCP         Towson       UMUC         Morgan State
Levels: Morgan State Towson UMBC UMCP UMUC


In [95]:
print(usm[usm['Undergraduate.Enrollment'] > 10000,])
cat("\n")
print(usm[usm['Undergraduate.Enrollment'] > 10000,'Name'])
usm['total'] <- usm[3] + usm[4]
print(usm)

    Name ZIP.Code Undergraduate.Enrollment Graduate.Enrollment
1   UMBC    21250                    11142                2498
2   UMCP    20742                    28472               10611
3 Towson    21252                    19596                3109
4   UMUC    20774                    67434               19382

[1] UMBC   UMCP   Towson UMUC  
Levels: Morgan State Towson UMBC UMCP UMUC
          Name ZIP.Code Undergraduate.Enrollment Graduate.Enrollment total
1         UMBC    21250                    11142                2498 13640
2         UMCP    20742                    28472               10611 39083
3       Towson    21252                    19596                3109 22705
4         UMUC    20774                    67434               19382 86816
5 Morgan State    21251                     6362                1327  7689


## R's built-in help system
- `R` has excellent built in help capabilities
    - To access the documentation for a specific function, type `?FUNCTION_NAME`
    - To search all helpfiles for a keyword, use the `??` function
- Typing a function without any arguments or parentheses will at a minimum show you the signature of the function
    - If code is not compiled, the code of the function will be displayed too

In [96]:
?read.table

0,1
read.table {utils},R Documentation

0,1
file,"the name of the file which the data are to be read from. Each row of the table appears as one line of the file. If it does not contain an absolute path, the file name is relative to the current working directory, getwd(). Tilde-expansion is performed where supported. This can be a compressed file (see file). Alternatively, file can be a readable text-mode connection (which will be opened for reading if necessary, and if so closed (and hence destroyed) at the end of the function call). (If stdin() is used, the prompts for lines may be somewhat confusing. Terminate input with a blank line or an EOF signal, Ctrl-D on Unix and Ctrl-Z on Windows. Any pushback on stdin() will be cleared before return.) file can also be a complete URL. (For the supported URL schemes, see the ‘URLs’ section of the help for url.)"
header,"a logical value indicating whether the file contains the names of the variables as its first line. If missing, the value is determined from the file format: header is set to TRUE if and only if the first row contains one fewer field than the number of columns."
sep,"the field separator character. Values on each line of the file are separated by this character. If sep = """" (the default for read.table) the separator is ‘white space’, that is one or more spaces, tabs, newlines or carriage returns."
quote,"the set of quoting characters. To disable quoting altogether, use quote = """". See scan for the behaviour on quotes embedded in quotes. Quoting is only considered for columns read as character, which is all of them unless colClasses is specified."
dec,the character used in the file for decimal points.
numerals,"string indicating how to convert numbers whose conversion to double precision would lose accuracy, see type.convert. Can be abbreviated. (Applies also to complex-number inputs.)"
row.names,"a vector of row names. This can be a vector giving the actual row names, or a single number giving the column of the table which contains the row names, or character string giving the name of the table column containing the row names. If there is a header and the first row contains one fewer field than the number of columns, the first column in the input is used for the row names. Otherwise if row.names is missing, the rows are numbered. Using row.names = NULL forces row numbering. Missing or NULL row.names generate row names that are considered to be ‘automatic’ (and not preserved by as.matrix)."
col.names,"a vector of optional names for the variables. The default is to use ""V"" followed by the column number."
as.is,"the default behavior of read.table is to convert character variables (which are not converted to logical, numeric or complex) to factors. The variable as.is controls the conversion of columns not otherwise specified by colClasses. Its value is either a vector of logicals (values are recycled if necessary), or a vector of numeric or character indices which specify which columns should not be converted to factors. Note: to suppress all conversions including those of numeric columns, set colClasses = ""character"". Note that as.is is specified per column (not per variable) and so includes the column of row names (if any) and any columns to be skipped."
na.strings,"a character vector of strings which are to be interpreted as NA values. Blank fields are also considered to be missing values in logical, integer, numeric and complex fields. Note that the test happens after white space is stripped from the input, so na.strings values may need their own white space stripped in advance."


In [97]:
read.table