<center><h1>The DataFrame Object in R</h1></center>

# 1. What is a `data.frame`?

  - Tabular data structure (i.e., like Excel spreadsheet)
  - Canonical data structure for data analysis
  - Capable of storing heterogeneous data

## 1.1 What does it look like?

In [1]:
idx <- 1:4
score <- rnorm(4)
vocals <- c(TRUE, TRUE, FALSE, FALSE)
firstname <- c("john", "paul", "george", "ringo")

dat <- data.frame(idx, firstname, score, vocals)

dat

idx,firstname,score,vocals
<int>,<chr>,<dbl>,<lgl>
1,john,-1.2070035,True
2,paul,0.8243397,True
3,george,0.9136364,False
4,ringo,1.8258713,False


## 1.2 Indexing and Slicing a `data.frame`

  - Similar to `vector`, `matrix`, and `array` objects

In [2]:
dat[1, 2]           # get element in first row, second column

In [3]:
dat[, 2]            # get all of second column

In [4]:
dat[3, ]            # get third row

Unnamed: 0_level_0,idx,firstname,score,vocals
Unnamed: 0_level_1,<int>,<chr>,<dbl>,<lgl>
3,3,george,0.9136364,False


### 1.2.1 Indexing using Column Names

In [5]:
dat[3, "score"]          # element from row 3 and "score" column

In [6]:
dat[2:4, "firstname"]    # get elements 2, 3, and 4 from "firstname" column

## 1.3 The `$` Operator and `data.frame` Objects 

In [7]:
dat$firstname            # get the "firstname" column

# 2. Filter `data.frame` using Logical Indexing

In [8]:
dat

idx,firstname,score,vocals
<int>,<chr>,<dbl>,<lgl>
1,john,-1.2070035,True
2,paul,0.8243397,True
3,george,0.9136364,False
4,ringo,1.8258713,False


In [9]:
dat[dat$vocals, ]

Unnamed: 0_level_0,idx,firstname,score,vocals
Unnamed: 0_level_1,<int>,<chr>,<dbl>,<lgl>
1,1,john,-1.2070035,True
2,2,paul,0.8243397,True


## 2.1 Create New `data.frame` from Another

In [10]:
dat2 <- dat[dat$vocals, ]       # create new dataframe, from subset of original

head(dat2)

Unnamed: 0_level_0,idx,firstname,score,vocals
Unnamed: 0_level_1,<int>,<chr>,<dbl>,<lgl>
1,1,john,-1.2070035,True
2,2,paul,0.8243397,True


### 2.1.1 Take Subset of `data.frame` Columns

In [11]:
cols <- c("firstname", "score")    # columns we care about

dat_namescore <- dat2[, cols]      # create new dataframe

head(dat_namescore)

Unnamed: 0_level_0,firstname,score
Unnamed: 0_level_1,<chr>,<dbl>
1,john,-1.2070035
2,paul,0.8243397


# 3. Adding Columns to a `data.frame`

In [12]:
dat

idx,firstname,score,vocals
<int>,<chr>,<dbl>,<lgl>
1,john,-1.2070035,True
2,paul,0.8243397,True
3,george,0.9136364,False
4,ringo,1.8258713,False


In [13]:
dat$food <- c("steak", "chicken", "potato", "rice")

dat

idx,firstname,score,vocals,food
<int>,<chr>,<dbl>,<lgl>,<chr>
1,john,-1.2070035,True,steak
2,paul,0.8243397,True,chicken
3,george,0.9136364,False,potato
4,ringo,1.8258713,False,rice


## 3.1. Adding Columns (cont.)

In [14]:
dat[, "drink"] <- c("water", "milk", "beer", "scotch")

dat

idx,firstname,score,vocals,food,drink
<int>,<chr>,<dbl>,<lgl>,<chr>,<chr>
1,john,-1.2070035,True,steak,water
2,paul,0.8243397,True,chicken,milk
3,george,0.9136364,False,potato,beer
4,ringo,1.8258713,False,rice,scotch


In [15]:
dat

idx,firstname,score,vocals,food,drink
<int>,<chr>,<dbl>,<lgl>,<chr>,<chr>
1,john,-1.2070035,True,steak,water
2,paul,0.8243397,True,chicken,milk
3,george,0.9136364,False,potato,beer
4,ringo,1.8258713,False,rice,scotch
