# Lists data.frames and environments

List, data.frames and environments are the most versatile data structures in R. In this section we introduce them and outline their basic uses.

## Lists

Unlike vectors, matrices and arrays, lists are not atomic. They are a generic vector able to contain any type including other lists. Lists can be created using the `vector` or more commonly the `list` functions.

In [1]:
x <- vector(mode = "list", length = 4)
print(x)

[[1]]
NULL

[[2]]
NULL

[[3]]
NULL

[[4]]
NULL



In [50]:
y <- list(1:10, letters[1:3], runif(6))
print(y)

[[1]]
 [1]  1  2  3  4  5  6  7  8  9 10

[[2]]
[1] "a" "b" "c"

[[3]]
[1] 0.3604316 0.5651189 0.2421092 0.7588877 0.6213195 0.1777223



Lists can have names too:

In [51]:
ll <- list(name = "Bryan", "age" = 44, occupation = "Dentist")
print(ll)

$name
[1] "Bryan"

$age
[1] 44

$occupation
[1] "Dentist"



In [52]:
# Name insertion:
names(y) <- c("nums", "letters", "random")
print(y)

$nums
 [1]  1  2  3  4  5  6  7  8  9 10

$letters
[1] "a" "b" "c"

$random
[1] 0.3604316 0.5651189 0.2421092 0.7588877 0.6213195 0.1777223



### Selecting items in lists

Items in lists can be selected using the `"[["` operator for integer indices or `"$"` for an individual names. The `"["` operator can select more than one list item and will always return a list:

In [53]:
# Returns a single item as a list
print(y[1])

$nums
 [1]  1  2  3  4  5  6  7  8  9 10



In [54]:
# Returns the first two items as a list
print(y[1:2])

$nums
 [1]  1  2  3  4  5  6  7  8  9 10

$letters
[1] "a" "b" "c"



In [55]:
# Alternatively
print(y[c("nums", "letters")])

$nums
 [1]  1  2  3  4  5  6  7  8  9 10

$letters
[1] "a" "b" "c"



In [56]:
# Returns the first item
print(y[[1]])

 [1]  1  2  3  4  5  6  7  8  9 10


In [57]:
# Returns the item named random
print(y$random)

[1] 0.3604316 0.5651189 0.2421092 0.7588877 0.6213195 0.1777223


In [58]:
# Returns the item named letters
print(y[["letters"]])

[1] "a" "b" "c"


In [59]:
# Returns the item named nums
print(y$`nums`)

 [1]  1  2  3  4  5  6  7  8  9 10


In [60]:
# Selection using logical vector
print(y[c(T, F, T)])

$nums
 [1]  1  2  3  4  5  6  7  8  9 10

$random
[1] 0.3604316 0.5651189 0.2421092 0.7588877 0.6213195 0.1777223



### Replacing items in lists

All the methods used to select items from lists can be used to replace items in lists by using the `"<-"` operator:

In [61]:
# Replace an item in a vector in the list
y$nums[2] <- 100
print(y)

$nums
 [1]   1 100   3   4   5   6   7   8   9  10

$letters
[1] "a" "b" "c"

$random
[1] 0.3604316 0.5651189 0.2421092 0.7588877 0.6213195 0.1777223



In [62]:
# Replace a whole item in the list selected use a character
y[["letters"]] <- letters[4:7]
print(y)

$nums
 [1]   1 100   3   4   5   6   7   8   9  10

$letters
[1] "d" "e" "f" "g"

$random
[1] 0.3604316 0.5651189 0.2421092 0.7588877 0.6213195 0.1777223



In [63]:
# Replace an item in the list selected with a logical vector
y[c(F, F, T)] <- list(rnorm(3))
print(y)

$nums
 [1]   1 100   3   4   5   6   7   8   9  10

$letters
[1] "d" "e" "f" "g"

$random
[1]  1.1076052 -0.2597612 -0.6168398



In [64]:
# Access items with just index numbers
y[[1]][3]

In [67]:
y[[1]][5] <- 50
print(y)

$nums
 [1]   1 100   3   4  50   6   7   8   9  10

$letters
[1] "d" "e" "f" "g"

$random
[1]  1.1076052 -0.2597612 -0.6168398



Coercing a vector to a list:

In [17]:
print(as.list(1:3))

[[1]]
[1] 1

[[2]]
[1] 2

[[3]]
[1] 3



# Exercise 1.10

**Question 1**

Create a list for a customer entity containing his name "Greg Marks", age 34, height 180, location Luton, occupation Dentist, address "1 Greek Street"

**Question 2**

You discover that the adress is actually "12 Dickson Road". Update the customer details in the list to reflect this.

**Question 3**

Select the name and occupation from list

## DataFrames

Data frames are tables composed of columns as vectors, they can be thought of as special types of list where all entries are vectors of equal length. Data frames can be created by using the `"data.frame"` function, or coercing a list using `"as.data.frame"`.

In [18]:
n <- 15
dat <- data.frame(product = sample(c("productA", "productB", "productC"), n, T, prob = c(3, 1, 2)),
          count = sample(100:200, n), grade = sample(letters[1:5], n, T))
dat

product,count,grade
productA,128,a
productA,176,a
productB,173,c
productA,106,a
productA,115,e
productC,142,b
productA,180,e
productC,166,d
productC,192,d
productC,120,d


In [19]:
# The head function returns the top rows of the table
head(dat, 4)

product,count,grade
productA,128,a
productA,176,a
productB,173,c
productA,106,a


In [20]:
# The tail function returns the bottom rows of the table
tail(dat, 4)

Unnamed: 0,product,count,grade
12,productB,196,e
13,productB,143,d
14,productC,150,a
15,productC,131,d


In [21]:
# Appending columns to a data.frame
dat$price <- sample(10:20, n, T)
dat[1:4,]

product,count,grade,price
productA,128,a,18
productA,176,a,18
productB,173,c,17
productA,106,a,15


In [22]:
# The number of rows and columns in a data frame
dim(dat)

In [23]:
# The number of rows in a data frame
nrow(dat)

In [24]:
# The number of columns in a data frame
ncol(dat)

Creating a data frame from a list:

In [25]:
n <- 10
dat2 <- list(product = sample(c("productA", "productB", "productC"), n, T, prob = c(3, 1, 2)),
          count = sample(100:200, n), grade = sample(letters[1:5], n, T))
print(dat2)

$product
 [1] "productA" "productA" "productA" "productA" "productC" "productA"
 [7] "productA" "productA" "productA" "productC"

$count
 [1] 132 176 137 179 184 103 192 169 129 164

$grade
 [1] "d" "b" "b" "e" "b" "c" "d" "c" "d" "c"



In [26]:
# Coercing a list to a data frame
dat2 <- as.data.frame(dat2)
dat2

product,count,grade
productA,132,d
productA,176,b
productA,137,b
productA,179,e
productC,184,b
productA,103,c
productA,192,d
productA,169,c
productA,129,d
productC,164,c


Subsetting a data frame

In [27]:
# Selection using integers (same as for matrices)
dat[1:5, 1:2]

product,count
productA,128
productA,176
productB,173
productA,106
productA,115


In [28]:
# Selection using column names and integer for rows
dat[1:5, c("product", "grade")]

product,grade
productA,a
productA,a
productB,c
productA,a
productA,e


Dollar notation for select a column

In [29]:
# Select the grade column, note that it is a factor
# R automatically converts character vectors to factors
#     when they are included in a data frame
print(dat$grade)

 [1] a a c a e b e d d d d e d a d
Levels: a b c d e


In [30]:
# Selection using logical vectors
dat[(dat$grade %in% c("a", "b", "c")) & (dat$count < 150),]

Unnamed: 0,product,count,grade,price
1,productA,128,a,18
4,productA,106,a,15
6,productC,142,b,10


# Exercise 1.11

**Question 1**

Use the `"head"` and `"tail"` functions to look at the `"iris"` dataset. What are the range of values of each of the numeric columns? Hint use the function `"range"`. Use the `"summary"` function to summarize the `"iris"` dataset.

**Question 2**

Subset the `"iris"` dataset using by selecting where `"Species"` is `"setosa"` and `"Sepal.Length"` is less than `4.5`.

**Question 3**

Select the first 4 rows and 3 columns of the `"iris"` data set.

## Introduction to environments

Environments are scopes in which R's program data exists. So far you have been using environments implicitly and in this section, we will use them explicitly. Environments are very similar to lists and dataframes so are their syntactic use.

The variables you create on R's console exist in the global environment `".GlobalEnv"`. The `"ls"` function lists all the variables in a specific environment (defaults to the global environment).

In [31]:
# Clear the Global environment
rm(list = ls())

In [32]:
# Nothing in the .GlobalEnv
ls()

In [33]:
# Create a variable
x <- 1:10

In [34]:
# Now x exists in the global environment
ls()

In [35]:
# Selecting items from environments using "$" notation
print(.GlobalEnv$x)

 [1]  1  2  3  4  5  6  7  8  9 10


In [36]:
# Selecting items with `[[` notation
print(.GlobalEnv[["x"]])

 [1]  1  2  3  4  5  6  7  8  9 10


In [37]:
# Assigning to environment
.GlobalEnv$y <- runif(5)
print(.GlobalEnv$y)

[1] 0.03130489 0.90469475 0.36439059 0.17350653 0.59770837


In [38]:
# Create environments
env1 <- new.env()
env1

<environment: 0x2d3a3c8>

In [39]:
# Return items from environment
env1$z <- rnorm(4)
print(env1$z)

[1] -1.8585510 -0.5712807 -1.4818808  1.0021203


R packages are basically locked environments loaded into R

In [40]:
# The "search" function shows all the attached items
search()

In [41]:
# Showing that packages are actually environments with attributes
as.environment("package:stats")

<environment: package:stats>
attr(,"name")
[1] "package:stats"
attr(,"path")
[1] "/usr/lib/R/library/stats"

In [42]:
# Attaching the iris data.frame
attach(iris)

In [43]:
# It is now loaded into R
search()

In [44]:
# The names in iris can be direcly accessed
Species[1:5]

In [45]:
# Detach the iris data.frame
detach(iris)

In [46]:
# iris is no longer loeaded
search()

# Exercise 1.12

**Question 1**

Create an environment called `env2` and use the `"assign"` function to assign 100 items sampled from `"letters"` to the variable `lett` in `env2` and sample 200 numbers from the standard normal distribution and assign it to a variable `x` in `env2`. 

**Question 2**

Use the `"get"` function to get `x` and `lett` from `env2`. Use `"$"` notation to replace `x` in `env2` with 200 numbers from the uniform [0,1] distribution, and `lett` with the `"Species"` column from the `iris` dataset.

**Question 3**

Attach `env2` and show that it has been attached by using the `"search"` function. Access the values from `env2` from the console. Now dettach `env2` and delete it.