# **7. R Data Structures**

## **7.1 Vectors**

Recall that vectors may have mode logical, numeric or character.

**7.1.1 Subsets of Vectors**

Remember how we can extract subsets of vectors (as discussed in section 2.6.2):
- We can specify the indices of the elements to be extracted, where negative indices omit elements.
- We can use a logical vector to specify which elements to extract based on their logical value.
- Additionally, for vectors with named elements, there's a third option:

In [None]:
data <- c(Andreas = 178, John = 185, Jeff = 183)[c("John", "Jeff")]
data


#A vector of names has been used to extract the elements.

**7.1.2 Patterned Data**

Use 5:15 to generate the numbers 5, 6, …, 15. Entering 15:5 will generate the sequence in the reverse order.
To repeat the sequence (2, 3, 5) four times over, enter rep(c(2,3,5), 4) thus:

In [None]:
 rep(c(2,3,5),4)

If instead one wants four 2s, then four 3s, then four 5s, enter rep(c(2,3,5), c(4,4,4)).

In [None]:
 rep(c(2,3,5),c(4,4,4)) # An alternative is rep(c(2,3,5), each=4)

You can simplify it like this:

Instead of `c(4, 4, 4)`, you can use `rep(4, 3)`. So instead of `rep(c(2, 3, 5), c(4, 4, 4))`, you can use `rep(c(2, 3, 5), rep(4, 3))`.

Also, remember that `rep()` has a `length.out` argument, which repeats the sequence until it reaches the specified length.

## **7.2 Missing Values**

In R, NA represents a missing value. Any arithmetic operation or relation involving NA results in NA. This includes comparisons such as <, <=, >, >=, ==, and !=. The first four compare magnitudes, == checks for equality, and != checks for inequality. It's essential to handle NA values properly to avoid unexpected results. For instance, x == NA produces NA, so it's better to use is.na(x) to determine which values of x are NA. Using x == NA doesn't provide any useful information about x.


In [None]:
x <- c(1,6,2,NA)
is.na(x) # TRUE for when NA appears, and otherwise FALSE

In [None]:
x==NA  #All elements are set to NA

In [None]:
NA==NA

*Missing values in subscripts*: In R

In [None]:
y[x>2]  <- x[x>2]    #generates the error message “NAs are not allowed in subscripted assignments".

ERROR: Error: object 'y' not found


Users are advised to use !is.na(x) to limit the selection, on one or both sides as necessary, to those elements
of x that are not NAs. We will have more to say on missing values in the section on data frames that now
follows

## **7.3 Data frames**


Data frames are like tables in R that store data. They're more flexible than matrices because each column can have a different type of data, like numbers, text, or categories. However, all values in a column must be of the same type. If you have a data frame with only numbers in each column, it's similar to a matrix, but not exactly the same. You can convert a data frame to a matrix using the as.matrix() function. We'll talk more about lists, another important concept, later on.

**7.3.1 Extraction of Component Parts of Data frames**

Consider the data frame barley that accompanies the lattice package:

In [None]:
barley <- read.csv("/content/barley.csv")
names(barley)

In [None]:
# Check the structure of the barley dataset
str(barley)


'data.frame':	120 obs. of  5 variables:
 $ rownames: int  1 2 3 4 5 6 7 8 9 10 ...
 $ yield   : num  27 48.9 27.4 39.9 33 ...
 $ variety : chr  "Manchuria" "Manchuria" "Manchuria" "Manchuria" ...
 $ year    : int  1931 1931 1931 1931 1931 1931 1931 1931 1931 1931 ...
 $ site    : chr  "University Farm" "Waseca" "Morris" "Crookston" ...


In [None]:
# If the "site" variable is not a factor, convert it to factor
barley$site <- factor(barley$site)

# Check the levels of the "site" variable
levels(barley$site)


In [None]:
Duluth1932  <- barley[barley$year == "1932"  & barley$site == "Duluth", c("variety", "yield")]
Duluth1932

Unnamed: 0_level_0,variety,yield
Unnamed: 0_level_1,<chr>,<dbl>
66,Manchuria,22.56667
72,Glabron,25.86667
78,Svansota,22.23333
84,Velvet,22.46667
90,Trebi,30.6
96,No. 457,22.7
102,No. 462,22.5
108,Peatland,31.36667
114,No. 475,27.36667
120,Wisconsin No. 38,29.33333


The first column holds the row labels, which in this case are the numbers of the rows that have been extracted.
In place of c("variety","yield") we could have written, more simply, c(2,4).

**7.3.2 Data Sets that Accompany R Packages**

To see a list of available datasets, just type data() in your R console. If you want to know what datasets are included in a specific package like 'datasets', use data(package='datasets').

For most packages, datasets are automatically accessible once the package is loaded. For example, you don't need to load the 'airquality' dataset separately; it's available once you load the 'datasets' package.

When you install R, it comes with many commonly used packages already included. However, some packages need to be installed explicitly. In these notes, we'll use the MASS package occasionally, which is included in the default distribution.

At the beginning of your session, R automatically loads some packages including the base package. You can load other installed packages using the library() command.

## **7.4 Data Entry Issues**

**7.4.1 Idiosyncrasies**

The read.table() function is great for reading numeric data arranged in rows and columns. However, if your data has text in one of the columns, R will automatically turn it into a 'factor', which is like a category with different levels for each unique text string.

Sometimes, R may misunderstand your data, especially if there are small mistakes like using 'O' instead of '0' or 'l' instead of '1'. If you use symbols like '*', '.', or blank spaces, R might also think the column contains text instead of numbers.

To avoid this confusion, you can use the as.is = TRUE parameter with read.table(). This tells R to keep the columns as they are without turning them into factors. Later, if you need to use a column as a factor, you can convert it manually or some functions will do it automatically for you.

**7.4.2 Missing values when using read.table()**

When using read.table(), remember to code missing values as NA. If your data file comes from SAS, you'll likely need to set na.strings = c("."). Sometimes, there might be different characters indicating missing values, like "NA", ".", "", or just empty cells. You can handle multiple indicators by setting na.strings = c("NA", ".", "", ""). The empty quotes "" ensure that empty cells are also treated as NAs.

**7.4.3 Separators when using read.table()**

With data from spreadsheets37, it is sometimes necessary to use tab (“\t”) or comma as the separator. The
default separator is white space. To set tab as the separator, specify sep="\t".

## **7.5 Factors and Ordered Factors**


In our earlier discussion, we learned about factors (section 2.6.4). They're great for efficiently storing character strings when there are many repeats of the same strings. They're especially useful for including qualitative effects in model and graphics formulas.

Factors act like two things in one. They're stored as integer vectors, but each value is interpreted based on a table of levels.

For example, let's look at the islandcities dataset that comes with these notes. It contains the populations of 19 island nation cities with urban center populations of 1.4 million or more in 1995. The row names are city names, the first column is the country name, and the second column is the urban center population in millions. Here's a table showing how many times each country appears:

- Australia: 3
- Cuba: 1
- Indonesia: 4
- Japan: 6
- Philippines: 2
- Taiwan: 1
- United Kingdom: 2


When you print the column named "country," you see the names, not the underlying integer values. This translation happens automatically in R for most factor operations, but there are some exceptions that can be tricky. To ensure you get the country names, specify them explicitly.


In [None]:
islandcities <- read.csv("/content/islandcities.csv")
as.character(islandcities$country)

In [None]:
#to get the integer values, specify
unclass(islandcities$country)

By default, R sorts the level names in alphabetical order. If we form a table that has the number of times that
each country appears, this is the order that is used:

In [None]:
table(islandcities$country)


AFG AGO ALB ALD AND ARE ARG ARM ATA ATG AUS AUT AZE BDI BEL BEN BFA BGD BGR BHR 
  4   7   1   1   1   2  20   1  40   1  36   1   1   1   1   3   2   4   1   1 
BHS BIH BLR BLZ BOL BRA BRB BRN BTN BWA CAF CAN CHE CHL CHN CIV CMR COD COG COL 
  2   2   2   1   6  46   1   1   1   4   4  45   3  13 100   3   6  15   3   9 
COM CPV CRI CUB CYP CZE DEU DJI DMA DNK DOM DZA ECU EGY ERI ESP EST ETH FIN FJI 
  1   1   2   2   1   1   5   1   1   6   2   6   4   8   2   8   1   5   4   1 
FRA FSM GAB GBR GEO GHA GIN GMB GNB GNQ GRC GRD GTM GUY HND HRV HTI HUN IDN IND 
 30   1   3  13   3   1   3   1   1   1   2   1   2   1   1   1   1   1  23  69 
IRL IRN IRQ ISL ISR ITA JAM JOR JPN KAZ KEN KGZ KHM KIR KNA KOR KOS KWT LAO LBN 
  1   9   5   1   2  21   1   1  18  13   4   1   3   1   1   5   1   1   1   1 
LBR LBY LCA LIE LKA LSO LTU LUX LVA MAR MCO MDA MDG MDV MEX MHL MKD MLI MLT MMR 
  1   7   1   1   3   1   1   1   1   7   1   1   5   1  27   1   1   6   1   4 
MNE MNG MOZ MRT MUS MWI MYS

This order of the level names is purely a convenience. We might prefer countries to appear in order of latitude,
from North to South. We can change the order of the level names to reflect this desired order:

In [None]:
# Convert country column to factor
islandcities$country <- as.factor(islandcities$country)

lev <- levels(islandcities$country)

lev

In [None]:
lev[c(7,4,6,2,5,3,1)]

In [None]:
 country <- factor(islandcities$country, levels=lev[c(7,4,6,2,5,3,1)])

In [None]:
 table(country)

country
ARG ALD ARE AGO AND ALB AFG 
 20   1   2   7   1   1   4 

In ordered factors, the levels have a specific order, so there are inequalities between them.

Factors can lead to unexpected results, so it's important to keep these two things in mind:
1. When a character vector becomes a column in a data frame, R automatically converts it into a factor. If you want to keep it as character, wrap it in the `I()` function.
2. Sometimes factors are treated as numeric vectors. To ensure you get the character vector, use `as.character(country)`. If you want the numeric levels (1, 2, 3, ...), use `as.numeric(country)`.

## **7.6 Ordered Factors**

In ordered factors, it's the levels that have a specific order. You can create an ordered factor, or convert a regular factor into an ordered one, using the `ordered()` function. Ordered factors are used when the levels represent positions on a scale, like small, medium, large.

In [None]:
 stress.level<-rep(c("low","medium","high"),2)

In [None]:
 ordf.stress<-ordered(stress.level, levels=c("low","medium","high"))

In [None]:
 ordf.stress

In [None]:
ordf.stress<"medium"

In [None]:
ordf.stress>="medium"

Ordered factors inherit properties from regular factors, and they have an additional attribute for ordering. When you check the class of an object, you'll see both its own class and any classes it inherits from.

In [None]:
class(ordf.stress)

## **7.7 Lists**

Lists allow you to gather various R objects into one container. This could include vectors, matrices, functions, or any other type of data. They can be a mix of different types of objects. As an example, we'll look at the list created by R when you run a linear model (lm) calculation, like the elastic.lm object mentioned in sections 1.1.4 and 2.1.4.

In [77]:
elasticband <- data.frame(stretch=c(46,54,48,50,44,42,52),
 distance=c(148,182,173,166,109,141,166))
# it is readily verified that elastic.lm consists of a variety of diffrent kinds of objects,
# stored as alist. You can get the names of these objects by typing in
elastic.lm <- lm(distance~stretch, data=elasticband)
names(elastic.lm)

In [79]:
# The first list element is:
elastic.lm$coefficients

In [80]:
#Alternative ways to extract this first list element are:
elastic.lm[["coefficients"]]
elastic.lm[[1]]

To get the sublist containing only the vector `elastic.lm$coefficients`, you can use either `elastic.lm["coefficients"]` or `elastic.lm[1]`. These commands give slightly different outputs but essentially refer to the same sublist. The output will display information preceded by `$coefficients`, indicating the list element named `coefficients`.

In [81]:
#The second list element is a vector of length 7
options(digits=3)
elastic.lm$residuals

In [82]:
# The tenth list element documents the function call:
elastic.lm$call

lm(formula = distance ~ stretch, data = elasticband)

In [83]:
mode(elastic.lm$call)

## **7.8 Matrices and Arrays**

Matrices are simpler than data frames because they contain elements of the same type throughout, such as all numbers or all characters. This makes them useful for mathematical operations that data frames can't handle. Matrices are especially important for users interested in creating new regression or multivariate methods. Matrices can also have more than two dimensions, in which case they are called arrays. It's important to remember that matrices are stored column by column.

In [89]:
xx <- matrix(1:6,ncol=3) # Equivalently, enter matrix(1:6,nrow=2)
xx

0,1,2
1,3,5
2,4,6


In [90]:
#if xx is any matrix , the assignment
x <- as.vector(xx)
#places columns of xx, in order, into the vector x. In the example above, we get back the elements 1, 2, . . . , 6.

In [92]:
# Matrices have the attribute “dimension”.
dim(xx)

In [93]:
#in fact a matrix is avector whose dimension attribute has length 2.
x34 <- matrix(1:12, ncol=4)
x34

0,1,2,3
1,4,7,10
2,5,8,11
3,6,9,12


In [94]:
# the extraction of a column or rows or submatrix
x34[2:3, c(1,4)]    # extract rows 2 & 3 & columns 1 & 4

0,1
2,11
3,12


In [95]:
x34[2,]    #extract the second row

In [96]:
x34[-2,]   #extract all rows except the second

0,1,2,3
1,4,7,10
3,6,9,12


In [97]:
x34[-2,-3]  # extract the matrix obtained by omitting row 2 and col 3

0,1,2
1,4,10
3,6,12


The dimnames() function helps manage row and column names in matrices. It returns a list containing row names as the first element and column names as the second. This concept extends straightforwardly to arrays, which we'll cover next.

**7.8.1 Arrays**

Extending beyond matrices (which have two dimensions) leads to arrays, which can have more than two dimensions. A matrix is essentially a two-dimensional array. For example, if we have a numeric vector with 24 elements, we can organize them into an array to maintain their order.

In [98]:
x <- 1:24

In [100]:
dim(x) <- c(2,12)  # turn this into 2 * 12 matrix.
x

0,1,2,3,4,5,6,7,8,9,10,11
1,3,5,7,9,11,13,15,17,19,21,23
2,4,6,8,10,12,14,16,18,20,22,24


In [101]:
dim(x) <- c(3,4,2)
x

**7.8.2 Conversion of Numeric Data frames into Matrices**

You can perform certain operations on matrices that you can't do with data frames. If you need to convert between the two, you can use the as.matrix() function.

## **7.9 Exercises**

1. Generate the numbers 101, 102, …, 112, and store the result in the vector x.

In [103]:
x <- 101:112
x

2. Generate four repeats of the sequence of numbers (4, 6, 3)

In [108]:
x <- rep(c(4,6,3),4)
x

3. Generate the sequence consisting of eight 4s, then seven 6s, and finally nine 3s. Store the numbers obtained , in
order, in the columns of a 6 by 4 matrix.

In [109]:
# Create the sequence
sequence <- c(rep(4, 8), rep(6, 7), rep(3, 9))

# Create a 6 by 4 matrix and fill it with the sequence
matrix_sequence <- matrix(sequence, nrow = 6, ncol = 4, byrow = TRUE)
matrix_sequence


0,1,2,3
4,4,4,4
4,4,4,4
6,6,6,6
6,6,6,3
3,3,3,3
3,3,3,3


4. Create a vector consisting of one 1, then two 2’s, three 3’s, etc., and ending with nine 9’s.

In [110]:
# Generate the vector
sequence <- unlist(sapply(1:9, function(x) rep(x, times = x)))

# Print the vector
sequence


5. For each of the following calculations, what you would expect? Check to see if you were right!

a)
```
 answer <- c(2, 7, 1, 5, 12, 3, 4)
for (j in 2:length(answer)){ answer[j] <- max(answer[j],answer[j-1])}
```
b)
```
answer <- c(2, 7, 1, 5, 12, 3, 4)
for (j in 2:length(answer)){ answer[j] <- sum(answer[j],answer[j-1])}
```

In [111]:
#(1)# Given vector
answer <- c(2, 7, 1, 5, 12, 3, 4)

# Perform the calculation
for (j in 2:length(answer)) {
  answer[j] <- max(answer[j], answer[j-1])
}

# Print the result
answer


In [112]:
#(2)# Given vector
answer <- c(2, 7, 1, 5, 12, 3, 4)

# Perform the calculation
for (j in 2:length(answer)) {
  answer[j] <- sum(answer[j], answer[j-1])
}

# Print the result
answer


6. In the built-in data frame airquality (datasets package):

(a) Determine, for each of the columns of the data
frame airquality (datasets package), the median, mean, upper and lower quartiles, and range;

(b) Extract the
row or rows for which Ozone has its maximum value;

(c) extract the vector of values of Wind for values of
Ozone that are above the upper quartile.

In [114]:
# Load the dataset
airquality  <- read.csv("/content/airquality.csv")

# (a) Summary statistics for each column
summary_stats <- function(x) {
  med <- median(x, na.rm = TRUE)
  mean_val <- mean(x, na.rm = TRUE)
  q1 <- quantile(x, 0.25, na.rm = TRUE)
  q3 <- quantile(x, 0.75, na.rm = TRUE)
  range_val <- range(x, na.rm = TRUE)

  return(c(median = med, mean = mean_val, Q1 = q1, Q3 = q3, Range = range_val))
}

# Apply the summary function to each column
summary_df <- sapply(airquality, summary_stats)

# Print the summary dataframe
print(summary_df)


       rownames Ozone Solar.R  Wind Temp Month  Day
median       77  31.5     205  9.70 79.0  7.00 16.0
mean         77  42.1     186  9.96 77.9  6.99 15.8
Q1.25%       39  18.0     116  7.40 72.0  6.00  8.0
Q3.75%      115  63.2     259 11.50 85.0  8.00 23.0
Range1        1   1.0       7  1.70 56.0  5.00  1.0
Range2      153 168.0     334 20.70 97.0  9.00 31.0


In [115]:
# (b) Row with maximum Ozone value
max_ozone_row <- airquality[which.max(airquality$Ozone), ]
print(max_ozone_row)



    rownames Ozone Solar.R Wind Temp Month Day
117      117   168     238  3.4   81     8  25


In [116]:
# (c) Values of Wind for Ozone > upper quartile
upper_quartile <- quantile(airquality$Ozone, 0.75, na.rm = TRUE)
wind_above_upper_quartile <- airquality$Wind[airquality$Ozone > upper_quartile]
print(wind_above_upper_quartile)

 [1]   NA   NA   NA   NA   NA  5.7   NA   NA   NA   NA   NA   NA   NA 13.8   NA
[16]   NA   NA   NA   NA   NA   NA   NA   NA   NA   NA   NA   NA   NA  4.1   NA
[31]  4.6  5.1  6.3  5.7  7.4   NA   NA  5.1   NA   NA  8.6  8.0  7.4  7.4  6.9
[46]  4.6  4.0 10.3  8.0   NA   NA  9.7   NA   NA  3.4  8.0   NA  9.7  2.3  6.3
[61]  6.3  6.9  5.1  2.8  4.6   NA


7. Refer to the Eurasian snow data that is given in Exercise 1.6 .

Find the mean of the snow cover

(a) for the oddnumbered years and

(b) for the even-numbered years.

In [117]:
# Eurasian snow dataset (assuming it's already loaded)
years <- 1970:1979
snow_cover <- c(6.5, 12.0, 14.9, 10.0, 10.7, 7.9, 21.9, 12.5, 14.5, 9.2)
# Combine years and snow_cover into a single data vector
snow_data <- data.frame(year = years, snow_cover = snow_cover)
snow_data
# Extracting odd-numbered and even-numbered years
odd_years <- snow_data$snow_cover[snow_data$year %% 2 != 0]
even_years <- snow_data$snow_cover[snow_data$year %% 2 == 0]

# Calculating mean snow cover for odd-numbered and even-numbered years
mean_snow_cover_odd <- mean(odd_years, na.rm = TRUE)
mean_snow_cover_even <- mean(even_years, na.rm = TRUE)

# Printing the results
cat("Mean snow cover for odd-numbered years:", mean_snow_cover_odd, "\n")
cat("Mean snow cover for even-numbered years:", mean_snow_cover_even, "\n")


year,snow_cover
<int>,<dbl>
1970,6.5
1971,12.0
1972,14.9
1973,10.0
1974,10.7
1975,7.9
1976,21.9
1977,12.5
1978,14.5
1979,9.2


Mean snow cover for odd-numbered years: 10.3 
Mean snow cover for even-numbered years: 13.7 


8. Determine which columns of the data frame Cars93 (MASS package) are factors. For each of these factor
columns, print out the levels vector. Which of these are ordered factors?

In [118]:
# Load the MASS package
library(MASS)

# Load the Cars93 dataset
data(Cars93)

# Identify factor columns
factor_columns <- sapply(Cars93, is.factor)

# Print factor columns and their levels
for (column_name in names(Cars93[factor_columns])) {
  cat("Column:", column_name, "\n")
  cat("Levels:", levels(Cars93[[column_name]]), "\n")

  # Check if the factor is ordered
  if (is.ordered(Cars93[[column_name]])) {
    cat("Ordered factor:", column_name, "\n")
  }

  cat("\n")
}


Column: Manufacturer 
Levels: Acura Audi BMW Buick Cadillac Chevrolet Chrylser Chrysler Dodge Eagle Ford Geo Honda Hyundai Infiniti Lexus Lincoln Mazda Mercedes-Benz Mercury Mitsubishi Nissan Oldsmobile Plymouth Pontiac Saab Saturn Subaru Suzuki Toyota Volkswagen Volvo 

Column: Model 
Levels: 100 190E 240 300E 323 535i 626 850 90 900 Accord Achieva Aerostar Altima Astro Bonneville Camaro Camry Capri Caprice Caravan Cavalier Celica Century Civic Colt Concorde Continental Corrado Corsica Corvette Cougar Crown_Victoria Cutlass_Ciera DeVille Diamante Dynasty ES300 Eighty-Eight Elantra Escort Eurovan Excel Festiva Firebird Fox Grand_Prix Imperial Integra Justy Laser LeBaron LeMans LeSabre Legacy Legend Loyale Lumina Lumina_APV MPV Maxima Metro Mirage Mustang Passat Prelude Previa Probe Protege Q45 Quest RX-7 Riviera Roadmaster SC300 SL Scoupe Sentra Seville Shadow Silhouette Sonata Spirit Stealth Storm Summit Sunbird Swift Taurus Tempo Tercel Town_Car Vision 

Column: Type 
Levels: Compact

9. Use summary() to get information about data in the data frames attitude (both in the datasets
package), and cpus (MASS package). Write brief notes, for each of these data sets, on what this reveals.

In [119]:
# Load the datasets package
library(datasets)

# Load the MASS package
library(MASS)

# Summary of the attitude dataset
cat("Summary of attitude dataset:\n")
summary(attitude)
cat("\n")

# Summary of the cpus dataset
cat("Summary of cpus dataset:\n")
summary(cpus)
cat("\n")


Summary of attitude dataset:


     rating       complaints     privileges      learning        raises    
 Min.   :40.0   Min.   :37.0   Min.   :30.0   Min.   :34.0   Min.   :43.0  
 1st Qu.:58.8   1st Qu.:58.5   1st Qu.:45.0   1st Qu.:47.0   1st Qu.:58.2  
 Median :65.5   Median :65.0   Median :51.5   Median :56.5   Median :63.5  
 Mean   :64.6   Mean   :66.6   Mean   :53.1   Mean   :56.4   Mean   :64.6  
 3rd Qu.:71.8   3rd Qu.:77.0   3rd Qu.:62.5   3rd Qu.:66.8   3rd Qu.:71.0  
 Max.   :85.0   Max.   :90.0   Max.   :83.0   Max.   :75.0   Max.   :88.0  
    critical       advance    
 Min.   :49.0   Min.   :25.0  
 1st Qu.:69.2   1st Qu.:35.0  
 Median :77.5   Median :41.0  
 Mean   :74.8   Mean   :42.9  
 3rd Qu.:80.0   3rd Qu.:47.8  
 Max.   :92.0   Max.   :72.0  


Summary of cpus dataset:


             name          syct           mmin            mmax      
 ADVISOR 32/60 :  1   Min.   :  17   Min.   :   64   Min.   :   64  
 AMDAHL 470/7A :  1   1st Qu.:  50   1st Qu.:  768   1st Qu.: 4000  
 AMDAHL 470V/7 :  1   Median : 110   Median : 2000   Median : 8000  
 AMDAHL 470V/7B:  1   Mean   : 204   Mean   : 2868   Mean   :11796  
 AMDAHL 470V/7C:  1   3rd Qu.: 225   3rd Qu.: 4000   3rd Qu.:16000  
 AMDAHL 470V/8 :  1   Max.   :1500   Max.   :32000   Max.   :64000  
 (Other)       :203                                                 
      cach           chmin          chmax            perf         estperf    
 Min.   :  0.0   Min.   : 0.0   Min.   :  0.0   Min.   :   6   Min.   :  15  
 1st Qu.:  0.0   1st Qu.: 1.0   1st Qu.:  5.0   1st Qu.:  27   1st Qu.:  28  
 Median :  8.0   Median : 2.0   Median :  8.0   Median :  50   Median :  45  
 Mean   : 25.2   Mean   : 4.7   Mean   : 18.3   Mean   : 106   Mean   :  99  
 3rd Qu.: 32.0   3rd Qu.: 6.0   3rd Qu.: 24.0   3rd Qu.: 1




 10. From the data frame mtcars (MASS package) extract a data frame mtcars6 that holds only the information for
cars with 6 cylinders.

In [120]:
# Load the MASS package
library(MASS)

# Extract cars with 6 cylinders
mtcars6 <- mtcars[mtcars$cyl == 6, ]

# View the first few rows of mtcars6
head(mtcars6)


Unnamed: 0_level_0,mpg,cyl,disp,hp,drat,wt,qsec,vs,am,gear,carb
Unnamed: 0_level_1,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>
Mazda RX4,21.0,6,160,110,3.9,2.62,16.5,0,1,4,4
Mazda RX4 Wag,21.0,6,160,110,3.9,2.88,17.0,0,1,4,4
Hornet 4 Drive,21.4,6,258,110,3.08,3.21,19.4,1,0,3,1
Valiant,18.1,6,225,105,2.76,3.46,20.2,1,0,3,1
Merc 280,19.2,6,168,123,3.92,3.44,18.3,1,0,4,4
Merc 280C,17.8,6,168,123,3.92,3.44,18.9,1,0,4,4


11. From the data frame Cars93 (MASS package), extract a data frame which holds only information for small and
sporty cars.

```
# This is formatted as code
```



In [121]:
# Load the MASS package
library(MASS)

# Extract small and sporty cars
small_sporty_cars <- subset(Cars93, Type %in% c("Small", "Sporty"))

# View the first few rows of the extracted data frame
head(small_sporty_cars)


Unnamed: 0_level_0,Manufacturer,Model,Type,Min.Price,Price,Max.Price,MPG.city,MPG.highway,AirBags,DriveTrain,⋯,Passengers,Length,Wheelbase,Width,Turn.circle,Rear.seat.room,Luggage.room,Weight,Origin,Make
Unnamed: 0_level_1,<fct>,<fct>,<fct>,<dbl>,<dbl>,<dbl>,<int>,<int>,<fct>,<fct>,⋯,<int>,<int>,<int>,<int>,<int>,<dbl>,<int>,<int>,<fct>,<fct>
1,Acura,Integra,Small,12.9,15.9,18.8,25,31,,Front,⋯,5,177,102,68,37,26.5,11.0,2705,non-USA,Acura Integra
14,Chevrolet,Camaro,Sporty,13.4,15.1,16.8,19,28,Driver & Passenger,Rear,⋯,4,193,101,74,43,25.0,13.0,3240,USA,Chevrolet Camaro
19,Chevrolet,Corvette,Sporty,34.6,38.0,41.5,17,25,Driver only,Rear,⋯,2,179,96,74,43,,,3380,USA,Chevrolet Corvette
23,Dodge,Colt,Small,7.9,9.2,10.6,29,33,,Front,⋯,5,174,98,66,32,26.5,11.0,2270,USA,Dodge Colt
24,Dodge,Shadow,Small,8.4,11.3,14.2,23,29,Driver only,Front,⋯,5,172,97,67,38,26.5,13.0,2670,USA,Dodge Shadow
28,Dodge,Stealth,Sporty,18.5,25.8,33.1,18,24,Driver only,4WD,⋯,4,180,97,72,40,20.0,11.0,3805,USA,Dodge Stealth
