# Hands-On Exercise 2.1: Working with Data in R

## Objectives

In this exercise, you will familiarize yourself with R syntax commands for exploring data sets in R.

## Overview

In this exercise, you will use R commands to examine the data structures, query rows, columns, and subsets of a number of data sets using alternative R commands. You will look at how to sample and simulate data within R. You will also look at data distributions and how visualization can aid in this process.

## Querying from data sets

In RStudio, create a new script (e.g. `Ex21.R`). Add commands to the file according to the instructions that follow in this exercise, and execute each command as you move through the steps.

Retrieve data from the first row and second column in the `mtcars` data frame.

#### <font color="green">Solution...</font>

In [None]:
mtcars[1, 2]

Examine the structure of the `iris` data frame, using the `str()` function.

#### <font color="green">Solution...</font>

In [None]:
str(iris)

What values can the Species factor in iris take (i.e., what are its levels)?

#### <font color="green">Solution...</font>

In [None]:
levels(iris$Species)

Examine the attributes in iris for their particular data type. Use one of the following:

- `is.numeric()`
- `is.character()`
- `is.vector()`
- `is.matrix()`
- `is.data.frame()`

Use row and column names instead of numeric coordinates to find out how many miles per gallon (`mpg`) a `Merc 280C` gets (query the `mtcars` data set).

#### <font color="green">Solution...</font>

In [None]:
mtcars["Merc 280C", "mpg"]

Find out how many rows are in the `mtcars` data frame.

#### <font color="green">Solution...</font>

In [None]:
nrow(mtcars)

Find out how many columns are in the data frame.

#### <font color="green">Solution...</font>

In [None]:
ncol(mtcars)

Preview the first few rows of the `mtcars` data frame.

#### <font color="green">Solution...</font>

In [None]:
head(mtcars)

Use the `c()` function to select the `mpg` and `gear` attributes from `mtcars`.

#### <font color="green">Solution...</font>

In [None]:
mtcars[c("mpg", "gear")]

Select the first and fifth through tenth variables of `mtcars`.

#### <font color="green">Solution...</font>

In [None]:
mtcars[c(1,5:10)]

Exclude the variables `disp`, `hp`, and `drat` from the data set.

#### <font color="green">Solution...</font>

In [None]:
my_vars <- names(mtcars) %in% c("mpg", "cyl", "disp")
mtcars[!my_vars]

Exclude the third and fifth variables.

#### <font color="green">Solution...</font>

In [None]:
mtcars[c(-3,-5)]

Delete the variables `qsec` and `vs` from a copy of `mtcars`.

#### <font color="green">Solution...</font>

In [None]:
mtcars_copy <- mtcars 
mtcars_copy$qsec <- mtcars_copy$vs <- NULL
mtcars_copy

Retrieve the ninth column vector of `mtcars` using the double square bracket (`[[]]`) operator.

#### <font color="green">Solution...</font>

In [None]:
mtcars[[9]]

Retrieve the same column vector by its name.

In [None]:
mtcars[["am"]]

Use the `$` operator instead of the double square bracket operator to retrieve `am`.

#### <font color="green">Solution...</font>

In [None]:
mtcars$am

Use a comma character with the `[]` operator to indicate all rows are to be retrieved from the `am` column vector.

#### <font color="green">Solution...</font>

In [None]:
mtcars[, "am"]

Retrieve the first five complete rows of the data set.

#### <font color="green">Solution...</font>

In [None]:
mtcars[1:5,]

Load the `weather` data set from the `Rattle` package.

#### <font color="green">Solution...</font>

In [None]:
library(rattle)
data(weather, package="rattle")

Retrieve rows where the `Rainfall` in `Canberra` is greater than 16.

#### <font color="green">Solution...</font>

In [None]:
weather[which(weather$Location == "Canberra" & weather$Rainfall > 16), ]

Use the `attach()` function to make objects within the data frame accessible with fewer keystrokes and rerun the previous query with shorter syntax.

#### <font color="green">Solution...</font>

In [None]:
attach(weather)
weather[which(Location == "Canberra" & Rainfall > 16), ]

Select the `Location`, `Date`, and `Rainfall` columns for all rows where the `Rainfall` is greater than or equal to 15.

#### <font color="green">Solution...</font>

In [None]:
subset(weather, Rainfall >= 15, select=c(Location, Date, Rainfall))

Select all columns values between `MinTemp` and `Sunshine` (inclusive) where `WindGustDir` is `NW` and it is raining on the following day.

#### <font color="green">Solution...</font>

In [None]:
subset(weather, WindGustDir == "NW" & RainTomorrow == "Yes", select=MinTemp:Sunshine)

## Congratulations!

You have successfully explored various data sets within R.