# Worksheet 06a: File Input/Output

_**Leader**: Victor Yuan **Reviewer:** Almas Khan **ASDA Assist**: David Kepplinger_

*Version 0*

This is the corresponding worksheet for Class 12 (Oct 20, 2020) & Class 13 (Oct 22, 2020).

There are 13 questions. To get 100% on this worksheet, you must get correct 0.4*13 = __5 questions__.

Some notes:

- Remember to pay attention to the variable name to store your answer in, or else it will not be autograded correctly.
- To ensure everything works properly, remember to run all code cells, not just the ones with your answer.

If you want to use packages which are not yet installed, you can use the code cell below to install them. You might not have the R package **repex** installed.

In [2]:
# Install additional packages, e.g.
 #install.packages("here")

# install.packages("testthat")
 #install.packages("reprex")

Installing package into ‘/home/jupyter/R/x86_64-pc-linux-gnu-library/4.0’
(as ‘lib’ is unspecified)

Installing package into ‘/home/jupyter/R/x86_64-pc-linux-gnu-library/4.0’
(as ‘lib’ is unspecified)



Use the following code cell to load any additional packages you want to use for this worksheet.

In [1]:
# Load packages, e.g.
 library(devtools)

Loading required package: usethis



Run the code cell below to load the packages.

In [2]:
suppressPackageStartupMessages(library(tidyverse))
suppressPackageStartupMessages(library(gapminder))
suppressPackageStartupMessages(library(here))
suppressPackageStartupMessages(library(testthat))
suppressPackageStartupMessages(library(reprex))

## TOPIC 1: Writing and reading data from disk

For writing R objects to your computer, and reading tabular data into R, we can use the `tidyverse` package `readr`, which is loaded when running `library(tidyverse)`

Let’s first load the built-in gapminder dataset and the tidyverse:

Next, let’s filter the data only from 2007 and only in the Asia continent and save it to a variable.

In [3]:
gap_asia_2007 <- gapminder %>% filter(year == 2007, continent == "Asia")
gap_asia_2007 %>%
  head()

country,continent,year,lifeExp,pop,gdpPercap
<fct>,<fct>,<int>,<dbl>,<int>,<dbl>
Afghanistan,Asia,2007,43.828,31889923,974.5803
Bahrain,Asia,2007,75.635,708573,29796.0483
Bangladesh,Asia,2007,64.062,150448339,1391.2538
Cambodia,Asia,2007,59.723,14131858,1713.7787
China,Asia,2007,72.961,1318683096,4959.1149
"Hong Kong, China",Asia,2007,82.208,6980412,39724.9787


#### Question 1.1

Write `gap_asia_2007` to a comma-separated value (csv) file named `exported_file.csv` with just one command:

```
write_csv(FILL_THIS_IN, "exported_file.csv")
```

In [4]:
write_csv(gap_asia_2007, "exported_file.csv")

In [5]:
test_that("Question 1.1", {
    expect_true(file.exists('exported_file.csv'))
    with(read.table('exported_file.csv', header = TRUE, sep = ',', stringsAsFactors = FALSE), {
        ctr_order <- order(enc2utf8(country), method = 'radix')
        expect_known_hash(country[ctr_order], '502e6665c327bdbc211f89c785ee853b')
        expect_known_hash(as.integer(pop[ctr_order]), '8bb3c4cc0e3a3380ff82cbd9fe83b2cb')
    })
})
print("Success!")

[1] "Success!"


#### Question 1.2

Let's use the function `read_csv` to  read in `exported_file.csv` back into R and store this as the variable `gap_asia_2007_in`.

```
gap_asia_2007_in <- read_csv("FILL_THIS_IN")

```

In [6]:
gap_asia_2007_in <- read_csv("exported_file.csv")

Parsed with column specification:
cols(
  country = [31mcol_character()[39m,
  continent = [31mcol_character()[39m,
  year = [32mcol_double()[39m,
  lifeExp = [32mcol_double()[39m,
  pop = [32mcol_double()[39m,
  gdpPercap = [32mcol_double()[39m
)



In [7]:
test_that("Question 1.2", {
    expect_known_hash(colnames(gap_asia_2007_in), 'cc76c54ddad925d63e472c77cd7bd7bf')
    expect_known_hash(sapply(gap_asia_2007_in, typeof), '68eb6593a9f582ea9b4aec7862df6be4')
    with(gap_asia_2007_in, {
        ctr_order <- order(enc2utf8(country), method = 'radix')
        expect_known_hash(country[ctr_order], '502e6665c327bdbc211f89c785ee853b')
        expect_known_hash(unique(continent), 'a500021b40bafb5d1ad20bed151aab68')
        expect_known_hash(round(lifeExp[ctr_order], 2), '9da5c364cf95548c95ea94de3193202b')
    })
})
print("Success!")

[1] "Success!"


Notice the output of running `read_csv`. This tells us about the types of variables that were read in. It's a good habit to check this every time you run a `read_` function. Sometimes we might want to change how these variable types are specified...

#### Question 1.3

Let's use the function `read_csv` to  read in `exported_file.csv` back into R and store this as the variable `gap_asia_2007_in_cspec`.

But! This time let's specify that we want the:

- columns `country` and `continent` to be `factors`
- all other column specification to be automatically determined by `read_csv`

```
gap_asia_2007_in_cspec <- FILL_THIS_IN(
  "FILL_THIS_IN.csv", 
  col_types = cols(
    country = col_factor(),
    continent = FILL_THIS_IN
  ))
```

In [11]:
gap_asia_2007_in_cspec <- read_csv(
  "exported_file.csv", 
  col_types = cols(
    country = col_factor(),
    continent = col_factor()
  ))

In [12]:
test_that("Question 1.2", {
    expect_known_hash(sapply(gap_asia_2007_in_cspec, typeof), 'd3ed7d3a07fad8143eb7dd22d88d62a3')
    with(gap_asia_2007_in_cspec, {
        expect_known_hash(sort(enc2utf8(levels(country)), method = 'radix'), '502e6665c327bdbc211f89c785ee853b')
        expect_known_hash(as.integer(continent), 'ccdd4647040ccea8f1863ae5e101edf9')
    })
})
print("Success!")

[1] "Success!"


### Filepaths with `here::here()`

Up until now, we always wrote and read files to our current directory. If we wanted to use a different folder on our computer, we could specify something like: 

- `Documents/STAT545/exported_file.csv` - Mac uses forward slashes
- `Documents\STAT545\exported_file.csv` - Windows uses backward slashes

However, if you wanted to make your Rproj more portable and accessible to more users in a cross-platform (between Mac, Unix, Windows users), rather than specifying every path explicitly, `here::here()` allows you to set relative paths much more easily.

#### Question 1.4

First, run the function `here::here()`. Note where this location is on your computer.

Second, use your file browser to go to the location returned by `here::here()`, create a folder called **"worksheet_06a_data"**.

Lastly,  filter `gap_asia_2007` to when `country` is equal to `"Pakistan"`. Then, write this to a `csv` file in your newly created folder called *"worksheet_06a_data"*.

```
here::here()
write_csv(gap_asia_2007, FILL_THIS_IN("worksheet_06a_data", FILL_THIS_IN.csv))
```

In [20]:
gap_asia_2007 <- gap_asia_2007 %>% filter(country == "Pafistan")
write_csv(gap_asia_2007, here("worksheet_06a_data", "gap_asia_2007.csv"))

In [21]:
test_that("Question 1.4", {
    expect_true(dir.exists(here::here('worksheet_06a_data')))
    expect_true(file.exists(here::here('worksheet_06a_data', 'here_exported_file.csv')))
    expect_setequal(tools::md5sum("exported_file.csv"), tools::md5sum(here::here('worksheet_06a_data', 'here_exported_file.csv')))
    with(read.table(here::here('worksheet_06a_data', 'here_exported_file.csv'), header = TRUE, sep = ',', stringsAsFactors = FALSE), {
        ctr_order <- order(enc2utf8(country), method = 'radix')
        expect_known_hash(country[ctr_order], '502e6665c327bdbc211f89c785ee853b')
        expect_known_hash(round(gdpPercap[ctr_order], 2), '78771a63570dc79433e9587793969a73')
    })
})
print("Success!")

ERROR: Error: Test failed: 'Question 1.4'
* <text>:3: file.exists(here::here("worksheet_06a_data", "here_exported_file.csv")) isn't true.
* <text>:4: tools::md5sum("exported_file.csv")[1] absent from tools::md5sum(here::here("worksheet_06a_data", "here_exported_file.csv"))
* <text>:4: tools::md5sum(here::here("worksheet_06a_data", "here_exported_file.csv"))[1] absent from tools::md5sum("exported_file.csv")
* <text>:5: cannot open the connection
[1mBacktrace:[22m
[90m 1. [39mbase::with(...)
[90m 2. [39mutils::read.table(...)
[90m 3. [39mbase::file(file, "rt")


## TOPIC 2: Base R

For this section, avoid using `tidyverse` functions when possible.

#### Question 2.1

First, let's assign the alphabet to the vector `alphabet`:

In [8]:
(alphabet <- LETTERS)

Use `[]` to subset the 3rd and 7th element of the vector `alphabet`, and assign this to an R object called `a2.1`. 

```
a2.1 <- alphabet[FILL_THIS_IN]
```

In [7]:
a2.1 <- alphabet[c(3,7)]

In [9]:
test_that("Question 2.1", expect_known_hash(a2.1, '548ed661cd18d7e1c902348697727011'))
print("Success!")

[1] "Success!"


#### Question 2.2

Use `[]` to extract everything from the `alphabet` vector, except the third entry. Assign this to a2.2

```
a2.2 <- alphabet[FILL_THIS_IN]
```

In [13]:
a2.2 <- alphabet[-3]

In [14]:
test_that("Question 2.2", expect_known_hash(a2.2, 'e637ddc1874226525a2c12063956edfa'))
print("Success!")

[1] "Success!"


#### Question 2.3

Extract the 2nd to 19th entry of `alphabet` – make use of `:` to construct sequential vectors. Assign to the object `a2.3`

```
a2.3 <- alphabet[FILL_THIS_IN]
```

In [15]:
a2.3 <- alphabet[2:19]

In [16]:
test_that("Question 2.3", expect_known_hash(a2.3, '716b4c1a6fce07eb1bae341b04999f22'))
print("Success!")

[1] "Success!"


#### Question 2.4

Replace the second entry of `alphabet` with the character string "This is where B is". Assign **in place** (i.e. do not create a new object, but change the existing `alphabet` vector).

```
alphabet[FILL_THIS_IN] <- FILL_THIS_IN
alphabet
```

In [17]:
alphabet[2] <-  "This is where B is"
alphabet

In [18]:
test_that("Question 2.5", expect_known_hash(enc2utf8(alphabet), 'a9acf5bcf0a199fd37259f8f1b56487a'))
print("Success!")

[1] "Success!"


#### Question 2.5

With the newly altered alphabet vector created from Q2.4, create a new vector called `a2.5`, where it is the same vector, but repeated twice. Hint: use `c()`

```
a2.5 <- FILL_THIS_IN(FILL_THIS_IN)
```

In [24]:
a2.5 <- rep(alphabet,times=2)

In [25]:
test_that("Question 2.5", expect_known_hash(enc2utf8(a2.5), '34c18fc97d296e3c4f6dc28b56a62b91'))
print("Success!")

[1] "Success!"


#### Question 2.6

Load the mtcars dataset. 

1. Extract the vector of `mpg` values using the `$` operator 
2. Extract the 2nd to 24th elements, inclusively.
3. Assign this to the object `a2.6`. 

```
(a2.6 <- mtcars$FILL_THIS_IN[FILL_THIS_IN])
```

In [26]:
(a2.6 <- mtcars$mpg[2:24])

In [27]:
test_that("Question 2.6", expect_known_hash(a2.6, 'd065f0e9275a4ff106485010c3fd1c2c'))
print("Success!")

[1] "Success!"


#### Question 2.7

Using `mtcars` again,

1. Extract the vector of `wt` values using the `$` operator.
2. Replace each value with `TRUE` if the value is greater than 3.4, and `FALSE` otherwise.
3. Assign this logical vector to the R object `a2.7`

```
(a2.7 <- mtcars$FILL_THIS_IN) 
```

In [None]:
# your code here
fail() # No Answer - remove if you provide an answer

In [None]:
test_that("Question 2.7", expect_known_hash(a2.7, 'c51a40941e57f8892fe413bee95fd8d5'))
print("Success!")

#### Question 2.8

Using `mtcars`,

1. For the rows where mpg > 20, replace the car weight entries (`wt`) with the number `1000`
2. Assign this new *tibble* as as the object a2.8

```
a2.8 <- mtcars
a2.8$wt[FILL_THIS_IN] <- FILL_THIS_IN
a2.8
```

In [None]:
# your code here
fail() # No Answer - remove if you provide an answer

In [None]:
test_that("Question 2.8", expect_known_hash(a2.8$wt, '4e56aa37c9e2888594cc7c360f784b70'))
print("Success!")

## TOPIC 3: Reprex

If you haven't done question 1.4, then do the following before attempting questions for this section:

- in your file browser go to the location returned by `here::here()` and create a folder called **"worksheet_06a_data"**.

#### Question 3.1

Create a reprex for the code and output of `mean(rnorm(10))`. Specify the output folder to be **worksheet_06_data**.

After doing this, I encourage you to open up the output files and take a look at your reprex!

```
reprex(FILL_THIS_IN, 
       outfile = here::here('FILL_THIS_IN', 'reprex.md'))
```


In [23]:
reprex(mean(rnorm(10)), 
       outfile = here::here('worksheet_06_data', 'reprex.md'))

“cannot open file '/home/jupyter/worksheet_06_data/reprex_reprex.R': No such file or directory”


ERROR: Error in file(con, "w"): cannot open the connection


In [None]:
test_that("Question 3.1", {
    expect_true(file.exists(here::here('worksheet_06a_data', 'reprex_reprex.md')))
    expect_true(file.exists(here::here('worksheet_06a_data', 'reprex_reprex.R')))
    expect_known_hash(gsub('\\s', '', paste0(readLines(here::here('worksheet_06a_data', 'reprex_reprex.R')), collapse = '')), 
                      '41311107e8e35738e6e60c14e8d78a65')
})
print("Success!")