# Worksheet A-6: File Input/Output

## Getting Started

Load the requirements for this worksheet:

In [1]:
suppressPackageStartupMessages(library(tidyverse))
suppressPackageStartupMessages(library(gapminder))
suppressPackageStartupMessages(library(here))
suppressPackageStartupMessages(library(testthat))
suppressPackageStartupMessages(library(reprex))

The following code chunk has been unlocked, to give you the flexibility to start this document with some of your own code. Remember, it's bad manners to keep a call to `install.packages()` in your source code, so don't forget to delete these lines if you ever need to run them.

In [4]:
#install.packages("here")

Installing package into 'C:/Users/41615/Documents/R/win-library/4.1'
(as 'lib' is unspecified)



package 'here' successfully unpacked and MD5 sums checked

The downloaded binary packages are in
	C:\Users\41615\AppData\Local\Temp\RtmpU7uSlt\downloaded_packages


# Part 1: Writing and reading data from disk

For writing R objects to your computer, and reading tabular data into R, we can use the `tidyverse` package `readr`, which is loaded when running `library(tidyverse)`.

Let's filter the data only from 2007 and only in the Asia continent and save it to a variable.

In [19]:
gap_asia_2007 <- gapminder %>% 
  filter(year == 2007, continent == "Asia")
head(gap_asia_2007)

country,continent,year,lifeExp,pop,gdpPercap
<fct>,<fct>,<int>,<dbl>,<int>,<dbl>
Afghanistan,Asia,2007,43.828,31889923,974.5803
Bahrain,Asia,2007,75.635,708573,29796.0483
Bangladesh,Asia,2007,64.062,150448339,1391.2538
Cambodia,Asia,2007,59.723,14131858,1713.7787
China,Asia,2007,72.961,1318683096,4959.1149
"Hong Kong, China",Asia,2007,82.208,6980412,39724.9787


## Question 1.1

Write `gap_asia_2007` to a comma-separated value (csv) file named `exported_file.csv` with just one command:

```
write_csv(FILL_THIS_IN, "exported_file.csv")
```

Note: no need to make any variables for this question. Check out your files after executing this line!

In [20]:
write_csv(gap_asia_2007, "exported_file.csv")

In [4]:
test_that("Question 1.1", {
    expect_true(file.exists('exported_file.csv'))
    with(read.table('exported_file.csv', header = TRUE, sep = ',', stringsAsFactors = FALSE), {
        ctr_order <- order(enc2utf8(country), method = 'radix')
        expect_known_hash(country[ctr_order], '502e6665c327bdbc211f89c785ee853b')
        expect_known_hash(as.integer(pop[ctr_order]), '8bb3c4cc0e3a3380ff82cbd9fe83b2cb')
    })
})

[32mTest passed[39m 


## Question 1.2

Let's use the function `read_csv` to read in `exported_file.csv` back into R and store this as the variable `gap_asia_2007_in`.

```
gap_asia_2007_in <- read_csv("FILL_THIS_IN")

```

In [21]:
gap_asia_2007_in <- read_csv("exported_file.csv")
head(gap_asia_2007_in)

[1m[1mRows: [1m[22m[34m[34m33[34m[39m [1m[1mColumns: [1m[22m[34m[34m6[34m[39m

[36m--[39m [1m[1mColumn specification[1m[22m [36m------------------------------------------------------------------------------------------------[39m
[1mDelimiter:[22m ","
[31mchr[39m (2): country, continent
[32mdbl[39m (4): year, lifeExp, pop, gdpPercap


[36mi[39m Use [30m[47m[30m[47m`spec()`[47m[30m[49m[39m to retrieve the full column specification for this data.
[36mi[39m Specify the column types or set [30m[47m[30m[47m`show_col_types = FALSE`[47m[30m[49m[39m to quiet this message.



country,continent,year,lifeExp,pop,gdpPercap
<chr>,<chr>,<dbl>,<dbl>,<dbl>,<dbl>
Afghanistan,Asia,2007,43.828,31889923,974.5803
Bahrain,Asia,2007,75.635,708573,29796.0483
Bangladesh,Asia,2007,64.062,150448339,1391.2538
Cambodia,Asia,2007,59.723,14131858,1713.7787
China,Asia,2007,72.961,1318683096,4959.1149
"Hong Kong, China",Asia,2007,82.208,6980412,39724.9787


In [22]:
test_that("Question 1.2", {
    expect_known_hash(colnames(gap_asia_2007_in), 'cc76c54ddad925d63e472c77cd7bd7bf')
    expect_known_hash(sapply(gap_asia_2007_in, typeof), '68eb6593a9f582ea9b4aec7862df6be4')
    with(gap_asia_2007_in, {
        ctr_order <- order(enc2utf8(country), method = 'radix')
        expect_known_hash(country[ctr_order], '502e6665c327bdbc211f89c785ee853b')
        expect_known_hash(unique(continent), 'a500021b40bafb5d1ad20bed151aab68')
        expect_known_hash(round(lifeExp[ctr_order], 2), '9da5c364cf95548c95ea94de3193202b')
    })
})

[32mTest passed[39m 


Notice the output of running `read_csv`. This tells us about the types of variables that were read in. It's a good habit to check this every time you run a `read_` function. Sometimes we might want to change how these variable types are specified...

## Question 1.3

Let's use the function `read_csv` to read in `exported_file.csv` back into R and store this as the variable `gap_asia_2007_in_cspec`.

But! This time let's specify that we want the:

- columns `country` and `continent` to be `factors`
- all other column specification to be automatically determined by `read_csv`

```
gap_asia_2007_in_cspec <- FILL_THIS_IN(
  "FILL_THIS_IN.csv", 
  col_types = cols(
    country = col_factor(),
    continent = FILL_THIS_IN
  ))
```

In [23]:
gap_asia_2007_in_cspec <- read_csv(
  "exported_file.csv", 
  col_types = cols(
    country = col_factor(),
    continent = col_factor()
  ))
head(gap_asia_2007_in_cspec)

country,continent,year,lifeExp,pop,gdpPercap
<fct>,<fct>,<dbl>,<dbl>,<dbl>,<dbl>
Afghanistan,Asia,2007,43.828,31889923,974.5803
Bahrain,Asia,2007,75.635,708573,29796.0483
Bangladesh,Asia,2007,64.062,150448339,1391.2538
Cambodia,Asia,2007,59.723,14131858,1713.7787
China,Asia,2007,72.961,1318683096,4959.1149
"Hong Kong, China",Asia,2007,82.208,6980412,39724.9787


In [24]:
test_that("Question 1.2", {
    expect_known_hash(sapply(gap_asia_2007_in_cspec, typeof), 'd3ed7d3a07fad8143eb7dd22d88d62a3')
    with(gap_asia_2007_in_cspec, {
        expect_known_hash(sort(enc2utf8(levels(country)), method = 'radix'), '502e6665c327bdbc211f89c785ee853b')
        expect_known_hash(as.integer(continent), 'ccdd4647040ccea8f1863ae5e101edf9')
    })
})

[32mTest passed[39m 


## Question 1.4

First, run the function `here::here()`. Note where this location is on your computer.

In [25]:
here::here()

Second, in the location returned by `here::here()`, create a folder called **"worksheet_06a_data"**. You can do that manually using your file browswer, or by executing the following code:

In [26]:
dir.create(here::here("worksheet_06a_data"))

"'C:\Users\41615\worksheet_a06.ipynb\worksheet_06a_data' already exists"


Your task now is to write the tibble `gap_asia_2007` to a `csv` file in your newly created folder. Name your file `here_exported_file.csv`. 

```
write_csv(gap_asia_2007, FILL_THIS_IN("worksheet_06a_data", FILL_THIS_IN.csv))
```

In [27]:
write_csv(gap_asia_2007, here::here("worksheet_06a_data", "here_exported_file.csv"))
# View files in the worksheet_06a_data folder:
dir(here::here("worksheet_06a_data"))

In [29]:
test_that("Question 1.4", {
    expect_true(dir.exists(here::here('worksheet_06a_data')))
    expect_true(file.exists(here::here('worksheet_06a_data', 'here_exported_file.csv')))
    expect_setequal(
        unname(tools::md5sum("exported_file.csv")), 
        unname(tools::md5sum(here::here('worksheet_06a_data', 'here_exported_file.csv')))
    )
    with(read.table(here::here('worksheet_06a_data', 'here_exported_file.csv'), 
                    header = TRUE, sep = ',', stringsAsFactors = FALSE), {
        ctr_order <- order(enc2utf8(country), method = 'radix')
        expect_known_hash(country[ctr_order], '502e6665c327bdbc211f89c785ee853b')
        expect_known_hash(round(gdpPercap[ctr_order], 2), '78771a63570dc79433e9587793969a73')
    })
})

[32mTest passed[39m 


# TOPIC 2: Base R

For this section, avoid using `tidyverse` functions when possible.

## Question 2.1

First, let's assign the alphabet to the vector `alphabet`:

In [30]:
(alphabet <- LETTERS)

Use `[]` to subset the 3rd and 7th element of the vector `alphabet`, and assign this to an R object called `a2.1`. 

```
a2.1 <- alphabet[FILL_THIS_IN]
```

In [31]:
a2.1 <- alphabet[c(3,7)]
a2.1

In [32]:
test_that("Question 2.1", expect_known_hash(a2.1, '548ed661cd18d7e1c902348697727011'))

[32mTest passed[39m 


## Question 2.2

Use `[]` to extract everything from the `alphabet` vector, except the third entry. Assign this to `a2.2`

```
a2.2 <- alphabet[FILL_THIS_IN]
```

In [33]:
a2.2 <- alphabet[-c(3)]
a2.2

In [34]:
test_that("Question 2.2", expect_known_hash(a2.2, 'e637ddc1874226525a2c12063956edfa'))

[32mTest passed[39m 


## Question 2.3

Extract the 2nd to 19th entry of `alphabet` – make use of `:` to construct sequential vectors. Assign to the object `a2.3`

```
a2.3 <- alphabet[FILL_THIS_IN]
```

In [35]:
a2.3 <- alphabet[2:19]
a2.3

In [36]:
test_that("Question 2.3", expect_known_hash(a2.3, '716b4c1a6fce07eb1bae341b04999f22'))

[32mTest passed[39m 


## Question 2.4

Replace the second entry of `alphabet` with the character string "This is where B is". Assign **in place** (i.e. do not create a new object, but change the existing `alphabet` vector).

```
alphabet[FILL_THIS_IN] <- FILL_THIS_IN
```

In [37]:
alphabet[c(2)] <- "This is where B is"
alphabet

In [38]:
test_that("Question 2.5", expect_known_hash(enc2utf8(alphabet), 'a9acf5bcf0a199fd37259f8f1b56487a'))

[32mTest passed[39m 


## Question 2.5

With the newly altered alphabet vector created from Q2.4, create a new vector called `a2.5`, where it is the same vector, but repeated twice. Hint: use `c()`

```
a2.5 <- FILL_THIS_IN(FILL_THIS_IN)
```

In [39]:
a2.5 <- rep(c(alphabet), 2)
a2.5

In [40]:
test_that("Question 2.5", expect_known_hash(enc2utf8(a2.5), '34c18fc97d296e3c4f6dc28b56a62b91'))
print("Success!")

[32mTest passed[39m 
[1] "Success!"


## Question 2.6

Load the mtcars dataset. 

1. Extract the vector of `mpg` values using the `$` operator 
2. Extract the 2nd to 24th elements, inclusively.
3. Assign this to the object `a2.6`. 

```
a2.6 <- mtcars$FILL_THIS_IN[FILL_THIS_IN]
```

In [41]:
a2.6 <- mtcars$mpg[c(2:24)]
a2.6

In [48]:
test_that("Question 2.6", expect_known_hash(a2.6, 'd065f0e9275a4ff106485010c3fd1c2c'))
print("Success!")

[32mTest passed[39m 
[1] "Success!"


## Question 2.7

Using `mtcars` again,

1. Extract the vector of `wt` values using the `$` operator.
2. Replace each value with `TRUE` if the value is greater than 3.4, and `FALSE` otherwise.
3. Assign this logical vector to the R object `a2.7`

```
a2.7 <- mtcars$FILL_THIS_IN
```

In [50]:
a2.7 <- ifelse(mtcars$wt > 3.4, TRUE, FALSE)
a2.7

In [51]:
test_that("Question 2.7", expect_known_hash(a2.7, 'c51a40941e57f8892fe413bee95fd8d5'))
print("Success!")

[32mTest passed[39m 
[1] "Success!"


## Question 2.8

Using `mtcars`,

1. For the rows where mpg > 20, replace the car weight entries (`wt`) with the number `1000`
2. Assign this new *tibble* as as the object a2.8

```
a2.8 <- mtcars
a2.8$wt[FILL_THIS_IN] <- FILL_THIS_IN
```

In [52]:
a2.8 <- mtcars
a2.8$wt[a2.8$mpg > 20] <- 1000
head(a2.8)

Unnamed: 0_level_0,mpg,cyl,disp,hp,drat,wt,qsec,vs,am,gear,carb
Unnamed: 0_level_1,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>
Mazda RX4,21.0,6,160,110,3.9,1000.0,16.46,0,1,4,4
Mazda RX4 Wag,21.0,6,160,110,3.9,1000.0,17.02,0,1,4,4
Datsun 710,22.8,4,108,93,3.85,1000.0,18.61,1,1,4,1
Hornet 4 Drive,21.4,6,258,110,3.08,1000.0,19.44,1,0,3,1
Hornet Sportabout,18.7,8,360,175,3.15,3.44,17.02,0,0,3,2
Valiant,18.1,6,225,105,2.76,3.46,20.22,1,0,3,1


In [54]:
test_that("Question 2.8", expect_known_hash(a2.8$wt, '4e56aa37c9e2888594cc7c360f784b70'))
print("Success!")

[32mTest passed[39m 
[1] "Success!"


# TOPIC 3: Reprex

If you haven't done question 1.4, then do the following before attempting questions for this section:

- in your file browser go to the location returned by `here::here()` and create a folder called **"worksheet_06a_data"**.

## Question 3.1

Create a reprex for the code and output of `mean(rnorm(10))`. Specify the output folder to be **worksheet_06_data**.

After doing this, I encourage you to open up the output files and take a look at your reprex!

```
reprex({
   FILL_THIS_IN
}, 
   outfile = here::here('FILL_THIS_IN', 'reprex.md'))
```


In [62]:
reprex(
   mean(rnorm(10))
, 
   outfile = here::here('worksheet_06a_data', 'reprex.md'))

[31mx[39m Install the [34m[34mstyler[34m[39m package in order to use [30m[47m[30m[47m`style = TRUE`[47m[30m[49m[39m.

[36mi[39m Non-interactive session, setting [30m[47m[30m[47m`html_preview = FALSE`[47m[30m[49m[39m.

[33m![39m [30m[47m[30m[47m`outfile`[47m[30m[49m[39m is deprecated

[33m![39m To control output filename, provide a filepath to [30m[47m[30m[47m`input`[47m[30m[49m[39m

[33m![39m Only taking working directory from [30m[47m[30m[47m`outfile`[47m[30m[49m[39m

[32mv[39m Preparing reprex as [30m[47m[30m[47m`.R`[47m[30m[49m[39m file:

  [34m[34mC:/Users/41615/worksheet_a06.ipynb/worksheet_06a_data/ash-crane_reprex.R[34m[39m

[36mi[39m Rendering reprex...

[32mv[39m Writing reprex file:

  [34m[34mC:/Users/41615/worksheet_a06.ipynb/worksheet_06a_data/ash-crane_reprex.md[34m[39m



In [61]:
test_that("Question 3.1", {
  # function to get thE latest file created based on a pattern
  most_recent = function(pattern) {
    tail(list.files( here::here('worksheet_06a_data'), patt= pattern),1)
  }
  expect_true(file.exists(here::here('worksheet_06a_data', most_recent(pattern = "reprex.R"))))
  expect_true(file.exists(here::here('worksheet_06a_data', most_recent(pattern = "reprex.md"))))
  expect_known_hash(gsub('\\s', '', paste0(readLines(here::here('worksheet_06a_data', most_recent(pattern = "reprex.R"))), collapse = '')),
                  '06b26c23099d3f851b76985c13f20dcc')
})

[32mTest passed[39m 


### Attribution

Assembled by Victor Yuan, reviewed by Almas Khan, and assisted by David Kepplinger.