# Worksheet A-6: File Input/Output

## Getting Started

Load the requirements for this worksheet:

In [None]:
suppressPackageStartupMessages(library(tidyverse))
suppressPackageStartupMessages(library(gapminder))
suppressPackageStartupMessages(library(here))
suppressPackageStartupMessages(library(testthat))
suppressPackageStartupMessages(library(reprex))

The following code chunk has been unlocked, to give you the flexibility to start this document with some of your own code. Remember, it's bad manners to keep a call to `install.packages()` in your source code, so don't forget to delete these lines if you ever need to run them.

In [None]:
# An unlocked code cell.

# Part 1: Writing and reading data from disk

For writing R objects to your computer, and reading tabular data into R, we can use the `tidyverse` package `readr`, which is loaded when running `library(tidyverse)`.

Let's filter the data only from 2007 and only in the Asia continent and save it to a variable.

In [None]:
gap_asia_2007 <- gapminder %>% 
  filter(year == 2007, continent == "Asia")
head(gap_asia_2007)

## Question 1.1

Write `gap_asia_2007` to a comma-separated value (csv) file named `exported_file.csv` with just one command:

```
write_csv(FILL_THIS_IN, "exported_file.csv")
```

Note: no need to make any variables for this question. Check out your files after executing this line!

In [None]:
# your code here
fail() # No Answer - remove if you provide an answer

In [None]:
test_that("Question 1.1", {
    expect_true(file.exists('exported_file.csv'))
    with(read.table('exported_file.csv', header = TRUE, sep = ',', stringsAsFactors = FALSE), {
        ctr_order <- order(enc2utf8(country), method = 'radix')
        expect_known_hash(country[ctr_order], '502e6665c327bdbc211f89c785ee853b')
        expect_known_hash(as.integer(pop[ctr_order]), '8bb3c4cc0e3a3380ff82cbd9fe83b2cb')
    })
})

## Question 1.2

Let's use the function `read_csv` to read in `exported_file.csv` back into R and store this as the variable `gap_asia_2007_in`.

```
gap_asia_2007_in <- read_csv("FILL_THIS_IN")

```

In [None]:
# your code here
fail() # No Answer - remove if you provide an answer
head(gap_asia_2007_in)

In [None]:
test_that("Question 1.2", {
    expect_known_hash(colnames(gap_asia_2007_in), 'cc76c54ddad925d63e472c77cd7bd7bf')
    expect_known_hash(sapply(gap_asia_2007_in, typeof), '68eb6593a9f582ea9b4aec7862df6be4')
    with(gap_asia_2007_in, {
        ctr_order <- order(enc2utf8(country), method = 'radix')
        expect_known_hash(country[ctr_order], '502e6665c327bdbc211f89c785ee853b')
        expect_known_hash(unique(continent), 'a500021b40bafb5d1ad20bed151aab68')
        expect_known_hash(round(lifeExp[ctr_order], 2), '9da5c364cf95548c95ea94de3193202b')
    })
})

Notice the output of running `read_csv`. This tells us about the types of variables that were read in. It's a good habit to check this every time you run a `read_` function. Sometimes we might want to change how these variable types are specified...

## Question 1.3

Let's use the function `read_csv` to read in `exported_file.csv` back into R and store this as the variable `gap_asia_2007_in_cspec`.

But! This time let's specify that we want the:

- columns `country` and `continent` to be `factors`
- all other column specification to be automatically determined by `read_csv`

```
gap_asia_2007_in_cspec <- FILL_THIS_IN(
  "FILL_THIS_IN.csv", 
  col_types = cols(
    country = col_factor(),
    continent = FILL_THIS_IN
  ))
```

In [None]:
# your code here
fail() # No Answer - remove if you provide an answer
head(gap_asia_2007_in_cspec)

In [None]:
test_that("Question 1.2", {
    expect_known_hash(sapply(gap_asia_2007_in_cspec, typeof), 'd3ed7d3a07fad8143eb7dd22d88d62a3')
    with(gap_asia_2007_in_cspec, {
        expect_known_hash(sort(enc2utf8(levels(country)), method = 'radix'), '502e6665c327bdbc211f89c785ee853b')
        expect_known_hash(as.integer(continent), 'ccdd4647040ccea8f1863ae5e101edf9')
    })
})

## Question 1.4

First, run the function `here::here()`. Note where this location is on your computer.

In [None]:
here::here()

Second, in the location returned by `here::here()`, create a folder called **"worksheet_06a_data"**. You can do that manually using your file browswer, or by executing the following code:

In [None]:
dir.create(here::here("worksheet_06a_data"))

Your task now is to write the tibble `gap_asia_2007` to a `csv` file in your newly created folder. Name your file `here_exported_file.csv`. 

```
write_csv(gap_asia_2007, FILL_THIS_IN("worksheet_06a_data", FILL_THIS_IN.csv))
```

In [None]:
# your code here
fail() # No Answer - remove if you provide an answer
# View files in the worksheet_06a_data folder:
dir(here::here("worksheet_06a_data"))

In [None]:
test_that("Question 1.4", {
    expect_true(dir.exists(here::here('worksheet_06a_data')))
    expect_true(file.exists(here::here('worksheet_06a_data', 'here_exported_file.csv')))
    expect_setequal(
        unname(tools::md5sum("exported_file.csv")), 
        unname(tools::md5sum(here::here('worksheet_06a_data', 'here_exported_file.csv')))
    )
    with(read.table(here::here('worksheet_06a_data', 'here_exported_file.csv'), 
                    header = TRUE, sep = ',', stringsAsFactors = FALSE), {
        ctr_order <- order(enc2utf8(country), method = 'radix')
        expect_known_hash(country[ctr_order], '502e6665c327bdbc211f89c785ee853b')
        expect_known_hash(round(gdpPercap[ctr_order], 2), '78771a63570dc79433e9587793969a73')
    })
})

### Attribution

Assembled by Victor Yuan, reviewed by Almas Khan, and assisted by David Kepplinger.