# Data file I/O

R has native functions for reading and writing files: `read.table, write.table, read.csv, write.csv, save, load, readRDS, saveRDS, readLines, writeLines`, `scan`.

The `read.table`, and `write.table` functions are the workhorse text file i/o functions in R and `read.csv` and `write.csv` are based on these.

In [25]:
# Write iris table in tab-separated format
write.table(iris, file = "iris.tsv", sep = "\t", 
            row.names = FALSE, col.names = TRUE)

In [26]:
read.table("iris.tsv", sep = "\t", header = TRUE)[1:6,] -> hi
print(ih)

  Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1          5.1         3.5          1.4         0.2  setosa
2          4.9         3.0          1.4         0.2  setosa
3          4.7         3.2          1.3         0.2  setosa
4          4.6         3.1          1.5         0.2  setosa
5          5.0         3.6          1.4         0.2  setosa
6          5.4         3.9          1.7         0.4  setosa


Columns containing strings are auto-converted to `factor` variables by R when reading the data in or creating dataframes. This behaviour can be adjusted using the `options` function.

In [27]:
# Species is a factor
class(ih$Species)

In [28]:
# View the stirngsAsFactors option:
print(options("stringsAsFactors"))

$stringsAsFactors
[1] FALSE



In [22]:
# Adjust to retain string format
options("stringsAsFactors" = FALSE)

In [29]:
# Now strings are read as character
class(read.table("iris.tsv", sep = "\t", header = TRUE)$Species)

In [30]:
# Reset to default
options("stringsAsFactors" = TRUE)

# Exercise 2.1

**Question 1**

The `read.csv` and `write.csv` are functions for CSV file i/o and are based on `read.table` and `write.table`. Use the `write.csv` function to write the last 20 rows of the `iris` dataset to file `"ir.csv"` and then use the `read.csv` function to read them back into a data frame. Hint: use R's help to search the functions usage.

**Question 2**

The `save` and `load` functions are i/o function for binary data. They save not only the data but also its name in the R environment. These functions can also save multiple objects.

Select rows 30 to 70 of the `iris` dataset and assign it to the variable `ir`. Now use the `cor` function to calculate the correlation of the first four columns of the `ir` dataset and assign it to `ir_cor` then print them both to the console. Use the `save` function to write the `ir` and `ir_cor` objects to file `"ir.RData"`. Now overwrite `ir` with rows 80 to 110 of the `iris` dataset and overwrite `ir_cor` with the correlation matrix of the first four columns of the new `ir` table. Print them both to the console.

Use the `load` function to load `"ir.RData"` and print `ir` and `ir_cor` to the console. What has happened? *Hint: Use R's help to look up the function's usage*


Use the `save` function to save your R session to a file `"r_sess.RData"`. <br/>*Hint: the `ls` function lists all the variables in your environment.*

Open a new R console and load the `"r_sess.RData"` data file. Make sure that all the object have been loaded. Then close the session. Delete the file using the unlink function.


**Question 3**

The `readRDS` and `writeRDS` functions are functions for binary data i/o but are for single objects. Write `ir` from the previous question to a file `"ir.rds"` using the `writeRDS` function and delete it from your R session. Use `ls` to show that it's gone.

Use `readRDS` to load the `ir.rds` dataset. How is this different from the way `save` and `load` works?

**Question 4**

The `readLines` and `writeLines` functions read and write lines of text files. Use the `readLines` function to read the first 5 lines from the `ir.csv` file created in **Question 1** and assign it to a variable `ir5`, then print it to the console.

Now write the `ir5` dataset to a file `"ir5.txt"` using the `writeLines` function check the file to make sure that the data is written.

Hint: use R's help to search for the `readLines` and `writeLines` functions.

**Question 5**

The `scan` function is a low-level function for reading text files. Use the `scan` function to read `"ir.csv"` file written in **Question 1**. Hint: you will need to specify three parameters as well as the file name. The `sep` paramter is the column separator to use, `what` is a list of column types represented by values for instance `0` for a numeric column and `""` for a text column (`what` can be a named list where the names are the table headers), the `skip` parameter allows you to skip lines in the file - you'll want to skip the header line.