<a href="https://colab.research.google.com/github/ZhenYuan2002/R/blob/main/Data_Input_%26_Output.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Data Input / Output**

- General structure of input/output function
- Saving and loading files generated by R
- Data files from R packages
- Importing external files
- Importing data from relational databases
- Using SQL from R
- Data governance

In any data analysis and reporting project, the ability to seamlessly import and export data is essential. This chapter covers the foundational process of data input and output (I/O) in R.


In [1]:
# Check, install and load library management package "pacman" if needed
if (!requireNamespace("pacman", quietly = TRUE)) {
  install.packages("pacman")
}
if (!"pacman" %in% loadedNamespaces()) {
      library("pacman", character.only = TRUE)
    }
p_load(forecast
       , tidyverse
       , readxl
       , fst
      )

Installing package into ‘/usr/local/lib/R/site-library’
(as ‘lib’ is unspecified)

Installing package into ‘/usr/local/lib/R/site-library’
(as ‘lib’ is unspecified)

also installing the dependencies ‘xts’, ‘TTR’, ‘quadprog’, ‘quantmod’, ‘colorspace’, ‘fracdiff’, ‘lmtest’, ‘timeDate’, ‘tseries’, ‘urca’, ‘zoo’, ‘RcppArmadillo’



forecast installed

Installing package into ‘/usr/local/lib/R/site-library’
(as ‘lib’ is unspecified)

also installing the dependency ‘fstcore’



fst installed



# **General structure of input/output functions**

In R, many functions related to I/O are intuitively named. There are a variety of functions, and all of them include either of the following keywords:

- read or load, which specifies that these functions are useful for importing objects into R
- write or save, which indicates that these functions are useful for exporting objects into R.

When creating folders after checking for their existance, wrapping around within an if/else block reduces the chance of accidentally deleting or overwriting folders.
- Can run this code block multiple times while keeping the exsiting folder and its component intact.

Saving the location in an object, e.g. using data_path as the location for write and read files, will save significant typing time in the following examples.

You can use any other location of your choice by changing the path or file arguments inside a function. Now, you can programmatically check the existance of a folder, and create one if needed.

In [None]:
# Check what objects are available in the R global environment
ls()

# Clear the seassion / workspace
rm(list = ls())

# Check root directory
getwd()

# Check the existence of a specific folder
dir.exists("folder_name")

# Create a folder inside the project folder
dir.create("folder_name")

# Check if the folder exist, and create it if doesn't
if (dir.exists(file.path(getwd(), "subfolder_name", "folder_name"))){
  data_path <- file.path(getwd(), "subfolder_name", "folder_name") # If avail, sets full path as data_path
} else { # If not available, creates the folder and saves it to data_path
  dir.create(file.path(getwd(), "subfolder_name", "folder_name"),
             recursive = TRUE)
  data_path <- file.path(getwd(), "subfolder_name", "folder_name")
}

# **Saving and loading files generated by R**

The saveRDS() and readRDS() functions save and load any R objects with the .rds or .RDS() extension.
- These functions are commonly used to save model objects and then load them later for further analysis.
- The saveRDS() function predominantly saves a single object.

When multiple objects are loaded from the same RDS file, you can access them in the standard way you extract elements from a list.

In [None]:
# Saving a single object
saveRDS(object_name, file = file.path(data_path, "object_name.rds"))

# Save multiple objects in one file by making a list
saveRDS(list(object_name1, object_name2),
        file = filepath(data_path, "object_name12.rds"))

# Extract multiple objects that are loaded from the same RDS file
multiple_objects <- readRDS(file.path(data.path, "/object_name12.rds"))
df1 <- multiple_objects[[1]]
df2 <- multiple_objects[[2]]

Data files from R packages

Many R libraries are shipped with some data files in them. Those data files are available when the package is loaded in an R session.