
# Import and Export
* Author: Johannes Maucher
* Last Update: 2017-08-09, some modifications by OK in 2019

There are also functions for import and export of data in **tidyverse**, but we prefer here the functions of Base R. 


In [1]:
library(tidyverse)

-- [1mAttaching packages[22m --------------------------------------- tidyverse 1.2.1 --
[32mv[39m [34mggplot2[39m 3.2.0     [32mv[39m [34mpurrr  [39m 0.3.2
[32mv[39m [34mtibble [39m 2.1.3     [32mv[39m [34mdplyr  [39m 0.8.3
[32mv[39m [34mtidyr  [39m 0.8.3     [32mv[39m [34mstringr[39m 1.4.0
[32mv[39m [34mreadr  [39m 1.3.1     [32mv[39m [34mforcats[39m 0.4.0
-- [1mConflicts[22m ------------------------------------------ tidyverse_conflicts() --
[31mx[39m [34mdplyr[39m::[32mfilter()[39m masks [34mstats[39m::filter()
[31mx[39m [34mdplyr[39m::[32mlag()[39m    masks [34mstats[39m::lag()


## Import data from textfile

A textfile can be read into a list of character-variables by applying the `readLines()`-function. Each element of the returned list contains a single line of the textfile.


In [2]:
textList <- readLines("../data/sampleTextfile.txt")
textList
cat("Number of rows in the file: ",length(textList))

Number of rows in the file:  3

The list of character-strings can be transformed into a single character-string variable by applying the `paste()`-function as follows:

In [3]:
text <- paste(textList, collapse=" ")
text
cat("Number of characters in the text: ", nchar(text))

Number of characters in the text:  245

## Import data from .csv

Read contents of .csv file into an R data frame with [`read.csv()`](https://stat.ethz.ch/R-manual/R-devel/library/utils/html/read.table.html). Some of the arguments are:

* 'file' define the path and CSV filename to read
* 'header' define a logical value to also import the header row as column names
* 'stringsAsFactors' define a logical value to convert string values in factors
* 'sep' define the field separator character of the columns
* 'dec' define the character used for decimal points
* 'col.names' define a vector of optional column names
* 'row.names' define a vector of optional row names

In [4]:
energyData <- read.csv(file="../data/EnergyMixGeoClust.csv", 
                       header=TRUE, sep=",",row.names=1)
names(energyData)
row.names(energyData)

In [5]:
glimpse(energyData)

Observations: 65
Variables: 11
$ Country   [3m[90m<fct>[39m[23m US, Canada, Mexico, Argentina, Brazil, Chile, Colombia, E...
$ Oil       [3m[90m<dbl>[39m[23m 842.9, 97.0, 85.6, 22.3, 104.3, 15.4, 8.8, 9.9, 8.5, 27.4...
$ Gas       [3m[90m<dbl>[39m[23m 588.7, 85.2, 62.7, 38.8, 18.3, 3.0, 7.8, 0.4, 3.1, 26.8, ...
$ Coal      [3m[90m<dbl>[39m[23m 498.0, 26.5, 6.8, 1.1, 11.7, 4.1, 3.1, 0.0, 0.5, 0.0, 2.3...
$ Nuclear   [3m[90m<dbl>[39m[23m 190.2, 20.3, 2.2, 1.8, 2.9, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0,...
$ Hydro     [3m[90m<dbl>[39m[23m 62.2, 90.2, 6.0, 9.2, 88.5, 5.6, 9.3, 2.1, 4.5, 19.5, 8.3...
$ Total2009 [3m[90m<dbl>[39m[23m 2182.0, 319.2, 163.2, 73.3, 225.7, 28.1, 29.0, 12.4, 16.6...
$ CO2Emm    [3m[90m<dbl>[39m[23m 5941.9, 602.7, 436.8, 164.2, 409.4, 70.3, 57.9, 31.3, 35....
$ Lat       [3m[90m<dbl>[39m[23m 37.090240, 56.130366, 23.634501, -38.416097, -14.235004, ...
$ Long      [3m[90m<dbl>[39m[23m -95.712891, -106.346771, -102.552784, -63.616672, 

Statistics of a data frame:

In [6]:
summary(energyData)

       Country        Oil              Gas              Coal        
 Algeria   : 1   Min.   :  1.00   Min.   :  0.00   Min.   :   0.00  
 Argentina : 1   1st Qu.:  9.70   1st Qu.:  4.00   1st Qu.:   0.70  
 Australia : 1   Median : 20.20   Median : 17.80   Median :   4.10  
 Austria   : 1   Mean   : 55.66   Mean   : 39.15   Mean   :  49.45  
 Azerbaijan: 1   3rd Qu.: 62.00   3rd Qu.: 38.40   3rd Qu.:  26.50  
 Bangladesh: 1   Max.   :842.90   Max.   :588.70   Max.   :1537.40  
 (Other)   :59                                                      
    Nuclear            Hydro          Total2009          CO2Emm      
 Min.   :  0.000   Min.   :  0.00   Min.   :   3.9   Min.   :   3.5  
 1st Qu.:  0.000   1st Qu.:  0.30   1st Qu.:  24.2   1st Qu.:  57.9  
 Median :  0.000   Median :  2.10   Median :  60.8   Median : 148.0  
 Mean   :  9.365   Mean   : 10.35   Mean   : 164.0   Mean   : 458.6  
 3rd Qu.:  5.400   3rd Qu.:  8.10   3rd Qu.: 128.2   3rd Qu.: 388.5  
 Max.   :190.200   Max.   :1

## Import data from Excel spreadsheet

In [9]:
#install.packages("XLConnect")
library(XLConnect)

objects("package:XLConnect")

In [10]:
hrvData <- readWorksheetFromFile("../data/spikeeHRV.xls", 
                            sheet=1)  # Options: startRow = 4, endCol = 2

In [11]:
glimpse(hrvData)

Observations: 602
Variables: 15
$ ID      [3m[90m<dbl>[39m[23m 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, ...
$ Tag     [3m[90m<dttm>[39m[23m 2016-10-26 06:05:34, 2016-10-25 06:12:35, 2016-10-24 06:14...
$ Start   [3m[90m<dttm>[39m[23m 2016-10-26 06:05:34, 2016-10-25 06:12:35, 2016-10-24 06:14...
$ Dauer   [3m[90m<chr>[39m[23m "0:10:04", "0:10:02", "0:09:49", "0:10:33", "0:10:04", "0:1...
$ Avg..HR [3m[90m<dbl>[39m[23m 44.39380, 50.24032, 42.77373, 45.11753, 51.20371, 43.30317,...
$ RMSSD   [3m[90m<dbl>[39m[23m 102.66, 95.34, 159.54, 113.26, 81.84, 168.13, 206.20, 161.4...
$ SDNN    [3m[90m<dbl>[39m[23m 176.82, 120.90, 190.57, 127.25, 117.12, 157.29, 192.39, 155...
$ pNN50   [3m[90m<dbl>[39m[23m 47.86, 54.49, 75.96, 67.60, 45.97, 76.20, 78.66, 70.86, 76....
$ RRmin   [3m[90m<dbl>[39m[23m 933, 935, 955, 1028, 919, 999, 957, 934, 729, 913, 849, 910...
$ RRmean  [3m[90m<dbl>[39m[23m 1351.54, 1194.26, 1402.73, 1329.86, 1171.79, 1385.5

## Import sample data from R datasets
R already contains a bunch of datasets, which can be accessed via the *datasets* package. The contained datasets are described e.g. in [http://stat.ethz.ch/R-manual/R-devel/library/datasets/html/00Index.html](http://stat.ethz.ch/R-manual/R-devel/library/datasets/html/00Index.html)

In [12]:
library(help="datasets")

In [13]:
?USArrests

In [14]:
a <- USArrests

In [15]:
print(a)

               Murder Assault UrbanPop Rape
Alabama          13.2     236       58 21.2
Alaska           10.0     263       48 44.5
Arizona           8.1     294       80 31.0
Arkansas          8.8     190       50 19.5
California        9.0     276       91 40.6
Colorado          7.9     204       78 38.7
Connecticut       3.3     110       77 11.1
Delaware          5.9     238       72 15.8
Florida          15.4     335       80 31.9
Georgia          17.4     211       60 25.8
Hawaii            5.3      46       83 20.2
Idaho             2.6     120       54 14.2
Illinois         10.4     249       83 24.0
Indiana           7.2     113       65 21.0
Iowa              2.2      56       57 11.3
Kansas            6.0     115       66 18.0
Kentucky          9.7     109       52 16.3
Louisiana        15.4     249       66 22.2
Maine             2.1      83       51  7.8
Maryland         11.3     300       67 27.8
Massachusetts     4.4     149       85 16.3
Michigan         12.1     255   

## Import data from JSON file

In [16]:
#install.packages("jsonlite")
library("jsonlite")

#json.file <- "http://api.worldbank.org/country?per_page=10&region=OED&lendingtype=LNX&format=json"
json.file <- "../data/exampleFile.json"

df.json <- fromJSON(paste(readLines(json.file), collapse=""))

#as data frame
df.json <- as.data.frame(df.json)

glimpse(df.json)


Attaching package: 'jsonlite'

The following object is masked from 'package:purrr':

    flatten

"unvollständige letzte Zeile in '../data/exampleFile.json' gefunden"

Observations: 8
Variables: 5
$ ID        [3m[90m<fct>[39m[23m 1, 2, 3, 4, 5, 6, 7, 8
$ Name      [3m[90m<fct>[39m[23m Otto, Johannes, Oliver, Simone, Beate, Joachim, ROland, E...
$ Salary    [3m[90m<fct>[39m[23m 723.3, 215.2, 213, 349, 243.25, 379, 642.5, 424.3
$ StartDate [3m[90m<fct>[39m[23m 1/1/2014, 3/22/2011, 10/13/2012, 2/10/2017, 3/23/2013, 5/...
$ Dept      [3m[90m<fct>[39m[23m HR, Support, IT, IT, Organisation, IT, Operations, Finance


## Export data to textfile

In [17]:
textVector <- c("This is my first row", "Second Row", "And here is the last row")

write(textVector, file="../data/testoutput.txt")

## Export data frame to csv

Write contents of a R data frame into .csv file with [`write.csv()`](https://stat.ethz.ch/R-manual/R-devel/library/utils/html/write.table.html). Some of the arguments are:

* 'x' define the R data frame or matrix to write
* 'file' define the path and CSV filename to write
* 'sep' define the field separator character of the columns
* 'dec' define the character used for decimal points
* 'col.names' define a vector of optional column names or a logical value to export the column names as header
* 'row.names' define a vector of optional row names or a logical value to also export the row names

In [18]:
write.table(energyData, "../data/energyData.csv", sep = ",")

## Export data frame to Excel spreadsheet

In [19]:
library(XLConnect)

In [20]:
writeWorksheetToFile("../data/energyDataTmp.xlsx", energyData, sheet = "Data")

## Export data to JSON file

In [21]:
library("jsonlite")

In [22]:
#Convert to JSON structure with \n at the end of a line
json.file <- toJSON(df.json, collapse = "\n")

json.file

#Write JSON structure as normal text file
write(json.file,file="../data/exampleFileNew.json")

{"ID":"1","Name":"Otto","Salary":"723.3","StartDate":"1/1/2014","Dept":"HR"} {"ID":"2","Name":"Johannes","Salary":"215.2","StartDate":"3/22/2011","Dept":"Support"} {"ID":"3","Name":"Oliver","Salary":"213","StartDate":"10/13/2012","Dept":"IT"} {"ID":"4","Name":"Simone","Salary":"349","StartDate":"2/10/2017","Dept":"IT"} {"ID":"5","Name":"Beate","Salary":"243.25","StartDate":"3/23/2013","Dept":"Organisation"} {"ID":"6","Name":"Joachim","Salary":"379","StartDate":"5/30/2018","Dept":"IT"} {"ID":"7","Name":"ROland","Salary":"642.5","StartDate":"7/27/2018","Dept":"Operations"} {"ID":"8","Name":"Everest","Salary":"424.3","StartDate":"3/13/2014","Dept":"Finance"} 

## Further import/export packages

For example:

* [xml2](https://github.com/r-lib/xml2) to read and write XML.

* [httr](https://github.com/r-lib/httr) for web APIs.

* [rvest](https://github.com/tidyverse/rvest) for scraping web ressources.

* [DBI](https://github.com/r-dbi/DBI) to import and export from and into relational databases.