# Tidyverse-readr

In [1]:
library(tidyverse)

-- [1mAttaching packages[22m ------------------------------------------------------------------------------- tidyverse 1.3.1 --

[32mv[39m [34mggplot2[39m 3.3.5     [32mv[39m [34mpurrr  [39m 0.3.4
[32mv[39m [34mtibble [39m 3.1.6     [32mv[39m [34mdplyr  [39m 1.0.7
[32mv[39m [34mtidyr  [39m 1.2.0     [32mv[39m [34mstringr[39m 1.4.0
[32mv[39m [34mreadr  [39m 2.1.2     [32mv[39m [34mforcats[39m 0.5.1

-- [1mConflicts[22m ---------------------------------------------------------------------------------- tidyverse_conflicts() --
[31mx[39m [34mdplyr[39m::[32mfilter()[39m masks [34mstats[39m::filter()
[31mx[39m [34mdplyr[39m::[32mlag()[39m    masks [34mstats[39m::lag()



## readr  

Most of readr’s functions are concerned with turning flat files into data frames:
- read_csv() - reads comma-delimited files
- read_csv2() - reads semicolon-separated files 
- read_tsv() - reads tab-delimited files, and
- read_delim() - reads in files with any delimiter.
- read_fwf() - reads fixed-width files specify fields either by their widths with fwf_widths() or their position with
- fwf_positions().
- read_table() reads a common variation of fixed-width files where columns are separated by white space
- read_log() reads Apache style log files.

### read_*()

In [19]:
## default methods

read_csv("temp.csv")

read_csv("a,b,c
1,2,3
4,5,6")

read_delim("x|y\n1|'a,b'",delim  = "|")

[1mRows: [22m[34m3[39m [1mColumns: [22m[34m3[39m

[36m--[39m [1mColumn specification[22m [36m------------------------------------------------------------------------------------------------[39m
[1mDelimiter:[22m ","
[31mchr[39m (1): Name
[32mdbl[39m (2): SLNo, Val


[36mi[39m Use [30m[47m[30m[47m`spec()`[47m[30m[49m[39m to retrieve the full column specification for this data.
[36mi[39m Specify the column types or set [30m[47m[30m[47m`show_col_types = FALSE`[47m[30m[49m[39m to quiet this message.



SLNo,Name,Val
<dbl>,<chr>,<dbl>
1,Ram,2.4
2,Sai,23.5
3,Krishna,6.8


[1mRows: [22m[34m2[39m [1mColumns: [22m[34m3[39m

[36m--[39m [1mColumn specification[22m [36m------------------------------------------------------------------------------------------------[39m
[1mDelimiter:[22m ","
[32mdbl[39m (3): a, b, c


[36mi[39m Use [30m[47m[30m[47m`spec()`[47m[30m[49m[39m to retrieve the full column specification for this data.
[36mi[39m Specify the column types or set [30m[47m[30m[47m`show_col_types = FALSE`[47m[30m[49m[39m to quiet this message.



a,b,c
<dbl>,<dbl>,<dbl>
1,2,3
4,5,6


[1mRows: [22m[34m1[39m [1mColumns: [22m[34m2[39m

[36m--[39m [1mColumn specification[22m [36m------------------------------------------------------------------------------------------------[39m
[1mDelimiter:[22m "|"
[31mchr[39m (1): y
[32mdbl[39m (1): x


[36mi[39m Use [30m[47m[30m[47m`spec()`[47m[30m[49m[39m to retrieve the full column specification for this data.
[36mi[39m Specify the column types or set [30m[47m[30m[47m`show_col_types = FALSE`[47m[30m[49m[39m to quiet this message.



x,y
<dbl>,<chr>
1,"'a,b'"


In [11]:
## skip lines

read_csv("A, B, C
a,b,c
x,y,z
1,2,3", skip=2)

## skip commented lines
read_csv("# A comment I want to skip
x,y,z
1,2,3", comment = "#")

[1mRows: [22m[34m1[39m [1mColumns: [22m[34m3[39m

[36m--[39m [1mColumn specification[22m [36m------------------------------------------------------------------------------------------------[39m
[1mDelimiter:[22m ","
[32mdbl[39m (3): x, y, z


[36mi[39m Use [30m[47m[30m[47m`spec()`[47m[30m[49m[39m to retrieve the full column specification for this data.
[36mi[39m Specify the column types or set [30m[47m[30m[47m`show_col_types = FALSE`[47m[30m[49m[39m to quiet this message.



x,y,z
<dbl>,<dbl>,<dbl>
1,2,3


[1mRows: [22m[34m1[39m [1mColumns: [22m[34m3[39m

[36m--[39m [1mColumn specification[22m [36m------------------------------------------------------------------------------------------------[39m
[1mDelimiter:[22m ","
[32mdbl[39m (3): x, y, z


[36mi[39m Use [30m[47m[30m[47m`spec()`[47m[30m[49m[39m to retrieve the full column specification for this data.
[36mi[39m Specify the column types or set [30m[47m[30m[47m`show_col_types = FALSE`[47m[30m[49m[39m to quiet this message.



x,y,z
<dbl>,<dbl>,<dbl>
1,2,3


In [15]:
## specify column names
read_csv("1,2,3\n4,5,6", col_names = FALSE)

## specify NA value
read_csv("a,b,c\n1,2,.\nx,y,z", na = ".")

[1mRows: [22m[34m2[39m [1mColumns: [22m[34m3[39m

[36m--[39m [1mColumn specification[22m [36m------------------------------------------------------------------------------------------------[39m
[1mDelimiter:[22m ","
[32mdbl[39m (3): X1, X2, X3


[36mi[39m Use [30m[47m[30m[47m`spec()`[47m[30m[49m[39m to retrieve the full column specification for this data.
[36mi[39m Specify the column types or set [30m[47m[30m[47m`show_col_types = FALSE`[47m[30m[49m[39m to quiet this message.



X1,X2,X3
<dbl>,<dbl>,<dbl>
1,2,3
4,5,6


[1mRows: [22m[34m2[39m [1mColumns: [22m[34m3[39m

[36m--[39m [1mColumn specification[22m [36m------------------------------------------------------------------------------------------------[39m
[1mDelimiter:[22m ","
[31mchr[39m (3): a, b, c


[36mi[39m Use [30m[47m[30m[47m`spec()`[47m[30m[49m[39m to retrieve the full column specification for this data.
[36mi[39m Specify the column types or set [30m[47m[30m[47m`show_col_types = FALSE`[47m[30m[49m[39m to quiet this message.



a,b,c
<chr>,<chr>,<chr>
1,2,
x,y,z


Compared to Base R read.csv()
- They are typically much faster (~10x) than their base equivalents. 
- Long-running jobs have a progress bar, so you can see what’s happening. 
- They produce **tibbles**.
- They don’t convert character vectors to factors, use row names, or munge the column names. 
- They are more reproducible. Base R functions inherit some behavior from your operating system and environment variables, so import code that works on your computer might not work on someone else’s.

### parse_*()

In [29]:
str(parse_logical(c("TRUE", "FALSE", "NA")))
parse_integer(c("1", "2", "3"))
parse_integer(c("1", "231", ".", "456"), na = ".")
parse_double("1,23", locale = locale(decimal_mark = ","))

problems(parse_integer(c("123", "345", "abc", "123.45")))

 logi [1:3] TRUE FALSE NA


row,col,expected,actual
<int>,<int>,<chr>,<chr>
3,,no trailing characters,abc
4,,no trailing characters,123.45


In [49]:
parse_number("20%")
parse_number("It cost $123,45")
x=parse_number("1 234.56", locale = locale(decimal_mark = ".", grouping_mark = " "))

In [36]:
charToRaw("Hadley")

[1] 48 61 64 6c 65 79

In [43]:
parse_character("El Ni\xf1o was particularly bad this year", locale = locale(encoding = "ISO-8859-9"))
parse_character("\x82\xb1\x82\xf1\x82\xc9\x82\xbf\x82\xcd", locale = locale(encoding = "Shift-JIS"))

guess_encoding(charToRaw("El Ni\xf1o was particularly bad this year"))
guess_encoding(charToRaw("\x82\xb1\x82\xf1\x82\xc9\x82\xbf\x82\xcd"))

encoding,confidence
<chr>,<dbl>
ISO-8859-1,0.46
ISO-8859-9,0.23


encoding,confidence
<chr>,<dbl>
KOI8-R,0.42


In [51]:
fruit <- c("apple", "banana")
parse_factor(c("apple", "banana", "banana"), levels = fruit)

In [57]:
parse_date(c("2010-01-01", "1979-10-14"))
parse_date("01/02/15", "%m/%d/%y")

parse_time("01:10 pm")
parse_time("20:10:01")
parse_datetime("2010-10-01T2010")
parse_date("1 janvier 2015", "%d %B %Y", locale = locale("fr"))


13:10:00

20:10:01

[1] "2010-10-01 20:10:00 UTC"

In [58]:
## Guessing
guess_parser("2010-10-01")
guess_parser("15:01")
guess_parser(c("TRUE", "FALSE"))
guess_parser(c("1", "5", "9"))
guess_parser(c("12,352,561"))
str(parse_guess("2010-10-10"))

 Date[1:1], format: "2010-10-10"


In [65]:
temp <- read_csv(readr_example("challenge.csv"),
      guess_max = 1001    ## default is 1000
    )

challenge <- read_csv(
      readr_example("challenge.csv"),
      col_types = cols(
        x = col_double(),
        y = col_date()
      )
    )  

challenge2 <- read_csv(readr_example("challenge.csv"),
                             col_types = cols(.default = col_character())
      )

[1mRows: [22m[34m2000[39m [1mColumns: [22m[34m2[39m

[36m--[39m [1mColumn specification[22m [36m------------------------------------------------------------------------------------------------[39m
[1mDelimiter:[22m ","
[32mdbl[39m  (1): x
[34mdate[39m (1): y


[36mi[39m Use [30m[47m[30m[47m`spec()`[47m[30m[49m[39m to retrieve the full column specification for this data.
[36mi[39m Specify the column types or set [30m[47m[30m[47m`show_col_types = FALSE`[47m[30m[49m[39m to quiet this message.



In [66]:
write_csv(challenge, "temp.csv")