In [2]:
library(tidyverse)

# Read a delimited file (including csv & tsv) into a tibble

**`read_csv()`** and **`read_tsv()`** are special cases of the general **`read_delim()`**. They're useful for reading the most common types of flat file data, comma separated values and tab separated values, respectively. **`read_csv2()`** uses ; for the field separator and , for the decimal point. This is common in some European countries.

```r
read_delim(
  file,
  delim,
  quote = "\"",
  escape_backslash = FALSE,
  escape_double = TRUE,
  col_names = TRUE,
  col_types = NULL,
  locale = default_locale(),
  na = c("", "NA"),
  quoted_na = TRUE,
  comment = "",
  trim_ws = FALSE,
  skip = 0,
  n_max = Inf,
  guess_max = min(1000, n_max),
  progress = show_progress(),
  skip_empty_rows = TRUE
)

read_csv(
  file,
  col_names = TRUE,
  col_types = NULL,
  locale = default_locale(),
  na = c("", "NA"),
  quoted_na = TRUE,
  quote = "\"",
  comment = "",
  trim_ws = TRUE,
  skip = 0,
  n_max = Inf,
  guess_max = min(1000, n_max),
  progress = show_progress(),
  skip_empty_rows = TRUE
)

read_csv2(
  file,
  col_names = TRUE,
  col_types = NULL,
  locale = default_locale(),
  na = c("", "NA"),
  quoted_na = TRUE,
  quote = "\"",
  comment = "",
  trim_ws = TRUE,
  skip = 0,
  n_max = Inf,
  guess_max = min(1000, n_max),
  progress = show_progress(),
  skip_empty_rows = TRUE
)

read_tsv(
  file,
  col_names = TRUE,
  col_types = NULL,
  locale = default_locale(),
  na = c("", "NA"),
  quoted_na = TRUE,
  quote = "\"",
  comment = "",
  trim_ws = TRUE,
  skip = 0,
  n_max = Inf,
  guess_max = min(1000, n_max),
  progress = show_progress(),
  skip_empty_rows = TRUE
)
```

# Examples

<hr>

**input sources**

Read from a path

In [3]:
mtcars_path <- readr_example('mtcars.csv')
mtcars_path

In [4]:
read_csv(mtcars_path)


-- Column specification ------------------------------------------------------------------------------------------------
cols(
  mpg = col_double(),
  cyl = col_double(),
  disp = col_double(),
  hp = col_double(),
  drat = col_double(),
  wt = col_double(),
  qsec = col_double(),
  vs = col_double(),
  am = col_double(),
  gear = col_double(),
  carb = col_double()
)



mpg,cyl,disp,hp,drat,wt,qsec,vs,am,gear,carb
21.0,6,160.0,110,3.9,2.62,16.46,0,1,4,4
21.0,6,160.0,110,3.9,2.875,17.02,0,1,4,4
22.8,4,108.0,93,3.85,2.32,18.61,1,1,4,1
21.4,6,258.0,110,3.08,3.215,19.44,1,0,3,1
18.7,8,360.0,175,3.15,3.44,17.02,0,0,3,2
18.1,6,225.0,105,2.76,3.46,20.22,1,0,3,1
14.3,8,360.0,245,3.21,3.57,15.84,0,0,3,4
24.4,4,146.7,62,3.69,3.19,20.0,1,0,4,2
22.8,4,140.8,95,3.92,3.15,22.9,1,0,4,2
19.2,6,167.6,123,3.92,3.44,18.3,1,0,4,4


In [5]:
# Including remote paths
read_csv("https://github.com/tidyverse/readr/raw/master/inst/extdata/mtcars.csv")


-- Column specification ------------------------------------------------------------------------------------------------
cols(
  mpg = col_double(),
  cyl = col_double(),
  disp = col_double(),
  hp = col_double(),
  drat = col_double(),
  wt = col_double(),
  qsec = col_double(),
  vs = col_double(),
  am = col_double(),
  gear = col_double(),
  carb = col_double()
)



mpg,cyl,disp,hp,drat,wt,qsec,vs,am,gear,carb
21.0,6,160.0,110,3.9,2.62,16.46,0,1,4,4
21.0,6,160.0,110,3.9,2.875,17.02,0,1,4,4
22.8,4,108.0,93,3.85,2.32,18.61,1,1,4,1
21.4,6,258.0,110,3.08,3.215,19.44,1,0,3,1
18.7,8,360.0,175,3.15,3.44,17.02,0,0,3,2
18.1,6,225.0,105,2.76,3.46,20.22,1,0,3,1
14.3,8,360.0,245,3.21,3.57,15.84,0,0,3,4
24.4,4,146.7,62,3.69,3.19,20.0,1,0,4,2
22.8,4,140.8,95,3.92,3.15,22.9,1,0,4,2
19.2,6,167.6,123,3.92,3.44,18.3,1,0,4,4


In [6]:
# Or directly from a string (must contain a newline)
read_csv("x,y\n1,2\n3,4")

x,y
1,2
3,4


<hr>

**Column types**

By default, readr guesses the columns types, looking at the first 1000 rows.
You can override with a compact specification:

In [8]:
read_csv("x,y\n1,2\n3,4", col_types = 'dc') %>% glimpse()

Rows: 2
Columns: 2
$ x <dbl> 1, 3
$ y <chr> "2", "4"


Or with a list of column types:

In [10]:
read_csv("x,y\n1,2\n3,4", col_types = list(col_double(), col_character())) %>% glimpse()

Rows: 2
Columns: 2
$ x <dbl> 1, 3
$ y <chr> "2", "4"


If there are parsing problems, you get a warning, and can extract
more details with **`problems()`**`

In [12]:
df <- read_csv("x\n1\n2\nb", col_types = list(col_double()))

"1 parsing failure.
row col expected actual         file
  3   x a double      b literal data
"

In [13]:
problems(df)

row,col,expected,actual,file
3,x,a double,b,literal data


<hr>

**File types**

In [17]:
read_csv("a,b\n1.0,2.0")

a,b
1,2


In [19]:
# using ; to separate fields and using , for decimal point
read_csv2("a;b\n1,2;2,5")

i Using ',' as decimal and '.' as grouping mark. Use `read_delim()` for more control.


a,b
1.2,2.5


In [20]:
read_tsv("a\tb\n1.0\t2.0")

a,b
1,2


In [22]:

read_delim("a|b\n1.0|2.0", delim = '|')

a,b
1,2


# Arguments

### `file`	

Either a path to a file, a connection, or literal data (either a single string or a raw vector).

Files ending in `.gz`, `.bz2`, `.xz`, or `.zip` will be automatically uncompressed. Files starting with `http://`,`https://`, `ftp://`, or `ftps://` will be automatically downloaded. Remote gz files can also be automatically downloaded and decompressed.

Literal data is most useful for examples and tests. It must contain at least one new line to be recognised as data (instead of a path) or be a vector of greater than length 1.

Using a value of **`clipboard()`** will read from the system clipboard.

<hr>

In [23]:
# a path to a file
path <- readr_example('mtcars.csv')

read_csv(path) %>% head()


-- Column specification ------------------------------------------------------------------------------------------------
cols(
  mpg = col_double(),
  cyl = col_double(),
  disp = col_double(),
  hp = col_double(),
  drat = col_double(),
  wt = col_double(),
  qsec = col_double(),
  vs = col_double(),
  am = col_double(),
  gear = col_double(),
  carb = col_double()
)



mpg,cyl,disp,hp,drat,wt,qsec,vs,am,gear,carb
21.0,6,160,110,3.9,2.62,16.46,0,1,4,4
21.0,6,160,110,3.9,2.875,17.02,0,1,4,4
22.8,4,108,93,3.85,2.32,18.61,1,1,4,1
21.4,6,258,110,3.08,3.215,19.44,1,0,3,1
18.7,8,360,175,3.15,3.44,17.02,0,0,3,2
18.1,6,225,105,2.76,3.46,20.22,1,0,3,1


In [24]:
# a connection
read_csv("https://github.com/tidyverse/readr/raw/master/inst/extdata/mtcars.csv") %>% head()


-- Column specification ------------------------------------------------------------------------------------------------
cols(
  mpg = col_double(),
  cyl = col_double(),
  disp = col_double(),
  hp = col_double(),
  drat = col_double(),
  wt = col_double(),
  qsec = col_double(),
  vs = col_double(),
  am = col_double(),
  gear = col_double(),
  carb = col_double()
)



mpg,cyl,disp,hp,drat,wt,qsec,vs,am,gear,carb
21.0,6,160,110,3.9,2.62,16.46,0,1,4,4
21.0,6,160,110,3.9,2.875,17.02,0,1,4,4
22.8,4,108,93,3.85,2.32,18.61,1,1,4,1
21.4,6,258,110,3.08,3.215,19.44,1,0,3,1
18.7,8,360,175,3.15,3.44,17.02,0,0,3,2
18.1,6,225,105,2.76,3.46,20.22,1,0,3,1


In [27]:
# zip, bz files
read_csv(readr_example('mtcars.csv.bz2')) %>% head()

read_csv(readr_example('mtcars.csv.zip')) %>% head()


-- Column specification ------------------------------------------------------------------------------------------------
cols(
  mpg = col_double(),
  cyl = col_double(),
  disp = col_double(),
  hp = col_double(),
  drat = col_double(),
  wt = col_double(),
  qsec = col_double(),
  vs = col_double(),
  am = col_double(),
  gear = col_double(),
  carb = col_double()
)



mpg,cyl,disp,hp,drat,wt,qsec,vs,am,gear,carb
21.0,6,160,110,3.9,2.62,16.46,0,1,4,4
21.0,6,160,110,3.9,2.875,17.02,0,1,4,4
22.8,4,108,93,3.85,2.32,18.61,1,1,4,1
21.4,6,258,110,3.08,3.215,19.44,1,0,3,1
18.7,8,360,175,3.15,3.44,17.02,0,0,3,2
18.1,6,225,105,2.76,3.46,20.22,1,0,3,1



-- Column specification ------------------------------------------------------------------------------------------------
cols(
  mpg = col_double(),
  cyl = col_double(),
  disp = col_double(),
  hp = col_double(),
  drat = col_double(),
  wt = col_double(),
  qsec = col_double(),
  vs = col_double(),
  am = col_double(),
  gear = col_double(),
  carb = col_double()
)



mpg,cyl,disp,hp,drat,wt,qsec,vs,am,gear,carb
21.0,6,160,110,3.9,2.62,16.46,0,1,4,4
21.0,6,160,110,3.9,2.875,17.02,0,1,4,4
22.8,4,108,93,3.85,2.32,18.61,1,1,4,1
21.4,6,258,110,3.08,3.215,19.44,1,0,3,1
18.7,8,360,175,3.15,3.44,17.02,0,0,3,2
18.1,6,225,105,2.76,3.46,20.22,1,0,3,1


In [30]:
# string literal

read_csv("a,b\n1.0,2.0")

a,b
1,2


### `delim`

	
Single character used to separate fields within a record.

<hr>

In [32]:
# comma seperated value (csv)

read_delim(path, delim = ',') %>% head()
# equivalent
# read_csv(path)


-- Column specification ------------------------------------------------------------------------------------------------
cols(
  mpg = col_double(),
  cyl = col_double(),
  disp = col_double(),
  hp = col_double(),
  drat = col_double(),
  wt = col_double(),
  qsec = col_double(),
  vs = col_double(),
  am = col_double(),
  gear = col_double(),
  carb = col_double()
)



mpg,cyl,disp,hp,drat,wt,qsec,vs,am,gear,carb
21.0,6,160,110,3.9,2.62,16.46,0,1,4,4
21.0,6,160,110,3.9,2.875,17.02,0,1,4,4
22.8,4,108,93,3.85,2.32,18.61,1,1,4,1
21.4,6,258,110,3.08,3.215,19.44,1,0,3,1
18.7,8,360,175,3.15,3.44,17.02,0,0,3,2
18.1,6,225,105,2.76,3.46,20.22,1,0,3,1


In [35]:
# tab separated value
read_delim("a\tb\n1.0\t2.0", delim = '\t') 
# equivalent
# read_tsv("a\tb\n1.0\t2.0")

a,b
1,2


### `quote`	

Single character used to quote strings.

<hr>

Sometimes strings in a CSV file contain commas. To prevent them from causing problems they need to be surrounded by a quoting character, like " or '

In [5]:
read_csv("x,y\n1,'a,b'", quote = "'")

x,y
1,"a,b"


In [39]:
# default
read_csv("name,clan\n'VN Pikachu', 'VN Champions'", quote = '\"')

name,clan
'VN Pikachu','VN Champions'


In [38]:
read_csv("name,clan\n'VN Pikachu', 'VN Champions'", quote = '\'')

name,clan
VN Pikachu,VN Champions


### `escape_backslash`	

Does the file use backslashes to escape special characters? This is more general than escape_double as backslashes can be used to escape the delimiter character, the quote character, or to add special characters like \\n.

### `escape_double`	

Does the file escape quotes by doubling them? i.e. If this option is TRUE, the value """" represents a single quote, \".

### `col_names`


Either `TRUE`, `FALSE` or a character vector of column names.

<hr>

If TRUE, the first row of the input will be used as the column names, and will not be included in the data frame

In [41]:
file_data <- 'name,clan\nVN Pikachu,VN Champions'

In [42]:
read_csv(file_data, col_names = T)

name,clan
VN Pikachu,VN Champions


If FALSE, column names will be generated automatically: X1, X2, X3 etc.

In [43]:
read_csv(file_data, col_names = F)

X1,X2
name,clan
VN Pikachu,VN Champions


If `col_names` is a character vector, the values will be used as the names of the columns, and the first row of the input will be read into the first row of the output data frame.

In [44]:
read_csv(file_data, col_names = c('First column', 'Second column'))

First column,Second column
name,clan
VN Pikachu,VN Champions


Missing (`NA`) column names will generate a warning, and be filled in with dummy names X1, X2 etc. 

In [45]:
read_csv('name,\nVN Pikachu,VN Champions')

"Missing column names filled in: 'X2' [2]"

name,X2
VN Pikachu,VN Champions


Duplicate column names will generate a warning and be made unique with a numeric suffix.

In [46]:
read_csv('name,name\nVN Pikachu,VN Champions')

"Duplicated column names deduplicated: 'name' => 'name_1' [2]"

name,name_1
VN Pikachu,VN Champions


### `col_types`

One of `NULL`, a `cols()` specification, or a string.

<hr>

If NULL, all column types will be imputed from the first 1000 rows on the input. This is convenient (and fast), but not robust. If the imputation fails, you'll need to supply the correct types yourself.

In [48]:
read_csv(readr_example('challenge.csv'))


-- Column specification ------------------------------------------------------------------------------------------------
cols(
  x = col_double(),
  y = col_logical()
)

"1000 parsing failures.
 row col           expected     actual                                                                file
1001   y 1/0/T/F/TRUE/FALSE 2015-01-16 'C:/Users/dell/Anaconda3/Lib/R/library/readr/extdata/challenge.csv'
1002   y 1/0/T/F/TRUE/FALSE 2018-05-18 'C:/Users/dell/Anaconda3/Lib/R/library/readr/extdata/challenge.csv'
1003   y 1/0/T/F/TRUE/FALSE 2015-09-05 'C:/Users/dell/Anaconda3/Lib/R/library/readr/extdata/challenge.csv'
1004   y 1/0/T/F/TRUE/FALSE 2012-11-28 'C:/Users/dell/Anaconda3/Lib/R/library/readr/extdata/challenge.csv'
1005   y 1/0/T/F/TRUE/FALSE 2020-01-13 'C:/Users/dell/Anaconda3/Lib/R/library/readr/extdata/challenge.csv'
.... ... .................. .......... ...................................................................
See problems(...) for more details.
"

x,y
404,
4172,
3004,
787,
37,
2332,
2489,
1449,
3665,
3863,


If a column specification created by **`cols()`**, it must contain one column specification for each column. If you only want to read a subset of the columns, use **`cols_only()`**.
Alternatively, you can use a compact string representation where each character represents one column:

* c = character

* i = integer

* n = number

* d = double

* l = logical

* f = factor

* D = date

* T = date time

* t = time

* ? = guess

* _ or - = skip

By default, reading a file without a column specification will print a message showing what readr guessed they were. To remove this message, use `col_types = cols()`.

see <b><a href = '../Introduction.ipynb'>this notebook</a></b> for detail and examples.

In [50]:
# remove `spec` message
read_csv(mtcars_path, col_types = cols()) %>% head()

mpg,cyl,disp,hp,drat,wt,qsec,vs,am,gear,carb
21.0,6,160,110,3.9,2.62,16.46,0,1,4,4
21.0,6,160,110,3.9,2.875,17.02,0,1,4,4
22.8,4,108,93,3.85,2.32,18.61,1,1,4,1
21.4,6,258,110,3.08,3.215,19.44,1,0,3,1
18.7,8,360,175,3.15,3.44,17.02,0,0,3,2
18.1,6,225,105,2.76,3.46,20.22,1,0,3,1


### `locale`

The locale controls defaults that vary from place to place. The default locale is US-centric (like R), but you can use `locale()` to create your own locale that controls things like the default time zone, encoding, decimal mark, big mark, and day/month names.

<hr>

<b><a href = '../locales.ipynb'>Notebook detail and examples</a></b>

### `na`	

Character vector of strings to interpret as missing values. Set this option to `character()` to indicate no missing values.

<hr>

In [57]:
file_data <- 'name,clan\nVN Pikachu,missing\nETOGRUL,NA\nunknown,King Allool'

read_csv(file_data)

name,clan
VN Pikachu,missing
ETOGRUL,
unknown,King Allool


In [59]:
# specify 'NA', '', 'missing', 'unknown' as missing value

read_csv(file_data, na = c('', 'NA', 'missing', 'unknown'))

name,clan
VN Pikachu,
ETOGRUL,
,King Allool


### `quoted_na`	

Should missing values inside quotes be treated as missing values (the default) or strings.

### `comment`	

A string used to identify comments. Any text after the comment characters will be silently ignored.

In [68]:
read_csv('name,clan\nVN Pikachu,VN Champions#ignore this part...', comment = '#')

name,clan
VN Pikachu,VN Champions


### `trim_ws`	

Should leading and trailing whitespace be trimmed from each field before parsing it?

<hr>

In [85]:
# default trim_ws = TRUE in read_csv

read_csv(' name ,   clan \n VN Pikachu,  VN Champions') %>% colnames()

In [84]:
read_csv(' name ,   clan \n VN Pikachu,  VN Champions', trim_ws = F) %>% colnames()

### `skip`

	
Number of lines to skip before reading data.

<hr>

In [90]:
file_data <- 'id,name\n1,a\n2,b\n3,c\n4,d\n5,e'
cat(file_data)

id,name
1,a
2,b
3,c
4,d
5,e

In [87]:
read_csv(file_data)

id,name
1,a
2,b
3,c
4,d
5,e


In [93]:
# skip first 3 rows in the text, the parse
# If we want column names, we have to specify it manually
read_csv(file_data, skip = 3, col_names = c('id', 'name'))

id,name
3,c
4,d
5,e


### `n_max`

Maximum number of records to read.
<hr>

In [94]:
# read 2 records
read_csv(file_data, n_max = 2)

id,name
1,a
2,b


### **`guess_max`**	

Maximum number of records to use for guessing column types.
<hr>

For each column, using 101 values to guess the type of that column. default is 1000.

In [96]:
read_csv(readr_example('challenge.csv'), guess_max = 1000) # default


-- Column specification ------------------------------------------------------------------------------------------------
cols(
  x = col_double(),
  y = col_logical()
)

"1000 parsing failures.
 row col           expected     actual                                                                file
1001   y 1/0/T/F/TRUE/FALSE 2015-01-16 'C:/Users/dell/Anaconda3/Lib/R/library/readr/extdata/challenge.csv'
1002   y 1/0/T/F/TRUE/FALSE 2018-05-18 'C:/Users/dell/Anaconda3/Lib/R/library/readr/extdata/challenge.csv'
1003   y 1/0/T/F/TRUE/FALSE 2015-09-05 'C:/Users/dell/Anaconda3/Lib/R/library/readr/extdata/challenge.csv'
1004   y 1/0/T/F/TRUE/FALSE 2012-11-28 'C:/Users/dell/Anaconda3/Lib/R/library/readr/extdata/challenge.csv'
1005   y 1/0/T/F/TRUE/FALSE 2020-01-13 'C:/Users/dell/Anaconda3/Lib/R/library/readr/extdata/challenge.csv'
.... ... .................. .......... ...................................................................
See problems(...) for more details.
"

x,y
404,
4172,
3004,
787,
37,
2332,
2489,
1449,
3665,
3863,


In [99]:
# using 1500 values to guess the type 
read_csv(readr_example('challenge.csv'), guess_max = 1500) %>% head()


-- Column specification ------------------------------------------------------------------------------------------------
cols(
  x = col_double(),
  y = col_date(format = "")
)



x,y
404,
4172,
3004,
787,
37,
2332,


### `progress`

Display a progress bar? By default it will only display in an interactive session and not while knitting a document. The display is updated every 50,000 values and will only display if estimated reading time is 5 seconds or more. The automatic progress bar can be disabled by setting option `readr.show_progress` to FALSE.

### `skip_empty_rows`

	
Should blank rows be ignored altogether? i.e. If this option is TRUE then blank rows will not be represented at all. If it is FALSE then they will be represented by NA values in all the columns.
<hr>

In [104]:
file_data <- 'name,clan\n\nVN pikachu,VN Champions'

In [105]:
# default skip_empty_rows = T

read_csv(file_data, skip_empty_rows = T)

name,clan
VN pikachu,VN Champions


In [106]:
# skip_empty_rows = F
read_csv(file_data, skip_empty_rows = F)

"1 parsing failure.
row col  expected    actual         file
  1  -- 2 columns 1 columns literal data
"

name,clan
,
VN pikachu,VN Champions


# Value

A **`tibble()`**. If there are parsing problems, a warning tells you how many, and you can retrieve the details with `problems()`.

