Skip to content

Commit

Permalink
Fix CSV quote encoding.
Browse files Browse the repository at this point in the history
BigQuery expects strings to be quoted in the "usual" manner, meaning that
quotes in string fields should be doubled, with the field itself in quotes.

This accomplishes this by tweaking arguments to `encodeString` and
`write.table`, and (unfortunately) quotes all string fields as a side-effect.
  • Loading branch information
craigcitro committed Nov 13, 2014
1 parent f818ea5 commit aed276e
Showing 1 changed file with 2 additions and 2 deletions.
4 changes: 2 additions & 2 deletions R/upload.r
Original file line number Diff line number Diff line change
Expand Up @@ -74,7 +74,7 @@ standard_csv <- function(values) {

# Encode special characters in strings
is_char <- vapply(values, is.character, logical(1))
values[is_char] <- lapply(values[is_char], encodeString, na.encode = FALSE, quote = '"')
values[is_char] <- lapply(values[is_char], encodeString, na.encode = FALSE)

# Encode dates and times
is_time <- vapply(values, function(x) inherits(x, "POSIXct"), logical(1))
Expand All @@ -84,7 +84,7 @@ standard_csv <- function(values) {
values[is_date] <- lapply(values[is_date], function(x) as.numeric(as.POSIXct(x)))

tmp <- tempfile(fileext = ".csv")
write.table(values, tmp, sep = ",", quote = FALSE, qmethod = "escape",
write.table(values, tmp, sep = ",", quote = TRUE, qmethod = "double",

This comment has been minimized.

Copy link
@hadley

hadley Nov 13, 2014

Are you sure you need quote = TRUE here?

This comment has been minimized.

Copy link
@craigcitro

craigcitro Nov 13, 2014

Author Owner

unfortunately, yes -- if quote = FALSE, then qmethod is completely ignored. (i confirmed that the docs weren't lying by trying it out a bit.)

i feel like it's a little noisier to always quote, but it works -- if we get motivated, we can write our own custom csv quoter. (c'mon, who doesn't want a write.csv3? ;) )

This comment has been minimized.

Copy link
@hadley

hadley Nov 13, 2014

What if you just omit quote?

This comment has been minimized.

Copy link
@craigcitro

craigcitro Nov 13, 2014

Author Owner

if you leave it out, it just picks up the default (TRUE). so i'll drop it, but code behaves the same.

This comment has been minimized.

Copy link
@hadley

hadley Nov 13, 2014

Hmmm, the docs say it should only quote factor and character columns. Maybe something further up is incorrectly coercing numeric to character?

This comment has been minimized.

Copy link
@craigcitro

craigcitro Nov 13, 2014

Author Owner

i think we crossed wires -- nothing is additionally quoting numeric fields. the silliness is just that we now always quote all string and factor columns, which we don't need to do. (we really only need to quote the particular entries that had double quotes in them.)

This comment has been minimized.

Copy link
@hadley

hadley Nov 13, 2014

Ah got it, that makes sense.

row.names = FALSE, col.names = FALSE, na = "")

# Don't read trailing nl
Expand Down

0 comments on commit aed276e

Please sign in to comment.