Cope with missing columns converting from json to df #3

DASpringate · 2014-07-23T10:26:08Z

Hi,

Thanks for writing this package, it is really useful!

I was using it and found that in some cases where not all of the fields are present in every row of the data, the converter chokes as not all of the rows are the same length. e.g. try this:

url <- "http://data.undp.org/resource/wxub-qc5k.json"
read.socrata(url)

To deal with this I added a function content_to_df, which is called from getContentAsDataFrame and looks like:

content_to_df <- function(con){
    lengths <- sapply(con, length)
    if(all(lengths == length(con[[1]]))){
        data.frame(t(sapply(con, unlist)), stringsAsFactors = FALSE)
    } else {
        all_cols <- names(con[[which(sapply(con, length) == max(sapply(con, length)))[1]]])
        con <- lapply(con, function(x){
            r <- c(x, sapply(all_cols[!all_cols %in% names(x)], function(xx) NA, simplify = FALSE))
            r[all_cols]
        })
        data.frame(t(sapply(con, unlist)), stringsAsFactors = FALSE)
    }
}

If the rows are all the same length, it converts as before, but if they are not, it assumes that the first longest length row has all of the correct column names in it and adds empty fields to any rows that do not have all of them in, then it sorts the columns back to the original order and returns a dataframe from this as before.

I added a test for this condition to the test file as well.

I also changed the json parser to jsonlite (I think you have already done this in the CRAN version).

If you can bring these changes into the CRAN version as well I would be really grateful as I want to call it in in a package I am developing now.

Cheers,

David

DASpringate · 2014-07-23T15:36:02Z

Updated again, I left the imports out from the last one!

Also, I found a bug in the code for getResponse():

In some cases, if there was a non-200 response, there was not a response$headers$content-length element in the response, resulting in an error. I changed this by checking the length of the response$content slot instead.

I also added an option to not produce an error if there is a non-200 response. The default is exactly as it is now so it doesn't affect anything in the package as it stands, but I find it useful to have the option in my package https://github.com/rOpenHealth/rUN_development_reports/blob/master/R/rUNdp.R#L95

DASpringate added 4 commits July 23, 2014 11:02

Fixed to cope with missing columns in some rows, test added and passed

b0680dd

v1.5 readme edit

eda4b40

v1.5 readme edit

346ce0e

Fix imports and getResponse bug

cf8f114

dmpe mentioned this pull request Nov 16, 2015

PR for feature #58 (access private datasets and be able to upsert and full replace datasets) #64

Closed

tomschenkjr added the ready label Feb 14, 2016

tomschenkjr removed the ready label Oct 22, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cope with missing columns converting from json to df #3

Cope with missing columns converting from json to df #3

DASpringate commented Jul 23, 2014

DASpringate commented Jul 23, 2014

Cope with missing columns converting from json to df #3

Are you sure you want to change the base?

Cope with missing columns converting from json to df #3

Conversation

DASpringate commented Jul 23, 2014

DASpringate commented Jul 23, 2014