Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Timeouts reading JSON? #62

Closed
natecobb opened this issue Oct 27, 2015 · 10 comments
Closed

Timeouts reading JSON? #62

natecobb opened this issue Oct 27, 2015 · 10 comments
Assignees
Labels
Milestone

Comments

@natecobb
Copy link

I'm trying to use RSocrata to pull data from the CDC, ie:
http://dev.socrata.com/foundry/#/chronicdata.cdc.gov/ksds-npd6
or
http://www.cdc.gov/cdi/

I am unable to load any of the data sets I tried, instead ultimately timing out with an error in curl_fetch():

> read.socrata("https://chronicdata.cdc.gov/resource/h5w7-v8i7.json")
Error in curl::curl_fetch_memory(url, handle = handle) : 
   Failed writing received data to disk/application

Other example URLs that fail:
> read.socrata("https://data.cityofchicago.org/resource/xzkq-xp2w.json?$limit=500")
> read.socrata("https://sandbox.demo.socrata.com/resource/6cpn-3h7n.json")

Changing the json suffix to csv eliminates the timeout but I assume that their are other ramifications of changing the returned data model.

The error occurs with the current version on CRAN on a Mac and Linux (Ubuntu 12.04); I tested a couple of weeks ago using the master branch from GitHub and got the same error. It also occurs with or with an application token. Its unclear to me if this is a duplicate of other problems that have been reported, although I do see a mention of this error as occurring randomly in #56

@tomschenkjr
Copy link
Contributor

Just identified the source of the error yesterday and the dev branch contains a potential patch. Would you be able to install off the current dev branch and test it out?

I just ran a local test and was successful on the above examples. Current dev has a lot of changes besides this bug fix and not yet stable. May release bug release on CRAN to fix this issue.

@natecobb
Copy link
Author

The dev branch seems to have a lot going on and our Linux server is missing many of the new required libraries (V8, GDAL, GEOS). I was unable to get rgdal to build (missing proj_api.h) even after installing the what seems to be the appropriate dev libraries. Would it be possible to back port the fix to main branch?

@tomschenkjr
Copy link
Contributor

Yeah, I’ll throw out a quick patch to the main branch.

From: Nathan Cobb [mailto:notifications@github.com]
Sent: Tuesday, October 27, 2015 5:07 PM
To: Chicago/RSocrata RSocrata@noreply.github.com
Cc: Schenk, Tom Tom.Schenk@cityofchicago.org
Subject: Re: [RSocrata] Timeouts reading JSON? (#62)

The dev branch seems to have a lot going on and our Linux server is missing many of the new required libraries (V8, GDAL, GEOS). I was unable to get rgdal to build (missing proj_api.h) even after installing the what seems to be the appropriate dev libraries. Would it be possible to back port the fix to main branch?


Reply to this email directly or view it on GitHubhttps://github.com//issues/62#issuecomment-151659639.


This e-mail, and any attachments thereto, is intended only for use by the addressee(s) named herein and may contain legally privileged and/or confidential information. If you are not the intended recipient of this e-mail (or the person responsible for delivering this document to the intended recipient), you are hereby notified that any dissemination, distribution, printing or copying of this e-mail, and any attachment thereto, is strictly prohibited. If you have received this e-mail in error, please respond to the individual sending the message, and permanently delete the original and any copy of any e-mail and printout thereof.

@tomschenkjr
Copy link
Contributor

Haven't thrown-up the patch yet. I think this bug is a little different than I though. Still planning on a patch release, just wanting to make sure I actually squash the issue.

@mnaminal
Copy link

I'm having the same problem with Medicare.gov datasets, such as:
https://data.medicare.gov/resource/rbry-mqwu.json.

I get the same error message natecobb posted. I'm a novice to R and programming for that matter, so it's highly possible I'm missing the solution somewhere, but is this something I can fix or should I start looking for alternative input methods?

@natecobb
Copy link
Author

As a workaround you can pull the data directly as CSV, ie:

library(curl)
cdi_url = "https://data.medicare.gov/resource/rbry-mqwu.csv"
read.csv(curl(cdi_url)) 

@stuagano
Copy link

stuagano commented Feb 8, 2016

@cityofchicago @tomschenkjr does this have anything to do with N/As in the data? RJSONIO looks to make you require binding those with Sapply.

Heres from a stackoverflow article
json_file <- '[{"name":"Doe, John","group":"Red","age (y)":24,"height (cm)":182,"wieght (kg)":74.8,"score":null},
{"name":"Doe, Jane","group":"Green","age (y)":30,"height (cm)":170,"wieght (kg)":70.1,"score":500},
{"name":"Smith, Joan","group":"Yellow","age (y)":41,"height (cm)":169,"wieght (kg)":60,"score":null},
{"name":"Brown, Sam","group":"Green","age (y)":22,"height (cm)":183,"wieght (kg)":75,"score":865},
{"name":"Jones, Larry","group":"Green","age (y)":31,"height (cm)":178,"wieght (kg)":83.9,"score":221},
{"name":"Murray, Seth","group":"Red","age (y)":35,"height (cm)":172,"wieght (kg)":76.2,"score":413},
{"name":"Doe, Jane","group":"Yellow","age (y)":22,"height (cm)":164,"wieght (kg)":68,"score":902}]'

json_file <- fromJSON(json_file)

json_file <- lapply(json_file, function(x) {
x[sapply(x, is.null)] <- NA
unlist(x)
})

http://stackoverflow.com/questions/16947643/getting-imported-json-data-into-a-data-frame-in-r

It looks like when you do to .csv like what @natecobb is doing it has some default way of including the N/A's

@tomschenkjr
Copy link
Contributor

Actually, the JSON--when there is a missing value--just doesn't return the field. That is:

{"name":"Doe, Jane","group":"Green","age (y)":30,"height (cm)":170,"wieght (kg)":70.1,"score":500},
{"name":"Smith, Joan","group":"Yellow","age (y)":41,"height (cm)":169,"wieght (kg)":60},

So, it's a vector of different lengths. That will need to be reconciled before binding everything.

@geneorama
Copy link
Member

I think the problem emerges here:
https://github.com/Chicago/RSocrata/blob/dev/R/RSocrata.R#L248

    while (nrow(page) > 0) { # more to come maybe?
        query <- paste(validUrl, if(is.null(parsedUrl$query)) {'?'} else {"&"}, '$offset=', nrow(result), sep='')
        response <- getResponse(query, email, password)
        page <- getContentAsDataFrame(response)
        result <- rbind(result, page) # accumulate
    }

In my test the JSON the result only has 1 row, so the offset is only incremented by 1... which means that any large data set would time out.

The structure of result within the loop:
image

@tomschenkjr tomschenkjr modified the milestone: v1.7.1 Oct 4, 2016
@nicklucius
Copy link
Contributor

This is the same underlying issue as #19 and closed by #102.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

6 participants