-
Notifications
You must be signed in to change notification settings - Fork 83
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
read.socrata hanging on JSON format #96
Comments
@kevinsmgov Thanks for the nice example and suggestions. I was going to implement the third suggestion (I agree with your assessment). However, I'm unable to reproduce the exact error. library(RSocrata)
read.socrata("https://data.smgov.net/resource/xx64-wi4x.json?$select=incident_number,incident_date,call_type,received_time,cleared_time,census_tract_2010_geoid&$where=incident_date=%272016-08-21%27") results in this error: Error in rbind(deparse.level, ...) :
numbers of columns of arguments do not match I just updated all my packages to see if I was missing something that you might have. Here's my R version 3.3.1 (2016-06-21)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 16.04.1 LTS
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] RSocrata_1.7.0-14
loaded via a namespace (and not attached):
[1] httr_1.2.1 R6_2.1.3 tools_3.3.1 curl_1.2 jsonlite_1.0 mime_0.5 Is yours similar? |
Sorry, I mixed up some debugging information. The issue with the I'm still trying to figure out why that field is generating that error. |
OK, the issue with that field (census_tract_2010_geoid) is that it is sometime null. When Socrata serializes JSON it leaves out null variables. When the rbind occurs at line 270, it errors out because some rows have a different number of values. There's probably not an easy answer for you here. In our SODA.NET library, we require the user to provide a target model to query into (we don't try to build a model from just their query result). So, for JSON users, we'll need to restrict our queries to make sure that no null values are returned (e.g. https://data.smgov.net/resource/xx64-wi4x.json?$select=incident_number,incident_date,call_type,received_time,cleared_time,census_tract_2010_geoid&$where=incident_date=%272016-08-27%27%20and%20census_tract_2010_geoid%20is%20not%20null) Additionally, you might enhance your CSV version to accept SoQL in the URL. That format guarantees a full tabular output regardless of null values. |
The "uneven row length in json downloads" problem is an old one, it's documented in #19 and came up in again in #33. That's a tough one to fix, which is why it's still outstanding. Part of the complication is that the dataset columns have different names depending on whether they're CSV or JSON. Also because of nesting the JSON columns don't map 1:1 to CSV columns so it's not easy to map using the meta data. |
Thanks, this was very helpful in narrowing-down the source of the bug. Marking as duplicate and closing this issue so the discussing can be consolidated into #19 |
@tomschenkjr sorry for the confusion, a lot of our dialogue was addressing an error that @kevinsmgov accidentally introduced at the last minute (the JSON uneven row error). The problem of the infinite loop still occurs with his modified url: I fixed this and I'm creating a pull request. All tests pass. I don't know how to add a test for infinite loops (I'm sure there's a way, but I wanted to get something pushed before I head out for the night). |
Added test for the url above, pull request is updated. |
changed test for empty json in getContentAsDataFrame(). Closes #96
@geneorama - I've deleted the branch from GitHub since the fix was merged into |
When I attempt to use read.socrata with a JSON format, the process hangs up.
(example url https://data.smgov.net/resource/xx64-wi4x.json?$select=incident_number,incident_date,call_type,received_time,cleared_time,census_tract_2010_geoid&$where=incident_date=%272016-08-21%27)
Debugging through the process appears to show the problem in the getContentAsDataFrame function. When testing the JSON response for the end of a paged sequence:
if(httr::content(response, as = 'text') == "[ ]") # empty json?
(line 196)
but the string value I'm seeing at that point is
"[]\n"
So it's not matching and looping forever. Perhaps Socrata is using a different JSON serializer now than when this logic was originally written. (or, I may be using the package incorrectly - let me know if this appears to be the case)
Some possible suggestions:
The text was updated successfully, but these errors were encountered: