Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failing CRAN checks - fix by 4/21 to keep on CRAN #166

Closed
nicklucius opened this issue Apr 14, 2019 · 6 comments
Closed

Failing CRAN checks - fix by 4/21 to keep on CRAN #166

nicklucius opened this issue Apr 14, 2019 · 6 comments

Comments

@nicklucius
Copy link
Contributor

See the errors here: https://cran.r-project.org/web/checks/check_results_RSocrata.html

@tomschenkjr
Copy link
Contributor

Wow! Where did this come from? Any idea on why this popped-up?

@nicklucius
Copy link
Contributor Author

nicklucius commented Apr 15, 2019 via email

@geneorama
Copy link
Member

I didn't have to look far to find the problem, the very first test in tests\testthat\test-all.R is failing:

test_that("read Socrata CSV is compatible with posixify", {
  df <- read.socrata('http://soda.demo.socrata.com/resource/4334-bgaj.csv')
  dt <- posixify("09/14/2012 10:38:01 PM")
  expect_equal(dt, df$Datetime[1])  ## Check that download matches test
})

At first glance, it appears that Socrata has changed the name of the csv version to be consistent with the json version, but it also appears that they've made the date time formats the same:

> readLines("http://soda.demo.socrata.com/resource/4334-bgaj.csv?$WHERE=earthquake_id='10555601'", n=2)
[1] "\"datetime\",\"depth\",\"earthquake_id\",\"location\",\"magnitude\",\"number_of_stations\",\"region\",\"source\",\"version\""
[2] "\"2012-09-10T13:16:13.000\",\"11.60\",\"10555601\",\"(63.1085, -151.4938)\",\"1.1\",\"10\",\"Central Alaska\",\"ak\",\"2\""  
> readLines("http://soda.demo.socrata.com/resource/9szf-fbd4.json?$WHERE=earthquake_id='10555601'", n=2)
[1] "[{\"datetime\":\"2012-09-10T13:16:13.000\",\"depth\":\"11.60\",\"earthquake_id\":\"10555601\",\"location\":{\"type\":\"Point\",\"coordinates\":[-151.4938,63.1085]},\"magnitude\":\"1.1\",\"number_of_stations\":\"10\",\"region\":\"Central Alaska\",\"source\":\"ak\",\"version\":\"2\"}]"

Normally the date times would not be in the same format, would they? 2012-09-10T13:16:13.000

@geneorama
Copy link
Member

I should have said "I didn't have to look far to find the *first problem", because that was just the first test.

Looking at the next test, "read Socrata CSV as default" the problem seems to be that the column types have changed. We're expecting this:

> c("character", "character", "character", "POSIXct", "numeric", 
+                  "numeric", "integer", "character", "character")
[1] "character" "character" "character" "POSIXct"   "numeric"   "numeric"   "integer"   "character"
[9] "character"

but we're getting this:

> unname(sapply(sapply(df, class),`[`, 1))
[1] "POSIXct"   "numeric"   "character" "character" "numeric"   "integer"   "character" "character"
[9] "character"

The new classes look right based on the data in

df <- read.socrata('https://soda.demo.socrata.com/resource/4334-bgaj.csv')
> head(df)
             datetime depth earthquake_id             location magnitude number_of_stations
1 2012-09-14 22:38:01   7.6      00388610 (41.1085, -117.6135)       2.7                 15
2 2012-09-14 22:14:45  10.6      15215753  (34.525, -118.1527)       1.5                 35
3 2012-09-14 22:14:21   0.0      71842370 (38.8023, -122.7685)       1.4                 21
4 2012-09-14 22:10:19   8.2      00388609 (36.9447, -117.6778)       1.5                 29
5 2012-09-14 22:06:11   6.4      00388607 (36.9417, -117.6903)       1.7                 29
6 2012-09-14 21:28:55  20.0      12258012  (19.7859, -64.0849)       3.1                  6
                       region source version
1                      Nevada     nn       9
2         Southern California     ci       0
3         Northern California     nc       0
4          Central California     nn       9
5          Central California     nn       9
6 north of the Virgin Islands     pr       0

I'm inclined to update the reference column classes to reflect the correct column types.

The decisions on how JSON was parsed in RSocrata go back to several other issues:

@nicklucius
Copy link
Contributor Author

Thanks, @geneorama!

I pushed a few changes to fix the remaining errors:

bcc9164
35e16bd

It looks like this was all caused by 2 apparent changes to the Socrata API.

  1. Field name format
  2. Field order

I believe I've seen a recent ticket with Socrata regarding the field order for the API not matching the order in the dataset. Could # 2 above this be related? In any case, I think at the point we can do a PR to master and new release.

@geneorama
Copy link
Member

@nicklucius thank you for the additional fixes. It's strange, I don't think those tests were failing when I put in my changes the other day. I wonder if they're making more changes on the back end. Also, I too wonder if this is related to the ticket I opened.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants