Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Timestamp stripped off COLLECTION_START and COLLECTION_END in historic db #79

Closed
ateucher opened this issue Aug 11, 2023 · 12 comments
Closed

Comments

@ateucher
Copy link
Contributor

ateucher commented Aug 11, 2023

library(rems)
library(dplyr)

con <- connect_historic_db()
#> Please remember to use 'disconnect_historic_db()' when you are finished querying the historic database.

tbl <- attach_historic_data(con)

filter(tbl, EMS_ID == "0400203", PARAMETER_CODE %in% c("0102", "0110")) |> 
  select(EMS_ID, COLLECTION_START, LOCATION_PURPOSE, PARAMETER) |> 
  collect() |> 
  as.data.frame() |> 
  head()
#>    EMS_ID COLLECTION_START LOCATION_PURPOSE                       PARAMETER
#> 1 0400203       1985-06-12            TREND Alkalinity Total 4.5 (as CaCO3)
#> 2 0400203       1983-06-01            TREND Alkalinity Total 4.5 (as CaCO3)
#> 3 0400203       1986-10-09            TREND Alkalinity Total 4.5 (as CaCO3)
#> 4 0400203       1987-08-11            TREND Alkalinity Total 4.5 (as CaCO3)
#> 5 0400203       1988-03-08            TREND Alkalinity Total 4.5 (as CaCO3)
#> 6 0400203       1984-11-20            TREND Alkalinity Total 4.5 (as CaCO3)

disconnect_historic_db()

Created on 2023-08-11 with reprex v2.0.2

This is caused by duckdb/duckdb#8547 (which might actually be an R bug... I've made an inquiry - TBD). It should be fixed by should be fixed by duckdb/duckdb#8548.

Since this is out of our hands I suggest we do a 0.8.1 release (now that we've fixed #76), and hold our noses and once it's fixed we bump the duckdb dependency version...

@KarHarker what do you think?

@ateucher
Copy link
Contributor Author

ateucher commented Aug 14, 2023

duckdb/duckdb#8548 has been merged, so once they do another release we can close this.

@KarHarker
Copy link
Collaborator

Hi @ateucher - sorry for the delay. Do you think it makes sense to wait for the latest duckdb release to do ours?

@ateucher
Copy link
Contributor Author

I guess the two options are:

  1. Leave as-is, where most people can't download and store the historic database, OR
  2. Release, so people can get the db but times will be missing from the two datetime columns.

What do you think would be best for users?

@KarHarker
Copy link
Collaborator

Hm. Yeah, I think we should release it and maybe add a caveat to the release that mentions the datetime issue.

@ateucher
Copy link
Contributor Author

ateucher commented Aug 17, 2023

We could add a message to the connect_historic_db() for now? Conditional on the duckdb and R version?

@ateucher
Copy link
Contributor Author

ateucher commented Aug 17, 2023

Like this:

  if (getRversion() >= numeric_version("4.3") && 
      packageVersion("duckdb") <= numeric_version("0.8.1.1")) {
    warning("This version of rems running under R 4.3 causes the time component of COLLECTION_START and COLLECTION_END to be omitted when query results are returned. This will be fixed soon via the next release of the duckdb package (See https://github.com/bcgov/rems/issues/79).")
  }

@KarHarker
Copy link
Collaborator

Yep, I think that sounds like a great approach!

@KarHarker
Copy link
Collaborator

KarHarker commented Aug 17, 2023

I can implement that change and do the release since you have done all the heavy lifting :)

@ateucher
Copy link
Contributor Author

Awesome, thanks!

KarHarker added a commit that referenced this issue Aug 17, 2023
KarHarker added a commit that referenced this issue Aug 17, 2023
added information about temporary warning message for issue #79
@ateucher
Copy link
Contributor Author

ateucher commented Sep 5, 2023

This looks to be fixed in duckdb 0.8.1-3:

library(rems)
library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union

con <- connect_historic_db()
#> Please remember to use 'disconnect_historic_db()' when you are finished querying the historic database.

tbl <- attach_historic_data(con)

filter(tbl, EMS_ID == "0400203", PARAMETER_CODE %in% c("0102", "0110")) |> 
  select(EMS_ID, COLLECTION_START, COLLECTION_END, PARAMETER) |> 
  collect() |> 
  as.data.frame() |> 
  head()
#>    EMS_ID    COLLECTION_START      COLLECTION_END
#> 1 0400203 1985-06-12 10:30:00 1985-06-12 10:30:00
#> 2 0400203 1983-06-01 10:00:00 1983-06-01 10:00:00
#> 3 0400203 1986-10-09 10:15:00 1986-10-09 10:15:00
#> 4 0400203 1987-08-11 14:00:00 1987-08-11 14:00:00
#> 5 0400203 1988-03-08 10:30:00 1988-03-08 10:30:00
#> 6 0400203 1984-11-20 00:00:00 1984-11-20 00:00:00
#>                         PARAMETER
#> 1 Alkalinity Total 4.5 (as CaCO3)
#> 2 Alkalinity Total 4.5 (as CaCO3)
#> 3 Alkalinity Total 4.5 (as CaCO3)
#> 4 Alkalinity Total 4.5 (as CaCO3)
#> 5 Alkalinity Total 4.5 (as CaCO3)
#> 6 Alkalinity Total 4.5 (as CaCO3)

Created on 2023-09-05 with reprex v2.0.2

@KarHarker
Copy link
Collaborator

Ah, sorry I missed your note - @ateucher, it looks updated on my end as well. Thanks for keeping your eye on this!!

I modified warning message to alert users that a duckdb update may be required (acbfc43)

@ateucher
Copy link
Contributor Author

ateucher commented Oct 12, 2023 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants