-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Package review, updates and addition of translation and cleaning functions #16
Package review, updates and addition of translation and cleaning functions #16
Conversation
…(skipped by default)
Thanks, Amy. I think we are going to try to use another function, e.g., get_cases_questionnaire(), with an API call that requests the output at CSV. We think this should take care of all of the unnesting. Hopefully that works.
-James
From: AmyM ***@***.***>
Sent: Tuesday, March 28, 2023 9:12 PM
To: WorldHealthOrganization/godataR ***@***.***>
Cc: Fuller, James (CDC/DDPHSIS/CGH/DGHP) ***@***.***>; Comment ***@***.***>
Subject: Re: [WorldHealthOrganization/godataR] Package review, updates and addition of translation and cleaning functions (PR #16)
Had meant to add some thoughts to this,some great additions here but on the
subject of questionnaire fields - I thought the nested fields issue had
been mostly taken care of? That is, if you use the correct API they are
not nested any more? Or maybe I am just thinking of the core nested fields
which are now unpacked...
Either way, difficult as they are to deal with I think there needs to be a
level agnostic un-nesting method. I've done it a few times in different
ways, more recently tried to use a dplyr friendly approach. I think I
might have included that in some of the lab2godata code, so have a look
there in case there's anything generalisable. When I have some more time I
will put a code snippet with a suggestion here.
On Tue, 28 Mar 2023 at 09:18, James Fuller ***@***.***<mailto:***@***.***>> wrote:
***@***.****<mailto:***@***.****> commented on this pull request.
------------------------------
On R/clean_cases.R
<#16 (comment)>
:
In general, I think we can remove a lot of the data cleaning steps in the
current version of clean_cases. These may also apply to other cleaning
functions. @sarahollis <https://github.com/sarahollis> anything else to
add?
*Keep as-is:*
1. clean date fields
2. clean field names
3. current address location
*Modify:*
4. clean age_years & age_months; generate numeric age variable; but no
need to create the age category field
5. can we use translate categories as part of the clean_cases function?
*Remove:*
6. vaccine data, hospitalization/isolation/icu data
7. remove the final step that only keeps certain fields.
*For Discussion:*
8. What to do with questionnaire fields? should we keep them? should we
use the get_cases(file_type="csv") to get them as a flat version?
9. Should we remove all nested fields? yes they cause problems for
exporting to a flat file, but could be frustrating if the data are needed
—
Reply to this email directly, view it on GitHub
<#16 (review)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AB5XMAWQLVEXHMYGKS42D23W6LQJBANCNFSM6AAAAAAV3VXLG4>
.
You are receiving this because you are subscribed to this thread.Message
ID: ***@***.***<mailto:***@***.***>>
--
*--*
*Amy Mikhail*
*Tel UK (WhatsApp):* +44 781 417 6107
*Gmail: ***@***.***<mailto:***@***.***>
*Skype: *amy.fwmikhail
—
Reply to this email directly, view it on GitHub<#16 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AQJO37RGM26CHZ7EHXUSBBLW6OD4LANCNFSM6AAAAAAV3VXLG4>.
You are receiving this because you commented.Message ID: ***@***.******@***.***>>
|
@jamesfuller-cdc @sarahollis I've added the first draft of a Some design considerations:
|
Update: This is because your NAMESPACE is not populated correctly. Update: After populating NAMEPSACE with Update: Here's a reprex @joshwlambert - Previously: "not sure whether some functions have been removed in the interim, because the example from Previously: "Now, see error in suppressWarnings({library(godataR)})
devtools::package_info("godataR", dependencies = FALSE)
#> package * version date (UTC) lib source
#> godataR * 2.0.0 2023-04-03 [1] local
#>
#> [1] /Library/Frameworks/R.framework/Versions/4.2-arm64/Resources/library
# Your Go.Data URL
url <- "https://godata-r19.who.int/"
# Your email address to log in to Go.Data
username <- getPass::getPass(msg = "Enter your Go.Data username (email address):")
#> Please enter password in TK window (Alt+Tab)
# Your password to log in to Go.Data
password <- getPass::getPass(msg = "Enter your Go.Data password:")
#> Please enter password in TK window (Alt+Tab)
# Get ID for active outbreak:
outbreak_id <- godataR::get_active_outbreak(url = url,
username = username,
password = password)
# get cases from current outbreak
cases <- get_cases(
url = url,
username = username,
password = password,
outbreak_id = outbreak_id
)
#> ...beginning download
#> ...download complete!
locations <- get_locations(
url = url,
username = username,
password = password
)
locations_clean <- clean_locations(locations = locations)
case_address_history <- clean_case_address_history(
cases = cases,
locations_clean = locations_clean
)
# from example in `clean_cases()` documentation
# other cleaned data required for `clean_cases()`
cases_vacc_history_clean <- clean_case_vax_history(cases = cases)
cases_address_history_clean <- clean_case_address_history(
cases = cases,
locations_clean = locations_clean
)
cases_dateranges_history_clean <- clean_case_med_history(cases = cases)
cases_clean <- clean_cases(
cases = cases,
cases_address_history_clean = cases_address_history_clean,
cases_vacc_history_clean = cases_vacc_history_clean,
cases_dateranges_history_clean = cases_dateranges_history_clean
)
cases_from_contacts <- cases_from_contacts(cases_clean) Created on 2023-04-03 by the reprex package (v2.0.1) |
Hi @joshwlambert just adding results from another look at this code; some issues are listed below from the test suite. These issue primarily relate to the accessed data not having the expected columns, either because some columns are missing, or some have been added. It may be useful to determine whether there is a data schema that could be tested against rather than hardcoding the column name and type expectations. Hope this helps!
|
This PR contains several updates to the package. These are separated into sections, organised by headings, and the change and reasoning for the change is stated.
Package maintenance
promise already under evaluation: recursive default argument reference or earlier problems?
which is not very intuitive especially for a new R users. By removing the recursive argument defaults this will resolve this issue.%>%
) from functions. This makes the code more modular and can improve debugging.importFrom
in function documentation. The choice of using explicit namespacing instead of importing a package in the documentation is subjective, but the benefit of the namespacing is it makes clear which functions come from other packages and which are in {godataR}.devtools::lint()
). This is a style guide set out by the tidyverse team. The use of a consistent, widely used, code style makes it easier to read the code and is likely easier for people to contribute given they are likely familiar with the style. One future additon could be a style check within the package that checks future changes also conform to this style.New functionality
translate_categories()
function (exported), as well astranslate_token()
andany_tokens()
functions (internal). These take the data returned by the API and use the language tokens (returned byget_language_tokens()
) to translate hard to read strings to a simplier form given by the language tokens.Testing
devtools::test()
anddevtools::check()
to run tests.get_*()
functions). These are skipped by default as they require credential to connect to the the API. However, they can be run locally if a user has credientials.Please do not merge pull request until fully reviewed.