Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

non-logical datasets shouldn't start with "is_" #2

Open
alistaire47 opened this issue Jul 12, 2017 · 0 comments
Open

non-logical datasets shouldn't start with "is_" #2

alistaire47 opened this issue Jul 12, 2017 · 0 comments

Comments

@alistaire47
Copy link
Owner

alistaire47 commented Jul 12, 2017

Currently is_developed and is_independent return character vectors:

unique(passport:::countries$is_developed)
#> [1] NA           "Developed"  "Developing"

unique(passport:::countries$is_independent)
#>  [1] NA                       "Yes"                   
#>  [3] "Territory of GB"        "International"         
#>  [5] "Territory of US"        "Part of NL"            
#>  [7] "Part of FI"             "Part of FR"            
#>  [9] "Territory of NO"        "Territory of AU"       
#> [11] "Associated with NZ"     "In contention"         
#> [13] "Part of DK"             "Crown dependency of GB"
#> [15] "Part of CN"             "Commonwealth of US"    
#> [17] "Territory of FR"        "Territory of NZ"       
#> [19] "Territories of US"

If they're going to start with is_, they should really return logical vectors. To address the issue, they could

  • drop information to actually return a logical
  • get renamed
  • be split in two, e.g. is_independent and dependency_status

None of these options is really ideal, as the expectation of as_country_code and as_country_name is usually to return a character vector or factor. They are not the only exceptions:

code_types <- sapply(passport:::countries, typeof) 

code_types[code_types != 'character']
#>                        gaul              un_region_code 
#>                    "double"                   "integer" 
#>           un_subregion_code un_intermediate_region_code 
#>                   "integer"                   "integer" 
#>                         m49                         ldc 
#>                   "integer"                   "logical" 
#>                        lldc                        sids 
#>                   "logical"                   "logical"

Numeric country codes (gaul, un_*_code, m49) are a different issue. Perhaps they should be strings, as they should not be operated upon, but converting them to factors is potentially very confusing and may merit a warning or message.

Country groupings (ldc, lldc, sids, un_*_code) will be addressed by #1 (though they face the same type issue).

These two (plus a lot more) should be split into a separate set of country attributes (#3), but the issue will still have to be addressed within that dataset.

This will be a breaking change, but integrating the change with #3 will minimize disruption.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant