Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consolidate vars_dict.csv to focus exclusively on renaming/recoding #6

Merged
merged 4 commits into from
Sep 14, 2023

Conversation

jeancochrane
Copy link
Contributor

@jeancochrane jeancochrane commented Sep 8, 2023

This PR consolidates the vars_dict.csv data file so that it can be used exclusively for the purposes of renaming and recoding variables, rather than documenting them (which is now owned by our dbt setup in https://github.com/ccao-data/data-architecture/). This consolidation involves two steps:

  1. Removing all columns that are related to data lineage or documentation
  2. Removing all rows for derived data that are no longer used anymore

Here's the command I ran to select the new set of columns and accomplish step 1 above:

csvcut -c "var_name_hie,var_name_iasworld,var_name_athena,var_name_model,var_name_publish,var_name_pretty,var_type,var_data_type,var_code,var_value,var_value_short" data-raw/vars_dict.csv > data-raw/vars_dict_edit.csv
mv data-raw/vars_dict_edit.csv data-raw/vars_dict.csv

These are the columns that got deleted by this command:

  • var_from_source
  • var_from_table
  • var_from_ctas
  • var_from_view
  • var_model_type
  • var_is_published
  • var_is_predictor
  • var_notes

And here are the derived rows I deleted to accomplish step 2, approved by Billy:

  • char_ot_impr
  • char_volume
  • strata_1 and strata_2
  • lag_*(e.g. lag_meta_sale_price, lag_char_bldg_sf)
  • hie_num_active
  • hie_num_expired

Closes #2.

@jeancochrane jeancochrane requested a review from a team as a code owner September 8, 2023 20:56
@jeancochrane jeancochrane marked this pull request as draft September 8, 2023 20:57
@codecov
Copy link

codecov bot commented Sep 8, 2023

Codecov Report

Merging #6 (4214b31) into master (1f56003) will not change coverage.
The diff coverage is n/a.

❗ Current head 4214b31 differs from pull request most recent head 9ad9528. Consider uploading reports for the commit 9ad9528 to get more accurate results

@@            Coverage Diff            @@
##            master        #6   +/-   ##
=========================================
  Coverage   100.00%   100.00%           
=========================================
  Files            6         6           
  Lines          382       382           
=========================================
  Hits           382       382           

@jeancochrane jeancochrane marked this pull request as ready for review September 8, 2023 22:03
Copy link
Member

@dfsnow dfsnow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jeancochrane A couple changes to make here:

data-raw/vars_dict.csv is the raw data that creates the vars_dict data.frame stored in the R package. This PR edits the raw data, but still needs to do the following:

  • Run data-raw/vars_dict.R. This will save the CSV as an R data file (data/vars_dict.Rda). This file is what the unit tests and package run against.
  • Update the docstring in R/data.R. This is where help docs for the packaged data.frame are kept. It needs to be updated manually to reflect the changes to the raw data.
  • Run devtools::test() to make sure all unit tests pass with the new dictionary.
  • Run devtools::document() to generate new man files from the changes to R/data.R.
  • Run devtools::check() to run a test package build.
  • Run pkgdown::build_site() to make sure the changes are reflected on the docs website.

I'm happy to do this if you'd like, just let me know!

@jeancochrane
Copy link
Contributor Author

jeancochrane commented Sep 12, 2023

@dfsnow I regenerated data and docs in 3226075 and added an issue template documenting these steps in 9ad9528! Pasting the output from the commands below since I'm still getting my bearings with R:

Click for R command history
> devtools::test()
? Testing ccao
? | F W S  OK | Context
? |        16 | test ccao_cod()
? |        19 | test ccao_prd() [1.3s]
? |        18 | test ccao_prb()
? |        13 | test ccao_generate_id()
? |        15 | test chars_288_active()
? |         2 | test chars_sparsify()
? |         1 | test chars_update()
? |         9 | test chars_fix_age()
? |         5 | test pin_clean()
? |         8 | test pin_format_pretty()
? |         7 | test convert_town()
? |         6 | test town_get_triad()
? |        12 | test town_get_assmnt_year()
? |         6 | test val_limit_ratios()
? |        13 | test val_round_fmv()
? |        15 | test vars_check_class()
? |        16 | test vars_rename()
? |        11 | test vars_recode()

══ Results ═════════════════════════════════════════════════════════════════════════════════════════════════════════════════════
Duration: 4.2 s

[ FAIL 0 | WARN 0 | SKIP 0 | PASS 192 ]

> devtools::document()
? Updating ccao documentation
? Loading ccao
Writing vars_dict.Rd

> devtools::check()
══ Documenting ═════════════════════════════════════════════════════════════════════════════════════════════════════════════════
? Updating ccao documentation
? Loading ccao

══ Building ════════════════════════════════════════════════════════════════════════════════════════════════════════════════════
Setting env vars:CFLAGS    : -Wall -pedantic -fdiagnostics-color=alwaysCXXFLAGS  : -Wall -pedantic -fdiagnostics-color=alwaysCXX11FLAGS: -Wall -pedantic -fdiagnostics-color=alwaysCXX14FLAGS: -Wall -pedantic -fdiagnostics-color=alwaysCXX17FLAGS: -Wall -pedantic -fdiagnostics-color=alwaysCXX20FLAGS: -Wall -pedantic -fdiagnostics-color=always
── R CMD build ─────────────────────────────────────────────────────────────────────────────────────────────────────────────────
?  checking for file `/home/jecochr/ccao/DESCRIPTION' ...
─  preparing `ccao':
?  checking DESCRIPTION meta-information ...checking for LF line-endings in source and make files and shell scriptschecking for empty or unneeded directoriesbuilding `ccao_1.2.2.tar.gz'

══ Checking ════════════════════════════════════════════════════════════════════════════════════════════════════════════════════
Setting env vars:
� _R_CHECK_CRAN_INCOMING_REMOTE_               : FALSE
� _R_CHECK_CRAN_INCOMING_                      : FALSE
� _R_CHECK_FORCE_SUGGESTS_                     : FALSE
� _R_CHECK_PACKAGES_USED_IGNORE_UNUSED_IMPORTS_: FALSE
� NOT_CRAN                                     : true
── R CMD check ─────────────────────────────────────────────────────────────────────────────────────────────────────────────────
─  using log directory `/tmp/RtmpPKvSxu/file31bbbd277f7eb1/ccao.Rcheck'using R version 4.2.2 Patched (2022-11-10 r83330)
─  using platform: x86_64-pc-linux-gnu (64-bit)
─  using session charset: UTF-8using options `--no-manual --as-cran'
?  checking for file `ccao/DESCRIPTION' ...checking extension type ... Packagethis is package `ccao' version `1.2.2''package encoding: UTF-8
?  checking package namespace informationchecking package dependencies ...Warning in url(sprintf("%s/%s", cran, path), open = "rb") : (1.8s)
     cannot open URL 'https://packagemanager.posit.co/cran/__linux__/jammy/latest/web/packages/packages.rds': HTTP status was '404 Not Found'
   Error in url(sprintf("%s/%s", cran, path), open = "rb") :
     cannot open the connection to 'https://packagemanager.posit.co/cran/__linux__/jammy/latest/web/packages/packages.rds'
   Execution halted

── R CMD check results ───────────────────────────────────────────────────────────────────────────────────────── ccao 1.2.2 ────
Duration: 2.6s

0 errors ? | 0 warnings ? | 0 notes ?

> pkgdown::build_site()
-- Installing package into temporary library -----------------------------------------------------------------------------------
Warning message:
renv took longer than expected (10 seconds) to activate the sandbox.

The sandbox can be disabled by setting:

    RENV_CONFIG_SANDBOX_ENABLED = FALSE

within an appropriate start-up .Renviron file.

See `?renv::config` for more details.
== Building pkgdown site =======================================================
Reading from: '/home/jecochr/ccao'
Writing to:   '/home/jecochr/ccao/public'
-- Initialising site -----------------------------------------------------------
Copying '../../../mnt/shared/renv/cache/v5/R-4.2/x86_64-pc-linux-gnu/pkgdown/2.0.7/16fa15449c930bf3a7761d3c68f8abf9/pkgdown/BS3/assets/bootstrap-toc.css' to 'bootstrap-toc.css'
Copying '../../../mnt/shared/renv/cache/v5/R-4.2/x86_64-pc-linux-gnu/pkgdown/2.0.7/16fa15449c930bf3a7761d3c68f8abf9/pkgdown/BS3/assets/bootstrap-toc.js' to 'bootstrap-toc.js'
Copying '../../../mnt/shared/renv/cache/v5/R-4.2/x86_64-pc-linux-gnu/pkgdown/2.0.7/16fa15449c930bf3a7761d3c68f8abf9/pkgdown/BS3/assets/docsearch.css' to 'docsearch.css'
Copying '../../../mnt/shared/renv/cache/v5/R-4.2/x86_64-pc-linux-gnu/pkgdown/2.0.7/16fa15449c930bf3a7761d3c68f8abf9/pkgdown/BS3/assets/docsearch.js' to 'docsearch.js'
Copying '../../../mnt/shared/renv/cache/v5/R-4.2/x86_64-pc-linux-gnu/pkgdown/2.0.7/16fa15449c930bf3a7761d3c68f8abf9/pkgdown/BS3/assets/link.svg' to 'link.svg'
Copying '../../../mnt/shared/renv/cache/v5/R-4.2/x86_64-pc-linux-gnu/pkgdown/2.0.7/16fa15449c930bf3a7761d3c68f8abf9/pkgdown/BS3/assets/pkgdown.css' to 'pkgdown.css'
Copying '../../../mnt/shared/renv/cache/v5/R-4.2/x86_64-pc-linux-gnu/pkgdown/2.0.7/16fa15449c930bf3a7761d3c68f8abf9/pkgdown/BS3/assets/pkgdown.js' to 'pkgdown.js'
Copying 'pkgdown/extra.css' to 'extra.css'
Copying 'pkgdown/favicon/apple-touch-icon-120x120.png' to 'apple-touch-icon-120x120.png'
Copying 'pkgdown/favicon/apple-touch-icon-60x60.png' to 'apple-touch-icon-60x60.png'
Copying 'pkgdown/favicon/apple-touch-icon-76x76.png' to 'apple-touch-icon-76x76.png'
Copying 'pkgdown/favicon/apple-touch-icon.png' to 'apple-touch-icon.png'
Copying 'pkgdown/favicon/favicon-16x16.png' to 'favicon-16x16.png'
Copying 'pkgdown/favicon/favicon-32x32.png' to 'favicon-32x32.png'
Copying 'pkgdown/favicon/favicon.ico' to 'favicon.ico'
Copying 'man/figures/logo.png' to 'logo.png'
-- Building home ---------------------------------------------------------------
Writing 'authors.html'
Writing 'LICENSE-text.html'
Copying 'man/figures/README-colors-1.png' to 'reference/figures/README-colors-1.png'
Copying 'man/figures/logo-large.png' to 'reference/figures/logo-large.png'
Copying 'man/figures/logo.png' to 'reference/figures/logo.png'
Writing '404.html'
-- Building function reference -------------------------------------------------
Writing 'reference/index.html'
Reading 'man/ccao_colors.Rd'
Writing 'reference/ccao_colors.html'
Reading 'man/ccao_funs.Rd'
Writing 'reference/ccao_funs.html'
Reading 'man/ccao_generate_id.Rd'
Writing 'reference/ccao_generate_id.html'
Reading 'man/cdu_dict.Rd'
Writing 'reference/cdu_dict.html'
Reading 'man/chars_288_active.Rd'
Writing 'reference/chars_288_active.html'
Reading 'man/chars_cols.Rd'
Writing 'reference/chars_cols.html'
Reading 'man/chars_fix_age.Rd'
Writing 'reference/chars_fix_age.html'
Reading 'man/chars_sample_athena.Rd'
Writing 'reference/chars_sample_athena.html'
Reading 'man/chars_sample_hie.Rd'
Writing 'reference/chars_sample_hie.html'
Reading 'man/chars_sample_universe.Rd'
Writing 'reference/chars_sample_universe.html'
Reading 'man/chars_sparsify.Rd'
Writing 'reference/chars_sparsify.html'
Reading 'man/chars_update.Rd'
Writing 'reference/chars_update.html'
Reading 'man/class_dict.Rd'
Writing 'reference/class_dict.html'
Reading 'man/coe_dict.Rd'
Writing 'reference/coe_dict.html'
Reading 'man/nbhd_shp.Rd'
Writing 'reference/nbhd_shp.html'
Reading 'man/pin_clean.Rd'
Writing 'reference/pin_clean.html'
Reading 'man/pin_format_pretty.Rd'
Writing 'reference/pin_format_pretty.html'
Reading 'man/town_convert.Rd'
Writing 'reference/town_convert.html'
Reading 'man/town_dict.Rd'
Writing 'reference/town_dict.html'
Reading 'man/town_get_assmnt_year.Rd'
Writing 'reference/town_get_assmnt_year.html'
Reading 'man/town_get_triad.Rd'
Writing 'reference/town_get_triad.html'
Reading 'man/town_shp.Rd'
Writing 'reference/town_shp.html'
Reading 'man/val_limit_ratios.Rd'
Writing 'reference/val_limit_ratios.html'
Reading 'man/val_round_fmv.Rd'
Writing 'reference/val_round_fmv.html'
Reading 'man/vars_check_class.Rd'
Writing 'reference/vars_check_class.html'
Reading 'man/vars_dict.Rd'
Writing 'reference/vars_dict.html'
Reading 'man/vars_dict_legacy.Rd'
Writing 'reference/vars_dict_legacy.html'
Reading 'man/vars_recode.Rd'
Writing 'reference/vars_recode.html'
Reading 'man/vars_rename.Rd'
Writing 'reference/vars_rename.html'
Writing 'sitemap.xml'
== DONE ========================================================================
-- Previewing site -------------------------------------------------------------------------------------------------------------

The only part of this that looks worrisome to me is the end of the devtools::check() output:

─  checking package dependencies ...Warning in url(sprintf("%s/%s", cran, path), open = "rb") : (1.8s)
     cannot open URL 'https://packagemanager.posit.co/cran/__linux__/jammy/latest/web/packages/packages.rds': HTTP status was '404 Not Found'
   Error in url(sprintf("%s/%s", cran, path), open = "rb") :
     cannot open the connection to 'https://packagemanager.posit.co/cran/__linux__/jammy/latest/web/packages/packages.rds'
   Execution halted

Does this seem concerning to you? On the one hand, it's labeled as a warning and didn't cause the check to fail. On the other hand, it does seem like an error that caused some piece of code to halt execution. Let me know if you think it's a problem and I can try to resolve it.

@dfsnow
Copy link
Member

dfsnow commented Sep 13, 2023

Does this seem concerning to you? On the one hand, it's labeled as a warning and didn't cause the check to fail. On the other hand, it does seem like an error that caused some piece of code to halt execution. Let me know if you think it's a problem and I can try to resolve it.

I'm guessing this is/was just a momentary problem with posit's servers, since I can't reproduce it on my end. Let me re-review and then we can merge. If it comes up in the CI we can re-open.

@jeancochrane jeancochrane merged commit 8b4ed84 into master Sep 14, 2023
10 checks passed
@jeancochrane jeancochrane deleted the jeancochrane/consolidate-vars-dict branch September 14, 2023 14:37
@jeancochrane jeancochrane mentioned this pull request Sep 14, 2023
10 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Move vars_dict out of this package to a shared location
2 participants