version 1.0.1

cran · Dec 22, 2023 · 5883369 · 5883369
commit 5883369
Show file tree

Hide file tree

Showing 63 changed files with 13,317 additions and 0 deletions.
diff --git a/DESCRIPTION b/DESCRIPTION
@@ -0,0 +1,45 @@
+Package: Rmonize
+Type: Package
+Title: Support Retrospective Harmonization of Data
+Version: 1.0.1
+Authors@R: 
+    c(person(given = "Guillaume",
+             family = "Fabre",
+             role = c("aut", "cre"),
+             email = "guijoseph.fabre@gmail.com",
+             comment = c(ORCID = "0000-0002-0124-9970")),
+      person("Maelstrom-research group",
+             role=c("fnd")))
+Maintainer: Guillaume Fabre <guijoseph.fabre@gmail.com>
+Description: Functions to support rigorous retrospective data harmonization 
+    processing, evaluation, and documentation across datasets from different 
+    studies based on Maelstrom Research guidelines. The package includes the 
+    core functions to evaluate and format the main inputs that define the 
+    harmonization process, apply specified processing rules to generate 
+    harmonized data, diagnose processing errors, and summarize and evaluate 
+    harmonized outputs. The main inputs that define the processing are a 
+    DataSchema (list and definitions of harmonized variables to be generated) 
+    and Data Processing Elements (processing rules to be applied to generate 
+    harmonized variables from study-specific variables). The main outputs of 
+    processing are harmonized datasets, associated metadata, and tabular and 
+    visual summary reports. As described in 
+    Maelstrom Research guidelines for rigorous retrospective data 
+    harmonization (Fortier I and al. (2017) <doi:10.1093/ije/dyw075>).
+License: GPL-3
+LazyData: true
+Depends: R (>= 3.4)
+Imports: dplyr (>= 1.1.0), rlang, stringr, tidyr, crayon, haven, utils,
+        fs, fabR (>= 2.0.0), madshapR
+Suggests: janitor, car, knitr
+URL: https://github.com/maelstrom-research/Rmonize/
+BugReports: https://github.com/maelstrom-research/Rmonize/issues
+RoxygenNote: 7.2.3
+VignetteBuilder: knitr
+Encoding: UTF-8
+Language: en-US
+NeedsCompilation: no
+Packaged: 2023-12-20 16:52:38 UTC; guill
+Author: Guillaume Fabre [aut, cre] (<https://orcid.org/0000-0002-0124-9970>),
+  Maelstrom-research group [fnd]
+Repository: CRAN
+Date/Publication: 2023-12-21 16:30:04 UTC
diff --git a/MD5 b/MD5
@@ -0,0 +1,62 @@
+bf50bea9d1cc9129fb706d303b689576 *DESCRIPTION
+0ab457c3e5eec24233535e292f660a49 *NAMESPACE
+fad0853e7a8f13c7f4672c15ccfda2b3 *NEWS.md
+41991d11778dd8e6ae86217d81b48ba3 *R/00-import-from-madshapR.R
+53c13717949302761598bdcdb2bb32a8 *R/01-utils.R
+3a365a666412a54f452bfdfde1b89902 *R/02-harmo_process_harmonization.R
+5da1cd4bc06bbcec23a45962d41680c7 *R/03-harmonized_data_evaluate.R
+0c4fbf469dab718bd61dd3ed475a4078 *R/04-harmonized_data_summarize.R
+258b557a71aa07abc81d5fb7fe916492 *R/05-harmonized_data_visualize.R
+1b54137d4ae82b40a266122de6d1201b *R/Rmonize-package.R
+30473863dc61a6763ab0e3fc9a4414a1 *README.md
+2836b2c5db38c9819380bd093383d89d *build/partial.rdb
+ff70bbda47a299185435e70394cd80f0 *build/vignette.rds
+c55d2d23abff5dca755002de2fd59a6e *data/Rmonize_DEMO.rda
+c1b9c0d0eac32d433fc090109ef1cabe *inst/WORDLIST
+cf0d88805ee1cb048eb11ff1bd88956a *inst/doc/a-Glossary-and-templates.R
+651e86fc34c5a870430596950bac697e *inst/doc/a-Glossary-and-templates.Rmd
+215a19ff1d1ac9ca2e399b2da3df585e *inst/doc/a-Glossary-and-templates.html
+cf0d88805ee1cb048eb11ff1bd88956a *inst/doc/b-Data-processing-elements.R
+dd565779c4eed1cb36ed7bfd60a720e5 *inst/doc/b-Data-processing-elements.Rmd
+0c53ffa75047c9c01a885dbda5da3a19 *inst/doc/b-Data-processing-elements.html
+a4b22c9cec9c2b67460ed40ef757f5af *inst/doc/c-Example-with-Rmonize_DEMO.R
+8c0f211c628d343e5a8f7008cd30491d *inst/doc/c-Example-with-Rmonize_DEMO.Rmd
+06c705ce33467d188b6221e8d77a61db *inst/doc/c-Example-with-Rmonize_DEMO.html
+564666a2a16494478d205ec1f55f31c7 *man/Rmonize-package.Rd
+7fda5bfa7a3d1183f1d8212f421320e9 *man/Rmonize_DEMO.Rd
+c8f7a2640b2e456dcb742a5213a32bd9 *man/Rmonize_help.Rd
+7727ce6a976e9679ed88af24c5d07828 *man/Rmonize_templates.Rd
+d259ad678473874c4306ab68b0872d02 *man/as_data_dict.Rd
+d88ef7e030e090e5df435a0e3672964f *man/as_data_proc_elem.Rd
+b70d6c025e6d08337f6c39a9ab27abba *man/as_dataschema.Rd
+f63f4dec4d027dc32e0fe3c521319737 *man/as_dataschema_mlstr.Rd
+1ab90b88583c61e96914c75fdfb7053e *man/as_dataset.Rd
+4bf4167df0feec0728f471ecfd97e42f *man/as_dossier.Rd
+e157b56ffbab1279572c9b92dda3db94 *man/as_harmonized_dossier.Rd
+5478e2862ba5ad3a8f0f858b6eb5957a *man/bookdown_open.Rd
+553ddc568949babb77aca23a6ae9dd51 *man/data_dict_apply.Rd
+8cc40b23ce5830c623bf02ac4dbdcc98 *man/data_dict_evaluate.Rd
+945e834191fdae7aa32c07fe1797602e *man/data_dict_extract.Rd
+10f1594c21afd01fa8dc8173ef1e5ae5 *man/dataschema_evaluate.Rd
+c740bef912c858af98bf0a2abb5e37df *man/dataschema_extract.Rd
+04b898c19d70dfc9cdfe4fbee31f8f75 *man/dataset_evaluate.Rd
+74c32e28ebd360e8e8c6b1d04c2557f2 *man/dataset_summarize.Rd
+5624151e4884ae4824b7baaf992fd42b *man/dataset_visualize.Rd
+2fad6730b0858d3791518a4a9b862425 *man/dossier_create.Rd
+fbb667e37efc99716e8d9289c81e28d6 *man/dossier_evaluate.Rd
+847635a7d77f99a9a4c055ad20d6a6df *man/dossier_summarize.Rd
+5a7d2982c964337268173b8fd137dd80 *man/figures/fig_readme.png
+ce3dc6dab597db8cc0b7aa7891eed2b1 *man/harmo_process.Rd
+f84312d6c9b24d1c763c178a7b76bb1c *man/harmonized_dossier_evaluate.Rd
+fff6cc8f04ecadf214ed21de9a4cb620 *man/harmonized_dossier_summarize.Rd
+807a1748f5e3c74e8826745bba8f7f67 *man/harmonized_dossier_visualize.Rd
+0d9c7f49ca27b0350761f55ecf5bd6cd *man/is_data_proc_elem.Rd
+400c8b84468856d30218f9421463ad65 *man/is_dataschema.Rd
+eda8fb22078d83d58e0622cd34e4c470 *man/is_dataschema_mlstr.Rd
+c6f092b26dcf223382e55fa8801cf3b7 *man/pooled_harmonized_dataset_create.Rd
+0d4d180b940b2f09ff608528e539a9fa *man/reexports.Rd
+e5455210434ca6b0035b1ddb29c135b6 *man/show_harmo_error.Rd
+651e86fc34c5a870430596950bac697e *vignettes/a-Glossary-and-templates.Rmd
+dd565779c4eed1cb36ed7bfd60a720e5 *vignettes/b-Data-processing-elements.Rmd
+8c0f211c628d343e5a8f7008cd30491d *vignettes/c-Example-with-Rmonize_DEMO.Rmd
+df77331e292754e7d823c16b5c16cf9e *vignettes/datatables.R
diff --git a/NAMESPACE b/NAMESPACE
@@ -0,0 +1,74 @@
+# Generated by roxygen2: do not edit by hand
+
+export(Rmonize_help)
+export(Rmonize_templates)
+export(as_data_dict)
+export(as_data_proc_elem)
+export(as_dataschema)
+export(as_dataschema_mlstr)
+export(as_dataset)
+export(as_dossier)
+export(as_harmonized_dossier)
+export(bookdown_open)
+export(data_dict_apply)
+export(data_dict_evaluate)
+export(data_dict_extract)
+export(dataschema_evaluate)
+export(dataschema_extract)
+export(dataset_evaluate)
+export(dataset_summarize)
+export(dataset_visualize)
+export(dossier_create)
+export(dossier_evaluate)
+export(dossier_summarize)
+export(harmo_process)
+export(harmonized_dossier_evaluate)
+export(harmonized_dossier_summarize)
+export(harmonized_dossier_visualize)
+export(is_data_proc_elem)
+export(is_dataschema)
+export(is_dataschema_mlstr)
+export(pooled_harmonized_dataset_create)
+export(show_harmo_error)
+import(dplyr)
+import(fabR)
+import(fs)
+import(haven)
+import(stringr)
+import(tidyr)
+importFrom(crayon,bold)
+importFrom(crayon,green)
+importFrom(madshapR,as_category)
+importFrom(madshapR,as_data_dict)
+importFrom(madshapR,as_data_dict_mlstr)
+importFrom(madshapR,as_dataset)
+importFrom(madshapR,as_dossier)
+importFrom(madshapR,as_taxonomy)
+importFrom(madshapR,as_valueType)
+importFrom(madshapR,bookdown_open)
+importFrom(madshapR,col_id)
+importFrom(madshapR,data_dict_apply)
+importFrom(madshapR,data_dict_evaluate)
+importFrom(madshapR,data_dict_extract)
+importFrom(madshapR,data_dict_filter)
+importFrom(madshapR,data_extract)
+importFrom(madshapR,dataset_evaluate)
+importFrom(madshapR,dataset_summarize)
+importFrom(madshapR,dataset_visualize)
+importFrom(madshapR,dataset_zap_data_dict)
+importFrom(madshapR,dossier_create)
+importFrom(madshapR,dossier_evaluate)
+importFrom(madshapR,dossier_summarize)
+importFrom(madshapR,is_category)
+importFrom(madshapR,is_data_dict)
+importFrom(madshapR,is_data_dict_mlstr)
+importFrom(madshapR,is_dataset)
+importFrom(madshapR,is_dossier)
+importFrom(madshapR,is_taxonomy)
+importFrom(madshapR,valueType_adjust)
+importFrom(rlang,":=")
+importFrom(rlang,.data)
+importFrom(rlang,is_error)
+importFrom(rlang,is_warning)
+importFrom(utils,browseURL)
+importFrom(utils,capture.output)
diff --git a/NEWS.md b/NEWS.md
@@ -0,0 +1,152 @@
+
+# Rmonize 1.0.1
+
+Bug corrections and enhancements after testing with real data.
+
+## Bug fixes and improvements
+
+### Improvement in handling pooled data
+
+The functions `harmo_process()`, `pool_harmonized_dataset_create()`,
+`harmonized_dossier_create()`, `harmonized_dossier_evaluate()`,
+`harmonized_dossier_summarize()`, `harmonized_dossier_visualize()` share
+the same parameter “harmonized_col_dataset” which is (if exists) the
+name of the column referring the input dataset names. If this column
+exists and is declared by the user, this will be used across the
+pipeline as a grouping/separating variable. By default, the name of each
+dataset will be used instead.
+
+rename DEMO_file_harmo into Rmonize_DEMO and update examples
+
+suppress the parameter overwrite = TRUE in the functions xxx_visualize()
+
+- <https://github.com/maelstrom-research/Rmonize/issues/38>
+
+in visual reports, void confusing changes in color scheme in visual
+reports.
+
+- <https://github.com/maelstrom-research/Rmonize/issues/37>
+
+Histograms for date variables display valid ranges.
+
+- <https://github.com/maelstrom-research/Rmonize/issues/31>
+
+in reports, change % NA as proportion in reports.
+
+- <https://github.com/maelstrom-research/Rmonize/issues/29>
+
+`harmonized_dossier_visualize()` report shows variable labels in the
+same language.
+
+- <https://github.com/maelstrom-research/Rmonize/issues/28>
+
+put id_creation in script and in rule in dpe (as in direct_mapping)
+
+- <https://github.com/maelstrom-research/Rmonize/issues/27>
+
+Allow special characters in names of datasets and data_dicts
+
+- <https://github.com/maelstrom-research/Rmonize/issues/23>
+
+In visual reports, the bar plot only appears when there are multiple
+missing value types, otherwise only the pie chart is shown.
+
+- <https://github.com/maelstrom-research/Rmonize/issues/22>
+
+enhance harmonized_dossier_visualize() output
+
+- <https://github.com/maelstrom-research/Rmonize/issues/17>
+
+enhance `show_harmo_error()` output
+
+- <https://github.com/maelstrom-research/Rmonize/issues/5>
+
+in reports, all of the percentages are now included under “Other values
+(non categorical)”, which gives a single value.
+
+- <https://github.com/maelstrom-research/Rmonize/issues/4>
+
+Function recode with special character is possible now
+
+# Rmonize 1.0.0
+
+Functions to support rigorous retrospective data harmonization
+processing, evaluation, and documentation across datasets in a dossier
+based on Maelstrom Research guidelines. The package includes the core
+functions to evaluate and format the main inputs that define the
+harmonization process, apply specified processing rules to generate
+harmonized data, diagnose processing errors, and summarize and evaluate
+harmonized outputs.
+
+This is still a work in progress, so please let us know if you used a
+function before and is not working any longer.
+
+## Helper functions and objects
+
+- `Rmonize_help()` Call the help center for full documentation
+- `dowload_templates()` Call the help center to the download template
+  page
+- `Rmonize_DEMO` Built-in material allowing the user to test the package
+  with demo data
+
+## Assess and manipulate input files
+
+- `as_data_proc_elem()` Validate and coerce any object as a Data
+  Processing Elements
+- `as_dataschema()`, `as_dataschema_mlstr()` Validate and coerce any
+  object as the DataSchema
+- `as_harmonized_dossier()` Validate and coerce any object as an
+  harmonized dossier
+- `dataschema_extract()` Extract and create the DataSchema from a data
+  processing elements
+
+## Data processing
+
+- `harmo_process()` Generate harmonized dataset(s) and annotated Data
+  Processing Elements. This function internally runs other functions,
+  which are :
+
+- `harmo_parse_process_rule()`,
+  `harmo_process_add_variable()`,`harmo_process_case_when()`,
+  `harmo_process_direct_mapping()`,`harmo_process_id_creation()`,
+  `harmo_process_impossible()`,`harmo_process_merge_variable()`,
+  `harmo_process_operation()`,`harmo_process_other()`,
+  `harmo_process_paste()`,`harmo_process_recode()`,
+  `harmo_process_rename()`,`harmo_process_undetermined()`
+
+- `pooled_harmonized_dataset_create()` Generate the pooled dataset from
+  harmonized datasets in a dossier
+
+## Evaluation of the harmonization process
+
+- `show_harmo_error()` Generate a summary of the annotated Data
+  Processing Elements
+- `data_proc_elem_evaluate()`,`dataschema_evaluate()`,
+  `harmonized_dossier_evaluate()`,`harmonized_dossier_summarize()`,
+  `harmonized_dossier_visualize()` Generate a quality assessment reports
+  and summary statistics of inputs and outputs.
+
+## import from madshapR package:
+
+- Shape and prepare input (datasets and data dictionaries) :
+
+`as_data_dict()`,`is_data_dict()`,
+`as_data_dict_mlstr()`,`is_data_dict_mlstr()`,
+`as_dataset()`,`is_dataset()`, `as_dossier()`,`is_dossier()`,
+`as_taxonomy()`
+
+- Extract and manipulate information from input :
+
+`data_extract()`,`data_dict_extract()`,
+`data_dict_apply()`,`dataset_zap_data_dict()`,`dossier_create()`
+`valueType_adjust()`
+
+- Assess input data :
+
+`dataset_evaluate()`, `data_dict_evaluate()`,`dossier_evaluate()`,
+`dataset_summarize()`,`dossier_summarize()`
+
+- Visualize input data :
+
+`bookdown_template()`,`bookdown_render()`,`bookdown_open()`,
+`dataset_visualize()`