autostudy - clean data for swift analysis in R
The autostudy R package validate and help the analysis of files provided in the right format.
As an epidemiologist in a French public hospital part of the work is to help physician with their studies. This can be quite time consuming as the data is almost always in the worst possible shape and data management is a pain. On the other hand the analysese themselves are often simple.
This package is based on the assumption that you asked the data to be handed to you in a predefined format. The package will automatically validate the data and output logs to help the clinician to put the data in the right shape.
A template for data gathering and an explicative document has to be provided to the clinician.
convert_to_ are wrapper around common dm functions to maximise the data imported and remove some unauthorized ones.
The converted df will be checked against the original one to fine the new NAs
- a df with "table", "column", "line", "error_type"
- try to the make it as clear and concise as possible
- maybe output whole line with errors and outline the errors themselves
- Make some faulty files to add the error handling
- in template :
- impose the data types
- idem for date format, separate info int categories and date_format
- see if I can protect the raws of the var dict
- add a "pretty name variable" (for plot labels and tables)
- make a mapping table of "var_type" vs "r type" (many to one)
- juste say were it went wrong in the data importation
- after look for outliers / produce a report for validate (descriptor)
- document functions
- When the data format is finished, provide template and explicative document