Skip to content

erictong0/remice

Repository files navigation

remice remice website

Have your collaborators been remiss in their data collection?

Have you experienced characters in your numeric columns?

Do rows ever get incorrectly entered or swapped?

The overarching goal of the remice package is to account for remiss data, providing a way to identify and solve those problems.

Installation

You can install the development version of remice like so:

install_github("erictong2/remice")

If you prefer, you can also go to the Github website (https://github.com/EricTong2/remice/) and download the package from there.

Function List

These are the functions included in the package:

  • analyze_type() takes a column of characters and determines if any values in that row are not of a specified type. While this does not directly fix the types of the data, it creates a table to highlight areas in which the data could cause errors for other functions, allowing for those rows to be manually fixed.

  • table_different() takes a column of characters and creates a table of strings that are uncommonly found in a dataset. This is aimed to mitigate various typos found by either entering data or from survey responses. For example, if all of the column values are “True” except for a single “Ture”, it is likely for a typo to have occurred. This function does not fix any problems, but it highlights the row numbers where the data should be checked.

  • separate_list() takes a column of characters and creates a table of the most common responses when multiple responses could exist in one row. For example, if two of the responses are “Apple, Banana, and Orange” and “Banana, Orange”, this would output a table of “Apple: 1, Banana: 2, Orange: 2.” This can be used on a survey about comfort foods, where a response might be

  • plot_outliers() takes multiple columns of numeric data that was longitudinally collected and checks for any abnormally large changes in the data. This could be used to check for potential outliers or areas where the data was mis-input: for example, recording “11” grams instead of “21” grams. In addition, the user inputs a percent change, and any points that have a change greater than the input for any two points would be plotted in a graph.

Data List

  • food_data is a survey of 125 individuals from Mercyhurst University. There are three columns, all with 125 character entries:
    • comfort_food is a list of students’ comfort foods
    • food_childhood is a list of students’ childhood foods
    • meals_dinner_friend is a list of foods eaten when friends came over
  • mouse_data_birth is a dataset from 32 mice that has four columns, describing the mouse ID, Sex, Number, and the Treatment type
  • mouse_data_bw_dirty is a dataset from the same 32 mice that has seven columns, the first being the same ID as mouse_data_birth. The other six columns describe three dates over which the data was collected alternating between the Body Weight of the mouse and the Date of which the Body Weight was collected
  • mouse_data_bw_clean provides the same information as mouse_data_bw_dirty but includes cleaned data to provide examples of functions.

Example

This is an example of separate_list() using the Comfort Food column from food_data

library(remice)

separate_list(food_data, "comfort_food", 5)
<style>#pmnldzlgzi table { font-family: system-ui, 'Segoe UI', Roboto, Helvetica, Arial, sans-serif, 'Apple Color Emoji', 'Segoe UI Emoji', 'Segoe UI Symbol', 'Noto Color Emoji'; -webkit-font-smoothing: antialiased; -moz-osx-font-smoothing: grayscale; } #pmnldzlgzi thead, #pmnldzlgzi tbody, #pmnldzlgzi tfoot, #pmnldzlgzi tr, #pmnldzlgzi td, #pmnldzlgzi th { border-style: none; } #pmnldzlgzi p { margin: 0; padding: 0; } #pmnldzlgzi .gt_table { display: table; border-collapse: collapse; line-height: normal; margin-left: auto; margin-right: auto; color: #333333; font-size: 16px; font-weight: normal; font-style: normal; background-color: #FFFFFF; width: auto; border-top-style: solid; border-top-width: 3px; border-top-color: #D3D3D3; border-right-style: solid; border-right-width: 3px; border-right-color: #D3D3D3; border-bottom-style: solid; border-bottom-width: 3px; border-bottom-color: #D3D3D3; border-left-style: solid; border-left-width: 3px; border-left-color: #D3D3D3; } #pmnldzlgzi .gt_caption { padding-top: 4px; padding-bottom: 4px; } #pmnldzlgzi .gt_title { color: #333333; font-size: 125%; font-weight: initial; padding-top: 4px; padding-bottom: 4px; padding-left: 5px; padding-right: 5px; border-bottom-color: #FFFFFF; border-bottom-width: 0; } #pmnldzlgzi .gt_subtitle { color: #333333; font-size: 85%; font-weight: initial; padding-top: 3px; padding-bottom: 5px; padding-left: 5px; padding-right: 5px; border-top-color: #FFFFFF; border-top-width: 0; } #pmnldzlgzi .gt_heading { background-color: #FFFFFF; text-align: center; border-bottom-color: #FFFFFF; border-left-style: none; border-left-width: 1px; border-left-color: #D3D3D3; border-right-style: none; border-right-width: 1px; border-right-color: #D3D3D3; } #pmnldzlgzi .gt_bottom_border { border-bottom-style: solid; border-bottom-width: 2px; border-bottom-color: #D3D3D3; } #pmnldzlgzi .gt_col_headings { border-top-style: solid; border-top-width: 2px; border-top-color: #D3D3D3; border-bottom-style: solid; border-bottom-width: 2px; border-bottom-color: #D3D3D3; border-left-style: none; border-left-width: 1px; border-left-color: #D3D3D3; border-right-style: none; border-right-width: 1px; border-right-color: #D3D3D3; } #pmnldzlgzi .gt_col_heading { color: #333333; background-color: #FFFFFF; font-size: 100%; font-weight: normal; text-transform: inherit; border-left-style: none; border-left-width: 1px; border-left-color: #D3D3D3; border-right-style: none; border-right-width: 1px; border-right-color: #D3D3D3; vertical-align: bottom; padding-top: 5px; padding-bottom: 6px; padding-left: 5px; padding-right: 5px; overflow-x: hidden; } #pmnldzlgzi .gt_column_spanner_outer { color: #333333; background-color: #FFFFFF; font-size: 100%; font-weight: normal; text-transform: inherit; padding-top: 0; padding-bottom: 0; padding-left: 4px; padding-right: 4px; } #pmnldzlgzi .gt_column_spanner_outer:first-child { padding-left: 0; } #pmnldzlgzi .gt_column_spanner_outer:last-child { padding-right: 0; } #pmnldzlgzi .gt_column_spanner { border-bottom-style: solid; border-bottom-width: 2px; border-bottom-color: #D3D3D3; vertical-align: bottom; padding-top: 5px; padding-bottom: 5px; overflow-x: hidden; display: inline-block; width: 100%; } #pmnldzlgzi .gt_spanner_row { border-bottom-style: hidden; } #pmnldzlgzi .gt_group_heading { padding-top: 8px; padding-bottom: 8px; padding-left: 5px; padding-right: 5px; color: #333333; background-color: #FFFFFF; font-size: 100%; font-weight: initial; text-transform: inherit; border-top-style: solid; border-top-width: 2px; border-top-color: #D3D3D3; border-bottom-style: solid; border-bottom-width: 2px; border-bottom-color: #D3D3D3; border-left-style: none; border-left-width: 1px; border-left-color: #D3D3D3; border-right-style: none; border-right-width: 1px; border-right-color: #D3D3D3; vertical-align: middle; text-align: left; } #pmnldzlgzi .gt_empty_group_heading { padding: 0.5px; color: #333333; background-color: #FFFFFF; font-size: 100%; font-weight: initial; border-top-style: solid; border-top-width: 2px; border-top-color: #D3D3D3; border-bottom-style: solid; border-bottom-width: 2px; border-bottom-color: #D3D3D3; vertical-align: middle; } #pmnldzlgzi .gt_from_md > :first-child { margin-top: 0; } #pmnldzlgzi .gt_from_md > :last-child { margin-bottom: 0; } #pmnldzlgzi .gt_row { padding-top: 8px; padding-bottom: 8px; padding-left: 5px; padding-right: 5px; margin: 10px; border-top-style: solid; border-top-width: 1px; border-top-color: #D3D3D3; border-left-style: none; border-left-width: 1px; border-left-color: #D3D3D3; border-right-style: none; border-right-width: 1px; border-right-color: #D3D3D3; vertical-align: middle; overflow-x: hidden; } #pmnldzlgzi .gt_stub { color: #333333; background-color: #FFFFFF; font-size: 100%; font-weight: initial; text-transform: inherit; border-right-style: solid; border-right-width: 2px; border-right-color: #D3D3D3; padding-left: 5px; padding-right: 5px; } #pmnldzlgzi .gt_stub_row_group { color: #333333; background-color: #FFFFFF; font-size: 100%; font-weight: initial; text-transform: inherit; border-right-style: solid; border-right-width: 2px; border-right-color: #D3D3D3; padding-left: 5px; padding-right: 5px; vertical-align: top; } #pmnldzlgzi .gt_row_group_first td { border-top-width: 2px; } #pmnldzlgzi .gt_row_group_first th { border-top-width: 2px; } #pmnldzlgzi .gt_summary_row { color: #333333; background-color: #FFFFFF; text-transform: inherit; padding-top: 8px; padding-bottom: 8px; padding-left: 5px; padding-right: 5px; } #pmnldzlgzi .gt_first_summary_row { border-top-style: solid; border-top-color: #D3D3D3; } #pmnldzlgzi .gt_first_summary_row.thick { border-top-width: 2px; } #pmnldzlgzi .gt_last_summary_row { padding-top: 8px; padding-bottom: 8px; padding-left: 5px; padding-right: 5px; border-bottom-style: solid; border-bottom-width: 2px; border-bottom-color: #D3D3D3; } #pmnldzlgzi .gt_grand_summary_row { color: #333333; background-color: #FFFFFF; text-transform: inherit; padding-top: 8px; padding-bottom: 8px; padding-left: 5px; padding-right: 5px; } #pmnldzlgzi .gt_first_grand_summary_row { padding-top: 8px; padding-bottom: 8px; padding-left: 5px; padding-right: 5px; border-top-style: double; border-top-width: 6px; border-top-color: #D3D3D3; } #pmnldzlgzi .gt_last_grand_summary_row_top { padding-top: 8px; padding-bottom: 8px; padding-left: 5px; padding-right: 5px; border-bottom-style: double; border-bottom-width: 6px; border-bottom-color: #D3D3D3; } #pmnldzlgzi .gt_striped { background-color: rgba(128, 128, 128, 0.05); } #pmnldzlgzi .gt_table_body { border-top-style: solid; border-top-width: 2px; border-top-color: #D3D3D3; border-bottom-style: solid; border-bottom-width: 2px; border-bottom-color: #D3D3D3; } #pmnldzlgzi .gt_footnotes { color: #333333; background-color: #FFFFFF; border-bottom-style: none; border-bottom-width: 2px; border-bottom-color: #D3D3D3; border-left-style: none; border-left-width: 2px; border-left-color: #D3D3D3; border-right-style: none; border-right-width: 2px; border-right-color: #D3D3D3; } #pmnldzlgzi .gt_footnote { margin: 0px; font-size: 90%; padding-top: 4px; padding-bottom: 4px; padding-left: 5px; padding-right: 5px; } #pmnldzlgzi .gt_sourcenotes { color: #333333; background-color: #FFFFFF; border-bottom-style: none; border-bottom-width: 2px; border-bottom-color: #D3D3D3; border-left-style: none; border-left-width: 2px; border-left-color: #D3D3D3; border-right-style: none; border-right-width: 2px; border-right-color: #D3D3D3; } #pmnldzlgzi .gt_sourcenote { font-size: 90%; padding-top: 4px; padding-bottom: 4px; padding-left: 5px; padding-right: 5px; } #pmnldzlgzi .gt_left { text-align: left; } #pmnldzlgzi .gt_center { text-align: center; } #pmnldzlgzi .gt_right { text-align: right; font-variant-numeric: tabular-nums; } #pmnldzlgzi .gt_font_normal { font-weight: normal; } #pmnldzlgzi .gt_font_bold { font-weight: bold; } #pmnldzlgzi .gt_font_italic { font-style: italic; } #pmnldzlgzi .gt_super { font-size: 65%; } #pmnldzlgzi .gt_footnote_marks { font-size: 75%; vertical-align: 0.4em; position: initial; } #pmnldzlgzi .gt_asterisk { font-size: 100%; vertical-align: 0; } #pmnldzlgzi .gt_indent_1 { text-indent: 5px; } #pmnldzlgzi .gt_indent_2 { text-indent: 10px; } #pmnldzlgzi .gt_indent_3 { text-indent: 15px; } #pmnldzlgzi .gt_indent_4 { text-indent: 20px; } #pmnldzlgzi .gt_indent_5 { text-indent: 25px; } #pmnldzlgzi .katex-display { display: inline-flex !important; margin-bottom: 0.75em !important; } #pmnldzlgzi div.Reactable > div.rt-table > div.rt-thead > div.rt-tr.rt-tr-group-header > div.rt-th-group:after { height: 0px !important; } </style>
Separated Word Frequencies
Word Frequency
ice cream 45
pizza 37
chips 26
chocolate 25
cookies 17
cheese 16
mac 11
pasta 9
cake 7
candy 7
french fries 7
popcorn 7
pretzels 5

This is an example of plot_outliers() using the mouse_data_bw_clean dataset:

library(remice)

table_different(mouse_data_birth, "Treatment", 5)
<style>#ozslihfktx table { font-family: system-ui, 'Segoe UI', Roboto, Helvetica, Arial, sans-serif, 'Apple Color Emoji', 'Segoe UI Emoji', 'Segoe UI Symbol', 'Noto Color Emoji'; -webkit-font-smoothing: antialiased; -moz-osx-font-smoothing: grayscale; } #ozslihfktx thead, #ozslihfktx tbody, #ozslihfktx tfoot, #ozslihfktx tr, #ozslihfktx td, #ozslihfktx th { border-style: none; } #ozslihfktx p { margin: 0; padding: 0; } #ozslihfktx .gt_table { display: table; border-collapse: collapse; line-height: normal; margin-left: auto; margin-right: auto; color: #333333; font-size: 16px; font-weight: normal; font-style: normal; background-color: #FFFFFF; width: auto; border-top-style: solid; border-top-width: 3px; border-top-color: #D3D3D3; border-right-style: solid; border-right-width: 3px; border-right-color: #D3D3D3; border-bottom-style: solid; border-bottom-width: 3px; border-bottom-color: #D3D3D3; border-left-style: solid; border-left-width: 3px; border-left-color: #D3D3D3; } #ozslihfktx .gt_caption { padding-top: 4px; padding-bottom: 4px; } #ozslihfktx .gt_title { color: #333333; font-size: 125%; font-weight: initial; padding-top: 4px; padding-bottom: 4px; padding-left: 5px; padding-right: 5px; border-bottom-color: #FFFFFF; border-bottom-width: 0; } #ozslihfktx .gt_subtitle { color: #333333; font-size: 85%; font-weight: initial; padding-top: 3px; padding-bottom: 5px; padding-left: 5px; padding-right: 5px; border-top-color: #FFFFFF; border-top-width: 0; } #ozslihfktx .gt_heading { background-color: #FFFFFF; text-align: center; border-bottom-color: #FFFFFF; border-left-style: none; border-left-width: 1px; border-left-color: #D3D3D3; border-right-style: none; border-right-width: 1px; border-right-color: #D3D3D3; } #ozslihfktx .gt_bottom_border { border-bottom-style: solid; border-bottom-width: 2px; border-bottom-color: #D3D3D3; } #ozslihfktx .gt_col_headings { border-top-style: solid; border-top-width: 2px; border-top-color: #D3D3D3; border-bottom-style: solid; border-bottom-width: 2px; border-bottom-color: #D3D3D3; border-left-style: none; border-left-width: 1px; border-left-color: #D3D3D3; border-right-style: none; border-right-width: 1px; border-right-color: #D3D3D3; } #ozslihfktx .gt_col_heading { color: #333333; background-color: #FFFFFF; font-size: 100%; font-weight: normal; text-transform: inherit; border-left-style: none; border-left-width: 1px; border-left-color: #D3D3D3; border-right-style: none; border-right-width: 1px; border-right-color: #D3D3D3; vertical-align: bottom; padding-top: 5px; padding-bottom: 6px; padding-left: 5px; padding-right: 5px; overflow-x: hidden; } #ozslihfktx .gt_column_spanner_outer { color: #333333; background-color: #FFFFFF; font-size: 100%; font-weight: normal; text-transform: inherit; padding-top: 0; padding-bottom: 0; padding-left: 4px; padding-right: 4px; } #ozslihfktx .gt_column_spanner_outer:first-child { padding-left: 0; } #ozslihfktx .gt_column_spanner_outer:last-child { padding-right: 0; } #ozslihfktx .gt_column_spanner { border-bottom-style: solid; border-bottom-width: 2px; border-bottom-color: #D3D3D3; vertical-align: bottom; padding-top: 5px; padding-bottom: 5px; overflow-x: hidden; display: inline-block; width: 100%; } #ozslihfktx .gt_spanner_row { border-bottom-style: hidden; } #ozslihfktx .gt_group_heading { padding-top: 8px; padding-bottom: 8px; padding-left: 5px; padding-right: 5px; color: #333333; background-color: #FFFFFF; font-size: 100%; font-weight: initial; text-transform: inherit; border-top-style: solid; border-top-width: 2px; border-top-color: #D3D3D3; border-bottom-style: solid; border-bottom-width: 2px; border-bottom-color: #D3D3D3; border-left-style: none; border-left-width: 1px; border-left-color: #D3D3D3; border-right-style: none; border-right-width: 1px; border-right-color: #D3D3D3; vertical-align: middle; text-align: left; } #ozslihfktx .gt_empty_group_heading { padding: 0.5px; color: #333333; background-color: #FFFFFF; font-size: 100%; font-weight: initial; border-top-style: solid; border-top-width: 2px; border-top-color: #D3D3D3; border-bottom-style: solid; border-bottom-width: 2px; border-bottom-color: #D3D3D3; vertical-align: middle; } #ozslihfktx .gt_from_md > :first-child { margin-top: 0; } #ozslihfktx .gt_from_md > :last-child { margin-bottom: 0; } #ozslihfktx .gt_row { padding-top: 8px; padding-bottom: 8px; padding-left: 5px; padding-right: 5px; margin: 10px; border-top-style: solid; border-top-width: 1px; border-top-color: #D3D3D3; border-left-style: none; border-left-width: 1px; border-left-color: #D3D3D3; border-right-style: none; border-right-width: 1px; border-right-color: #D3D3D3; vertical-align: middle; overflow-x: hidden; } #ozslihfktx .gt_stub { color: #333333; background-color: #FFFFFF; font-size: 100%; font-weight: initial; text-transform: inherit; border-right-style: solid; border-right-width: 2px; border-right-color: #D3D3D3; padding-left: 5px; padding-right: 5px; } #ozslihfktx .gt_stub_row_group { color: #333333; background-color: #FFFFFF; font-size: 100%; font-weight: initial; text-transform: inherit; border-right-style: solid; border-right-width: 2px; border-right-color: #D3D3D3; padding-left: 5px; padding-right: 5px; vertical-align: top; } #ozslihfktx .gt_row_group_first td { border-top-width: 2px; } #ozslihfktx .gt_row_group_first th { border-top-width: 2px; } #ozslihfktx .gt_summary_row { color: #333333; background-color: #FFFFFF; text-transform: inherit; padding-top: 8px; padding-bottom: 8px; padding-left: 5px; padding-right: 5px; } #ozslihfktx .gt_first_summary_row { border-top-style: solid; border-top-color: #D3D3D3; } #ozslihfktx .gt_first_summary_row.thick { border-top-width: 2px; } #ozslihfktx .gt_last_summary_row { padding-top: 8px; padding-bottom: 8px; padding-left: 5px; padding-right: 5px; border-bottom-style: solid; border-bottom-width: 2px; border-bottom-color: #D3D3D3; } #ozslihfktx .gt_grand_summary_row { color: #333333; background-color: #FFFFFF; text-transform: inherit; padding-top: 8px; padding-bottom: 8px; padding-left: 5px; padding-right: 5px; } #ozslihfktx .gt_first_grand_summary_row { padding-top: 8px; padding-bottom: 8px; padding-left: 5px; padding-right: 5px; border-top-style: double; border-top-width: 6px; border-top-color: #D3D3D3; } #ozslihfktx .gt_last_grand_summary_row_top { padding-top: 8px; padding-bottom: 8px; padding-left: 5px; padding-right: 5px; border-bottom-style: double; border-bottom-width: 6px; border-bottom-color: #D3D3D3; } #ozslihfktx .gt_striped { background-color: rgba(128, 128, 128, 0.05); } #ozslihfktx .gt_table_body { border-top-style: solid; border-top-width: 2px; border-top-color: #D3D3D3; border-bottom-style: solid; border-bottom-width: 2px; border-bottom-color: #D3D3D3; } #ozslihfktx .gt_footnotes { color: #333333; background-color: #FFFFFF; border-bottom-style: none; border-bottom-width: 2px; border-bottom-color: #D3D3D3; border-left-style: none; border-left-width: 2px; border-left-color: #D3D3D3; border-right-style: none; border-right-width: 2px; border-right-color: #D3D3D3; } #ozslihfktx .gt_footnote { margin: 0px; font-size: 90%; padding-top: 4px; padding-bottom: 4px; padding-left: 5px; padding-right: 5px; } #ozslihfktx .gt_sourcenotes { color: #333333; background-color: #FFFFFF; border-bottom-style: none; border-bottom-width: 2px; border-bottom-color: #D3D3D3; border-left-style: none; border-left-width: 2px; border-left-color: #D3D3D3; border-right-style: none; border-right-width: 2px; border-right-color: #D3D3D3; } #ozslihfktx .gt_sourcenote { font-size: 90%; padding-top: 4px; padding-bottom: 4px; padding-left: 5px; padding-right: 5px; } #ozslihfktx .gt_left { text-align: left; } #ozslihfktx .gt_center { text-align: center; } #ozslihfktx .gt_right { text-align: right; font-variant-numeric: tabular-nums; } #ozslihfktx .gt_font_normal { font-weight: normal; } #ozslihfktx .gt_font_bold { font-weight: bold; } #ozslihfktx .gt_font_italic { font-style: italic; } #ozslihfktx .gt_super { font-size: 65%; } #ozslihfktx .gt_footnote_marks { font-size: 75%; vertical-align: 0.4em; position: initial; } #ozslihfktx .gt_asterisk { font-size: 100%; vertical-align: 0; } #ozslihfktx .gt_indent_1 { text-indent: 5px; } #ozslihfktx .gt_indent_2 { text-indent: 10px; } #ozslihfktx .gt_indent_3 { text-indent: 15px; } #ozslihfktx .gt_indent_4 { text-indent: 20px; } #ozslihfktx .gt_indent_5 { text-indent: 25px; } #ozslihfktx .katex-display { display: inline-flex !important; margin-bottom: 0.75em !important; } #ozslihfktx div.Reactable > div.rt-table > div.rt-thead > div.rt-tr.rt-tr-group-header > div.rt-th-group:after { height: 0px !important; } </style>
Inconsistent Strings
Row Number String
11 Placobo
20 Trmt

About

Cleaning Mice Data

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors