This repository was archived by the owner on Mar 28, 2025. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 2
Home
echang97 edited this page Jul 25, 2019
·
60 revisions
This is the wiki page for mapping out the Data Quality scripts
- Compare two Excel files to determine if data was added, changed, or deleted
- Determine if an Excel file follows its predefined format
- Check if numbers are in line with older records
cd Documents/GitHub/Data-Quality-Checker/
- Credit to Matthew Kudija for the Source Code
- Highlights differences between two Excel files
- Exports file with highlighted differences
Running Excel Diff through Terminal: python diff.py old.xlsx new.xlsx
python diff.py ../files/monthly_production_05-2019 ../files/monthly_production_06-2019.xlsx
- A Python script that accepts an Excel file
- Creates formats based on sample Excel files
- Checks given Excel file for:
- New or Missing Field Names
- Unexpected or missing Field Entries
- Unexpected Units of measurement or New items
- Number of Withheld rows
Setup only needs to be run once per file type (e.g. Monthly Production, Federal Production CY), edit the json file when adding new things
Running Setup through Terminal: python formatcheck.py setup filename.xlsx
// Note: The following path is in quotes because white-space is present
python formatcheck.py setup '../files/monthly_revenue_05-2019 (1).xlsx'
Running Format Check through Terminal: python formatcheck.py filename.xlsx
python formatcheck.py ../files/monthly_revenue_05-2019.xlsx
- Grouping determines default thresholds
- Editable json files for thresholds
Running Setup through Terminal: python numberchecker.py setup filename.xlsx
python numberchecker.py setup '../files/monthly_revenue_05-2019 (1).xlsx'
Running Number Check through Terminal: python numberchecker.py filename.xlsx
python numberchecker.py ../files/monthly_revenue_05-2019.xlsx