This repository was archived by the owner on Mar 28, 2025. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 2
Home
echang97 edited this page Jul 18, 2019
·
60 revisions
This is the wiki page for mapping out the Data Quality scripts
- Compare two Excel files to determine if data was added, changed, or deleted
- Determine if an Excel file follows its predefined format
- Check if numbers are in line with older records
- Use tab for auto-completing words to reduce the amount of typing
- "../" goes back a folder
- Put the path in quotes if white space is present
cd Documents/GitHub/Data-Quality-Checker/scripts
- Credit to Matthew Kudija for the Source Code
- Highlights differences between two Excel files
- Exports file with highlighted differences
Running Excel Diff through Terminal: python diff.py old.xlsx new.xlsx
python diff.py ../files/monthly_production_05-2019 ../files/monthly_production_06-2019.xlsx
- A Python script that accepts an Excel file
- Creates formats based on sample Excel files
- Checks given Excel file for:
- New or Missing Field Names
- Unexpected or missing Field Entries
- Unexpected Units of measurement or New items
- Number of Withheld rows
Setup only needs to be run once per format, edit the json file when adding new things
Running Setup through Terminal: python formatcheck.py setup filename.xlsx
python formatcheck.py setup '../files/monthly_revenue_05-2019 (1).xlsx'
Running Format Check through Terminal: python formatcheck.py filename.xlsx
python formatcheck.py ../files/monthly_revenue_05-2019.xlsx
- Similar to Format Checker
- Exports an Excel File highlighting anomalies
- Can use same config files as formatcheck.py
Running Highlighter through Terminal: python highlighter.py "filename.xlsx"