This repository was archived by the owner on Mar 28, 2025. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 2
Home
echang97 edited this page Jun 27, 2019
·
60 revisions
This is the wiki page for mapping out the Data Quality scripts
- Compare two Excel files to determine if data was added and/or deleted
- Determine if an Excel file follows its predefined format
- Check if numbers are in line with older records
Maybe make a GUI to make it more User Friendly?
cd Documents/GitHub/Data-Quality-Checker/scripts
- Credit to Matthew Kudija for the Source Code
- Highlights differences between two Excel files
- Exports file with highlighted differences
Running Excel Diff through Terminal: python diff.py old.xlsx new.xlsx
python diff.py ../files/monthly_production_05-2019 ../files/monthly_production_06-2019.xlsx
- A Python script that accepts an Excel file
- Creates formats based on sample Excel files
- Checks given Excel file for:
- New or Missing Fields
- Unexpected Units of measurement or New items
- Number of Withheld rows
Running Setup through Terminal: python formatcheck.py setup filename.xlsx
python formatcheck.py setup '../files/monthly_revenue_05-2019 (1).xlsx'
Running Format Check through Terminal: python formatcheck.py filename.xlsx
python formatcheck.py ../files/monthly_revenue_05-2019.xlsx
- Similar to Format Checker
- Exports an Excel File highlighting anomolies
- Can use same config files as formatcheck.py
Running Highlighter through Terminal: python highlighter.py "filename.xlsx"