Skip to content
This repository was archived by the owner on Mar 28, 2025. It is now read-only.
echang97 edited this page Jul 25, 2019 · 60 revisions

This is the wiki page for mapping out the Data Quality scripts

Goals

  • Compare two Excel files to determine if data was added, changed, or deleted
  • Determine if an Excel file follows its predefined format
  • Check if numbers are in line with older records

What Things Do and How to Run

Notes for Running:

  • Use tab for auto-completing words to reduce the amount of typing
  • "../" goes back a folder
  • Put the path in quotes if white space is present

Be sure to change the directory to where the script is located using cd

cd Documents/GitHub/Data-Quality-Checker/scripts

Excel Diff

  • Credit to Matthew Kudija for the Source Code
  • Highlights differences between two Excel files
  • Exports file with highlighted differences

Running Excel Diff through Terminal: python diff.py old.xlsx new.xlsx

python diff.py ../files/monthly_production_05-2019 ../files/monthly_production_06-2019.xlsx

Format Checker

  • A Python script that accepts an Excel file
  • Creates formats based on sample Excel files
  • Checks given Excel file for:
    • New or Missing Field Names
    • Unexpected or missing Field Entries
    • Unexpected Units of measurement or New items
    • Number of Withheld rows

Setup only needs to be run once per format, edit the json file when adding new things

Running Setup through Terminal: python formatcheck.py setup filename.xlsx

python formatcheck.py setup '../files/monthly_revenue_05-2019 (1).xlsx'

Running Format Check through Terminal: python formatcheck.py filename.xlsx

python formatcheck.py ../files/monthly_revenue_05-2019.xlsx

Number Checker

  • Grouping determines default thresholds
  • Editable json files for thresholds

Running Setup through Terminal: python numberchecker.py setup filename.xlsx

python numberchecker.py setup '../files/monthly_revenue_05-2019 (1).xlsx'

Running Number Check through Terminal: python numberchecker.py filename.xlsx

python numberchecker.py ../files/monthly_revenue_05-2019.xlsx

Clone this wiki locally