Simple script to check CSV files before loading to DB.
This project is not using any external libraries. This makes it easy to deploy.
Main purpose to check CSV files in the most afficient way possible.
Script is tested for individual files sizes of 30MB+. 300MB is processed in about 90 seconds in the worst case scenarion. It's use case is intended for smaller than 6MB CSV files. 30 files are processed in about 10 seconds.
- Date format check
- Begin (Sunday) and End (Saturday) date checks
- Figures format check (whole and decimal)
- Retailer name check against provided list (Parameters/Accounts.csv)
- Column header check against provided list (Parameters/Headers.csv)
- EAN duplicate check
- EAN check agianst provided DB EAN list (Parameters/Ean_list.csv)
- Product description check (Always must be "ABC")
- Column number in row check. To ensure that comma or other symbols can't mess up the load.
- CSV files in "Check" folder are run throught all the checks.
- Files that pass all the checks are moved to folder "Correct".
- Files that don't pass checks stay in the "Check" folder.
- All rows are captured and added to new csv files into "Error" folder. Every row has a comment at the end what failed it.
- Terminal show which files failed and which passed. Giving totals for all figure columns for files that passes the checks.
- Windows: Add path to python.exe and path to script into batch file. This will allow to run script simply running batch file.
- Script can be ran directly from CMD and PowerShell in windows and using terminal in other systems. Please google for specifics of your system.