Skip to content

Based on shell scripts authored by Jeff Severns Guntzel to help data journalists use Christopher Groskopf's 'csvkit' utility library to audit a csv file without opening it, and then backup and move the the csv file to a project directory before working on it.

Notifications You must be signed in to change notification settings

chrislkeller/csv_audit_and_backup

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 

Repository files navigation

csv_audit_and_backup

This shell script is based on a pair of shell scripts authored by Jeff Severns Guntzel.

Jeff's first script helped data journalists use Christopher Groskopf's 'csvkit' library to audit a csv file without opening it.

Jeff's second script created a backup of a csv file, and then moved the backup and the original to a project directory structure.

Initial setup & Usage

  • Install csvkit using the Terminal on Linux or MacOS

      easy_install pip
      pip install csvkit
    
  • Download the csv_audit_and_backup shell script repo to your desktop

  • Unzip the repo and drag the folder to a directory in your home folder. For instance $HOME/Documents. Feel free to rename the folder to something shorter like csv_audit. There is a variable called BASEDIR in the script based on the following file path, so any changes will need to be made there.

  • Also, the script is based on Jeff's directory structure, and assumes the same:

      data_files
      	/DataInbox
      		/NewData
      	/DataFarm
    

Before Script

  • Use your Terminal's Change Directory command to enter into the csv_audit folder.

      cd $HOME/Documents/csv_audit/
    
  • Let's change into the New Data folder and list the files so we can see a sample csv file titled failed_banks.csv. We're going to use this to make sure everything works as expected.

      cd data_files/DataInbox/NewData
      ls
    
  • Let's now tell the script to act on the failed_banks.csv file. We'll run the script using the file name as a parameter

       bash $HOME/Documents/csv_audit/push_audit.sh failed_banks.csv
    
  • The script will take the csv file, audit it using csvkit, create an audit file, make a copy of the csv and move all three to a new directory in DataFarm based on the name of the csv file.

After Script

Notes & Resources


For this script to work, you must install Christopher Groskopf's csvkit.

Command Line Tutorial, via Jeff's blog

Walkthroughs on Jeff's orginal scripts are here and here.

About

Based on shell scripts authored by Jeff Severns Guntzel to help data journalists use Christopher Groskopf's 'csvkit' utility library to audit a csv file without opening it, and then backup and move the the csv file to a project directory before working on it.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages