Deduplicate and parse list of `dirty names'
Python
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
.gitignore
.travis.yml
ReadMe.md
appveyor.yml
names.py
process_names.py
requirements.txt
sample_input.csv
sample_output.csv
setup.cfg
setup.py

ReadMe.md

Clean Names

Build Status Build status

The script takes a csv file with column 'Name' containing 'dirty names' --- names with all different formats: lastname firstname, firstname lastname, middlename lastname firstname etc. (see sample input file). And it produces a csv file that has all the columns of the original csv file and the following columns: 'uniqid', 'FirstName', 'MiddleInitial/Name', 'LastName', 'RomanNumeral', 'Title', 'Suffix'. The script takes out duplicate names by default (see sample output file).

Application

The script was used to fix names in CF-Scores from Database on Ideology, Money in Politics, and Elections. Processed database with clean names posted on Harvard DVN.

Installation

  1. Clone this repository

git clone https://github.com/soodoku/clean-names.git

  1. Navigate to clean-names

  2. Run python setup.py install

Using Clean Names

Usage: process_names.py [options]

Command Line Options

 	-h, 	    --help show this help message and exit  
 	-o OUTFILE, --out=OUTFILE  
                  	Output file in CSV (default: sample_output.csv)  
  -c COLUMN,  --column=COLUMN  
                  	Column name in CSV that contains Names (default: Name)    
   -a, 	    --all      	
    			Export all names (do not take duplicate names out)  (default: False)  

Example

 python process_names.py -a sample_input.csv 

License

Scripts are released under the MIT License