## Import packages

In [1]:
import pandas as pd

## Read in Antelope file
Since all the fields with numbers are year or address data, we won't be doing any math. It's safe to read in everything as a string with [read_csv](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_csv.html) option dtype.

In [None]:
antelope = pd.read_csv(filename, sep='\t', dtype=str)
antelope.head()

## Fix ZIP Codes
Some of the zips in the source data are missing zeros at the start, like in row 155. A subset of rows can be selected with [iloc](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.iloc.html).

In [None]:
antelope.iloc[[155]]

To fix this we add zeros to the front of the string to make up the difference using [zfill](https://pandas.pydata.org/docs/reference/api/pandas.Series.str.zfill.html).

In [None]:
antelope['zip'] = antelope['zip'].str.zfill(5)
antelope.iloc[[155]]

## Title Case name and address
Make all the name and address fields more readable

In [None]:
antelope['firstName'] = antelope['firstName'].str.title()
antelope['middleName'] = antelope['middleName'].str.title()
antelope['lastName'] = antelope['lastName'].str.title()
antelope['city'] = antelope['city'].str.title()
antelope['street'] = antelope['street'].str.title()
antelope.head()

## Add hunt and fish columns
All data in this file is hunting, so 'Y' for hunt and '' for fish

In [None]:
antelope['hunt'] = 'Y'
antelope['fish'] = ''
antelope.head()

## Replace NaNs with empty string
Also handle blank space in names

In [None]:
antelope = antelope.fillna('')
antelope = antelope.replace(' ','')
antelope.head()

## Drop Duplicate Records

In [None]:
antelope.shape

In [None]:
antelope.drop_duplicates(inplace=True)
antelope.shape

## Assign Record ID

In [None]:
antelope['RecordID'] = antelope.index + 1
antelope.head()

## Add Address 2 and Suffix columns
to feed ExactTrack

In [None]:
antelope['addr2'] = ''
antelope.head()

## Reorder columns
Using [reindex](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.reindex.html)

In [None]:
list(antelope.columns)

In [None]:
antelope = antelope.reindex(columns=['RecordID', 'permitYear', 'Permit Type', 'FullName', 'firstName', 'middleName', 'lastName', 'Suffix', 'street', 'addr2', 'city', 'state', 'zip', 'hunt', 'fish'])
antelope.head()