# Kitsap Street List

## Goal 

Produce a list of every street suffix which is used in Kitsap county, and the standaard USPS abbreviation for those suffixes. Present that list in a way which makes it easy for food bank volunteers to enter data in a standard way during intake.

## Sources

Name                 | description                                                              | filename               | source                 | URL 
---------------------|--------------------------------------------------------------------------|------------------------|------------------------|-------------------------------------------
List of addresses    | a list of every address in Kitsap county                                 | Property_addresses.txt | Kitsap County Assessor | https://www.kitsap.gov/assessor/Documents/Property_addresses.txt
Suffix abbreviations | a table indicating the USPS preferred abbreviations for address suffixes | suffix.htm             | USPS                   | https://pe.usps.com/text/pub28/28apc_002.htm

## Getting the suffixes

Notice that this table calls the suffix an "identifier". I will continue to use the USPS terminology.

In [1]:
import warnings
from bs4 import BeautifulSoup
import pandas as pd
import numpy as np

addresses = pd.read_csv('Property_addresses.txt', sep='\t')
addresses.head()

Unnamed: 0,ID,rp_acct_id,st_no,prefix,street_name,identifier,suffix,suite_unit,city,state,zip_code,street_addr
0,1,2327807,8184.0,NE,STATE HWY 104,,,,KINGSTON,WA,98346.0,8184 NE STATE HWY 104
1,2,1578939,5811.0,NE,TIMBERLAND,DR,,,KINGSTON,WA,98346.0,5811 NE TIMBERLAND DR
2,3,1568401,36833.0,,PINE,PL,NE,,HANSVILLE,WA,98340.0,36833 PINE PL NE
3,4,1566306,36908.0,,ASPEN,WAY,NE,,HANSVILLE,WA,98340.0,36908 ASPEN WAY NE
4,5,1567296,5933.0,NE,SPRUCE,DR,,,HANSVILLE,WA,98340.0,5933 NE SPRUCE DR


In [2]:
suffixes = set(addresses['identifier'].dropna().unique())
print(suffixes)

{'TRL', 'LOOP', 'RUN', 'TER', 'CV', 'RD', 'HWY', 'BLF', 'PKWY', 'VW', 'HL', 'AVE', 'ST', 'LN', 'CIR', 'CT', 'BLVD', 'BCH', 'ALY', 'SPUR', 'LNDG', 'WALK', 'CRST', 'PL', 'PATH', 'WAY', 'DR'}


Notice that some addresses have an empty suffix field, and the suffix appearing in the street name. These should be fixed manually.

In [3]:
missing = addresses['identifier'].isna()
misentered = addresses['street_name'][missing].unique()
misentered

array(['STATE HWY 104', 'STATE HWY 308', 'SEA VISTA', 'STATE HWY 303',
       'BIG BEEF CROSSING', 'HORIZON LANE WEST', 'HORIZON LANE EAST',
       'STATE HWY 3', 'STATE HWY 16', 'STATE HWY 305',
       'ANGELINE AVENUE SOUTH', 'THE CEDARS', 'KINGSWAY', 'RHODES END',
       'RUE VILLA', 'SOL VEI', 'STAVIS DRIVE EAST', 'STAVIS DRIVE WEST',
       'CLARK ISLAND'], dtype=object)

In [4]:
suffix_corrections = {'HWY':'HWY', 'VISTA':'VIS', 'LANE':'LN', 'CROSSING':'XING', 'AVENUE':'AVE', 'DRIVE':'DR', 'ISLAND':'IS',}

for street in misentered:
    for name, abbrev in suffix_corrections.items():
        if name in street.split():
            suffixes.add(abbrev)
            break
    else:
        warnings.warn(f'Street name "{street}" has no known suffix abbreviation')
        
print(suffixes)

{'TRL', 'LOOP', 'RUN', 'TER', 'CV', 'RD', 'HWY', 'BLF', 'VIS', 'PKWY', 'VW', 'HL', 'XING', 'AVE', 'ST', 'LN', 'CIR', 'CT', 'BLVD', 'BCH', 'ALY', 'IS', 'SPUR', 'LNDG', 'WALK', 'CRST', 'PL', 'PATH', 'WAY', 'DR'}




## Getting the standard abbreviations

This consists of parsing the table on the USPS website.

In [5]:
with open('suffix.htm', 'r') as f:
    soup = BeautifulSoup(f)

In [6]:
table = soup.find('table', attrs={'class':'Basic_no_title'})
rows = table.find_all('tr')

abbreviations = {}
for row in rows[1:]:
    cols = [ele.text.strip() for ele in row.find_all('td')]
    if len(cols) == 3:
        primary, common, standard = cols
        if standard in suffixes:
            abbreviations[standard] = {'name':primary, 'other names':set()}
    else:
        common, = cols
    if standard in suffixes and common not in [primary, standard]:
        abbreviations[standard]['other names'].add(common)
    
abbreviations 

{'ALY': {'name': 'ALLEY', 'other names': {'ALLEE', 'ALLY'}},
 'AVE': {'name': 'AVENUE',
  'other names': {'AV', 'AVEN', 'AVENU', 'AVN', 'AVNUE'}},
 'BCH': {'name': 'BEACH', 'other names': set()},
 'BLF': {'name': 'BLUFF', 'other names': {'BLUF'}},
 'BLVD': {'name': 'BOULEVARD', 'other names': {'BOUL', 'BOULV'}},
 'CIR': {'name': 'CIRCLE', 'other names': {'CIRC', 'CIRCL', 'CRCL', 'CRCLE'}},
 'CT': {'name': 'COURT', 'other names': set()},
 'CV': {'name': 'COVE', 'other names': set()},
 'CRST': {'name': 'CREST', 'other names': set()},
 'XING': {'name': 'CROSSING', 'other names': {'CRSSNG'}},
 'DR': {'name': 'DRIVE', 'other names': {'DRIV', 'DRV'}},
 'HWY': {'name': 'HIGHWAY',
  'other names': {'HIGHWY', 'HIWAY', 'HIWY', 'HWAY'}},
 'HL': {'name': 'HILL', 'other names': set()},
 'IS': {'name': 'ISLAND', 'other names': {'ISLND'}},
 'LNDG': {'name': 'LANDING', 'other names': {'LNDNG'}},
 'LN': {'name': 'LANE', 'other names': set()},
 'LOOP': {'name': 'LOOP', 'other names': {'LOOPS'}},
 'PKWY'

## Output the results

Since the "other names" column includes commas, I will use TSV format. Any spreadsheet software can read it.

In [7]:
with open('kitsap_suffixes.tsv', 'w') as f:
    f.write('Suffix\tStandard abbreviation\tOther common abbreviations\n')
    for abbrev in sorted(abbreviations):
        name = abbreviations[abbrev]['name']
        others = ', '.join(x.capitalize() for x in sorted(abbreviations[abbrev]['other names']))
        f.write('\t'.join([name.capitalize(), abbrev.capitalize(), others])+'\n')