### OpenStreetMap Case Study

#### Map Area

Cincinnati, OH, United States
</n>

(Specifically - The Neighborhood Norwood and small surrounding areas)

Having lived near Cincinnati for the majority of my life and having worked in the Norwood area for the past two years, I thought I would like to explore the data surrounding one of the neighborhoods just north of downtown.

In [1]:
### Import modules for this project
import xml.etree.cElementTree as ET
from collections import defaultdict
import re
import pprint
import string

### Problems with the data

- Inconsistently used street names
    - I.E. 'St.', 'St', and 'Str' all used for 'Street'
    - Some streets have cardinal directions as their ending (which is actually fine)
- Inconsistent capitalization
    - OH, Oh and oh used for the state of Ohio.
- Some addresses ended with a unit number
    - I.E. '#26', 'A'

Using a mapping dictionary with our audit functions, we can correct these issues relatively easily.</br>

Lets start by opening the data from our OSM file and creating our auditing dictionaries.

In [2]:
### Opening and saving OSM XML file for auditing and cleaning
osm_file = open("map.osm", encoding="utf-8")

street_type_re = re.compile(r'\b\S+\.?$', re.IGNORECASE)  

In [3]:
### Creates a dictionary of expected street types

expected = ["Street","Avenue","Boulevard","Drive","Court","Acres","Alley",
            "Place","Way","Circle","Square","Lane","Road","Trail","Parkway","Crescent","Terrace"]


### Dictionary of mapping replacements for commonly used street abbr

mapping = {"St": "Street","Str": "Street","St.": "Street",
           "Ave": "Avenue", "Ave.": "Avenue",
           "Rd.": "Road", "Rd": "Road",
           "Cir": "Circle"}

The 'expected' dictionary is used to determine if a street name is correctly displayed in the data.  If it is, that particular way will not show up in our list of troublesome street names.</br>

If it does, that's where the 'mapping' dictionary will help.  This ensures that inconsistent street names are adjusted to follow a set naming convention.</br>

Items were added to these dictionaries as tests were performed to ensure a more robust and complete set of street names.

In [4]:
### Custom classes for auditing street name data

street_types = defaultdict(set)

def audit(osm_file):
    
    for event, elem in ET.iterparse(osm_file, events=("start",)):

        if elem.tag == "node" or elem.tag == "way":
            for tag in elem.iter("tag"):
                if is_street_name(tag):
                    audit_street_type(street_types, tag.attrib['v'])
    return street_types
    pprint.pprint(street_types)
    
# Custom class for finding odd street names  
def audit_street_type(street_types, street_name):
    m = street_type_re.search(street_name)
    if m:
        street_type = m.group()
        if street_type not in expected:
            street_types[street_type].add(street_name)
            
# Custom class for correcting street names    
def fix_street(osm_file):
    street_types = audit(osm_file)
    print(street_types)
    for street_type, ways in street_types.items():
        for name in ways:
            if street_type in mapping:
                better_name = name.replace(street_type, mapping[street_type])
                print(name, "=>", better_name)
    pprint.pprint(dict(street_types))

In [5]:
### Classes for finding/improving street names

def is_street_name(elem):
    return (elem.attrib['k'] == "addr:street")


def update_street_name(name, mapping):
    
    name = string.capwords(name)
    m = street_type_re.search(name)
    
    if m:
        street_type = m.group()
        if street_type not in expected and street_type in mapping:
            name = re.sub(street_type_re, mapping[street_type], name)
    else:
        print("Odd Street Name: " % name)

In [6]:
### Creates a dictionary of tags with tag name as key and quantity of each tags as value

def count_tags(osm_file):
    tags = {}
        
    for event, elem in  ET.iterparse(osm_file): 
        if elem.tag in tags:
            tags[elem.tag] += 1
        else:
            tags[elem.tag] = 1
            
    return tags

In [7]:
### Pretty print tags

def iter_parse():

    tags = count_tags(osm_file)
    pprint.pprint(tags)

In [8]:
### Running the audit

audit(osm_file)

defaultdict(set, {'A': {'7703 Montgomery Road, Suite A'}})

Running the Audit, we see that only one additional straggler was found.  A street name that has a suite.</br>

I believe that this is nothing to worry about as this is a perfectly legitimate street name.

In [9]:
osm_file.seek(0) #takes you back to begining of the dataset
iter_parse()


{'bounds': 1,
 'member': 9855,
 'nd': 49725,
 'node': 41015,
 'osm': 1,
 'relation': 79,
 'tag': 19559,
 'way': 6553}


In [10]:
osm_file.seek(0)#takes you back to begining of the dataset
fix_street(osm_file)

defaultdict(<class 'set'>, {'A': {'7703 Montgomery Road, Suite A'}})
{'A': {'7703 Montgomery Road, Suite A'}}


In [11]:
import cerberus
%run conversion-database-prep.py

NameError: name 'unicode' is not defined