# OpenStreetMap Case Study

## Introduction
<p>I am to investigate data set from a location of my choosing from openstreetmap, identify problems, clean it and store the data in SQL. In addition, I am to propose ideas on how to improve the data. This investigation is a practice project needed to complete the Data Analyst Nanodegree from Udacity.
</p>

### Location
<p>I chose Auckland, New Zealand as my location for my investigation because I have been planning to take a trip here for sometime now. I would like to take the opportunity to get myself familiar with the place by using it as an example for this project</p>
- [www.openstreetmap.org/node/292806332](https://www.openstreetmap.org/node/292806332)

In [2]:
import sqlite3
from collections import defaultdict
import xml.etree.cElementTree as ET
import re
import pprint

file_sample = "auckland_new-zealand-sample.osm"
file_actual = "auckland_new-zealand.osm"

street_type_re = re.compile(r'\b\S+\.?$', re.IGNORECASE)

expected = ["Avenue", "Crescent", "Drive", "Highway", "Lane", "Place", "Road", "Street", "Way"]

mapping = {
    "street": "Street",
    "st": "Street",
    "st.": "Street",
    "rd": "Road",
    "road": "Road",
    "Strret": "Street",
    "cr": "Crescent",
    "cresent": "Crescent",
    "crest": "Crescent",
    "hwy": "Highway",
    "ave": "Avenue",
    "plc,": "Place",
    "beach": "Beach",
    "way": "Way",
    "ln": "Lane"
}

def update_name(name, mapping):
    name_a = name.split(" ")

    for w in range(len(name_a)):
        if name_a[w].lower() in mapping.keys():
            name_a[w] = mapping[name_a[w].lower()]
    name = " ".join(name_a)
    
    return name
            
def audit_street_type(street_types, street_name):
    m = street_type_re.search(street_name)
    if m:
        street_type = m.group()
        if street_type not in expected:
            new_name = update_name(street_name, mapping)
            street_types[street_type].add(new_name)
    
def is_street_name(elem):
    return (elem.attrib['k'] == 'addr:street')

def audit(osmfile):
    osm_file = open(osmfile, 'r')
    street_types = defaultdict(set)
    for event, elem in ET.iterparse(osm_file, events=('start',)):
        if elem.tag == "way" or elem.tag == 'node':
            for tag in elem.iter("tag"):
                if is_street_name(tag):
                    audit_street_type(street_types, tag.attrib['v'])
    osm_file.close()
    return street_types

st_types = audit(file_actual)

pprint.pprint(dict(st_types))

{'0632': set(['15 Arrenway Dr, Rosedale, Auckland 0632']),
 u'1010\u65b0\u897f\u862d': set([u'38 Lorne St, Auckland, 1010\u65b0\u897f\u862d']),
 '16': set(['State Highway 16']),
 '2': set(['State Highway 2']),
 '22': set(['State Highway 22']),
 '26': set(['26']),
 'Auckland': set(['Exmouth Road, Northcote, Auckland']),
 'Ave': set(['Brennan Avenue',
             'Delta Avenue',
             'Erson Avenue',
             'Gillies Avenue',
             'Vitasovich Avenue',
             'Waverley Avenue']),
 'Broadway': set(['Broadway']),
 'Circle': set(['Leybourne Circle']),
 'Close': set(['Challen Close', 'Court Town Close', 'Regia Close']),
 'Coronation': set(['Coronation']),
 'Court': set(['Fantail Court', 'Palm Court', 'Palmgreen Court']),
 'Cove': set(['Clearwater Cove']),
 'Cr': set(['Marjorie Jayne Crescent']),
 'Cresent': set(['Tawa Crescent']),
 'Crest': set(['The Crescent']),
 'East': set(['Customs Street East',
              'Durham Street East',
              'Greenlane East',

## Problems Encountered in Your Map
<p>Running the data set through the code, I explored the data and took notes of some problems that I have encountered in the map that I have chosen.</p>
- Mispelled Names
- Incorrect Capitalization
- Abbreviated Names
- Problematic Format

In [10]:
lower = re.compile(r'^([a-z]|_)*$')
lower_colon = re.compile(r'^([a-z]|_)*:([a-z]|_)*$')
problemchars = re.compile(r'[=\+/&<>;\'"\?%#$@\,\. \t\r\n]')

def key_type(element,keys):
    if element.tag == 'tag':
        query = element.attrib['k']
        if lower.search(query):
            keys['lower'] += 1
        elif lower_colon.search(query):
            keys['lower_colon'] += 1
        elif problemchars.search(query):
            keys['problemchars'] += 1
        else:
            keys['other'] += 1
        pass
    return keys

def process_map(filename):
    keys = {'lower': 0, 'lower_colon': 0, 'problemchars': 0, 'other': 0}
    for _, element in ET.iterparse(filename):
        keys = key_type(element, keys)
    return keys

keys = process_map(file_current)
print keys

def count_tags(filename):
    tags = {}
    
    for i, elem in ET.iterparse(filename):
        if elem.tag not in tags.keys():
            tags[elem.tag] = 1
        else:
            tags[elem.tag] += 1
    return tags

tags = count_tags(file_current)
print tags

{'problemchars': 0, 'lower': 88852, 'other': 76919, 'lower_colon': 1960}
