# Wrangling Open Street Map of Waterloo, Canada
## By @IanEdington



### Map Area: Region of Waterloo, Canada
    https://www.openstreetmap.org/relation/2062154
    https://www.openstreetmap.org/relation/2062153

### References used during this project
    https://www.udacity.com/course/viewer#!/c-ud032-nd/l-760758686/m-817328934
    https://docs.python.org/3/library/xml.etree.elementtree.html
    https://docs.python.org/2/library/re.html
    http://stackoverflow.com/questions/5029934/python-defaultdict-of-defaultdict


In [33]:
import xml.etree.cElementTree as ET
from pprint import pprint
import re
from importlib import reload

#-- show plots in notebook
%matplotlib inline

#-- Import wrangling functions using my lasso
import Lasso as l

In [34]:
# Reloading my lasso whenever it gets low
reload(l)

<module 'Lasso' from '/Users/ian/Dropbox/dev/udacity/project/3-wrangling-maps/Lasso.py'>

##Understanding the data
Getting an idea of what is going on inside the area chosen.
Looking at the possible values for each tag type and each 

In [35]:
atr_d, st_atr_d, s_st_d, tag_k_v_dict = l.summarizes_data_2_tags_deep('waterloo-OSM-data.osm')

In [36]:
# types of top level tags:
# Expected [node, way, relation]
pprint (sorted(list(atr_d.keys())))

['member', 'meta', 'nd', 'node', 'osm', 'relation', 'tag', 'way']


###### Why are member, nd & tag as top level tags?
It looks like all start tags were analysed not just top level tags. This won't change our analysis. It will actually be useful to see how tag is used as a child of different tags.

###### What about &lt;osm&gt;?
Only one osm element in set:<br/>
&lt;osm version="0.6" generator="Overpass API"&gt;

###### What about &lt;note&gt;?
Only one note element in set:<br/>
&lt;note&gt; The data included in this document is from www.openstreetmap.org. The data is made available under ODbL. &lt;/note&gt;

###### What about &lt;meta&gt;?
Only one meta element in set:<br/>
&lt;meta osm_base="2015-07-16T03:14:03Z"/&gt;

## Focus on node, way, relation
### nodes
#### node attributes
Attributes of node: Expected [id, lat, log, ...]
Address keys should be here.
All keys should be indexable: check for problem keys.

In [37]:
#list of node attribute names (keys)
key_list = sorted(list(atr_d['node'].keys()))

# check for problem keys
print (l.check_keys_list(key_list))
pprint (key_list)

[]
['changeset', 'id', 'lat', 'lon', 'timestamp', 'uid', 'user', 'version']


No problem chars in node attribute keys
"addr:" fields seem to be limited (no extra ':')

Interesting keys:
*    Stangalone: ['FIXME', 'tag', 'place', 'node', 'fixme', 'dbh_cm']
*    Address: ['addr:city', 'addr:country', 'addr:housename', 'addr:housenumber', 'addr:interpolation', 'addr:postcode', 'addr:province', 'addr:state', 'addr:street', 'addr:unit',]

In [38]:
for attrib in ['FIXME', 'tag', 'place', 'fixme', 'note', 'dbh_cm']:
    print ('This is the contents of ' + attrib)
    pprint (atr_d['node'][attrib])


This is the contents of FIXME
set()
This is the contents of tag
set()
This is the contents of place
set()
This is the contents of fixme
set()
This is the contents of note
set()
This is the contents of dbh_cm
set()


    'FIXME': marks files that warrent a closer look
    'tag': Seems to be an empty set. This might be a
    'place': just another location descriptor
    'node': ref to the sub elements
    'fixme': marks files that warrent a closer look
    'note': lots of notes, one fixme starting with 'FIXME:'
    'dbh_cm': seems to be in reference to a tree

#### Node Tag k:v pairs

k:v pairs of Tags on node
Expected ?
All keys should be indexable: check for problem keys.

Keys from tags shouldn't conflict with attribute keys from same node. How can we check this?

In [39]:
#list of node attribute names (keys)
tag_key_list = list(tag_k_v_dict['node'].keys())

# check for problem keys
print (l.check_keys_list(tag_key_list))
pprint (sorted(tag_key_list))

[]
['FIXME',
 'access',
 'addr:city',
 'addr:country',
 'addr:housename',
 'addr:housenumber',
 'addr:interpolation',
 'addr:postcode',
 'addr:province',
 'addr:state',
 'addr:street',
 'addr:unit',
 'administrative',
 'aerialway',
 'aeroway',
 'alcohol',
 'alt_name',
 'amenity',
 'artist',
 'artist_name',
 'artwork_type',
 'atm',
 'automated',
 'backrest',
 'barrier',
 'beauty',
 'bench',
 'bicycle',
 'bicycle_parking',
 'bin',
 'board_type',
 'books',
 'booth',
 'bottle',
 'brand',
 'building',
 'building:levels',
 'built',
 'bus',
 'button',
 'button_operated',
 'canvec:UUID',
 'capacity',
 'car',
 'clothes',
 'colour',
 'contact:phone',
 'contact:website',
 'content',
 'contents',
 'covered',
 'craft',
 'created_by',
 'crossing',
 'crossing:barrier',
 'crossing:bell',
 'crossing:light',
 'cuisine',
 'currency:CAD',
 'cycleway',
 'dbh_cm',
 'denomination',
 'description',
 'designation',
 'destination',
 'diet:vegetarian',
 'direction',
 'dispensing',
 'display',
 'drink',
 'drinkin

No problem chars in node tag keys 

Interesting keys:

    Stangalone: ['FIXME', 'tag', 'place', 'node', 'fixme', 'dbh_cm']
    Address: ['addr:city', 'addr:country', 'addr:housename', 'addr:housenumber', 'addr:interpolation', 'addr:postcode', 'addr:province', 'addr:state', 'addr:street', 'addr:unit',]



#### node children

#### node children attributes

#### node children children

### Ways

### relation

###Potential Problem areas
#### address

## Make a plan for how to store the data



###1. Problems Encountered in the Map
Student response describes the challenges encountered while auditing, fixing and processing the dataset for the area of their choice. Some of the problems encountered during data audit are cleaned programmatically.


Student response shows understanding of the process of auditing, and ways to correct or standardize the data, including dealing with problems specific to the location, e.g. related to language or traditional ways of formatting. Some of the problems encountered during data audit are cleaned programmatically.  

###2. Data Overview
Student provides a statistical overview about their chosen dataset, like:

    size of the file
    number of unique users
    number of nodes and ways
    number of chosen type of nodes, like cafes, shops etc
    
Student response provides the statistics about their chosen map area.

Student response also includes the MongoDB queries used to obtain the statistics.

###3. Additional Ideas
Other ideas about the datasets

Student is able to analyze the dataset and recognize opportunities for using it in other projects

Student proposes one or more additional ways of improving and analyzing the data and gives thoughtful discussion about the benefits and anticipated problems in implementing the improvement.

### Code Review:
####Code Functionality

All Lesson 6 problems are solved correctly. Final project code functionality reflects the description in the project document. All required Lesson 6 questions are correctly solved with the submitted code. Final project code functionality reflects the description in the project document.

####Code Readability

Final project code is well structured.

Final project code is commented as necessary.
Final project code follows an intuitive, easy-to-follow logical structure.

Final project code that is not intuitively readable is well-documented with comments.

###Thoroughness and Succinctness of Submission

Student submission is long enough to thoroughly answer the questions asked without giving unnecessary detail.
A good general guideline is that your question responses should take about 3-6 pages.