# Project 3 - Open Street Map Data Wrangling

## Abstract
Open Street Map is a project to create a open source mapping of the world. It could most easily be understood as the Wikipedia of maps, where anyone in the world can add to the mapping dataset. The data provides a rich dataset, both interesting in the context of learning more about various geographic areas around the world, or from the data itself, such as how many people contributed to the dataset for a particular area. With any project reliant on human input though the data is sometimes inconsistent. 

### Los Angeles California
Los Angeles California was chosen for two primary reasons. One is that Los Angeles is largest city in America, and the second most populous. It is very likely that there are more records in Los Angeles than most other American cities and it is also likely that there are more contributers. The second reason is that the I was born close to Los Angeles so to me it is a more interesting dataset.

## Methodology
The map data was initially downloaded from MapZen

## Data Munging

### Verify Tags

In [5]:
file_location = "data\los-angeles_california.osm"
file_location = r"C:\los-angeles_california.osm"

In [6]:
from programs import tags
tag = tags.process_map(file_location)

In [7]:
tag

{'lower': 1804949, 'lower_colon': 2122407, 'other': 176260, 'problemchars': 0}

Out of the 4,103,616 tags luckily none of them contain any of characters labeled as problem characters in the Lesson 6 example.

### Number of contributors and elements

In [8]:
from programs import users
users,ids = users.process_map(file_location)

In [13]:
"Number of Users: {0}     Number of Ids: {1}".format(len(users),len(ids))

'Number of Users: 3026     Number of Ids: 5961251'

In [10]:
list(ids)[:5]

['557893465', '2878269565', '54327174', '95222955', '3375815992']

Out of the millions of people that have visited or live in LA it seems that only 3026 people are responsible for all the points in the LA Open Street Mapb Project. Further analysis will be done after the MongoDB database has been created.

### Count Nodes and Ways
A function was added to the data module which counts the number of ways and nodes elements in the original LA osm file. We'll be using this counter later to verify that import into Mongodb was successful

In [None]:
from programs import data
count = data.count_elements(file_location)

### Create json
For import into Mongodb, the osm file, an XML type file, will be converted into a json using python. During the conversion process street names are checked and converted. The list of conversations was generated manually by reviewing a list of all street suffixes to check for repeats or typos. Processing the map takes significant resources and was precomputed outside of this notebook

In [None]:
if True == False: #Prevents execution
    data.process_map(file_location)

# MongoDB
Mongodb is a popular NoSQL database that stores its data in collections of documents. Documents have a flexible schema, which works well for the OpenStreetMap data as not every node and way has the same "columns". If the Open Street Map data was to be stored in a tabular databases many fields would be null, for instance "Outside Seating" would be irrelevant for most businsses. Additionally adding extra data would be burdensome as new columns would have to be added to an ever growing table.

### Loading Data
The JSON file generated by our previous Python method needs to initially be loaded into Mongodb using the following command in the terminal

>mongoimport --db test --collection la_map --file los-angeles_california.osm.json

Once imported we can begin querying the local Mongodb database using a python driver

In [1]:
from pymongo import MongoClient
client = MongoClient('mongodb://localhost:27017/')
db = client.test

### Verifying all documents imported

In [2]:
db.la_map.find().count()

5953758

### Counting Number of Elements in entire collection

In [16]:
counter = 0
for document in db.la_map.find_one():
    counter += len(document)
print(counter)

<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
35


### See what types of cuisines are available

In [26]:
db.la_map.distinct("cuisine")

['burger',
 'japanese',
 'american',
 'thai',
 'korean',
 'vietnamese',
 'sushi',
 'mexican',
 'italian',
 'roast_beef',
 'coffee',
 'sandwich',
 'hawaiian',
 'ice_cream',
 'pizza',
 'donut',
 'fish_and_chips',
 'chinese',
 'steak_house',
 'chicken;mexican',
 'chicken',
 'Japanese Ramen',
 'Northern Chinese',
 'taiwanese',
 'indian',
 'mediterranean',
 'cantonese',
 'regional',
 'coffee_shop',
 'french',
 'gastropub',
 'noodle',
 'asian',
 'peruvian',
 'greek',
 'steak;seafood',
 'italian;mediterranean',
 'deli',
 'burger;american',
 'barbecue',
 'american;bakery',
 'seafood',
 'Californian',
 'burger;mexican',
 'breakfast',
 'american;brewpub',
 'garlic',
 'indonesian',
 'pizza;chicken',
 'catering',
 'juice',
 'sushi;steak;japanese',
 'seafood;steak',
 'mexican;pizza',
 'seafood;sushi;steak',
 'greek;burger',
 'seafood;california',
 'chinese;sushi',
 'seafood;steak;hawaiian',
 'sushi;japanese;steak',
 'mexican;steak;seafood',
 'sushi;california',
 'chicken;ice_cream',
 'seafood;brewp

### Count Number of Coffee Shops

In [28]:
db.la_map.find({"cuisine":{"$regex": u".*coffee.*"}}).count()

174

# Appendix

## Reference
Map Source - https://mapzen.com/data/metro-extracts
MongoDB Manual - https://docs.mongodb.org/manual/