# Map Area

Taylorsville City and the surrounding areas:

* https://www.openstreetmap.org/export#map=13/40.6581/-111.9497

I am interested in working on Taylorsville and the surrounding areas not only becuase it is around where I live currently, but also it is where both my children and parents grew up. So it is a place with a lot of memories and history for me.

# Problems Encountered in the Map
I first downloaded the data for my area, and then ran it through Sampler.py to create a sample of my data for initial review to determine what would be candidates for cleanup. After running the values through a modified version of data.py (sample_data.py) to generate CSV files for review I went over the data and found the following points that could use improvement:

* The turn:lanes tags are less readable due to omitting none value for lanes that do not have turn signals (|||right instead of |none|none|right)

    * Reference: https://wiki.openstreetmap.org/wiki/Key:turn
    
* There are some instances of maxspeed:type = sign which is an outdated format, and so should be updated to source:maxspeed = sign

    * Reference: https://wiki.openstreetmap.org/wiki/Key:maxspeed
    
* There are instances of hov = lane tags, which should be cleaned up as they are no longer valid

    * Reference: https://wiki.openstreetmap.org/wiki/Key:hov
    
* There are some instances of abbreviations in addr:street tags that should be cleaned up (Rd for Road and singl letter compass directions)

    * Reference: https://wiki.openstreetmap.org/wiki/Names#Abbreviation_.28don.27t_do_it.29
    
* There are some outdated name1 tags that should be converted to alt_name

    * Reference: https://wiki.openstreetmap.org/wiki/Names
    

# Auditing Data
After reviewing and determining the data to be improved I then set about creating auditing scripts to verify the amount of problem data to be cleaned and build the basis for the cleanup process. The simplest of these were the outdate/invalid tags, which each followed the basic process of checking for the appropriate collection of tags and incrementally counting them. These were tested first against my sample data for manual verification before running against the full data set.

In [2]:
%run name1_audit.py
%run speed_signs_audit.py
%run hov_audit.py

The count of outdated name1 tags is 3
The count of invalid maxspeed:type = sign tags is 15
The count of invalid HOV tags is 62


Auditing the abbreviations in the addr:street and turn:lanes tags were handled by street_audit.py and lanes_audit.py respectively, and these combined and reported the to be modified values so I could determine the proper process to programmatically clean the data. While previous work had provided the basis for the auditing of the addr:street tags the turn:lanes required a new regex in order to catch all instances of turn:lanes

# Data Overview

After completing the auditing and scripting the the conversion and cleanup, saved as final_data.py, and then importing the resultant data into a SQL database I ended up with the following data files:
```
Taylorsville.osm.....119 MB
nodes.csv.............44 MB
ways_nodes.csv........14 MB
ways_tags.csv........4.4 MB
ways.csv.............4.4 MB
nodes_tags.csv.......1.3 MB
DataWrangle.mdf......598 MB
```

# SQL Queries of Data

### Number of Nodes

```sql
SELECT count(*)
FROM [dbo].[nodes]
```
Result: 527,392

### Number of Ways

```sql
SELECT count(*)
FROM [dbo].[ways]
```
Result: 74,735

### Number of Unique User IDs

```sql
SELECT COUNT(DISTINCT(u.[uid]))          
FROM (SELECT [uid] FROM [dbo].[nodes] UNION ALL SELECT [uid] FROM [dbo].[ways]) u
```
Result: 933

### Counts of Types of Leisure Nodes

```sql
SELECT [value], count(*)
FROM dbo.nodes_tags
WHERE [key] = 'leisure'
GROUP BY [value]
ORDER BY 2 DESC
```
Results:
```
picnic_table........26
fitness_centre......10
park................6
playground..........6
sports_centre.......6
dance...............3
bleachers...........2
bowling_alley.......2
fishing.............1
amusement_arcade....1
fitness_station.....1
swimming_pool.......1
```

### Top 10 Named Ways With the Most Crossings

```sql
SELECT TOP 10 wt.[value], count(*)
FROM [dbo].[ways_tags] wt JOIN [dbo].[ways_nodes] wn ON wt.[id] = wn.[id]
WHERE wn.[node_id] IN (SELECT DISTINCT [id] FROM [dbo].[nodes_tags] WHERE [key] = 'crossing')
AND wt.[key] = 'name'
GROUP BY wt.[value]
ORDER BY 2 DESC
```
Results:
```
Highland Drive..........51
3300 South..............29
State Street............27
900 East................25
1300 East...............24
Main Street.............24
Fort Union Boulevard....23
3500 South..............21
4500 South..............16
2300 East...............15
```

# Additional Ideas
While I was analyzing and reviewing the data there were a few other items I found that could use further work for cleaning up and making the data more usable:

* There is a large amount of tiger tags that have not been reviewed based off of the tiger:reviewed tags

Querying the SQL database returns 6,372 tiger:reviewed = no tags, meaning about 10% of the ways in the dataset I processed is still in need of review (https://wiki.openstreetmap.org/wiki/TIGER_fixup). Cleaning these up would certainly be a significant effort, as it would require manual review, but I feel it could be beneficial not only in cleaning the old data but giving a chance to catch other errors or updates needed to adjacent data due to changes over the years. One example would be a nearby business that currently is listed as "Shake Makers", but has since been completely demolished and rebuilt as a 7-11.

* HOV lanes are limited to specific stretches of when they can be entered or exited, so these should be modeled

The HOV lanes on the included Interstate 15 are the type with designated entry/exit areas. As such, adding them properly would require significant manual work to mark where the specific areas are. While this could prove some benefit in better route planning in some situations I think the work would be better focused on other areas of the Interstate outside of this dataset. The particular area I reviewed is fairly urban, which in my experience means there is less use of the HOV lanes. Further to the north and south are more suburban and rural stretches that may see more benefit in having accurate HOV mapping.

# Conclusion
I found this project to be a very interestring and informative experience. Not only did I learn and develop my skills with Python, SQL, and data wrangling but I was introduced to a new project and tool in OpenStreetMap. There is definitely plenty of work that could be done to clean up and update this data in my neck of the woods, and as time permits in the future I may very well return to help contribute as I can.