# Data Science at Scale, [Capstone Project](https://www.coursera.org/learn/datasci-capstone)
## Milestone 1. Create a dataset of locations of buildings
Create a file where each record represents a building. In most approaches, each building will be associated with some spatial extent so that you can determine which incidents will be assigned to it.

In [1]:
import folium
import numpy as np
import pandas as pd
from datetime import datetime, date
from geopy.geocoders import Nominatim

### Buildings File
Each record represents a location derived from 311 call, crime or blight violation from the followin datasets -
* [Blight Violations](https://d18ky98rnyall9.cloudfront.net/_97bd1c1e5df9537bb13398c9898deed7_detroit-blight-violations.csv?Expires=1462320000&Signature=KjJzlAwVQBOONT-2ZJN7ixzhYeD~Cb1T5t4G5pIn1Alf3F7c0MTwnnYfstgr-hxGH12A9T4mayhz7uPl2zPYVk5VOIfHrkmTNiwudNJbZ0gtMjFXr~q7EFQSfi3nafc~W0sDZKezGVCVZCrPqN2RUddWIJfuli0erB1kvRNC75k_&Key-Pair-Id=APKAJLTNE6QMUY6HBC5A)
* [311 Service Calls](https://d18ky98rnyall9.cloudfront.net/_dcebfb2135a2bf5a6392493bd61aba22_detroit-311.csv?Expires=1462320000&Signature=lfmBO8JTr0lHrxA-DYDkl~TfwaM6hEyPsqhhtnE1iKfEEoxKmHT62VwnJvnjccUcfrsdMfyz7YpFz-OvtXMVJBC4~d8mDPcLo~15nLr198gUHCpykWk2uV1nOln4kCQuSDvuusQDR9UMDSCAURf-I8lCM7LU3jy3IYOd73uY-HU_&Key-Pair-Id=APKAJLTNE6QMUY6HBC5A)
* [Crime Incidents](https://d18ky98rnyall9.cloudfront.net/_dcebfb2135a2bf5a6392493bd61aba22_detroit-crime.csv?Expires=1462320000&Signature=FsI7KVjPOUR8ujpVKtMWouTgUmc0XY8RS2J5EjJa9Z-Yab61WBPBOroVrwoGa4UtAB9uDB2IJTVXzUx4LFz-zBEgGyd4BX4uZlbnnLkv82wW3FzJZcpMzKbpjfq0xtt4AY7DcRx69GzGl84EE4is~C5hoOIVThMcTKaALabpwW4_&Key-Pair-Id=APKAJLTNE6QMUY6HBC5A)
* [Detroit Demolition](https://data.detroitmi.gov/api/views/rv44-e9di/rows.csv?accessType=DOWNLOAD)

### Data preparation & Tools
The greatest challenge in the provided data from Socrata is within the Location column because it concatenates all of the fields used in the geocoding process, and when address fields are included, line breaks are entered into the field and cause havoc until they are removed from the data file.
Before analysis of the data could be undertaken, all files were initially formatted using [Excel PowerQuery](https://support.office.com/en-us/article/Introduction-to-Microsoft-Power-Query-for-Excel-6E92E2F4-2079-4E1F-BAD5-89F6269CD605) for removal of aforementioned line breaks and standardization of the street number and addresses. Then the data was loaded into [FME](https://www.safe.com/fme/fme-desktop/) to validate and standardize the geographic coordinates and output formatted incident and unique building files. 

In [2]:
file = 'building_blight_features.csv'
data = pd.read_csv(file)
buildings = pd.DataFrame(data, columns = ['Address','Latitude','Longitude','blight','CrimeCount','311Count','BlightViolationCount'])
buildings
print "Rowcount: ", "{:,}".format(len(buildings))

Rowcount:  160,985


In [3]:
# set to 500 for testing
MAX_RECORDS = 500
# MAX_RECORDS = len(buildings)

# center on Detroit
buildings_map = folium.Map(location=[42.3314, -83.0458],
                   tiles='Stamen Toner',
                   zoom_start=11)

# build the map
for each in buildings[0:MAX_RECORDS].iterrows():
    folium.Marker([each[1]['Latitude'], each[1]['Longitude']],
              popup=each[1]['Address'],
              icon=folium.Icon(color='gray',icon='home')
             ).add_to(buildings_map)
buildings_map

## Resources
* https://github.com/python-visualization/folium
* https://folium.readthedocs.io/en/latest/quickstart.html#markers
* http://www.w3schools.com/icons/bootstrap_icons_glyphicons.asp