# Segmenting and Clustering Neighborhoods in Irving
***

## Introduction/Business Problem

The City of Irving Texas, located in the DFW area between DFW Airport and Dallas, is a city surrounded by major highways and is an interface to neighboring areas. 
Understanding the area well I would like to draw conclusions from segmenting and clustering neighborhood data obtained from foursquare. 
The conclusions will be the best locations to put particular stores. 

Additionally, I would speculate that more high end stores would be in the North areas of Irving and fast foods would be more common in the central and south. 

## Data

The data would be modeled similarly to the New York City and Toronto problems. However, Irving, and all of Texas, do not have Boroughs so we will only need neighborhood names. 
After initial looking Wikipedia does not have a list so I found a list on a different site `http://www.city-data.com/nbmaps/neigh-Irving-Texas.html#N5`. In the next sections I will scrub the data.

The pipeline following getting neighborhood names would be to use `geocoder` to get the Latitude and Longitudes respectively.

## Implementation

A list of Neighborhoods in Irving is not available nicely on the internet so we will have to scrub it

In [6]:
import imp
import numpy as np # library to handle data in a vectorized manner
import pandas as pd
import re
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe
import requests
import json
# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

# !conda install -c conda-forge geocoder
try:
    imp.find_module('geocoder')
except ImportError:
    !conda install -c conda-forge geocoder 
from geopy.geocoders import Nominatim
import geocoder

# !conda install -c conda-forge folium=0.5.0 --yes
try:
    imp.find_module('folium')
except ImportError:
    !conda install -c conda-forge folium=0.5.0 --yes
import folium # map rendering library

In [9]:
url ="http://www.city-data.com/nbmaps/neigh-Irving-Texas.html#N5"

body = requests.get(url)
text = body.text
start = text.find(">Neighborhoods:</h2>") + len(">Neighborhoods:</h2>") + 1
end = text.find(">Woodhaven</a>") + len(">Woodhaven</a>")
text = text[start:end]

text = re.sub("<.*?>", " ", text) # remove all tag elements
text = text.split(',') # split by comma
text = [x.strip() for x in text] # trim leading and trailing whitespace

df = pd.DataFrame(data = {'Neighborhood': text})
df.head(10)

Unnamed: 0,Neighborhood
0,Arts District
1,Barton Estates
2,Beverly Oaks
3,Broadmoor Hills
4,Cardinal Family Village
5,Club Townhomes
6,Cottonwood Valley
7,Country Club Place
8,Del Paseo
9,Downtown Heritage District


#### Adding Latitude and Longitude

In [10]:
df["Latitude"] = ""
df["Longitude"] = ""
df.head(10)

Unnamed: 0,Neighborhood,Latitude,Longitude
0,Arts District,,
1,Barton Estates,,
2,Beverly Oaks,,
3,Broadmoor Hills,,
4,Cardinal Family Village,,
5,Club Townhomes,,
6,Cottonwood Valley,,
7,Country Club Place,,
8,Del Paseo,,
9,Downtown Heritage District,,


In [11]:
for index, row in df.iterrows():
    # initialize your variable to None
    lat_lng_coords = None
    postal_code = row['Neighborhood']
    # loop until you get the coordinates
    while(lat_lng_coords is None):
        g = geocoder.google('{}, Irving, Texas'.format(postal_code))
        lat_lng_coords = g.latlng

    latitude = lat_lng_coords[0]
    longitude = lat_lng_coords[1]
    row["Latitude"] = latitude
    row["Longitude"] = longitude

In [12]:
pd.set_option('precision', 8)
df.head(10)

Unnamed: 0,Neighborhood,Latitude,Longitude
0,Arts District,32.848551,-96.966548
1,Barton Estates,32.824903,-96.988034
2,Beverly Oaks,32.833081,-96.927017
3,Broadmoor Hills,32.876134,-97.000122
4,Cardinal Family Village,32.84782,-96.954465
5,Club Townhomes,32.823383,-96.939296
6,Cottonwood Valley,32.861251,-96.965205
7,Country Club Place,32.863181,-96.949431
8,Del Paseo,32.832752,-96.960506
9,Downtown Heritage District,32.814701,-96.947804
