<h1>Battle of the Neighborhoods Capstone Project Final Submission</h1>

<h3>Download and import dependencies</h3>

In [1]:
!conda install -c conda-forge geopy --yes
import pandas as pd, numpy as np, requests, folium
from geopy.geocoders import Nominatim
import matplotlib.cm as cm
import matplotlib.colors as colors

Collecting package metadata (current_repodata.json): done
Solving environment: done

## Package Plan ##

  environment location: /home/jupyterlab/conda/envs/python

  added / updated specs:
    - geopy


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    geographiclib-1.50         |             py_0          34 KB  conda-forge
    geopy-1.22.0               |     pyh9f0ad1d_0          63 KB  conda-forge
    ------------------------------------------------------------
                                           Total:          97 KB

The following NEW packages will be INSTALLED:

  geographiclib      conda-forge/noarch::geographiclib-1.50-py_0
  geopy              conda-forge/noarch::geopy-1.22.0-pyh9f0ad1d_0



Downloading and Extracting Packages
geopy-1.22.0         | 63 KB     | ##################################### | 100% 
geographiclib-1.50   | 34 KB     | ###############################

The list of Phoenix, AZ neighborhoods was obtained from https://en.wikipedia.org/wiki/Category:Neighborhoods_in_Phoenix,_Arizona.

In [2]:
neighborhood_list = [
'Arcadia (Phoenix)',
'Biltmore Area',
'Brentwood Historic District',
'Central Avenue Corridor',
'Chinatown, Phoenix',
'Desert Ridge',
'Downtown Phoenix',
'Golden Gate Barrio',
'Maryvale, Phoenix',
'Moon Valley, Phoenix',
'North/Northwest Phoenix',
'Sacred Heart Church (Phoenix, Arizona)',
'South Phoenix',
'F. Q. Story Neighborhood Historic District',
'Woodlea Historic District'
]

I then gathered the central coordinates for each neighborhood using Geopy

In [3]:
geolocator = Nominatim(user_agent='phoenix', timeout=3)

def get_coords(neighborhood):
    address = neighborhood
    if 'Phoenix' not in address:
        address = address + ', Phoenix, AZ'
    location = geolocator.geocode(address)
    if location != None:
        return {'Neighborhood':neighborhood, 'Latitude':location.latitude, 'Longitude':location.longitude}
    else:
        return neighborhood
    
neighborhoods = [get_coords(n) for n in neighborhood_list]

In [4]:
neighborhoods[0:5]

[{'Neighborhood': 'Arcadia (Phoenix)',
  'Latitude': 33.4995258,
  'Longitude': -111.95811929829402},
 {'Neighborhood': 'Biltmore Area',
  'Latitude': 33.510792050000006,
  'Longitude': -112.02787750653704},
 {'Neighborhood': 'Brentwood Historic District',
  'Latitude': 33.4639866,
  'Longitude': -112.04319692516617},
 'Central Avenue Corridor',
 {'Neighborhood': 'Chinatown, Phoenix',
  'Latitude': 41.8535449,
  'Longitude': -87.6329633}]

Some of the neighborhoods did not have latitudes and longitudes, so they needed to be troubleshooted

In [5]:
no_coords = [(i,n) for i,n in enumerate(neighborhoods) if type(n) != dict]
no_coords

[(3, 'Central Avenue Corridor'),
 (7, 'Golden Gate Barrio'),
 (13, 'F. Q. Story Neighborhood Historic District')]

Central Avenue Corridor was not found in Geopy's database. Through a Google Maps search, I found the coordinates to be (33.5147764,-112.073993). 

According to Wikipedia, Golden Gate Barrio is no longer a neighborhood.

Through the manipulation of the query term, I was able to find results for F. Q. Story Neighborhood.

In [6]:
fq_story = geolocator.geocode('F.Q. Story, Phoenix, AZ')
fq_story

Location(F. Q. Story, Encanto, Phoenix, Maricopa County, Arizona, United States of America, (33.459864350000004, -112.08695829069015, 0.0))

With all of the coordinates gathered, I then updated the neighborhood list and turned it into a dataframe

In [7]:
neighborhoods[no_coords[0][0]] = {'Neighborhood': 'Central Avenue Corridor', 'Latitude': 33.5147764, 'Longitude': -112.073993}
neighborhoods[no_coords[2][0]] = {'Neighborhood': 'F. Q. Story', 'Latitude': fq_story.latitude, 'Longitude': fq_story.longitude}
neighborhoods.remove(no_coords[1][1]);
neighborhood_df = pd.DataFrame(neighborhoods)
neighborhood_df

Unnamed: 0,Neighborhood,Latitude,Longitude
0,Arcadia (Phoenix),33.499526,-111.958119
1,Biltmore Area,33.510792,-112.027878
2,Brentwood Historic District,33.463987,-112.043197
3,Central Avenue Corridor,33.514776,-112.073993
4,"Chinatown, Phoenix",41.853545,-87.632963
5,Desert Ridge,33.634208,-111.931534
6,Downtown Phoenix,40.798673,-77.853707
7,"Maryvale, Phoenix",33.492843,-112.223207
8,"Moon Valley, Phoenix",33.616529,-112.068815
9,North/Northwest Phoenix,43.231179,-76.300764


Before getting data from the FourSquare API, I wanted to map the neighborhoods to see how spread out they are, and to verify that the data was correct.

In [8]:
phoenix_coords = geolocator.geocode('Phoenix, AZ')

phoenix_map = folium.Map(location=[phoenix_coords.latitude, phoenix_coords.longitude], zoom_start=11)

for lat, long, neighborhood in zip(neighborhood_df['Latitude'], neighborhood_df['Longitude'], neighborhood_df['Neighborhood']):
    label = folium.Popup(str(neighborhood), parse_html=True)
    folium.CircleMarker([lat, long], radius=5, popup=label, fill=True, fill_opacity=0.7).add_to(phoenix_map)

phoenix_map

Mapping the neighborhoods out led me to find out that Chinatown, Downtown, and North/Northwest Phoenix had the wrong coordinates. They were in Illinois, Pennsylvania, and New York!

Chinatown was another defunct neighborhood according to Wikipedia so I removed it. I found Downtown and North/Northwest coordinates in Google Maps.

In [9]:
#Remove Chinatown
neighborhood_df = neighborhood_df[neighborhood_df.Neighborhood != 'Chinatown, Phoenix']

#Update Downtown / North/Northwest
neighborhood_df.loc[neighborhood_df.Neighborhood == 'Downtown Phoenix', ['Latitude','Longitude']] = [33.4514652,-112.0913905]
neighborhood_df.loc[neighborhood_df.Neighborhood == 'North/Northwest Phoenix', ['Latitude','Longitude']] = [33.6814518,-112.0715705]
neighborhood_df

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  self.obj[item] = s


Unnamed: 0,Neighborhood,Latitude,Longitude
0,Arcadia (Phoenix),33.499526,-111.958119
1,Biltmore Area,33.510792,-112.027878
2,Brentwood Historic District,33.463987,-112.043197
3,Central Avenue Corridor,33.514776,-112.073993
5,Desert Ridge,33.634208,-111.931534
6,Downtown Phoenix,33.451465,-112.091391
7,"Maryvale, Phoenix",33.492843,-112.223207
8,"Moon Valley, Phoenix",33.616529,-112.068815
9,North/Northwest Phoenix,33.681452,-112.071571
10,"Sacred Heart Church (Phoenix, Arizona)",33.448437,-112.074142


With the dataframe fixed, here is what the neighborhoods ended up looking like:

In [10]:
del(phoenix_map)
phoenix_map = folium.Map(location=[phoenix_coords.latitude, phoenix_coords.longitude], zoom_start=10)

for lat, long, neighborhood in zip(neighborhood_df['Latitude'], neighborhood_df['Longitude'], neighborhood_df['Neighborhood']):
    label = folium.Popup(str(neighborhood), parse_html=True)
    folium.CircleMarker([lat, long], radius=5, popup=label, fill=True, fill_opacity=0.7).add_to(phoenix_map)

phoenix_map

With the cleaned data, we can see that the neighborhoods near downtown are quite concentrated while the suburban neighborhoods are distant from each other.

Now it is time to gather data from the FourSquare API
According to Wikipedia, the average human walking speed is 3.1 mph. This means that roughly 5,000 meters can be covered in an hour. Given that we are looking for access to convenient services and amenities, we should look for venues that are within a 15 minute walk of the neighborhood.

In [11]:
radius = 5000 * 15 / 60;

#Set FourSquare API Variables
CLIENT_ID = 'LIDUKDVKQR2PT0KGXT2XL0GK2SUHLWBZSVPHI05TG00TGSMH' # your Foursquare ID
CLIENT_SECRET = 'M1PGTNBM2ASOESWMJMRD0DHB13CETTK02OD4ZRZGASKAGCEY' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version
LIMIT = 500;

#Function to iterate through dataframe records and gather venues within defined radius. From prior Capstone assignment
def getNearbyVenues(names, latitudes, longitudes, radius=radius):
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

phoenix_venues = getNearbyVenues(neighborhood_df['Neighborhood'],neighborhood_df['Latitude'],neighborhood_df['Longitude'])

Arcadia (Phoenix)
Biltmore Area
Brentwood Historic District
Central Avenue Corridor
Desert Ridge
Downtown Phoenix
Maryvale, Phoenix
Moon Valley, Phoenix
North/Northwest Phoenix
Sacred Heart Church (Phoenix, Arizona)
South Phoenix
F. Q. Story
Woodlea Historic District


No Medical / Hospital venues were found within 1250 meters of the center of any of the neighborhoods, so the homebuyers would likely need to have a car for their medical needs. The other walking-distance services and amenities that ACME Realty was interested were then looked at.

In [12]:
#Find venue categories generated that pertain to healthcare (as defined in FourSquare's API docs, if any exist)
health_venues = [x for x in ['Hospital','Doctor','Medical','Therapist','Rehab','Urgent Care','Chiropractor','Health'] if x in phoenix_venues.columns]
if len(health_venues) > 0:
    print(len(health_venues), 'Types of Healthcare venues exist within walking distance.')
else:
    print("No healthcare venues exist within walking distance.")

No healthcare venues exist within walking distance.


Count Results to see how many venues were gathered for each neighborhood

In [13]:
phoenix_venues.groupby('Neighborhood').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Arcadia (Phoenix),36,36,36,36,36,36
Biltmore Area,100,100,100,100,100,100
Brentwood Historic District,55,55,55,55,55,55
Central Avenue Corridor,100,100,100,100,100,100
Desert Ridge,100,100,100,100,100,100
Downtown Phoenix,57,57,57,57,57,57
F. Q. Story,81,81,81,81,81,81
"Maryvale, Phoenix",24,24,24,24,24,24
"Moon Valley, Phoenix",34,34,34,34,34,34
North/Northwest Phoenix,14,14,14,14,14,14


All neighborhoods generated results, so now we can one-hot encode them

In [14]:
# Encode categories
phoenix_onehot = pd.get_dummies(phoenix_venues[['Venue Category']], prefix="", prefix_sep="")
phoenix_onehot['Neighborhood'] = phoenix_venues['Neighborhood'] 

# move neighborhood column to the first column
phoenix_onehot.insert(0, 'Neighborhood', phoenix_onehot.pop('Neighborhood'))
phoenix_onehot.head()


Unnamed: 0,Neighborhood,Accessories Store,Adult Boutique,Airport,Airport Service,American Restaurant,Antique Shop,Arcade,Art Gallery,Arts & Crafts Store,...,Tour Provider,Trail,Vegetarian / Vegan Restaurant,Video Store,Vietnamese Restaurant,Wine Bar,Wine Shop,Wings Joint,Women's Store,Yoga Studio
0,Arcadia (Phoenix),0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,Arcadia (Phoenix),0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,Arcadia (Phoenix),0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,Arcadia (Phoenix),0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,Arcadia (Phoenix),0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


Now we can find out which neighborhoods have the highest concentration of venues that ACME Realty was interested in

In [15]:
phoenix_walking_distance = phoenix_onehot.groupby('Neighborhood').sum().reset_index()
phoenix_walking_distance = phoenix_walking_distance[['Neighborhood','Trail','Park','Botanical Garden','Garden','Sculpture Garden','Pharmacy','Supermarket','Grocery Store','Health Food Store','Convenience Store']]
phoenix_walking_distance['Desired_Venues'] = phoenix_walking_distance.sum(axis=1)
phoenix_walking_distance = phoenix_walking_distance.sort_values(by='Desired_Venues',ascending=False)
phoenix_walking_distance['Rank'] = np.arange(len(phoenix_walking_distance))+1
phoenix_walking_distance

Unnamed: 0,Neighborhood,Trail,Park,Botanical Garden,Garden,Sculpture Garden,Pharmacy,Supermarket,Grocery Store,Health Food Store,Convenience Store,Desired_Venues,Rank
2,Brentwood Historic District,0,1,0,0,0,0,1,3,0,3,8,1
11,South Phoenix,0,2,0,0,0,0,0,0,0,5,7,2
3,Central Avenue Corridor,1,1,0,0,0,0,1,2,1,0,6,3
1,Biltmore Area,1,0,0,0,0,0,0,3,0,1,5,4
8,"Moon Valley, Phoenix",2,0,0,0,0,1,0,1,0,1,5,5
0,Arcadia (Phoenix),0,2,0,1,0,0,0,0,0,1,4,6
6,F. Q. Story,0,0,1,2,0,0,0,0,0,1,4,7
7,"Maryvale, Phoenix",0,2,0,0,0,0,0,0,0,2,4,8
10,"Sacred Heart Church (Phoenix, Arizona)",0,2,0,0,0,0,0,2,0,0,4,9
12,Woodlea Historic District,0,0,0,0,0,2,0,1,0,1,4,10


Now we can build an understanding of what the neighborhoods are like by gauging the frequency of venue occurrences using functions from the previous capstone assignment)

In [16]:
phoenix_grouped = phoenix_onehot.groupby('Neighborhood').mean().reset_index()

def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
phoenix_venues_sorted = pd.DataFrame(columns=columns)
phoenix_venues_sorted['Neighborhood'] = phoenix_grouped['Neighborhood']

for ind in np.arange(phoenix_grouped.shape[0]):
    phoenix_venues_sorted.iloc[ind, 1:] = return_most_common_venues(phoenix_grouped.iloc[ind, :], num_top_venues)

phoenix_venues_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Arcadia (Phoenix),Hotel Pool,Resort,Gym / Fitness Center,Park,American Restaurant,Hotel,Intersection,Canal,Lounge,Latin American Restaurant
1,Biltmore Area,Clothing Store,American Restaurant,Coffee Shop,Hotel,New American Restaurant,Burger Joint,Bank,Cosmetics Shop,Breakfast Spot,Pizza Place
2,Brentwood Historic District,Mexican Restaurant,Taco Place,Convenience Store,Grocery Store,Coffee Shop,Intersection,Seafood Restaurant,Clothing Store,Sandwich Place,Café
3,Central Avenue Corridor,Pizza Place,Coffee Shop,Ice Cream Shop,Breakfast Spot,American Restaurant,Gastropub,Bar,Mexican Restaurant,Taco Place,New American Restaurant
4,Desert Ridge,Furniture / Home Store,Italian Restaurant,Coffee Shop,Mexican Restaurant,Clothing Store,Cosmetics Shop,American Restaurant,Women's Store,Seafood Restaurant,Restaurant


With the Neighborhoods' top venue categories defined, we can move on to clustering to find neighborhoods with similar offerings using the k-means algorithm

In [18]:
from sklearn.cluster import KMeans

kclusters = 5
phoenix_grouped_clustering = phoenix_grouped.drop('Neighborhood', 1)
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(phoenix_grouped_clustering)
kmeans.labels_[0:14]


phoenix_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

phoenix_merged = neighborhood_df

# merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
phoenix_merged = phoenix_merged.join(phoenix_venues_sorted.set_index('Neighborhood'), on='Neighborhood')
phoenix_merged = phoenix_merged.join(phoenix_walking_distance[['Neighborhood','Desired_Venues','Rank']].set_index('Neighborhood'), on='Neighborhood')

phoenix_merged = phoenix_merged.sort_values(by='Rank')
phoenix_merged

Unnamed: 0,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue,Desired_Venues,Rank
2,Brentwood Historic District,33.463987,-112.043197,1,Mexican Restaurant,Taco Place,Convenience Store,Grocery Store,Coffee Shop,Intersection,Seafood Restaurant,Clothing Store,Sandwich Place,Café,8,1
11,South Phoenix,33.406456,-112.072943,4,Convenience Store,Park,Seafood Restaurant,Mexican Restaurant,Marijuana Dispensary,Gas Station,Pet Store,Rental Car Location,Discount Store,Chinese Restaurant,7,2
3,Central Avenue Corridor,33.514776,-112.073993,3,Pizza Place,Coffee Shop,Ice Cream Shop,Breakfast Spot,American Restaurant,Gastropub,Bar,Mexican Restaurant,Taco Place,New American Restaurant,6,3
1,Biltmore Area,33.510792,-112.027878,3,Clothing Store,American Restaurant,Coffee Shop,Hotel,New American Restaurant,Burger Joint,Bank,Cosmetics Shop,Breakfast Spot,Pizza Place,5,4
8,"Moon Valley, Phoenix",33.616529,-112.068815,3,Pizza Place,Bar,Bank,Video Store,Italian Restaurant,Trail,Burger Joint,Flower Shop,Shipping Store,Farm,5,5
0,Arcadia (Phoenix),33.499526,-111.958119,0,Hotel Pool,Resort,Gym / Fitness Center,Park,American Restaurant,Hotel,Intersection,Canal,Lounge,Latin American Restaurant,4,6
12,F. Q. Story,33.459864,-112.086958,1,Art Gallery,Mexican Restaurant,Coffee Shop,American Restaurant,Hotel,Theater,Pizza Place,Breakfast Spot,Burger Joint,Intersection,4,7
7,"Maryvale, Phoenix",33.492843,-112.223207,4,Fast Food Restaurant,Convenience Store,Pizza Place,Park,Mexican Restaurant,Fried Chicken Joint,Bowling Alley,Discount Store,Ice Cream Shop,Insurance Office,4,8
10,"Sacred Heart Church (Phoenix, Arizona)",33.448437,-112.074142,3,Coffee Shop,Pizza Place,Hotel,Music Venue,American Restaurant,Bar,Lounge,Breakfast Spot,Salon / Barbershop,Theater,4,9
13,Woodlea Historic District,33.497139,-112.087684,1,Mexican Restaurant,Coffee Shop,Thrift / Vintage Store,American Restaurant,Pizza Place,Thai Restaurant,Gym,Pharmacy,Burger Joint,Gay Bar,4,10


Based on these results, we can conclude that:

Cluster 0: Brentwood Historic District and Woodlea Historic District
There is an emphasis on Mexican Food, Cafes, and Gay Bars in these neighborhoods

Cluster 1: Central Avenue Corridor, Biltmore Area, F. Q. Story, Moon Valley, Phoenix, Sacred Heart Church (Phoenix, Arizona), Desert Ridge, and Downtown Phoenix
This is the largest cluster by far. These neighborhoods mostly have Pizza Places, American Restaurants, and Coffee Shops

Cluster 2: South Phoenix and Maryvale
This cluster has Mexican Food, Convenience Stores, and Fried Chicken Joints in common

Cluster 3: Arcadia (Phoenix)
This cluster only had one neighborhood. This neighborhood was very hotel/resort focused, but had parks as well

Cluster 4: North/Northwest Phoenix
This cluster also had one neighborhood. It had an hardware store, shipping store, and Construct & Landscaping services in its top 10, which means this area is better for do-it-yourselfers. It also had the lowest amount of venues that ACME Realty desired

Finally, let's map the data to show where the neighborhood clusters are in relation to one another

In [19]:
map_phoenix_clusters = folium.Map(location=[phoenix_coords.latitude, phoenix_coords.longitude], zoom_start=10)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.prism(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(phoenix_merged['Latitude'], phoenix_merged['Longitude'], phoenix_merged['Neighborhood'], phoenix_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_phoenix_clusters)
       
map_phoenix_clusters