<h1 align=center><font size = 8>Data Science Capstone Project</font></h1>
<h1 align=center><font size = 8>A Comparison of Dallas and Fort Worth</font></h1>

## Introduction
<blockquote>It is often said that Dallas is where the east ends and Fort Worth is where the west begins.  Certainly, it is true that when you think of Fort Worth, you think of pickup trucks and cowboy boots and Billy Bob's at The Stockyards and when you think of Dallas, you think of BMW's and banking and Highland Park.  The question for someone who's company has recently moved to the DFW area, though, is:  is there a difference in the neighborhoods in Fort Worth and Dallas?</blockquote>
<blockquote>This project will use Zillow nighborhood data to see if there are any systematic differences in Dallas vs. Fort Worth nighborhoods using k-means clustering.</blockquote>

We will start by loading the libraries that we need for this project.

In [1]:
try:
    import pandas as pd
    import numpy as np
except:
    !conda install -c anaconda pandas numpy --yes 
    import pandas as pd
    import numpy as np
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)
try:
    from geopy.geocoders import Nominatim # to convert address into latitude and longitude
except:
    !conda install -c conda-forge geopy --yes 
    from geopy.geocoders import Nominatim
try:
    import folium # plotting library
except:
    !conda install -c conda-forge folium --yes 
    import folium
from bs4 import BeautifulSoup
import urllib.request
import json # library to handle JSON files
import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe
# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors
import geojson
!conda activate geo1
from shapely.geometry import Point, Polygon

In [2]:
!conda activate geo1
from shapely.geometry import Point, Polygon

## Data
The geodata used for this project is the Zillow US Neighborhoods dataset.  Zillow makes this data available for free under a creative commons licence.  It can be downloaded at "https://data.opendatasoft.com/explore/dataset/zillow-neighborhoods%40public/export/".
<br>
<br>
We used the export filters and downloaded one geojson file for Fort Worth and a second geojson file for Dallas.  The files contain full geoshape data and has "properties" sections that that contain the city, neighborhood name, regionid, geo_point_2d, county and state.  We parse the files to just load the properties section and append it to a dataframe.  We then split the geo_data_2d entry into Latitude and Longitude columns, dropping any rows where geo_data_2d is NA.


In [3]:
with open('e:/downloads/zillow-neighborhoods-FortWorth.geojson') as json_data:
    fortworth_neighborhood_data = geojson.load(json_data)
with open('e:/downloads/zillow-neighborhoods-Dallas.geojson') as json_data:
    dallas_neighborhood_data = geojson.load(json_data)
print(type(fortworth_neighborhood_data), type(dallas_neighborhood_data))

<class 'geojson.feature.FeatureCollection'> <class 'geojson.feature.FeatureCollection'>


In [4]:
print(type(fortworth_neighborhood_data[0]['geometry']['coordinates'][0]))
print(fortworth_neighborhood_data[0]['geometry']['coordinates'][0])
print(fortworth_neighborhood_data['features'][0])

<class 'list'>
[[-97.428271, 32.645946], [-97.428288, 32.645946], [-97.428367, 32.645944], [-97.428685, 32.645933], [-97.429647, 32.645984], [-97.4302, 32.646044], [-97.430962, 32.646061], [-97.432542, 32.646125], [-97.43244, 32.648147], [-97.428821, 32.648459], [-97.428535, 32.647886], [-97.428396, 32.647557], [-97.428318, 32.647253], [-97.428249, 32.646923], [-97.42824, 32.646541], [-97.428271, 32.645946]]
{"geometry": {"coordinates": [[[-97.428271, 32.645946], [-97.428288, 32.645946], [-97.428367, 32.645944], [-97.428685, 32.645933], [-97.429647, 32.645984], [-97.4302, 32.646044], [-97.430962, 32.646061], [-97.432542, 32.646125], [-97.43244, 32.648147], [-97.428821, 32.648459], [-97.428535, 32.647886], [-97.428396, 32.647557], [-97.428318, 32.647253], [-97.428249, 32.646923], [-97.42824, 32.646541], [-97.428271, 32.645946]]], "type": "Polygon"}, "properties": {"city": "Fort Worth", "county": "Tarrant", "geo_point_2d": [32.64714146486756, -97.43035652827258], "name": "Briercliff", "r

In [5]:
fortworth_neighborhood_data_slice = fortworth_neighborhood_data['features']
df = pd.DataFrame(columns=['city','name','regionid','geo_point_2d','county','state'])
df_coordinates = pd.DataFrame(columns=['coordinates'])
for entry in fortworth_neighborhood_data_slice:
    df = df.append(entry['properties'],ignore_index=True)
    df_coordinates = df_coordinates.append(entry['geometry'],ignore_index=True)
dallas_neighborhood_data_slice = dallas_neighborhood_data['features']
for entry in dallas_neighborhood_data_slice:
    df = df.append(entry['properties'],ignore_index=True)
    df_coordinates = df_coordinates.append(entry['geometry'],ignore_index=True)
df = df.rename(columns={'name': 'Neighborhood'})
df['coordinates']=df_coordinates['coordinates']
df.head()

Unnamed: 0,city,Neighborhood,regionid,geo_point_2d,county,state,coordinates
0,Fort Worth,Briercliff,422763,"[32.64714146486756, -97.43035652827258]",Tarrant,TX,"[[[-97.428271, 32.645946], [-97.428288, 32.645..."
1,Fort Worth,Fairmount,233172,"[32.724372978687654, -97.33766189387364]",Tarrant,TX,"[[[-97.34377, 32.718042], [-97.34377, 32.71804..."
2,Fort Worth,Willow Creek,207722,"[32.63110091743264, -97.3434603767844]",Tarrant,TX,"[[[-97.348532, 32.634888], [-97.34777, 32.6348..."
3,Fort Worth,Shaw Clarke,422871,"[32.70111450603923, -97.33719612060821]",Tarrant,TX,"[[[-97.33749, 32.70601], [-97.33749, 32.705724..."
4,Fort Worth,Carver Heights,422820,"[32.72505076405124, -97.22932078449055]",Tarrant,TX,"[[[-97.228473, 32.725718], [-97.228222, 32.725..."


Split up geo_point_2d into Latitude and Longitude

In [6]:
print(df.shape)
df = df.dropna(subset=['geo_point_2d'])
print(df.shape)
df2 = pd.DataFrame(df["geo_point_2d"].tolist(), columns=['Latitude','Longitude']) #pd.DataFrame(df['geo_point_2d'].tolist(),index=df.index)
df = pd.concat([df, df2], axis=1)
df.drop(['geo_point_2d','regionid'],axis=1,inplace=True)
df.head()

(425, 7)
(425, 7)


Unnamed: 0,city,Neighborhood,county,state,coordinates,Latitude,Longitude
0,Fort Worth,Briercliff,Tarrant,TX,"[[[-97.428271, 32.645946], [-97.428288, 32.645...",32.647141,-97.430357
1,Fort Worth,Fairmount,Tarrant,TX,"[[[-97.34377, 32.718042], [-97.34377, 32.71804...",32.724373,-97.337662
2,Fort Worth,Willow Creek,Tarrant,TX,"[[[-97.348532, 32.634888], [-97.34777, 32.6348...",32.631101,-97.34346
3,Fort Worth,Shaw Clarke,Tarrant,TX,"[[[-97.33749, 32.70601], [-97.33749, 32.705724...",32.701115,-97.337196
4,Fort Worth,Carver Heights,Tarrant,TX,"[[[-97.228473, 32.725718], [-97.228222, 32.725...",32.725051,-97.229321


Extract Dallas-Fort Worth

In [7]:
#dfw = df[df['city'].isin(['Dallas', 'Fort Worth'])]
dfw = df
dfw.sort_values(['city','Neighborhood'], inplace=True, ignore_index=True)
print(dfw.shape)
dfw.groupby("city").count()

(425, 7)


Unnamed: 0_level_0,Neighborhood,county,state,coordinates,Latitude,Longitude
city,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Carrollton,21,21,21,21,21,21
Dallas,34,34,34,34,34,34
Desoto,6,6,6,6,6,6
Fort Worth,225,225,225,225,225,225
Garland,85,85,85,85,85,85
Irving,18,18,18,18,18,18
Mesquite,4,4,4,4,4,4
Plano,1,1,1,1,1,1
Richardson,31,31,31,31,31,31


# Methodology
##  Visual Inspection:  Create a map of DFW Neighborhoods

In [8]:
# create map - use Arlington for map lat/long
address = 'Arlington, TX'

geolocator = Nominatim(user_agent="ca_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Arlington are {}, {}.'.format(latitude, longitude))
# using point slightly north to get centering right
latitude =32.791825
longitude = -97.03

The geograpical coordinate of Arlington are 32.701938999999996, -97.10562379033699.


In [9]:
# create map of DFW using latitude and longitude values for Arlington
map_dfw = folium.Map(location=[latitude, longitude], zoom_start=10)

# add markers to map
for lat, lng, city, neighborhood in zip(dfw['Latitude'], dfw['Longitude'], dfw['city'], df['Neighborhood']):
    label = '{}, {}, {}, {}'.format(neighborhood, city, lat, lng)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_dfw)  
    
map_dfw

# Get Neighborhood Venues

In [10]:
CLIENT_ID = 'YPF4NY2VJMYEMROHWYBYVI11IZKQ0H4EDECKGJR0XDYRWE4M' # your Foursquare ID
CLIENT_SECRET = 'LI202HN0RHIHWIIQQPWSYIPUQE14DLGSNALI21B5XGV4E31Y' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version
LIMIT = 100 # limit of number of venues returned by Foursquare API
radius = 1200 #1600 approx 1 mile, 500 approx .3 mile

In [11]:
def getVenuesInPolygon(names, latitudes, longitudes, polygons):
    file_handle=open('coordinates.csv','w')
    radius=3200
    venues_list=[]
    for name, lat, lng, poly in zip(names, latitudes, longitudes, polygons):
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
        #try:    
        # make the GET request
        #results = requests.get(url).json()["response"]['groups'][0]['items']
        results = requests.get(url).json()["response"]
        if len(results)>0:
            results = results['groups'][0]['items']
            while len(poly)>0 and len(poly)<3:
                poly = poly[0]
            try:
                coordinates = Polygon(poly)
            except:
                try:
                    coordinates = Polygon(poly[0])
                except:
                    try:
                        coordinates = Polygon(poly[0][0])
                    except:
                        print(name," Invalid coordinates: length: ",len(poly),' coordinates: ', poly)
                        file_handle.write(name," Invalid coordinates: length: ",len(poly),' coordinates: ')
                        file_handle.write(poly)            
            # return only relevant information for each nearby venue
            #print(results)
            venue_count = 0
            for v in results:
                pt = Point([v['venue']['location']['lng'],v['venue']['location']['lat']])
                #if (coordinates.length>0 and pt.within(coordinates)) or (coordinates.length==0 and v['venue']['location']['distance']<800):
                if (coordinates.length>0 and pt.within(coordinates)) or v['venue']['location']['distance']<800:
                    venue_count = venue_count + 1
                    #print(pt,' inside ',coordinates)
                    venues_list.append([name, lat, lng, v['venue']['name'], v['venue']['location']['lat'], v['venue']['location']['lng'],  
                        v['venue']['categories'][0]['name']])
                #else:
                #    print(pt,' outside ',coordinates)
            print(name, lat, lng, ' venue count ', venue_count)            
            if venue_count==0:
                print('\tno venues')
                venues_list.append([
                    name, 
                lat, 
                    lng, 
                    'Residential', 
                    lat, 
                    lng,  
                    'Residential'
                    ])
            
        else:
            print(name, lat, lng,' No groups in results')
            venues_list.append([
                    name, 
                    lat, 
                    lng, 
                    'Residential', 
                    lat, 
                    lng,  
                    'Residential'
                    ])
                  
        #except:
        #    print('\tProblem getting info')
    #print('venues_list:  ',venues_list)  
    nearby_venues = pd.DataFrame(venues_list, columns=['Neighborhood', 'Neighborhood Latitude', 'Neighborhood Longitude', 'Venue', 'Venue Latitude', 
        'Venue Longitude', 'Venue Category']) 
    #nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    #nearby_venues = pd.DataFrame(columns = ['Neighborhood', 
    #              'Neighborhood Latitude', 
    #              'Neighborhood Longitude', 
    #              'Venue', 
    #              'Venue Latitude', 
    #              'Venue Longitude', 
    #              'Venue Category'])
    file_handle.close()
    return(nearby_venues)

In [12]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
        try:    
            # make the GET request
            results = requests.get(url).json()["response"]['groups'][0]['items']
            
            # return only relevant information for each nearby venue
            venues_list.append([(
                name, 
                lat, 
                lng, 
                v['venue']['name'], 
                v['venue']['location']['lat'], 
                v['venue']['location']['lng'],  
                v['venue']['categories'][0]['name']) for v in results])
        except:
            print('\tProblem getting info')
    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [13]:
#dfw_venues = getNearbyVenues(names=dfw['Neighborhood'],latitudes=dfw['Latitude'],longitudes=dfw['Longitude'],radius)
#print(dfw['coordinates'].iloc[:5])
radius=3200
#dfw_venues = getVenuesInPolygon(names=dfw['Neighborhood'].iloc[:5],latitudes=dfw['Latitude'].iloc[:5],longitudes=dfw['Longitude'].iloc[:5],
#    polygons=dfw['coordinates'].iloc[:5])
dfw_venues = getVenuesInPolygon(names=dfw['Neighborhood'],latitudes=dfw['Latitude'],longitudes=dfw['Longitude'],polygons=dfw['coordinates'])
#print (dfw_venues.head())
print('There are {} unique categories.'.format(len(dfw_venues['Venue Category'].unique())))

Bel Air of Josey Ranch 32.969355914257726 -96.88168631227494  venue count  3
Cambridge Estates 32.988927644291394 -96.91617820375673  venue count  15
Carrollton Heights 32.956817736977484 -96.90127455361676  venue count  8
Carrollton Summertree 32.97278590625278 -96.86834792754553  venue count  1
Carrolton Highlands 32.951421190068444 -96.89745971865148  venue count  5
Hill'n Dale 32.9652324227805 -96.89314077922661  venue count  1
Jackson Arms 32.97657242364724 -96.89296428247056  venue count  6
Mcoy Estates 32.98105765762574 -96.89554645640186  venue count  4
Morningside 32.98840829897419 -96.87152296182454  venue count  1
Nob Hill 32.98963477794778 -96.89570633349281  venue count  6
Oak Tree North 32.98817469682512 -96.86008121492931  venue count  6
Park Terrace 32.96294828979307 -96.9049947817715  venue count  0
	no venues
Parks of Carrollton 32.97653456989962 -96.9022608953078  venue count  7
Parkside Estates 32.97768723189802 -96.87202308761685  venue count  1
Rohton Park 32.9500

In [14]:
dfw_venues.loc[dfw_venues['Venue Category']=='Zoo Exhibit']

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
1480,Berkeley Place,32.72111,-97.349059,The Texas Wild,32.717984,-97.353705,Zoo Exhibit
1482,Berkeley Place,32.72111,-97.349059,Penguins,32.719308,-97.354746,Zoo Exhibit
1806,Frisco Heights,32.711151,-97.354355,The Texas Wild,32.717984,-97.353705,Zoo Exhibit
2110,Park Hill,32.718465,-97.358118,The Texas Wild,32.717984,-97.353705,Zoo Exhibit
2363,University Place,32.713866,-97.358543,The Texas Wild,32.717984,-97.353705,Zoo Exhibit


In [15]:
# load and merge Venue Groups
venue_groups = pd.read_csv('VenueGroups.csv')
venue_groups.head()
if 'Venue Group' in dfw_venues.columns:
    dfw_venues.drop('Venue Group',axis=1, inplace=True)
dfw_venues = dfw_venues.join(venue_groups.set_index('Venue Category'), on='Venue Category')
dfw_venues.head()


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category,Venue Group
0,Bel Air of Josey Ranch,32.969356,-96.881686,Braum's Ice Cream & Dairy Store,32.972006,-96.88948,Ice Cream Shop,Fast-Food Venue
1,Bel Air of Josey Ranch,32.969356,-96.881686,The Home Depot,32.973511,-96.886881,Hardware Store,Home Shop
2,Bel Air of Josey Ranch,32.969356,-96.881686,Pizza Hut,32.97159,-96.877311,Pizza Place,Fast-Food Venue
3,Cambridge Estates,32.988928,-96.916178,Super H-Mart,32.98499,-96.911966,Supermarket,Grocery Store
4,Cambridge Estates,32.988928,-96.916178,Chick-fil-A,32.98617,-96.909283,Fast Food Restaurant,Fast-Food Venue


In [16]:
dfw_venues[dfw_venues['Venue Category']=='Tea Room']

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category,Venue Group


In [17]:
# make sure all venue categories have been assigned to venue groups
fw_venues[dfw_venues['Venue Group']=='']

NameError: name 'fw_venues' is not defined

In [18]:
print(dfw_venues.groupby('Neighborhood').count())

Neighborhood Latitude  \
Neighborhood                                               
Alamo Heights                                          9   
Alexandra Meadows                                      1   
Almeta,Bonita, Bella Vista                             1   
Altemesa East                                          2   
Apollo Arapaho & Camelot                               6   
Arapaho                                                7   
Arbor Creek                                            1   
Arcadia Park                                           5   
Arlington Heights                                     10   
Arts District                                         31   
Avondale                                              12   
Bal Harbour                                            1   
Basswood Park                                         11   
Basswood Village                                       1   
Bear Creek                                            35   
Beechwood Creek

# Analyze Neighborhoods

In [19]:
# one hot encoding
dfw_onehot = pd.get_dummies(dfw_venues[['Venue Category']], prefix="", prefix_sep="")
# There is Neighborhood in the venue categories, so we drop that column
try:
    del dfw_onehot['Neighborhood']
except:
    print('No venues called Neighborhood')
# add neighborhood column back to dataframe
dfw_onehot['Neighborhood'] = dfw_venues['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [dfw_onehot.columns[-1]] + list(dfw_onehot.columns[:-1])
dfw_onehot = dfw_onehot[fixed_columns]
dfw_onehot.head()

No venues called Neighborhood


Unnamed: 0,Neighborhood,Accessories Store,Adult Boutique,African Restaurant,Airport,Airport Service,Airport Terminal,American Restaurant,Antique Shop,Arcade,Art Gallery,Art Museum,Arts & Crafts Store,Asian Restaurant,Astrologer,Athletics & Sports,Auto Workshop,Automotive Shop,BBQ Joint,Bagel Shop,Bakery,Bank,Bar,Baseball Field,Basketball Court,Basketball Stadium,Beer Bar,Beer Garden,Big Box Store,Board Shop,Boat or Ferry,Bookstore,Botanical Garden,Boutique,Bowling Alley,Boxing Gym,Brazilian Restaurant,Breakfast Spot,Brewery,Bridal Shop,Bubble Tea Shop,Buffet,Building,Burger Joint,Bus Station,Business Service,Butcher,Café,Cajun / Creole Restaurant,Campground,Caribbean Restaurant,Cheese Shop,Chinese Restaurant,Chocolate Shop,Climbing Gym,Clothing Store,Cocktail Bar,Coffee Shop,College Bookstore,Comfort Food Restaurant,Comic Shop,Concert Hall,Construction & Landscaping,Convenience Store,Cosmetics Shop,Country Dance Club,Creperie,Cuban Restaurant,Cupcake Shop,Cycle Studio,Dance Studio,Deli / Bodega,Department Store,Dessert Shop,Diner,Disc Golf,Discount Store,Dive Bar,Dog Run,Donut Shop,Dry Cleaner,Electronics Store,Ethiopian Restaurant,Fabric Shop,Farmers Market,Fast Food Restaurant,Financial or Legal Service,Fish & Chips Shop,Flea Market,Flower Shop,Food,Food & Drink Shop,Food Service,Food Truck,Football Stadium,Frame Store,French Restaurant,Fried Chicken Joint,Frozen Yogurt Shop,Furniture / Home Store,Gaming Cafe,Garden,Garden Center,Gas Station,Gastropub,Gay Bar,General Entertainment,German Restaurant,Gift Shop,Golf Course,Golf Driving Range,Gourmet Shop,Greek Restaurant,Grocery Store,Gun Range,Gym,Gym / Fitness Center,Halal Restaurant,Harbor / Marina,Hardware Store,Health & Beauty Service,Health Food Store,Himalayan Restaurant,History Museum,Hobby Shop,Hockey Rink,Hookah Bar,Hot Dog Joint,Hotel,Hotel Bar,IT Services,Ice Cream Shop,Indian Restaurant,Indie Movie Theater,Internet Cafe,Irish Pub,Italian Restaurant,Japanese Restaurant,Jewelry Store,Juice Bar,Kitchen Supply Store,Korean Restaurant,Lake,Latin American Restaurant,Leather Goods Store,Light Rail Station,Lingerie Store,Liquor Store,Lounge,Market,Massage Studio,Medical Center,Mediterranean Restaurant,Men's Store,Mexican Restaurant,Middle Eastern Restaurant,Miscellaneous Shop,Mobile Phone Shop,Modern European Restaurant,Mongolian Restaurant,Monument / Landmark,Motorcycle Shop,Motorsports Shop,Movie Theater,Moving Target,Multiplex,Museum,Music Store,Music Venue,Nail Salon,Nature Preserve,New American Restaurant,Nightclub,Noodle House,Office,Opera House,Optical Shop,Organic Grocery,Outdoor Supply Store,Outdoors & Recreation,Paper / Office Supplies Store,Park,Performing Arts Venue,Peruvian Restaurant,Pet Service,Pet Store,Pharmacy,Pie Shop,Pizza Place,Playground,Plaza,Pool,Pool Hall,Pub,Public Art,Ramen Restaurant,Record Shop,Recreation Center,Rental Car Location,Rental Service,Residential,Residential Building (Apartment / Condo),Resort,Restaurant,River,Rock Club,Salad Place,Salon / Barbershop,Salvadoran Restaurant,Sandwich Place,Scenic Lookout,Science Museum,Seafood Restaurant,Shipping Store,Shoe Store,Shopping Mall,Shopping Plaza,Skating Rink,Smoke Shop,Smoothie Shop,Soccer Field,South American Restaurant,Southern / Soul Food Restaurant,Spa,Sporting Goods Shop,Sports Bar,Steakhouse,Storage Facility,Supermarket,Supplement Shop,Sushi Restaurant,Taco Place,Tanning Salon,Tapas Restaurant,Tennis Court,Tex-Mex Restaurant,Thai Restaurant,Theater,Theme Restaurant,Thrift / Vintage Store,Toy / Game Store,Trail,Truck Stop,Turkish Restaurant,Vegetarian / Vegan Restaurant,Video Game Store,Video Store,Vietnamese Restaurant,Warehouse Store,Weight Loss Center,Wine Bar,Wine Shop,Wings Joint,Women's Store,Yoga Studio,Zoo,Zoo Exhibit
0,Bel Air of Josey Ranch,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,Bel Air of Josey Ranch,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,Bel Air of Josey Ranch,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,Cambridge Estates,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,Cambridge Estates,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


In [20]:
dfw_grouped = dfw_onehot.groupby('Neighborhood').mean().reset_index()
dfw_grouped.head()

Unnamed: 0,Neighborhood,Accessories Store,Adult Boutique,African Restaurant,Airport,Airport Service,Airport Terminal,American Restaurant,Antique Shop,Arcade,Art Gallery,Art Museum,Arts & Crafts Store,Asian Restaurant,Astrologer,Athletics & Sports,Auto Workshop,Automotive Shop,BBQ Joint,Bagel Shop,Bakery,Bank,Bar,Baseball Field,Basketball Court,Basketball Stadium,Beer Bar,Beer Garden,Big Box Store,Board Shop,Boat or Ferry,Bookstore,Botanical Garden,Boutique,Bowling Alley,Boxing Gym,Brazilian Restaurant,Breakfast Spot,Brewery,Bridal Shop,Bubble Tea Shop,Buffet,Building,Burger Joint,Bus Station,Business Service,Butcher,Café,Cajun / Creole Restaurant,Campground,Caribbean Restaurant,Cheese Shop,Chinese Restaurant,Chocolate Shop,Climbing Gym,Clothing Store,Cocktail Bar,Coffee Shop,College Bookstore,Comfort Food Restaurant,Comic Shop,Concert Hall,Construction & Landscaping,Convenience Store,Cosmetics Shop,Country Dance Club,Creperie,Cuban Restaurant,Cupcake Shop,Cycle Studio,Dance Studio,Deli / Bodega,Department Store,Dessert Shop,Diner,Disc Golf,Discount Store,Dive Bar,Dog Run,Donut Shop,Dry Cleaner,Electronics Store,Ethiopian Restaurant,Fabric Shop,Farmers Market,Fast Food Restaurant,Financial or Legal Service,Fish & Chips Shop,Flea Market,Flower Shop,Food,Food & Drink Shop,Food Service,Food Truck,Football Stadium,Frame Store,French Restaurant,Fried Chicken Joint,Frozen Yogurt Shop,Furniture / Home Store,Gaming Cafe,Garden,Garden Center,Gas Station,Gastropub,Gay Bar,General Entertainment,German Restaurant,Gift Shop,Golf Course,Golf Driving Range,Gourmet Shop,Greek Restaurant,Grocery Store,Gun Range,Gym,Gym / Fitness Center,Halal Restaurant,Harbor / Marina,Hardware Store,Health & Beauty Service,Health Food Store,Himalayan Restaurant,History Museum,Hobby Shop,Hockey Rink,Hookah Bar,Hot Dog Joint,Hotel,Hotel Bar,IT Services,Ice Cream Shop,Indian Restaurant,Indie Movie Theater,Internet Cafe,Irish Pub,Italian Restaurant,Japanese Restaurant,Jewelry Store,Juice Bar,Kitchen Supply Store,Korean Restaurant,Lake,Latin American Restaurant,Leather Goods Store,Light Rail Station,Lingerie Store,Liquor Store,Lounge,Market,Massage Studio,Medical Center,Mediterranean Restaurant,Men's Store,Mexican Restaurant,Middle Eastern Restaurant,Miscellaneous Shop,Mobile Phone Shop,Modern European Restaurant,Mongolian Restaurant,Monument / Landmark,Motorcycle Shop,Motorsports Shop,Movie Theater,Moving Target,Multiplex,Museum,Music Store,Music Venue,Nail Salon,Nature Preserve,New American Restaurant,Nightclub,Noodle House,Office,Opera House,Optical Shop,Organic Grocery,Outdoor Supply Store,Outdoors & Recreation,Paper / Office Supplies Store,Park,Performing Arts Venue,Peruvian Restaurant,Pet Service,Pet Store,Pharmacy,Pie Shop,Pizza Place,Playground,Plaza,Pool,Pool Hall,Pub,Public Art,Ramen Restaurant,Record Shop,Recreation Center,Rental Car Location,Rental Service,Residential,Residential Building (Apartment / Condo),Resort,Restaurant,River,Rock Club,Salad Place,Salon / Barbershop,Salvadoran Restaurant,Sandwich Place,Scenic Lookout,Science Museum,Seafood Restaurant,Shipping Store,Shoe Store,Shopping Mall,Shopping Plaza,Skating Rink,Smoke Shop,Smoothie Shop,Soccer Field,South American Restaurant,Southern / Soul Food Restaurant,Spa,Sporting Goods Shop,Sports Bar,Steakhouse,Storage Facility,Supermarket,Supplement Shop,Sushi Restaurant,Taco Place,Tanning Salon,Tapas Restaurant,Tennis Court,Tex-Mex Restaurant,Thai Restaurant,Theater,Theme Restaurant,Thrift / Vintage Store,Toy / Game Store,Trail,Truck Stop,Turkish Restaurant,Vegetarian / Vegan Restaurant,Video Game Store,Video Store,Vietnamese Restaurant,Warehouse Store,Weight Loss Center,Wine Bar,Wine Shop,Wings Joint,Women's Store,Yoga Studio,Zoo,Zoo Exhibit
0,Alamo Heights,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.111111,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.111111,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.111111,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.111111,0.111111,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.111111,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.111111,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.111111,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.111111,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,Alexandra Meadows,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,"Almeta,Bonita, Bella Vista",0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,Altemesa East,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,Apollo Arapaho & Camelot,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.166667,0.0,0.0,0.0,0.0,0.0,0.166667,0.0,0.0,0.0,0.0,0.166667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.166667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.166667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.166667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


# Group most common venues in each neighborhood

In [21]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
top_venue_columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        top_venue_columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        top_venue_columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=top_venue_columns)
neighborhoods_venues_sorted['Neighborhood'] = dfw_grouped['Neighborhood']

for ind in np.arange(dfw_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(dfw_grouped.iloc[ind, :], num_top_venues)
    for col in range(num_top_venues):
        if dfw_grouped.set_index('Neighborhood').loc[[neighborhoods_venues_sorted.iloc[ind, 0]],[neighborhoods_venues_sorted.iloc[ind, 1+col]]].iloc[0][0] == 0:
            neighborhoods_venues_sorted.iloc[ind, 1+col] = 'NA'
neighborhoods_venues_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Alamo Heights,Dessert Shop,Coffee Shop,Fast Food Restaurant,Grocery Store,Burger Joint,Sushi Restaurant,Diner,Vietnamese Restaurant,BBQ Joint,
1,Alexandra Meadows,Liquor Store,,,,,,,,,
2,"Almeta,Bonita, Bella Vista",American Restaurant,,,,,,,,,
3,Altemesa East,Cosmetics Shop,Discount Store,,,,,,,,
4,Apollo Arapaho & Camelot,Pizza Place,Coffee Shop,Chinese Restaurant,Video Store,Butcher,Sports Bar,,,,


In [22]:

dfw_grouped.set_index('Neighborhood').loc[['Alamo Heights'],['Grocery Store']]
neighborhoods_venues_sorted.iloc[0, :]
ind=0
col=0
print(neighborhoods_venues_sorted.iloc[ind, 1],neighborhoods_venues_sorted.iloc[ind, 2+col])
dfw_grouped.set_index('Neighborhood').loc[[neighborhoods_venues_sorted.iloc[ind, 0]],[neighborhoods_venues_sorted.iloc[ind, 1+col]]].iloc[0][0]

Dessert Shop Coffee Shop


0.1111111111111111

In [53]:
# import k-means from clustering stage
from sklearn.cluster import KMeans
# set number of clusters
kclusters = 8

dfw_grouped_clustering = dfw_grouped.drop('Neighborhood', axis=1)
if 'Cluster Labels' in neighborhoods_venues_sorted.columns:
    neighborhoods_venues_sorted.drop('Cluster Labels',axis=1, inplace=True)
# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(dfw_grouped_clustering)

# add clustering labels
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)
dfw_merged = dfw

# merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
dfw_merged = dfw_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='Neighborhood')

dfw_merged.head()

Unnamed: 0,city,Neighborhood,county,state,coordinates,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Carrollton,Bel Air of Josey Ranch,Dallas,TX,"[[[-96.884675, 32.967425], [-96.884703, 32.969...",32.969356,-96.881686,2,Fast-Food Venue,Home Shop,,,,,,,,
1,Carrollton,Cambridge Estates,Dallas,TX,"[[[-96.913294, 32.990296], [-96.910004, 32.990...",32.988928,-96.916178,2,Fast-Food Venue,Asian Cuisine,Grocery Store,Bubble Tea Shop,Seafood Restaurant,,,,,
2,Carrollton,Carrollton Heights,Dallas,TX,"[[[-96.906652, 32.955414], [-96.906635, 32.956...",32.956818,-96.901275,2,American Cuisine,Fast-Food Venue,Museum,Southern / Soul Food Restaurant,Drinking Establishment,,,,,
3,Carrollton,Carrollton Summertree,Dallas,TX,"[[[-96.864348, 32.97025], [-96.867828, 32.9702...",32.972786,-96.868348,7,Park,,,,,,,,,
4,Carrollton,Carrolton Highlands,Dallas,TX,"[[[-96.890407, 32.95367], [-96.890393, 32.9496...",32.951421,-96.89746,2,Fast-Food Venue,Athletic Venue,Museum,,,,,,,


## check for invalid cluster labels

In [54]:
idx = pd.to_numeric(dfw_merged['Cluster Labels'], errors='coerce').isna()
print('Number of neighborhoods with invalid Cluster Labels: ',len(dfw_merged[idx]), ' out of ',len(dfw_merged.index))
#neighborhoods_venues_sorted[idx]

dfw_merged[idx]

Number of neighborhoods with invalid Cluster Labels:  0  out of  425


Unnamed: 0,city,Neighborhood,county,state,coordinates,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue


## Map clusters--removing rows with invalid cluster labels and converting cluster labels to int before we map them

In [57]:

# first, we have to drop non-numeric cluster labels and change them to int
dfw_merged = dfw_merged[pd.to_numeric(dfw_merged['Cluster Labels'], errors='coerce').notnull()]
dfw_merged['Cluster Labels'] = dfw_merged['Cluster Labels'].astype('int')
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=10)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]
rainbow2=['#0000ff','#800080','#ff964f','#ff0000','#008000','#ffd700','#1996f3','#663300','#ffc0cb','#daa520'] 
rainbow_text=['blue','purple','orange','red','green','yellow','sky blue','brown','pink','gold']
# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(dfw_merged['Latitude'], dfw_merged['Longitude'], dfw_merged['Neighborhood'], dfw_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow2[cluster],
        fill=True,
        fill_color=rainbow2[cluster],
        fill_opacity=0.7).add_to(map_clusters)
title_html = '''
            <h3 align="center" style="font-size:20px"><b>DFW Neighborhoods by Venue Category</b></h3>
            '''
map_clusters.get_root().html.add_child(folium.Element(title_html))
map_clusters.save('DFW_by_VenueCategory.html') 
map_clusters

## Examine cluster composition

In [52]:
dfw_merged.groupby('Cluster Labels').count()

Unnamed: 0_level_0,city,Neighborhood,county,state,coordinates,Latitude,Longitude,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
Cluster Labels,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1
0,11,11,11,11,11,11,11,11,11,11,11,11,11,11,11,11,11
1,72,72,72,72,72,72,72,72,72,72,72,72,72,72,72,72,72
2,26,26,26,26,26,26,26,26,26,26,26,26,26,26,26,26,26
3,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6
4,245,245,245,245,245,245,245,245,245,245,245,245,245,245,245,245,245
5,9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,9
6,14,14,14,14,14,14,14,14,14,14,14,14,14,14,14,14,14
7,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10
8,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7
9,25,25,25,25,25,25,25,25,25,25,25,25,25,25,25,25,25


In [27]:
dfw_merged.loc[dfw_merged['Neighborhood']=='Wolf Creek']

Unnamed: 0,city,Neighborhood,county,state,coordinates,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
54,Dallas,Wolf Creek,Dallas,TX,"[[[-96.823164, 32.654073], [-96.823078, 32.641...",32.6584,-96.846397,4,Fast Food Restaurant,Fried Chicken Joint,Discount Store,BBQ Joint,Pharmacy,Mexican Restaurant,Big Box Store,Grocery Store,Smoothie Shop,Convenience Store


In [28]:
fst_venue_col = 7
dfw_merged.loc[dfw_merged['Cluster Labels'] == 0, dfw_merged.columns[[0,1] + list(range(fst_venue_col, dfw_merged.shape[1]))]].sort_values(['city','Neighborhood'])

Unnamed: 0,city,Neighborhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
63,Fort Worth,Altemesa East,0,Cosmetics Shop,Discount Store,,,,,,,,
101,Fort Worth,Cobblestone,0,Gas Station,Discount Store,,,,,,,,
107,Fort Worth,Crawford Farms,0,Discount Store,,,,,,,,,
115,Fort Worth,Eastern Hills,0,Discount Store,Lounge,,,,,,,,
116,Fort Worth,Eastland,0,Discount Store,Fried Chicken Joint,Pharmacy,,,,,,,
117,Fort Worth,Eastwood Pleasant Glade,0,Discount Store,Fried Chicken Joint,Burger Joint,,,,,,,
141,Fort Worth,Hamlet,0,Discount Store,,,,,,,,,
186,Fort Worth,North Beverly Hills,0,Discount Store,,,,,,,,,
193,Fort Worth,Oakridge Terrace,0,Discount Store,,,,,,,,,
232,Fort Worth,South Edgewood,0,Grocery Store,Discount Store,,,,,,,,


In [29]:
dfw_merged.loc[dfw_merged['Cluster Labels'] == 1, dfw_merged.columns[[0,1] + list(range(fst_venue_col, dfw_merged.shape[1]))]].sort_values(['city','Neighborhood'])

Unnamed: 0,city,Neighborhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
11,Carrollton,Park Terrace,1,Residential,,,,,,,,,
56,Desoto,Candel Meadow,1,Residential,,,,,,,,,
57,Desoto,Frost farms,1,Residential,,,,,,,,,
59,Desoto,Meadowbrook Estates,1,Residential,,,,,,,,,
66,Fort Worth,Bal Harbour,1,Residential,,,,,,,,,
79,Fort Worth,Briercliff,1,Residential,,,,,,,,,
80,Fort Worth,Brittany Place,1,Residential,,,,,,,,,
84,Fort Worth,Burton Hill Trinity Trails,1,Residential,,,,,,,,,
85,Fort Worth,Butler,1,Residential,,,,,,,,,
93,Fort Worth,Caville,1,Residential,,,,,,,,,


In [30]:
dfw_merged.loc[dfw_merged['Cluster Labels'] == 2, dfw_merged.columns[[0,1] + list(range(fst_venue_col, dfw_merged.shape[1]))]].sort_values(['city','Neighborhood'])

Unnamed: 0,city,Neighborhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
10,Carrollton,Oak Tree North,2,Pool,South American Restaurant,Wings Joint,Mexican Restaurant,Bar,Gym,,,,
14,Carrollton,Rohton Park,2,Mexican Restaurant,Caribbean Restaurant,Thai Restaurant,Restaurant,Greek Restaurant,Fast Food Restaurant,Dessert Shop,Vietnamese Restaurant,,
77,Fort Worth,Brentmoor,2,Mexican Restaurant,,,,,,,,,
145,Fort Worth,Harmony,2,Convenience Store,Storage Facility,Gym,Mexican Restaurant,,,,,,
149,Fort Worth,Heritage,2,Mexican Restaurant,Convenience Store,Deli / Bodega,Tanning Salon,Greek Restaurant,Grocery Store,Gym,Recreation Center,Chinese Restaurant,American Restaurant
150,Fort Worth,Heritage Glen,2,Convenience Store,Recreation Center,Mexican Restaurant,,,,,,,
156,Fort Worth,Hubbard Heights,2,Pharmacy,Mexican Restaurant,Tennis Court,,,,,,,
174,Fort Worth,Marine Park,2,Mexican Restaurant,,,,,,,,,
190,Fort Worth,North Side,2,Bar,Mexican Restaurant,Café,Fried Chicken Joint,American Restaurant,Nightclub,Shoe Store,,,
213,Fort Worth,Ridglea Hills,2,Mexican Restaurant,Golf Course,Bank,Liquor Store,Nail Salon,Department Store,Vietnamese Restaurant,Ice Cream Shop,,


In [31]:
dfw_merged.loc[dfw_merged['Cluster Labels'] == 3, dfw_merged.columns[[0,1] + list(range(fst_venue_col, dfw_merged.shape[1]))]].sort_values(['city','Neighborhood'])

Unnamed: 0,city,Neighborhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
69,Fort Worth,Beechwood Creek,3,Golf Course,,,,,,,,,
71,Fort Worth,Bently Village,3,Hardware Store,Golf Course,,,,,,,,
102,Fort Worth,Colonial Hills,3,Golf Course,,,,,,,,,
170,Fort Worth,Lost Creek,3,Golf Course,,,,,,,,,
179,Fort Worth,Mira Vista,3,Golf Course,,,,,,,,,
321,Garland,Greens,3,Golf Course,,,,,,,,,
324,Garland,Hills of Firewheel,3,Golf Course,,,,,,,,,
339,Garland,Oakridge,3,Golf Course,,,,,,,,,
350,Garland,Retreat at Firewheel,3,Golf Course,,,,,,,,,


In [32]:
dfw_merged.loc[dfw_merged['Cluster Labels'] == 4, dfw_merged.columns[[0,1] + list(range(6, dfw_merged.shape[1]))]].sort_values(['city','Neighborhood'])

Unnamed: 0,city,Neighborhood,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Carrollton,Bel Air of Josey Ranch,-96.881686,4,Hardware Store,Pizza Place,Ice Cream Shop,,,,,,,
1,Carrollton,Cambridge Estates,-96.916178,4,Korean Restaurant,Coffee Shop,Bakery,Supermarket,Bubble Tea Shop,Ice Cream Shop,Indian Restaurant,Fast Food Restaurant,Dessert Shop,Seafood Restaurant
2,Carrollton,Carrollton Heights,-96.901275,4,Southern / Soul Food Restaurant,History Museum,Donut Shop,Brewery,American Restaurant,Café,Diner,Burger Joint,,
3,Carrollton,Carrollton Summertree,-96.868348,4,Park,,,,,,,,,
4,Carrollton,Carrolton Highlands,-96.89746,4,Burger Joint,Greek Restaurant,History Museum,Recreation Center,Donut Shop,,,,,
5,Carrollton,Hill'n Dale,-96.893141,4,Bookstore,,,,,,,,,
6,Carrollton,Jackson Arms,-96.892964,4,Mexican Restaurant,Pizza Place,Sandwich Place,Ice Cream Shop,Hardware Store,Juice Bar,,,,
7,Carrollton,Mcoy Estates,-96.895546,4,Donut Shop,Trail,Sandwich Place,Coffee Shop,,,,,,
8,Carrollton,Morningside,-96.871523,4,Grocery Store,Ice Cream Shop,,,,,,,,
9,Carrollton,Nob Hill,-96.895706,4,Cosmetics Shop,Tex-Mex Restaurant,Trail,Cuban Restaurant,Sandwich Place,Supermarket,,,,


In [33]:
dfw_merged.loc[dfw_merged['Cluster Labels'] == 5, dfw_merged.columns[[0,1] + list(range(fst_venue_col, dfw_merged.shape[1]))]].sort_values(['city','Neighborhood'])

Unnamed: 0,city,Neighborhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
68,Fort Worth,Basswood Village,5,Convenience Store,,,,,,,,,
138,Fort Worth,Greenfield Acres,5,Convenience Store,Gym / Fitness Center,,,,,,,,
151,Fort Worth,Highland Hills,5,Convenience Store,Warehouse Store,,,,,,,,
172,Fort Worth,Marine Creek Hills,5,Convenience Store,,,,,,,,,
173,Fort Worth,Marine Creek Ranch,5,Convenience Store,,,,,,,,,
191,Fort Worth,Northbrook,5,Convenience Store,,,,,,,,,
203,Fort Worth,Parkview Hills,5,Convenience Store,Video Store,,,,,,,,
211,Fort Worth,Ridglea,5,Convenience Store,,,,,,,,,
280,Fort Worth,Westpoint,5,Convenience Store,,,,,,,,,
282,Fort Worth,White Lake Hills,5,Convenience Store,,,,,,,,,


## Analyze Neighborhoods by Venue Group

In [34]:
# one hot encoding
dfw_onehot = pd.get_dummies(dfw_venues[['Venue Group']], prefix="", prefix_sep="")
# There is Neighborhood in the venue categories, so we drop that column
try:
    del dfw_onehot['Neighborhood']
except:
    print('No venues called Neighborhood')
# add neighborhood column back to dataframe
dfw_onehot['Neighborhood'] = dfw_venues['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [dfw_onehot.columns[-1]] + list(dfw_onehot.columns[:-1])
dfw_onehot = dfw_onehot[fixed_columns]
dfw_grouped2 = dfw_onehot.groupby('Neighborhood').mean().reset_index()
neighborhoods_venues_sorted = pd.DataFrame(columns=top_venue_columns)
neighborhoods_venues_sorted['Neighborhood'] = dfw_grouped2['Neighborhood']

for ind in np.arange(dfw_grouped2.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(dfw_grouped2.iloc[ind, :], num_top_venues)
    for col in range(num_top_venues):
        if dfw_grouped2.set_index('Neighborhood').loc[[neighborhoods_venues_sorted.iloc[ind, 0]],[neighborhoods_venues_sorted.iloc[ind, 1+col]]].iloc[0][0] == 0:
            neighborhoods_venues_sorted.iloc[ind, 1+col] = 'NA'
# cluster analysis
dfw_grouped_clustering = dfw_grouped2.drop('Neighborhood', axis=1)
if 'Cluster Labels' in neighborhoods_venues_sorted.columns:
    neighborhoods_venues_sorted.drop('Cluster Labels',axis=1, inplace=True)
# run k-means clustering
kclusters2 = 10
kmeans = KMeans(n_clusters=kclusters2, random_state=0).fit(dfw_grouped_clustering)

# add clustering labels
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)
dfw_merged_bygroup = dfw

# merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
dfw_merged_bygroup = dfw_merged_bygroup.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='Neighborhood')

dfw_merged_bygroup.head() # check the last columns!
idx = pd.to_numeric(dfw_merged_bygroup['Cluster Labels'], errors='coerce').isna()
print('Number of neighborhoods with invalid Cluster Labels: ',len(dfw_merged_bygroup[idx]), ' out of ',len(dfw_merged_bygroup.index))
print(neighborhoods_venues_sorted[idx])



No venues called Neighborhood
Number of neighborhoods with invalid Cluster Labels:  0  out of  425
Empty DataFrame
Columns: [Cluster Labels, Neighborhood, 1st Most Common Venue, 2nd Most Common Venue, 3rd Most Common Venue, 4th Most Common Venue, 5th Most Common Venue, 6th Most Common Venue, 7th Most Common Venue, 8th Most Common Venue, 9th Most Common Venue, 10th Most Common Venue]
Index: []


In [35]:
# venue groups to html file
fname = "VenueGroups.html"
with  open(fname,"w") as file_handle:
    file_handle.write(venue_groups[['Venue Group', 'Venue Category']].sort_values(['Venue Group', 'Venue Category']).to_html())
#venue_groups.head()
#venue_groups[['Venue Group', 'Venue Category']].sort_values(['Venue Group', 'Venue Category']).head()

In [60]:
# Create the map
# first, we have to drop non-numeric cluster labels and change them to int
dfw_merged_bygroup = dfw_merged_bygroup[pd.to_numeric(dfw_merged_bygroup['Cluster Labels'], errors='coerce').notnull()]
dfw_merged_bygroup['Cluster Labels'] = dfw_merged_bygroup['Cluster Labels'].astype('int')
map_clusters2 = folium.Map(location=[latitude, longitude], zoom_start=10)

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(dfw_merged_bygroup['Latitude'], dfw_merged_bygroup['Longitude'], dfw_merged_bygroup['Neighborhood'], dfw_merged_bygroup['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow2[cluster],
        fill=True,
        fill_color=rainbow2[cluster],
        fill_opacity=0.7).add_to(map_clusters2)
title_html = '''
            <h3 align="center" style="font-size:20px"><b>DFW Neighborhoods by Venue Group - ''' + str(kclusters2) + ''' Clusters</b></h3>
            '''
map_clusters2.get_root().html.add_child(folium.Element(title_html))

map_clusters2.save('DFW_by_VenueGroup-' + str(kclusters2) + '.html')      
map_clusters2

## Examine cluster composition

In [37]:
dfw_merged_bygroup.groupby('Cluster Labels').count()

Unnamed: 0_level_0,city,Neighborhood,county,state,coordinates,Latitude,Longitude,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
Cluster Labels,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1
0,72,72,72,72,72,72,72,72,72,72,72,72,72,72,72,72,72
1,148,148,148,148,148,148,148,148,148,148,148,148,148,148,148,148,148
2,15,15,15,15,15,15,15,15,15,15,15,15,15,15,15,15,15
3,9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,9
4,17,17,17,17,17,17,17,17,17,17,17,17,17,17,17,17,17
5,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16
6,17,17,17,17,17,17,17,17,17,17,17,17,17,17,17,17,17
7,131,131,131,131,131,131,131,131,131,131,131,131,131,131,131,131,131


In [38]:
# write clusters to html files
for f in range(kclusters2):
    fname = "Cluster-" + str(f) + "-" + str(kclusters2) + ".html"
    fst_venue_col = 7
    cluster_df = dfw_merged_bygroup.loc[dfw_merged_bygroup['Cluster Labels'] == f, dfw_merged_bygroup.columns[[0,1] + list(range(fst_venue_col, dfw_merged_bygroup.shape[1]))]].sort_values(['city','Neighborhood'])
    with  open(fname,"w") as file_handle:
        file_handle.write(cluster_df.to_html())


In [39]:
dfw_merged_bygroup.loc[dfw_merged_bygroup['Cluster Labels'] == 0, dfw_merged_bygroup.columns[[0,1] + list(range(fst_venue_col, dfw_merged_bygroup.shape[1]))]].sort_values(['city','Neighborhood'])

Unnamed: 0,city,Neighborhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
11,Carrollton,Park Terrace,0,Residential,,,,,,,,,
56,Desoto,Candel Meadow,0,Residential,,,,,,,,,
57,Desoto,Frost farms,0,Residential,,,,,,,,,
59,Desoto,Meadowbrook Estates,0,Residential,,,,,,,,,
66,Fort Worth,Bal Harbour,0,Residential,,,,,,,,,
79,Fort Worth,Briercliff,0,Residential,,,,,,,,,
80,Fort Worth,Brittany Place,0,Residential,,,,,,,,,
84,Fort Worth,Burton Hill Trinity Trails,0,Residential,,,,,,,,,
85,Fort Worth,Butler,0,Residential,,,,,,,,,
93,Fort Worth,Caville,0,Residential,,,,,,,,,


In [40]:
dfw_merged_bygroup.loc[dfw_merged_bygroup['Cluster Labels'] == 1, dfw_merged_bygroup.columns[[0,1] + list(range(fst_venue_col, dfw_merged_bygroup.shape[1]))]].sort_values(['city','Neighborhood'])

Unnamed: 0,city,Neighborhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
2,Carrollton,Carrollton Heights,1,American Cuisine,Fast-Food Venue,Museum,Southern / Soul Food Restaurant,Drinking Establishment,,,,,
5,Carrollton,Hill'n Dale,1,Bookstore,,,,,,,,,
9,Carrollton,Nob Hill,1,Shopping Venues,Grocery Store,Fast-Food Venue,Mexican Cuisine,Outdoor Destination,,,,,
10,Carrollton,Oak Tree North,1,Athletic Venue,Mexican Cuisine,Gym,Fast-Food Venue,Drinking Establishment,,,,,
12,Carrollton,Parks of Carrollton,1,Bookstore,Fast-Food Venue,Grocery Store,Asian Cuisine,Bubble Tea Shop,,,,,
14,Carrollton,Rohton Park,1,Fast-Food Venue,Asian Cuisine,Mexican Cuisine,American Cuisine,Caribbean Restaurant,,,,,
17,Carrollton,Trinity Mills,1,Fast-Food Venue,Asian Cuisine,Shopping Venues,Outdoor Destination,Grocery Store,Bookstore,Health Food Store,Video Game Store,Financial Services,Mexican Cuisine
18,Carrollton,Trinity Mills,1,Fast-Food Venue,Asian Cuisine,Shopping Venues,Outdoor Destination,Grocery Store,Bookstore,Health Food Store,Video Game Store,Financial Services,Mexican Cuisine
19,Carrollton,Whitlock Warriors,1,Pet Service,Garden Center,,,,,,,,
21,Dallas,Arts District,1,American Cuisine,Arts/Entertainment,Fast-Food Venue,Museum,Grocery Store,Mexican Cuisine,Seafood Restaurant,Lodging,Park,Yoga Studio


In [41]:
dfw_merged_bygroup.loc[dfw_merged_bygroup['Cluster Labels'] == 2, dfw_merged_bygroup.columns[[0,1] + list(range(fst_venue_col, dfw_merged_bygroup.shape[1]))]].sort_values(['city','Neighborhood'])

Unnamed: 0,city,Neighborhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
68,Fort Worth,Basswood Village,2,Convenience Store,,,,,,,,,
138,Fort Worth,Greenfield Acres,2,Convenience Store,Gym,,,,,,,,
151,Fort Worth,Highland Hills,2,Convenience Store,,,,,,,,,
172,Fort Worth,Marine Creek Hills,2,Convenience Store,,,,,,,,,
173,Fort Worth,Marine Creek Ranch,2,Convenience Store,,,,,,,,,
191,Fort Worth,Northbrook,2,Convenience Store,,,,,,,,,
203,Fort Worth,Parkview Hills,2,Convenience Store,Video Store,,,,,,,,
211,Fort Worth,Ridglea,2,Convenience Store,,,,,,,,,
280,Fort Worth,Westpoint,2,Convenience Store,,,,,,,,,
282,Fort Worth,White Lake Hills,2,Convenience Store,,,,,,,,,


In [42]:
dfw_merged_bygroup.loc[dfw_merged_bygroup['Cluster Labels'] == 3, dfw_merged_bygroup.columns[[0,1] + list(range(fst_venue_col, dfw_merged_bygroup.shape[1]))]].sort_values(['city','Neighborhood'])

Unnamed: 0,city,Neighborhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
69,Fort Worth,Beechwood Creek,3,Golf Course,,,,,,,,,
71,Fort Worth,Bently Village,3,Home Shop,Golf Course,,,,,,,,
102,Fort Worth,Colonial Hills,3,Golf Course,,,,,,,,,
170,Fort Worth,Lost Creek,3,Golf Course,,,,,,,,,
179,Fort Worth,Mira Vista,3,Golf Course,,,,,,,,,
321,Garland,Greens,3,Golf Course,,,,,,,,,
324,Garland,Hills of Firewheel,3,Golf Course,,,,,,,,,
339,Garland,Oakridge,3,Golf Course,,,,,,,,,
350,Garland,Retreat at Firewheel,3,Golf Course,,,,,,,,,


In [43]:
dfw_merged_bygroup.loc[dfw_merged_bygroup['Cluster Labels'] == 4, dfw_merged_bygroup.columns[[0,1] + list(range(fst_venue_col, dfw_merged_bygroup.shape[1]))]].sort_values(['city','Neighborhood'])

Unnamed: 0,city,Neighborhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
63,Fort Worth,Altemesa East,4,Shopping Venues,,,,,,,,,
101,Fort Worth,Cobblestone,4,Shopping Venues,Gas Station,,,,,,,,
107,Fort Worth,Crawford Farms,4,Shopping Venues,,,,,,,,,
115,Fort Worth,Eastern Hills,4,Shopping Venues,Drinking Establishment,,,,,,,,
116,Fort Worth,Eastland,4,Shopping Venues,Fast-Food Venue,Health Services,,,,,,,
125,Fort Worth,Falcon Ridge,4,Shopping Venues,Convenience Store,Grocery Store,,,,,,,
141,Fort Worth,Hamlet,4,Shopping Venues,,,,,,,,,
186,Fort Worth,North Beverly Hills,4,Shopping Venues,,,,,,,,,
193,Fort Worth,Oakridge Terrace,4,Shopping Venues,,,,,,,,,
208,Fort Worth,Quail Run,4,Shopping Venues,,,,,,,,,


In [44]:
dfw_merged_bygroup.loc[dfw_merged_bygroup['Cluster Labels'] == 5, dfw_merged_bygroup.columns[[0,1] + list(range(fst_venue_col, dfw_merged_bygroup.shape[1]))]].sort_values(['city','Neighborhood'])

Unnamed: 0,city,Neighborhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
76,Fort Worth,Bonnie Brae,5,Asian Cuisine,,,,,,,,,
77,Fort Worth,Brentmoor,5,Mexican Cuisine,,,,,,,,,
174,Fort Worth,Marine Park,5,Mexican Cuisine,,,,,,,,,
184,Fort Worth,Morningside Park,5,Asian Cuisine,,,,,,,,,
216,Fort Worth,River Trails,5,Asian Cuisine,,,,,,,,,
230,Fort Worth,Shaw Clarke,5,Mexican Cuisine,,,,,,,,,
273,Fort Worth,West Byers,5,Asian Cuisine,Mexican Cuisine,Gym,Fast-Food Venue,Drinking Establishment,,,,,
298,Garland,Charleston Commons,5,Asian Cuisine,,,,,,,,,
300,Garland,Coomer Creek,5,Asian Cuisine,Mexican Cuisine,,,,,,,,
317,Garland,Forest Crest,5,Asian Cuisine,Shopping Venue,Convenience Store,Grocery Store,Fast-Food Venue,,,,,


In [45]:
dfw_merged_bygroup.loc[dfw_merged_bygroup['Cluster Labels'] == 6, dfw_merged_bygroup.columns[[0,1] + list(range(fst_venue_col, dfw_merged_bygroup.shape[1]))]].sort_values(['city','Neighborhood'])

Unnamed: 0,city,Neighborhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
3,Carrollton,Carrollton Summertree,6,Park,,,,,,,,,
13,Carrollton,Parkside Estates,6,Park,,,,,,,,,
20,Carrollton,Woodcreek,6,Park,,,,,,,,,
146,Fort Worth,Harriet Creek Ranch,6,Park,,,,,,,,,
171,Fort Worth,Marine Creek,6,Athletic Venue,,,,,,,,,
227,Fort Worth,Sendera Ranch,6,Health Services,Park,,,,,,,,
228,Fort Worth,Sendera Ranch,6,Health Services,Park,,,,,,,,
229,Fort Worth,Sendera Ranch,6,Health Services,Park,,,,,,,,
249,Fort Worth,Sunset Terrace,6,Park,,,,,,,,,
297,Garland,Chandler Heights,6,Athletic Venue,Grocery Store,Park,,,,,,,


In [46]:
dfw_merged_bygroup.loc[dfw_merged_bygroup['Cluster Labels'] == 7, dfw_merged_bygroup.columns[[0,1] + list(range(fst_venue_col, dfw_merged_bygroup.shape[1]))]].sort_values(['city','Neighborhood'])

Unnamed: 0,city,Neighborhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Carrollton,Bel Air of Josey Ranch,7,Fast-Food Venue,Home Shop,,,,,,,,
1,Carrollton,Cambridge Estates,7,Fast-Food Venue,Asian Cuisine,Grocery Store,Bubble Tea Shop,Seafood Restaurant,,,,,
4,Carrollton,Carrolton Highlands,7,Fast-Food Venue,Athletic Venue,Museum,,,,,,,
6,Carrollton,Jackson Arms,7,Fast-Food Venue,Home Shop,Mexican Cuisine,,,,,,,
7,Carrollton,Mcoy Estates,7,Fast-Food Venue,Outdoor Destination,,,,,,,,
8,Carrollton,Morningside,7,Grocery Store,Fast-Food Venue,,,,,,,,
15,Carrollton,Rollingwood Estates,7,Fast-Food Venue,,,,,,,,,
16,Carrollton,Savoy of Josey Ranch,7,Fast-Food Venue,Mexican Cuisine,,,,,,,,
24,Dallas,Cedar Crest,7,Fast-Food Venue,Shopping Venues,Convenience Store,Health Services,Light Rail Station,Mexican Cuisine,Athletic Venue,Grocery Store,Gas Station,Golf Course
27,Dallas,Coppell,7,Drinking Establishment,Fast-Food Venue,Seafood Restaurant,,,,,,,


In [47]:
dfw_merged_bygroup.loc[dfw_merged_bygroup['Cluster Labels'] == 8, dfw_merged_bygroup.columns[[0,1] + list(range(fst_venue_col, dfw_merged_bygroup.shape[1]))]].sort_values(['city','Neighborhood'])

Unnamed: 0,city,Neighborhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue


In [48]:
dfw_merged_bygroup.loc[dfw_merged_bygroup['Cluster Labels'] == 9, dfw_merged_bygroup.columns[[0,1] + list(range(fst_venue_col, dfw_merged_bygroup.shape[1]))]].sort_values(['city','Neighborhood'])

Unnamed: 0,city,Neighborhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
