# Segmenting and clustering the neighborhoods in Houston
The following is the final assignment for the Applied Data Science Capstone on Coursera.

## Part 1
Create dataframe of postal codes from Wikipedia for the M postal codes of Canada from:

https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M

The dataframe must be the following:
1. The dataframe will consist of three columns: PostalCode, Borough, and Neighborhood
2. Only process the cells that have an assigned borough. Ignore cells with a borough that is Not assigned.
3. More than one neighborhood can exist in one postal code area. For example, in the table on the Wikipedia page, you will notice that M5A is listed twice and has two neighborhoods: Harbourfront and Regent Park. These two rows will be combined into one row with the neighborhoods separated with a comma as shown in row 11 in the above table.
4. If a cell has a borough but a Not assigned neighborhood, then the neighborhood will be the same as the borough.
5. Clean your Notebook and add Markdown cells to explain your work and any assumptions you are making.
6. In the last cell of your notebook, use the .shape method to print the number of rows of your dataframe.

In [1]:
# import libraries

import numpy as np # library to handle data in a vectorized manner
import pandas as pd # library for data analsysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)
import json # library to handle JSON files
#pip install geopy # uncomment this line if you haven't completed the Foursquare API lab
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values
import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe
# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors
# import k-means from clustering stage
from sklearn.cluster import KMeans
from sklearn.datasets.samples_generator import make_blobs
#pip install folium # uncomment this line if you haven't completed the Foursquare API lab
import folium # map rendering library
from bs4 import BeautifulSoup
import lxml
from IPython.display import Image 
from IPython.core.display import HTML 
import random # library for random number generation

print('Libraries imported.')

Libraries imported.


In [2]:
# find website and load into beautiful soup
url = requests.get('https://www.houstoniamag.com/home-and-real-estate/2019/03/neighborhoods-by-the-numbers-2019').text
soup = BeautifulSoup(url, 'lxml')
#print(soup.prettify())
table = soup.find('table')

# define headers
headers = table.findAll('th')
for i, head in enumerate(headers): headers[i]=str(headers[i]).replace("<th>","").replace("</th>","")

# find all items and skip first one:
rows=table.findAll('tr')
rows=rows[1:len(rows)]

# skip all meta symbols and line feeds between rows:
for i, row in enumerate(rows): rows[i] = str(rows[i]).replace("\n</td></tr>","").replace("<tr>\n<td>","").replace("\n<","<")

df=pd.DataFrame(rows)
df[headers] = df[0].str.split("</td><td>", n = 12, expand = True) 
df.drop(columns=[0],inplace=True)
df = df.drop(144)
df = df.drop(['2018 Median Home Price',  '5-Year Percent Growth (2013-2018)',  '1-Year Percent Growth (2017-2018)', 'Average Days on Market (2018)', 'Average Square Feet', 'Owner vs Renter Ratio', 'Median Household Income', 'Median Age', 'Percent Enrolled in Public Schools', 'Average Commute Time', 'Walkability'], axis = 1)

df2 = df.astype(str)
df2.rename(columns={'ZIP Code':'Zip'}, inplace=True)
df2.head()

Unnamed: 0,Neighborhood,Zip
0,1960/Cypress,77065
1,Aldine Area,77039
2,Alief,77072
3,Alvin North,77511
4,Alvin South,77511


# Part 2
Now that you have built a dataframe of the postal code of each neighborhood along with the borough name and neighborhood name, in order to utilize the Foursquare location data, we need to get the latitude and the longitude coordinates of each neighborhood.

In an older version of this course, we were leveraging the Google Maps Geocoding API to get the latitude and the longitude coordinates of each neighborhood. However, recently Google started charging for their API: http://geoawesomeness.com/developers-up-in-arms-over-google-maps-api-insane-price-hike/, so we will use the Geocoder Python package instead: https://geocoder.readthedocs.io/index.html.

The problem with this Package is you have to be persistent sometimes in order to get the geographical coordinates of a given postal code. So you can make a call to get the latitude and longitude coordinates of a given postal code and the result would be None, and then make the call again and you would get the coordinates. So, in order to make sure that you get the coordinates for all of our neighborhoods, you can run a while loop for each postal code. Taking postal code M5G as an example, your code would look something like this:


Given that this package can be very unreliable, in case you are not able to get the geographical coordinates of the neighborhoods using the Geocoder package, here is a link to a csv file that has the geographical coordinates of each postal code: http://cocl.us/Geospatial_data

Use the Geocoder package or the csv file to create the following dataframe:


Important Note: There is a limit on how many times you can call geocoder.google function. It is 2500 times per day. This should be way more than enough for you to get acquainted with the package and to use it to get the geographical coordinates of the neighborhoods in the Toronto.

Once you are able to create the above dataframe, submit a link to the new Notebook on your Github repository. (2 marks)

In [3]:
#add Geo-spatial data
dfll= pd.read_csv("https://raw.githubusercontent.com/TbEeDaDrY/Coursera_Capstone/master/us-zip-code-latitude-and-longitude.csv")
dfll.set_index("Zip")
dfll = dfll.astype(str)
df2.set_index("Zip")
houston_data=pd.merge(df2, dfll)
houston_data.head()

Unnamed: 0,Neighborhood,Zip,Latitude,Longitude
0,1960/Cypress,77065,29.927675,-95.60547
1,Aldine Area,77039,29.909123,-95.33683
2,Alief,77072,29.700898,-95.59002
3,Alvin North,77511,29.41148,-95.24475
4,Alvin South,77511,29.41148,-95.24475


# Part 3
Explore and cluster the neighborhoods in Toronto. You can decide to work with only boroughs that contain the word Toronto and then replicate the same analysis we did to the New York City data. It is up to you.

Just make sure:

1. to add enough Markdown cells to explain what you decided to do and to report any observations you make.
2. to generate maps to visualize your neighborhoods and how they cluster together.

Once you are happy with your analysis, submit a link to the new Notebook on your Github repository. (3 marks)

Use geopy library to get the latitude and longitude values of Toronto.
In order to define an instance of the geocoder, we need to define a user_agent. We will name our agent ny_explorer, as shown below.

In [4]:
address = 'Houston, TX, USA'

geolocator = Nominatim(user_agent="ho_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Houston, TX, USA are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Houston, TX, USA are 29.7589382, -95.3676974.


Create a map of Toronto with neighborhoods superimposed on top.

In [5]:
# create map of Toronto using latitude and longitude values
map_houston = folium.Map(location=[latitude, longitude], zoom_start=11)

# add markers to map
for lat, lng, label in zip(houston_data['Latitude'], houston_data['Longitude'], houston_data['Neighborhood']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_houston)  
    
map_houston

Define Foursquare Credentials and Version

In [6]:
CLIENT_ID = 'Q1IWU1ZORZHX1V4HVPWA3YAAQPHVQCFHHBWOOTKF5OQ4G02O' # your Foursquare ID
CLIENT_SECRET = '23K5MT3BO1WRTT52FW0UKINIHPHV0JIKWHXJLIPP2JT2BEPK' # your Foursquare Secret
VERSION = '20180604'
LIMIT = 30
print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: Q1IWU1ZORZHX1V4HVPWA3YAAQPHVQCFHHBWOOTKF5OQ4G02O
CLIENT_SECRET:23K5MT3BO1WRTT52FW0UKINIHPHV0JIKWHXJLIPP2JT2BEPK


Get the neighborhood's name

In [7]:
houston_data.loc[0, 'Neighborhood']

'1960/Cypress'

In [8]:
# Get Neighboorhood's lat and long
neighborhood_latitude = houston_data.loc[0, 'Latitude'] # neighborhood latitude value
neighborhood_longitude = houston_data.loc[0, 'Longitude'] # neighborhood longitude value

neighborhood_name = houston_data.loc[0, 'Neighborhood'] # neighborhood name

print('Latitude and longitude values of {} are {}, {}.'.format(neighborhood_name, 
                                                               neighborhood_latitude, 
                                                               neighborhood_longitude))

Latitude and longitude values of 1960/Cypress are 29.927675, -95.60547.


Now, let's get the top 100 venues within a radius of 500 meters

In [9]:
LIMIT = 100 # limit of number of venues returned by Foursquare API
radius = 500 # define radius
url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION, 
    neighborhood_latitude, 
    neighborhood_longitude, 
    radius, 
    LIMIT)
url # display URL

'https://api.foursquare.com/v2/venues/explore?&client_id=Q1IWU1ZORZHX1V4HVPWA3YAAQPHVQCFHHBWOOTKF5OQ4G02O&client_secret=23K5MT3BO1WRTT52FW0UKINIHPHV0JIKWHXJLIPP2JT2BEPK&v=20180604&ll=29.927675,-95.60547&radius=500&limit=100'

Send the GET request and examine the resutls

In [10]:
results = requests.get(url).json()
results

{'meta': {'code': 200, 'requestId': '5eaddef90f5968001b296fc4'},
 'response': {'suggestedFilters': {'header': 'Tap to show:',
   'filters': [{'name': 'Open now', 'key': 'openNow'}]},
  'headerLocation': 'Houston',
  'headerFullLocation': 'Houston',
  'headerLocationGranularity': 'city',
  'totalResults': 12,
  'suggestedBounds': {'ne': {'lat': 29.932175004500007,
    'lng': -95.60028731718344},
   'sw': {'lat': 29.923174995499995, 'lng': -95.61065268281655}},
  'groups': [{'type': 'Recommended Places',
    'name': 'recommended',
    'items': [{'reasons': {'count': 0,
       'items': [{'summary': 'This spot is popular',
         'type': 'general',
         'reasonName': 'globalInteractionReason'}]},
      'venue': {'id': '4ad4c6a8f964a520b2fb20e3',
       'name': 'Petco',
       'location': {'address': '12310 FM 1960 Rd W',
        'lat': 29.924461715080675,
        'lng': -95.60274423444666,
        'labeledLatLngs': [{'label': 'display',
          'lat': 29.924461715080675,
          

From the Foursquare lab in the previous module, we know that all the information is in the items key. Before we proceed, let's borrow the get_category_type function from the Foursquare lab.

In [11]:
# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

Now we are ready to clean the json and structure it into a pandas dataframe.

In [12]:
# assign relevant part of JSON to venues
venues = results['response']['groups'][0]['items']

# tranform venues into a dataframe
dataframe = json_normalize(venues)

# filter columns
filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
dataframe = dataframe.loc[:, filtered_columns]

# filter the category for each row
dataframe['venue.categories'] = dataframe.apply(get_category_type, axis=1)

# clean columns
dataframe.columns = [col.split(".")[-1] for col in dataframe.columns]

dataframe.head()

Unnamed: 0,name,categories,lat,lng
0,Petco,Pet Store,29.924462,-95.602744
1,GameStop,Video Game Store,29.923513,-95.604873
2,Massage Envy - FM 1960 Eldridge,Spa,29.924638,-95.60183
3,SUBWAY,Sandwich Place,29.923205,-95.604962
4,GNC,Supplement Shop,29.924037,-95.604463


In [13]:
print('{} venues were returned by Foursquare.'.format(dataframe.shape[0]))

12 venues were returned by Foursquare.


In [14]:
# Creat a function to repeat same process to all the neighborhoods in Toronto

def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [15]:
houston_venues = getNearbyVenues(names=houston_data['Neighborhood'],
                                   latitudes=houston_data['Latitude'],
                                   longitudes=houston_data['Longitude']
                                  )

1960/Cypress
Aldine Area
Alief
Alvin North
Alvin South
Atascocita North
Atascocita South
Fall Creek Area
Bacliff/San Leon
Bayou Vista
Hitchcock
Omega Bay
Baytown/Chambers County
Baytown/Harris County
Bear Creek
Katy - North
Bellaire
Braeswood Place
Knollwood/Woodside Area
Brays Oaks
Briargrove
Briargrove Park/Walnut Bend
Rivercrest
Westchase Area
Briarmeadow/Tanglewilde
Charnwood/Briarbend
Brookshire
Chambers County East
Champions Area
Clear Lake Area
Cleveland Area
Coldspring/South San Jacinto County
Conroe Northeast
Conroe Southeast
Conroe Southwest
Copperfield Area
Cottage Grove
Memorial Park
Rice Military/Washington Corridor
Washington East/Sabine
Crosby Area
Crystal Beach
Cypress North
Cypress South
Dayton
Deer Park
Denver Harbor
Dickinson
East End - Galveston
Midtown - Galveston
East End Revitalized
Eldridge North
Energy Corridor
Five Corners
Fort Bend County North/Richmond
Fort Bend Southeast
Friendswood
Fulshear/South Brookshire/Simonton
Galleria
Tanglewood Area
Garden Oaks
Nor

In [16]:
print(houston_venues.shape)
houston_venues.head()

(1313, 7)


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,1960/Cypress,29.927675,-95.60547,Petco,29.924462,-95.602744,Pet Store
1,1960/Cypress,29.927675,-95.60547,GameStop,29.923513,-95.604873,Video Game Store
2,1960/Cypress,29.927675,-95.60547,Massage Envy - FM 1960 Eldridge,29.924638,-95.60183,Spa
3,1960/Cypress,29.927675,-95.60547,SUBWAY,29.923205,-95.604962,Sandwich Place
4,1960/Cypress,29.927675,-95.60547,GNC,29.924037,-95.604463,Supplement Shop


In [17]:
houston_venues.groupby('Neighborhood').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
1960/Cypress,12,12,12,12,12,12
Alief,7,7,7,7,7,7
Alvin North,11,11,11,11,11,11
Alvin South,11,11,11,11,11,11
Atascocita North,24,24,24,24,24,24
Atascocita South,4,4,4,4,4,4
Bacliff/San Leon,4,4,4,4,4,4
Baytown/Chambers County,9,9,9,9,9,9
Baytown/Harris County,5,5,5,5,5,5
Bear Creek,2,2,2,2,2,2


In [18]:
print('There are {} uniques categories.'.format(len(houston_venues['Venue Category'].unique())))

There are 206 uniques categories.


In [19]:
# Analyze each neighboorhood

# one hot encoding
houston_onehot = pd.get_dummies(houston_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
houston_onehot['Neighborhood'] = houston_venues['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [houston_onehot.columns[-1]] + list(houston_onehot.columns[:-1])
houston_onehot = houston_onehot[fixed_columns]

houston_onehot.head()

Unnamed: 0,Neighborhood,ATM,Airport Terminal,American Restaurant,Art Gallery,Art Museum,Arts & Crafts Store,Arts & Entertainment,Asian Restaurant,Assisted Living,Athletics & Sports,Auto Garage,Auto Workshop,Automotive Shop,BBQ Joint,Bakery,Bank,Bar,Baseball Field,Basketball Court,Beer Bar,Beer Garden,Big Box Store,Bistro,Bookstore,Boutique,Breakfast Spot,Brewery,Bridal Shop,Bubble Tea Shop,Building,Burger Joint,Burrito Place,Bus Station,Business Service,Café,Cajun / Creole Restaurant,Casino,Chinese Restaurant,Clothing Store,Coffee Shop,College Football Field,Comfort Food Restaurant,Comic Shop,Concert Hall,Construction & Landscaping,Convenience Store,Cosmetics Shop,Coworking Space,Creperie,Cupcake Shop,Cycle Studio,Deli / Bodega,Department Store,Dessert Shop,Diner,Discount Store,Dive Bar,Donut Shop,Electronics Store,Event Service,Eye Doctor,Fabric Shop,Farmers Market,Fast Food Restaurant,Flea Market,Food,Food & Drink Shop,Food Truck,Fried Chicken Joint,Frozen Yogurt Shop,Furniture / Home Store,Garden Center,Gas Station,Gastropub,General Entertainment,General Travel,German Restaurant,Gift Shop,Golf Course,Gourmet Shop,Grocery Store,Gun Range,Gym,Gym / Fitness Center,Gymnastics Gym,Halal Restaurant,Hardware Store,Hawaiian Restaurant,Health & Beauty Service,Health Food Store,Herbs & Spices Store,History Museum,Hobby Shop,Home Service,Hookah Bar,Hot Dog Joint,Hotel,Hunan Restaurant,IT Services,Ice Cream Shop,Indian Restaurant,Insurance Office,Intersection,Irish Pub,Italian Restaurant,Japanese Restaurant,Jewelry Store,Juice Bar,Kebab Restaurant,Kids Store,Kitchen Supply Store,Korean Restaurant,Latin American Restaurant,Library,Lingerie Store,Liquor Store,Lounge,Market,Martial Arts Dojo,Massage Studio,Medical Supply Store,Mediterranean Restaurant,Men's Store,Mexican Restaurant,Middle Eastern Restaurant,Miscellaneous Shop,Mobile Phone Shop,Mongolian Restaurant,Monument / Landmark,Movie Theater,Moving Target,Museum,Music Store,Music Venue,Nail Salon,New American Restaurant,Nightclub,Non-Profit,Noodle House,Office,Other Repair Shop,Outdoors & Recreation,Pakistani Restaurant,Paper / Office Supplies Store,Park,Pawn Shop,Pet Store,Pharmacy,Photography Studio,Pizza Place,Playground,Pool,Print Shop,Professional & Other Places,Pub,Ramen Restaurant,Record Shop,Rental Car Location,Resort,Restaurant,Salad Place,Salon / Barbershop,Sandwich Place,Scenic Lookout,Seafood Restaurant,Shipping Store,Shoe Store,Shopping Mall,Skate Park,Smoke Shop,Smoothie Shop,Snack Place,Soccer Field,Spa,Spanish Restaurant,Speakeasy,Sporting Goods Shop,Sports Bar,Steakhouse,Supermarket,Supplement Shop,Sushi Restaurant,Swim School,Szechuan Restaurant,Taco Place,Tanning Salon,Taxi,Tea Room,Tennis Court,Thai Restaurant,Theater,Thrift / Vintage Store,Tourist Information Center,Trail,Turkish Restaurant,Vacation Rental,Vegetarian / Vegan Restaurant,Video Game Store,Video Store,Vietnamese Restaurant,Volleyball Court,Waste Facility,Wine Bar,Wings Joint,Women's Store,Yoga Studio
0,1960/Cypress,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,1960/Cypress,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0
2,1960/Cypress,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,1960/Cypress,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,1960/Cypress,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


In [20]:
houston_onehot.shape

(1313, 207)

In [21]:
houston_grouped = houston_onehot.groupby('Neighborhood').mean().reset_index()
houston_grouped

Unnamed: 0,Neighborhood,ATM,Airport Terminal,American Restaurant,Art Gallery,Art Museum,Arts & Crafts Store,Arts & Entertainment,Asian Restaurant,Assisted Living,Athletics & Sports,Auto Garage,Auto Workshop,Automotive Shop,BBQ Joint,Bakery,Bank,Bar,Baseball Field,Basketball Court,Beer Bar,Beer Garden,Big Box Store,Bistro,Bookstore,Boutique,Breakfast Spot,Brewery,Bridal Shop,Bubble Tea Shop,Building,Burger Joint,Burrito Place,Bus Station,Business Service,Café,Cajun / Creole Restaurant,Casino,Chinese Restaurant,Clothing Store,Coffee Shop,College Football Field,Comfort Food Restaurant,Comic Shop,Concert Hall,Construction & Landscaping,Convenience Store,Cosmetics Shop,Coworking Space,Creperie,Cupcake Shop,Cycle Studio,Deli / Bodega,Department Store,Dessert Shop,Diner,Discount Store,Dive Bar,Donut Shop,Electronics Store,Event Service,Eye Doctor,Fabric Shop,Farmers Market,Fast Food Restaurant,Flea Market,Food,Food & Drink Shop,Food Truck,Fried Chicken Joint,Frozen Yogurt Shop,Furniture / Home Store,Garden Center,Gas Station,Gastropub,General Entertainment,General Travel,German Restaurant,Gift Shop,Golf Course,Gourmet Shop,Grocery Store,Gun Range,Gym,Gym / Fitness Center,Gymnastics Gym,Halal Restaurant,Hardware Store,Hawaiian Restaurant,Health & Beauty Service,Health Food Store,Herbs & Spices Store,History Museum,Hobby Shop,Home Service,Hookah Bar,Hot Dog Joint,Hotel,Hunan Restaurant,IT Services,Ice Cream Shop,Indian Restaurant,Insurance Office,Intersection,Irish Pub,Italian Restaurant,Japanese Restaurant,Jewelry Store,Juice Bar,Kebab Restaurant,Kids Store,Kitchen Supply Store,Korean Restaurant,Latin American Restaurant,Library,Lingerie Store,Liquor Store,Lounge,Market,Martial Arts Dojo,Massage Studio,Medical Supply Store,Mediterranean Restaurant,Men's Store,Mexican Restaurant,Middle Eastern Restaurant,Miscellaneous Shop,Mobile Phone Shop,Mongolian Restaurant,Monument / Landmark,Movie Theater,Moving Target,Museum,Music Store,Music Venue,Nail Salon,New American Restaurant,Nightclub,Non-Profit,Noodle House,Office,Other Repair Shop,Outdoors & Recreation,Pakistani Restaurant,Paper / Office Supplies Store,Park,Pawn Shop,Pet Store,Pharmacy,Photography Studio,Pizza Place,Playground,Pool,Print Shop,Professional & Other Places,Pub,Ramen Restaurant,Record Shop,Rental Car Location,Resort,Restaurant,Salad Place,Salon / Barbershop,Sandwich Place,Scenic Lookout,Seafood Restaurant,Shipping Store,Shoe Store,Shopping Mall,Skate Park,Smoke Shop,Smoothie Shop,Snack Place,Soccer Field,Spa,Spanish Restaurant,Speakeasy,Sporting Goods Shop,Sports Bar,Steakhouse,Supermarket,Supplement Shop,Sushi Restaurant,Swim School,Szechuan Restaurant,Taco Place,Tanning Salon,Taxi,Tea Room,Tennis Court,Thai Restaurant,Theater,Thrift / Vintage Store,Tourist Information Center,Trail,Turkish Restaurant,Vacation Rental,Vegetarian / Vegan Restaurant,Video Game Store,Video Store,Vietnamese Restaurant,Volleyball Court,Waste Facility,Wine Bar,Wings Joint,Women's Store,Yoga Studio
0,1960/Cypress,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.083333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.083333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.083333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.083333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.083333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.083333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.083333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.166667,0.0,0.0,0.0,0.0,0.0,0.083333,0.083333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.083333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,Alief,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.142857,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.142857,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.285714,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.142857,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.142857,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.142857,0.0,0.0,0.0,0.0,0.0,0.0
2,Alvin North,0.0,0.0,0.181818,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.181818,0.0,0.0,0.0,0.0,0.090909,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.090909,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.090909,0.0,0.181818,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.090909,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.090909,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,Alvin South,0.0,0.0,0.181818,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.181818,0.0,0.0,0.0,0.0,0.090909,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.090909,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.090909,0.0,0.181818,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.090909,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.090909,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,Atascocita North,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.041667,0.0,0.0,0.0,0.0,0.0,0.0,0.041667,0.0,0.0,0.0,0.041667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.041667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.083333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.083333,0.0,0.0,0.0,0.0,0.041667,0.0,0.041667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.041667,0.0,0.0,0.041667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.041667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.041667,0.0,0.041667,0.041667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.041667,0.0,0.0,0.041667,0.0,0.0,0.083333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.041667,0.041667,0.0,0.0,0.0,0.041667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.041667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
5,Atascocita South,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
6,Bacliff/San Leon,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
7,Baytown/Chambers County,0.0,0.0,0.111111,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.111111,0.0,0.0,0.0,0.0,0.111111,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.111111,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.111111,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.222222,0.0,0.0,0.111111,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.111111,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
8,Baytown/Harris County,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
9,Bear Creek,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [22]:
houston_grouped.shape

(120, 207)

In [23]:
num_top_venues = 5

for hood in houston_grouped['Neighborhood']:
    print("----"+hood+"----")
    temp = houston_grouped[houston_grouped['Neighborhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----1960/Cypress----
                 venue  freq
0                  Spa  0.17
1      Supplement Shop  0.08
2    Mobile Phone Shop  0.08
3  Hawaiian Restaurant  0.08
4    Convenience Store  0.08


----Alief----
                   venue  freq
0           Noodle House  0.29
1            Pizza Place  0.14
2            Gas Station  0.14
3  Vietnamese Restaurant  0.14
4         Sandwich Place  0.14


----Alvin North----
                  venue  freq
0   American Restaurant  0.18
1  Fast Food Restaurant  0.18
2           Pizza Place  0.18
3         Grocery Store  0.09
4              Pharmacy  0.09


----Alvin South----
                  venue  freq
0   American Restaurant  0.18
1  Fast Food Restaurant  0.18
2           Pizza Place  0.18
3         Grocery Store  0.09
4              Pharmacy  0.09


----Atascocita North----
                  venue  freq
0           Pizza Place  0.08
1  Fast Food Restaurant  0.08
2      Department Store  0.08
3  Gym / Fitness Center  0.04
4            Shoe Stor

                venue  freq
0  Mexican Restaurant   1.0
1                 ATM   0.0
2       Movie Theater   0.0
3              Museum   0.0
4         Music Store   0.0


----Greenway Plaza----
                venue  freq
0         Coffee Shop  0.12
1  Seafood Restaurant  0.12
2                 Bar  0.06
3  Mexican Restaurant  0.06
4                Bank  0.06


----Gulfton----
                       venue  freq
0                 Taco Place  0.12
1  Latin American Restaurant  0.12
2        Fried Chicken Joint  0.12
3         Spanish Restaurant  0.06
4                 Nail Salon  0.06


----Heights/Greater Heights----
                  venue  freq
0           Art Gallery  0.11
1           Yoga Studio  0.05
2            Restaurant  0.05
3  Gym / Fitness Center  0.05
4      Sushi Restaurant  0.05


----Hempstead----
                  venue  freq
0         Grocery Store  0.33
1        Discount Store  0.33
2            Donut Shop  0.33
3                   ATM  0.00
4  Pakistani Restaurant  0.

               venue  freq
0         Playground   0.4
1          Pet Store   0.4
2     Scenic Lookout   0.2
3                ATM   0.0
4  Other Repair Shop   0.0


----Porter/New Caney West----
                   venue  freq
0           Home Service   1.0
1                    ATM   0.0
2  Outdoors & Recreation   0.0
3                 Museum   0.0
4            Music Store   0.0


----Rice Military/Washington Corridor----
                      venue  freq
0                       Bar  0.07
1  Mediterranean Restaurant  0.07
2                Taco Place  0.04
3               Wings Joint  0.04
4                Food Truck  0.04


----Rice/Museum District----
                   venue  freq
0                 Bakery  0.33
1  Outdoors & Recreation  0.33
2                   Food  0.33
3                    ATM  0.00
4            Music Store  0.00


----River Oaks Area----
               venue  freq
0                Pub  0.12
1        Pizza Place  0.08
2   Department Store  0.08
3  Health Food Store 

In [24]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [25]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = houston_grouped['Neighborhood']

for ind in np.arange(houston_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(houston_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,1960/Cypress,Spa,Mobile Phone Shop,Supermarket,Sandwich Place,Hawaiian Restaurant,Baseball Field,Video Game Store,Supplement Shop,Convenience Store,Pet Store
1,Alief,Noodle House,Gas Station,Pizza Place,Vietnamese Restaurant,Massage Studio,Sandwich Place,Garden Center,Furniture / Home Store,Frozen Yogurt Shop,Fried Chicken Joint
2,Alvin North,Fast Food Restaurant,Pizza Place,American Restaurant,Pharmacy,Fried Chicken Joint,Grocery Store,Taco Place,Sandwich Place,Yoga Studio,Garden Center
3,Alvin South,Fast Food Restaurant,Pizza Place,American Restaurant,Pharmacy,Fried Chicken Joint,Grocery Store,Taco Place,Sandwich Place,Yoga Studio,Garden Center
4,Atascocita North,Department Store,Pizza Place,Fast Food Restaurant,Fried Chicken Joint,Mobile Phone Shop,Miscellaneous Shop,Mexican Restaurant,Pet Store,Breakfast Spot,Shoe Store


Cluster Neighboorhoods

In [26]:
# set number of clusters
kclusters = 5

houston_grouped_clustering = houston_grouped.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(houston_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10]

array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0])

In [27]:
# add clustering labels
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

houston_merged = houston_data

# merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
houston_merged = houston_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='Neighborhood')



# check the last columns!
houston_merged.head()

Unnamed: 0,Neighborhood,Zip,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,1960/Cypress,77065,29.927675,-95.60547,0.0,Spa,Mobile Phone Shop,Supermarket,Sandwich Place,Hawaiian Restaurant,Baseball Field,Video Game Store,Supplement Shop,Convenience Store,Pet Store
1,Aldine Area,77039,29.909123,-95.33683,,,,,,,,,,,
2,Alief,77072,29.700898,-95.59002,0.0,Noodle House,Gas Station,Pizza Place,Vietnamese Restaurant,Massage Studio,Sandwich Place,Garden Center,Furniture / Home Store,Frozen Yogurt Shop,Fried Chicken Joint
3,Alvin North,77511,29.41148,-95.24475,0.0,Fast Food Restaurant,Pizza Place,American Restaurant,Pharmacy,Fried Chicken Joint,Grocery Store,Taco Place,Sandwich Place,Yoga Studio,Garden Center
4,Alvin South,77511,29.41148,-95.24475,0.0,Fast Food Restaurant,Pizza Place,American Restaurant,Pharmacy,Fried Chicken Joint,Grocery Store,Taco Place,Sandwich Place,Yoga Studio,Garden Center


In [28]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(houston_merged['Latitude'], houston_merged['Longitude'], houston_merged['Neighborhood'], houston_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        #color=rainbow[cluster-1],
        fill=True,
        #fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

Analyze Clusters

## Cluster 1

In [29]:
houston_merged.loc[houston_merged['Cluster Labels'] == 0, houston_merged.columns[[0] + list(range(5, houston_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,1960/Cypress,Spa,Mobile Phone Shop,Supermarket,Sandwich Place,Hawaiian Restaurant,Baseball Field,Video Game Store,Supplement Shop,Convenience Store,Pet Store
2,Alief,Noodle House,Gas Station,Pizza Place,Vietnamese Restaurant,Massage Studio,Sandwich Place,Garden Center,Furniture / Home Store,Frozen Yogurt Shop,Fried Chicken Joint
3,Alvin North,Fast Food Restaurant,Pizza Place,American Restaurant,Pharmacy,Fried Chicken Joint,Grocery Store,Taco Place,Sandwich Place,Yoga Studio,Garden Center
4,Alvin South,Fast Food Restaurant,Pizza Place,American Restaurant,Pharmacy,Fried Chicken Joint,Grocery Store,Taco Place,Sandwich Place,Yoga Studio,Garden Center
5,Atascocita North,Department Store,Pizza Place,Fast Food Restaurant,Fried Chicken Joint,Mobile Phone Shop,Miscellaneous Shop,Mexican Restaurant,Pet Store,Breakfast Spot,Shoe Store
6,Atascocita South,Breakfast Spot,Electronics Store,Moving Target,Mobile Phone Shop,Dive Bar,Donut Shop,General Travel,General Entertainment,Diner,Gastropub
7,Fall Creek Area,Breakfast Spot,Electronics Store,Moving Target,Mobile Phone Shop,Dive Bar,Donut Shop,General Travel,General Entertainment,Diner,Gastropub
8,Bacliff/San Leon,Thrift / Vintage Store,Dessert Shop,Mexican Restaurant,Casino,Yoga Studio,Gastropub,Gas Station,Garden Center,Furniture / Home Store,Frozen Yogurt Shop
12,Baytown/Chambers County,Mexican Restaurant,Fast Food Restaurant,Mobile Phone Shop,American Restaurant,Hookah Bar,Electronics Store,Grocery Store,Taco Place,Food,Gas Station
13,Baytown/Harris County,Pawn Shop,Sandwich Place,Bar,Sports Bar,Bakery,Yoga Studio,Flea Market,Gastropub,Gas Station,Garden Center


## Cluster 2

In [30]:
houston_merged.loc[houston_merged['Cluster Labels'] == 1, houston_merged.columns[[0] + list(range(5, houston_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
103,Near West End - Galveston,Vacation Rental,Yoga Studio,Fast Food Restaurant,Gastropub,Gas Station,Garden Center,Furniture / Home Store,Frozen Yogurt Shop,Fried Chicken Joint,Food Truck
104,Tiki Island,Vacation Rental,Yoga Studio,Fast Food Restaurant,Gastropub,Gas Station,Garden Center,Furniture / Home Store,Frozen Yogurt Shop,Fried Chicken Joint,Food Truck
105,West End - Galveston,Vacation Rental,Yoga Studio,Fast Food Restaurant,Gastropub,Gas Station,Garden Center,Furniture / Home Store,Frozen Yogurt Shop,Fried Chicken Joint,Food Truck


## Cluster 3

In [31]:
houston_merged.loc[houston_merged['Cluster Labels'] == 2, houston_merged.columns[[0] + list(range(5, houston_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
93,Memorial Close In,Clothing Store,Deli / Bodega,General Entertainment,Gastropub,Gas Station,Garden Center,Furniture / Home Store,Frozen Yogurt Shop,Fried Chicken Joint,Food Truck
94,Memorial Villages,Clothing Store,Deli / Bodega,General Entertainment,Gastropub,Gas Station,Garden Center,Furniture / Home Store,Frozen Yogurt Shop,Fried Chicken Joint,Food Truck


## Cluster 4

In [32]:
houston_merged.loc[houston_merged['Cluster Labels'] == 3, houston_merged.columns[[0] + list(range(5, houston_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
107,Northside,Moving Target,Yoga Studio,Flea Market,Gastropub,Gas Station,Garden Center,Furniture / Home Store,Frozen Yogurt Shop,Fried Chicken Joint,Food Truck
108,Oak Forest West Area,Moving Target,Yoga Studio,Flea Market,Gastropub,Gas Station,Garden Center,Furniture / Home Store,Frozen Yogurt Shop,Fried Chicken Joint,Food Truck
139,Waller,Moving Target,Yoga Studio,Flea Market,Gastropub,Gas Station,Garden Center,Furniture / Home Store,Frozen Yogurt Shop,Fried Chicken Joint,Food Truck


## Cluster 5

In [33]:
houston_merged.loc[houston_merged['Cluster Labels'] == 4, houston_merged.columns[[0] + list(range(5, houston_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
77,Katy - Old Towne,Construction & Landscaping,Home Service,Yoga Studio,Fast Food Restaurant,Gastropub,Gas Station,Garden Center,Furniture / Home Store,Frozen Yogurt Shop,Fried Chicken Joint
80,Kingwood East,Construction & Landscaping,Yoga Studio,Flea Market,General Entertainment,Gastropub,Gas Station,Garden Center,Furniture / Home Store,Frozen Yogurt Shop,Fried Chicken Joint
116,Porter/New Caney West,Home Service,Yoga Studio,Flea Market,General Entertainment,Gastropub,Gas Station,Garden Center,Furniture / Home Store,Frozen Yogurt Shop,Fried Chicken Joint


In [34]:
houston_merged.to_csv('houston.csv')