# Battle of the Neighborhoods! 
## Houston, Texas Edition
***

## Description of the problem:
I am going to analyze neighborhoods in Houston, Texas and try to find the hottest areas for socializing and finding great food.
Houston is the largest city in Texas, so I will constrain the analysis to neighborhoods within 15km of Downtown Houston, TX

Interested parties for this analysis would be people looking to move into the area, or just explore Houston's finest!
As one of the most diverse cities in Texas, Houston is a mecca for exploring and experiencing different cultures by their eateries
***

## The data used, and how we will use it to solve our problem
Simply using google and our GitHub community, we can locate a real estate website that lists Houston neighborhoods, zips, and housing information.

Once we webscrape, we can clean the data and merge with a master USA zip code/lat/long csv located on GitHub.

After a quick mapping utilizing Folium, we can use FourSquare to gather venue data on our locations of choice; that again being neighborhoods within 15km of Downtown. This will allow for a quick Uber ride to the venue of our choice from one central location!

In [1]:
# import dependencies 


import pandas as pd # library to process data as dataframes
import requests
from bs4 import BeautifulSoup
import numpy as np

## Webscrape Neighborhoods and Postal Codes in Houston

In [2]:
# define a variable with the URL of the website I used
html_data = 'https://www.houstoniamag.com/home-and-real-estate/2017/03/neighborhoods-by-the-numbers-real-estate-data-2017'

Parse the HTML data using `beautiful_soup`

In [3]:
data = requests.get(html_data).text

# creates a BeautifulSoup object
soup = BeautifulSoup(data, 'html5lib')

What is the content of the Title Attribute?

In [4]:
tag_object = soup.title
print("the tag object is", tag_object)

the tag object is <title>Neighborhoods by the Numbers 2017 | Houstonia Magazine</title>


How many tables are on this webpage?

In [5]:
tables = soup.find_all('table')
print('there are', len(tables) ,'tables on the webpage')

there are 4 tables on the webpage


We can see the table we want is the first one on the webpage

In [6]:
houston_data_read = pd.read_html(str(tables[0]), flavor='bs4')[0]



# drop NaN values
houston_data_read.fillna(0, inplace = True)

houston_data_read

Unnamed: 0.1,Unnamed: 0,ZIP Code,2016 Median Home Price,% Growth 2010-2016,% Growth 2015-2016,Avg. Days on Market in 2016,% Owner Occupied
0,1960/Cypress,77065,"$179,000",45.50%,8.50%,32.0,47%
1,Aldine Area,77039,"$133,500",57.10%,7.70%,35.1,61%
2,Alief,77072,"$164,000",80.20%,14.70%,31.3,47%
3,Alvin North,77511,"$227,000",43.40%,5.60%,58.3,71%
4,Alvin South,77511,"$163,900",46.30%,6.20%,35.1,71%
...,...,...,...,...,...,...,...
142,West University/Southside Area,77005,"$1,192,000",55.80%,-2.60%,49.6,73%
143,Westchase Area,77042,"$605,750",-3.10%,-14.00%,92.2,28%
144,Willis/New Waverly,77318/77378,"$146,250",68.10%,4.50%,49.5,74%
145,Willow Meadows Area,77035,"$310,000",41.60%,-1.70%,52.9,43%


Drop unnecessary columns

In [7]:
# drop unneccesary columns
houston_data_read = houston_data_read.drop(columns = 
                                           ['% Growth 2010-2016', 
                                            '% Growth 2015-2016', 
                                            'Avg. Days on Market in 2016', 
                                            '% Owner Occupied'] )

# rename columns

houston_data_read = houston_data_read.rename(columns = {'Unnamed: 0': 'Neighborhood', 'ZIP Code': 'ZIP'})


In [8]:
# one cell had 2 zip codes and couldn't convert to float, so I used only the first listed zip
houston_data_read.at[144, 'ZIP'] = 77318

Check the data

In [9]:
houston_data_read.head(150)

Unnamed: 0,Neighborhood,ZIP,2016 Median Home Price
0,1960/Cypress,77065,"$179,000"
1,Aldine Area,77039,"$133,500"
2,Alief,77072,"$164,000"
3,Alvin North,77511,"$227,000"
4,Alvin South,77511,"$163,900"
...,...,...,...
142,West University/Southside Area,77005,"$1,192,000"
143,Westchase Area,77042,"$605,750"
144,Willis/New Waverly,77318,"$146,250"
145,Willow Meadows Area,77035,"$310,000"


In [10]:
# change ZIP to float for merging later on - also wanted it without decimal

houston_data_read['ZIP'] = houston_data_read['ZIP'].astype(float).astype(int)

In [11]:
print('the size of the Houston Zip Code dataframe is' , houston_data_read.shape)

the size of the Houston Zip Code dataframe is (147, 3)


## Loading Geospatial Data and Merging the Dataframes

In [12]:
# load the geospatial dataset from csv into a pandas dataframe

coordinates = pd.read_csv(r'C:\Users\Jacob Smith\Data_Science_Workbooks\US_Zip_Codes_lat_long.csv')

coordinates.head()

Unnamed: 0,ZIP,LAT,LNG
0,601,18.180555,-66.749961
1,602,18.361945,-67.175597
2,603,18.455183,-67.119887
3,606,18.158345,-66.932911
4,610,18.295366,-67.125135


Merge the `houston_data_read` dataframe and the `coordinates` dataframe based on their shared column, `ZIP`

In [13]:
houston_df = pd.merge(houston_data_read, coordinates, on = 'ZIP')

# check that it worked
houston_df.tail(25)

Unnamed: 0,Neighborhood,ZIP,2016 Median Home Price,LAT,LNG
122,Sharpstown Area,77036,"$180,000",29.701847,-95.534537
123,South Houston,77587,"$105,000",29.661032,-95.229784
124,Southbelt/Ellington,77034,"$167,000",29.61951,-95.191644
125,Spring Branch,77055,"$325,000",29.796871,-95.49165
126,Spring East,77373,"$149,700",30.062169,-95.383966
127,Spring Northeast,77386,"$238,000",30.100255,-95.356479
128,Spring/Klein,77388,"$207,500",30.057802,-95.470985
129,Spring/Klein/Tomball,77375,"$215,000",30.094886,-95.58583
130,Tomball,77375,"$241,000",30.094886,-95.58583
131,Stafford Area,77477,"$165,000",29.624261,-95.568033


In [14]:
# concatenate LAT and LONG for later on

houston_df['LAT_LNG'] = houston_df['LAT'].map(str) +' , '+ houston_df['LNG'].map(str)
houston_df.tail(5)

Unnamed: 0,Neighborhood,ZIP,2016 Median Home Price,LAT,LNG,LAT_LNG
142,Waller,77484,"$195,000",30.079431,-95.932255,"30.079431 , -95.932255"
143,Webster,77598,"$315,000",29.539422,-95.134995,"29.539422 , -95.13499499999999"
144,Willis/New Waverly,77318,"$146,250",30.438807,-95.533229,"30.438807 , -95.533229"
145,Willow Meadows Area,77035,"$310,000",29.655503,-95.471663,"29.655503000000003 , -95.471663"
146,Willowbrook,77064,"$153,500",29.918045,-95.535685,"29.918045000000003 , -95.535685"


In [15]:
# check the size of the dataframe
houston_df.shape

(147, 6)

## Explore and Cluster the Neighborhoods in Houston

In [16]:
# import dependencies

import numpy as np # library to handle data in a vectorized manner

import pandas as pd # library for data analsysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json # library to handle JSON files

# !conda install -c conda-forge geopy --yes 
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

# !conda install -c conda-forge folium=0.5.0 --yes
import folium # map rendering library

print('Libraries imported.')

Libraries imported.


In [17]:
address = 'Houston, TX'

# the below 4 lines of code weren't exactly downtown - so a quick google search allowed me to find the correct coordinates!


#geolocator = Nominatim(user_agent="Houston_explorer")
#location = geolocator.geocode(address)
#latitude = location.latitude
#longitude = location.longitude


# google search coodinates for downtown Houston!
latitude = 29.7559698
longitude = -95.3573194

print('The geographical coordinates of Houston, TX are {}, {}.'.format(latitude, longitude))

The geographical coordinates of Houston, TX are 29.7559698, -95.3573194.


In [18]:
# create map of Houston using latitude and longitude values

houston_map = folium.Map(location=[latitude, longitude], zoom_start=9)
label = 'Downtown Houston'
folium.CircleMarker(
        [latitude, longitude],
        radius=4,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(houston_map)  
    

houston_map

In [19]:
# add markers to map
for Zip, Lat, Lng, Neighborhood, in zip(houston_df['ZIP'], houston_df['LAT'], houston_df['LNG'], houston_df['Neighborhood']):
    label = '{}  ({})'.format(Neighborhood, Zip)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [Lat, Lng],
        radius=4,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(houston_map)  
    
houston_map    

## Select Data Points within 15 km of Downtown Houston

In [20]:
from math import sin, cos, sqrt, atan2, radians

# approximate radius of earth in km
R = 6373.0

# Downtown Houston lat and long
dt_lat = radians(29.7559698)
dt_lon = radians(-95.3573194)

# test of random coordinates from our dataframe
lat2 = radians(29.698430)
lon2 = radians(-95.356900)

dlon = lon2 - dt_lon
dlat = lat2 - dt_lat

a = sin(dlat / 2)**2 + cos(dt_lat) * cos(lat2) * sin(dlon / 2)**2
c = 2 * atan2(sqrt(a), sqrt(1 - a))

distance = R * c

print("Result:", str(round(distance, 2)), 'km from Downtown Houston')


Result: 6.4 km from Downtown Houston


In [21]:
from geopy.distance import distance
import geopy.distance


latitude = 29.7559698
longitude = -95.3573194


# first, let's create a square DataFrame (think of it as a matrix if you like)
square = pd.DataFrame(
    np.zeros(len(houston_df) ** 2).reshape(len(houston_df), len(houston_df)),
    index=houston_df.index, columns=houston_df.index)

# replacing distance.vicenty with distance.distance
def get_distance(col):
    end = latitude, longitude
    return houston_df['LAT_LNG'].apply(geopy.distance.distance,
                              args=(end,),
                              ellipsoid='WGS-84'
                             )

distances = square.apply(get_distance, axis=1).T

# Select first column of the dataframe as a series
distances = pd.DataFrame(distances)
distances = distances[0]
distances.head()

0    30.406465720630806 km
1    17.274546967486714 km
2     22.87714014491653 km
3     43.05835380454846 km
4     43.05835380454846 km
Name: 0, dtype: object

In [22]:
# merge the data to our main dataframe, `houston_df` 
houston_df = houston_df.merge(distances, how = 'outer', left_index=True, right_index=True)

In [23]:
houston_df.head()

Unnamed: 0,Neighborhood,ZIP,2016 Median Home Price,LAT,LNG,LAT_LNG,0
0,1960/Cypress,77065,"$179,000",29.926473,-95.60379,"29.926472999999998 , -95.60379",30.406465720630806 km
1,Aldine Area,77039,"$133,500",29.911171,-95.341182,"29.911171000000003 , -95.34118199999999",17.274546967486714 km
2,Alief,77072,"$164,000",29.699688,-95.584817,"29.699688000000002 , -95.584817",22.87714014491653 km
3,Alvin North,77511,"$227,000",29.380858,-95.241857,"29.380858 , -95.241857",43.05835380454846 km
4,Alvin South,77511,"$163,900",29.380858,-95.241857,"29.380858 , -95.241857",43.05835380454846 km


In [24]:
# rename the column
houston_df.rename(columns = {(0) : 'Distance from Downtown'}, inplace = True)

In [25]:
# what are the datatypes?
houston_df.head().dtypes

Neighborhood               object
ZIP                         int32
2016 Median Home Price     object
LAT                       float64
LNG                       float64
LAT_LNG                    object
Distance from Downtown     object
dtype: object

In [26]:
# change object to string, remove last few characters 'km'

# convert to float and round to 2 decimals to clean it up
houston_df['Distance from Downtown'] = houston_df['Distance from Downtown'].astype(str).str[:-3].astype(float).round(decimals = 2)




# check the data
houston_df.head()

Unnamed: 0,Neighborhood,ZIP,2016 Median Home Price,LAT,LNG,LAT_LNG,Distance from Downtown
0,1960/Cypress,77065,"$179,000",29.926473,-95.60379,"29.926472999999998 , -95.60379",30.41
1,Aldine Area,77039,"$133,500",29.911171,-95.341182,"29.911171000000003 , -95.34118199999999",17.27
2,Alief,77072,"$164,000",29.699688,-95.584817,"29.699688000000002 , -95.584817",22.88
3,Alvin North,77511,"$227,000",29.380858,-95.241857,"29.380858 , -95.241857",43.06
4,Alvin South,77511,"$163,900",29.380858,-95.241857,"29.380858 , -95.241857",43.06


In [27]:
# add new column that specifies if the location is within 15 km of Downtown Houston
houston_df['Within 15 km of Downtown'] = houston_df['Distance from Downtown'] < 15

In [28]:
houston_df.head(25)

Unnamed: 0,Neighborhood,ZIP,2016 Median Home Price,LAT,LNG,LAT_LNG,Distance from Downtown,Within 15 km of Downtown
0,1960/Cypress,77065,"$179,000",29.926473,-95.60379,"29.926472999999998 , -95.60379",30.41,False
1,Aldine Area,77039,"$133,500",29.911171,-95.341182,"29.911171000000003 , -95.34118199999999",17.27,False
2,Alief,77072,"$164,000",29.699688,-95.584817,"29.699688000000002 , -95.584817",22.88,False
3,Alvin North,77511,"$227,000",29.380858,-95.241857,"29.380858 , -95.241857",43.06,False
4,Alvin South,77511,"$163,900",29.380858,-95.241857,"29.380858 , -95.241857",43.06,False
5,Atascocita North,77346,"$189,900",29.994499,-95.177499,"29.994498999999998 , -95.177499",31.64,False
6,Atascocita South,77396,"$199,000",29.945205,-95.259778,"29.945204999999998 , -95.259778",23.0,False
7,Fall Creek Area,77396,"$302,000",29.945205,-95.259778,"29.945204999999998 , -95.259778",23.0,False
8,Bacliff/San Leon,77518,"$165,941",29.507162,-94.987247,"29.507162 , -94.987247",45.22,False
9,Bayou Vista,77563,"$240,000",29.303629,-95.032416,"29.303628999999997 , -95.032416",59.21,False


In [29]:
# create index where location is greater than 15 km from downtown
index = houston_df[(houston_df['Within 15 km of Downtown'] == False)].index

# drop rows where location is greater than 15 km from Downtown Houston
houston_df.drop(index, inplace = True)

In [30]:
# reset the index for our newly refined dataframe
houston_df.reset_index(inplace = True, drop = True)
houston_df

Unnamed: 0,Neighborhood,ZIP,2016 Median Home Price,LAT,LNG,LAT_LNG,Distance from Downtown,Within 15 km of Downtown
0,Bellaire,77401,"$933,000",29.704019,-95.460905,"29.704019 , -95.46090500000001",11.56,True
1,Braeswood Place,77025,"$715,000",29.685706,-95.434764,"29.685706 , -95.434764",10.81,True
2,Knollwood/Woodside Area,77025,"$430,000",29.685706,-95.434764,"29.685706 , -95.434764",10.81,True
3,Briargrove,77057,"$824,000",29.744081,-95.487974,"29.744081 , -95.487974",12.71,True
4,Cottage Grove,77007,"$392,500",29.771545,-95.411083,"29.771545 , -95.41108299999999",5.48,True
5,Memorial Park,77007,"$1,130,000",29.771545,-95.411083,"29.771545 , -95.41108299999999",5.48,True
6,Rice Military/Washington Corridor,77007,"$449,900",29.771545,-95.411083,"29.771545 , -95.41108299999999",5.48,True
7,Washington East/Sabine,77007,"$409,000",29.771545,-95.411083,"29.771545 , -95.41108299999999",5.48,True
8,Denver Harbor,77020,"$105,000",29.773179,-95.314327,"29.773179 , -95.314327",4.57,True
9,East End Revitalized,77003,"$253,000",29.749778,-95.345885,"29.749778000000003 , -95.345885",1.3,True


In [31]:
# create another map of Houston using latitude and longitude values

# google search coodinates for downtown Houston!
dt_latitude = 29.7559698
dt_longitude = -95.3573194

dt_houston_map = folium.Map(location=[dt_latitude, dt_longitude], zoom_start=11)
label = 'Downtown Houston'
folium.CircleMarker(
        [dt_latitude, dt_longitude],
        radius=4,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(dt_houston_map)  
    

dt_houston_map

## The map below is our final visualization of the neighborhoods we will analyze

In [32]:
# add markers to map
for Zip, Lat, Lng, Neighborhood, Distance in zip(houston_df['ZIP'],
                                                houston_df['LAT'], 
                                                houston_df['LNG'],
                                                houston_df['Neighborhood'],
                                        houston_df['Distance from Downtown']):
    
    label = '{} {} km ({})'.format(Neighborhood, Distance, Zip)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [Lat, Lng],
        radius=4,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(dt_houston_map)  
    
dt_houston_map    

### We will define our Foursquare Credentials and Version

In [33]:
# @hidden_cell
CLIENT_ID = 'CGVJTVS2GDVOWWUFRGSMXRRLTGOAP4QDMX0CUZJ5JI2NKL1O' # Foursquare ID
CLIENT_SECRET = 'P22WM2JLBUDRBPGZLMJBOIMYZJC5PMCICTORNB0F3K2VOXFY' # Foursquare Secret
VERSION = '20210621' # Foursquare API version
LIMIT = 100 # A default Foursquare API limit value

print('My credentials:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

My credentials:
CLIENT_ID: CGVJTVS2GDVOWWUFRGSMXRRLTGOAP4QDMX0CUZJ5JI2NKL1O
CLIENT_SECRET:P22WM2JLBUDRBPGZLMJBOIMYZJC5PMCICTORNB0F3K2VOXFY


#### Lets explore a singel neighborhood in the dataframe

Get the first neighborhood name 

In [34]:
houston_df.loc[0, 'Neighborhood']

'Bellaire'

Get the neighborhood's lat and long values

In [35]:
neighborhood_latitude = houston_df.loc[0, 'LAT'] # neighborhood latitude value
neighborhood_longitude = houston_df.loc[0, 'LNG'] # neighborhood longitude value

neighborhood_name = houston_df.loc[0, 'Neighborhood'] # neighborhood name

print('Latitude and longitude values of {} are {}, {}.'.format(neighborhood_name, 
                                                               neighborhood_latitude, 
                                                               neighborhood_longitude))

Latitude and longitude values of Bellaire are 29.704019, -95.46090500000001.


#### Now we can grab the top 100 venues within 500m of the Bellaire neighborhood

Let's first create the `GET` request `URL`

In [36]:
LIMIT = 100 # limit of number of venues returned by Foursquare API to the top 100

radius = 1000 # define radius in meters

explore_url_prefix = 'https://api.foursquare.com/v2/venues/explore'

url = '{}?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
    explore_url_prefix,
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION, 
    neighborhood_latitude, 
    neighborhood_longitude, 
    radius, 
    LIMIT)

url # display URL


'https://api.foursquare.com/v2/venues/explore?&client_id=CGVJTVS2GDVOWWUFRGSMXRRLTGOAP4QDMX0CUZJ5JI2NKL1O&client_secret=P22WM2JLBUDRBPGZLMJBOIMYZJC5PMCICTORNB0F3K2VOXFY&v=20210621&ll=29.704019,-95.46090500000001&radius=1000&limit=100'

In [37]:
# retrieve the top 100 venues as a JSON file
results = requests.get(url).json()

results

{'meta': {'code': 200, 'requestId': '60da1cb8e87a100d582d0371'},
 'response': {'suggestedFilters': {'header': 'Tap to show:',
   'filters': [{'name': '$-$$$$', 'key': 'price'},
    {'name': 'Open now', 'key': 'openNow'}]},
  'headerLocation': 'Southwest Houston',
  'headerFullLocation': 'Southwest Houston, Houston',
  'headerLocationGranularity': 'neighborhood',
  'totalResults': 34,
  'suggestedBounds': {'ne': {'lat': 29.713019009000007,
    'lng': -95.45056279590374},
   'sw': {'lat': 29.69501899099999, 'lng': -95.47124720409629}},
  'groups': [{'type': 'Recommended Places',
    'name': 'recommended',
    'items': [{'reasons': {'count': 0,
       'items': [{'summary': 'This spot is popular',
         'type': 'general',
         'reasonName': 'globalInteractionReason'}]},
      'venue': {'id': '4e4d8f5e22711b81ba75e46b',
       'name': 'CVS pharmacy',
       'location': {'address': 'west bellfort',
        'crossStreet': 'dairy ashford',
        'lat': 29.70537080522144,
        'lng'

In [38]:
# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

We're ready to clean the json file and structure it into a _pandas_ dataframe

In [39]:
venues = results['response']['groups'][0]['items']
    
nearby_venues = json_normalize(venues) # flatten JSON

# filter columns
filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
nearby_venues =nearby_venues.loc[:, filtered_columns]

# filter the category for each row
nearby_venues['venue.categories'] = nearby_venues.apply(get_category_type, axis=1)

# clean columns
nearby_venues.columns = [col.split(".")[-1] for col in nearby_venues.columns]

nearby_venues.head(15)

  nearby_venues = json_normalize(venues) # flatten JSON


Unnamed: 0,name,categories,lat,lng
0,CVS pharmacy,Pharmacy,29.705371,-95.459175
1,Mod Pizza,Pizza Place,29.705214,-95.469172
2,Menchie's,Frozen Yogurt Shop,29.706825,-95.4699
3,Charlie's BBQ & Hamburgers,Restaurant,29.709577,-95.462957
4,Jersey Mike's Subs,Sandwich Place,29.706619,-95.468529
5,Starbucks,Coffee Shop,29.707271,-95.468296
6,Lemongrass Cafe,Asian Restaurant,29.705291,-95.468277
7,Costa Brava Bistro,Spanish Restaurant,29.70537,-95.468397
8,Wells Fargo,Bank,29.706213,-95.46837
9,Jimmy John's,Sandwich Place,29.70743,-95.468624


In [40]:
print('{} venues were returned by Foursquare that are in a 1000m radius of the Beczy Park neighborhood.'.format(nearby_venues.shape[0]))

34 venues were returned by Foursquare that are in a 1000m radius of the Beczy Park neighborhood.


## Explore Neighborhoods in Houston

Let's create a function to repeat this process for all of the neighborhoods in our dataframe

In [41]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print('Retrieving venues for', name)
        
        explore_url_prefix = 'https://api.foursquare.com/v2/venues/explore'
            
        # create the API request URL
        limit = 100
        radius = 500
        url = '{}?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            explore_url_prefix,
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [42]:
houston_venues = getNearbyVenues(names=houston_df['Neighborhood'],
                                   latitudes=houston_df['LAT'],
                                   longitudes=houston_df['LNG']
                                  )

Retrieving venues for Bellaire
Retrieving venues for Braeswood Place
Retrieving venues for Knollwood/Woodside Area
Retrieving venues for Briargrove
Retrieving venues for Cottage Grove
Retrieving venues for Memorial Park
Retrieving venues for Rice Military/Washington Corridor
Retrieving venues for Washington East/Sabine
Retrieving venues for Denver Harbor
Retrieving venues for East End Revitalized
Retrieving venues for Galleria
Retrieving venues for Tanglewood Area
Retrieving venues for Greenway Plaza
Retrieving venues for Gulfton
Retrieving venues for Heights/Greater Heights
Retrieving venues for Timbergrove/Lazybrook
Retrieving venues for Highland Village/Midlane
Retrieving venues for Royden Oaks/Afton Oaks
Retrieving venues for Hobby Area
Retrieving venues for Medical Center Area
Retrieving venues for Medical Center South
Retrieving venues for Meyerland Area
Retrieving venues for Midtown-Houston
Retrieving venues for Riverside
Retrieving venues for Montrose
Retrieving venues for Nort

In [43]:
print(houston_venues.shape)
houston_venues.head(15)

(687, 7)


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Bellaire,29.704019,-95.460905,CVS pharmacy,29.705371,-95.459175,Pharmacy
1,Braeswood Place,29.685706,-95.434764,Linkwood Park,29.6845,-95.436425,Park
2,Braeswood Place,29.685706,-95.434764,Edwards Theater,29.68274,-95.434295,Multiplex
3,Braeswood Place,29.685706,-95.434764,The House That Drinks Built,29.683983,-95.439151,Cocktail Bar
4,Braeswood Place,29.685706,-95.434764,The Bayou,29.689585,-95.436823,Park
5,Knollwood/Woodside Area,29.685706,-95.434764,Linkwood Park,29.6845,-95.436425,Park
6,Knollwood/Woodside Area,29.685706,-95.434764,Edwards Theater,29.68274,-95.434295,Multiplex
7,Knollwood/Woodside Area,29.685706,-95.434764,The House That Drinks Built,29.683983,-95.439151,Cocktail Bar
8,Knollwood/Woodside Area,29.685706,-95.434764,The Bayou,29.689585,-95.436823,Park
9,Briargrove,29.744081,-95.487974,Rice Epicurean,29.746557,-95.485899,Fruit & Vegetable Store


Let's check to see how many venues were returned for each of the neighborhoods

In [44]:
houston_venues.groupby('Neighborhood').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Bellaire,1,1,1,1,1,1
Braeswood Place,4,4,4,4,4,4
Briargrove,6,6,6,6,6,6
Cottage Grove,64,64,64,64,64,64
Denver Harbor,14,14,14,14,14,14
East End Revitalized,10,10,10,10,10,10
Galleria,9,9,9,9,9,9
Greenway Plaza,16,16,16,16,16,16
Heights/Greater Heights,7,7,7,7,7,7
Highland Village/Midlane,45,45,45,45,45,45


How many unique categories can be curated from all of our returned values?

In [45]:
print('There are {} uniques categories.'.format(len(houston_venues['Venue Category'].unique())))

There are 145 uniques categories.


## Analyze Each Neighborhood

In [46]:
# one hot encoding
houston_onehot = pd.get_dummies(houston_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
houston_onehot['Neighborhood'] = houston_venues['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [houston_onehot.columns[-1]] + list(houston_onehot.columns[:-1])
houston_onehot = houston_onehot[fixed_columns]

houston_onehot.head()

Unnamed: 0,Neighborhood,ATM,Accessories Store,Airport Service,American Restaurant,Art Gallery,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,BBQ Joint,Bagel Shop,Bakery,Bank,Bar,Beach,Beer Garden,Bistro,Board Shop,Bookstore,Bowling Alley,Boxing Gym,Brazilian Restaurant,Breakfast Spot,Brewery,Bridal Shop,Bubble Tea Shop,Burger Joint,Burrito Place,Bus Station,Business Service,Café,Cajun / Creole Restaurant,Casino,Chinese Restaurant,Clothing Store,Cocktail Bar,Coffee Shop,College Rec Center,Comfort Food Restaurant,Comic Shop,Convenience Store,Cosmetics Shop,Cuban Restaurant,Cupcake Shop,Dance Studio,Deli / Bodega,Dessert Shop,Diner,Discount Store,Electronics Store,Farmers Market,Fast Food Restaurant,Flower Shop,Food & Drink Shop,Food Court,Food Service,Food Truck,French Restaurant,Fried Chicken Joint,Frozen Yogurt Shop,Fruit & Vegetable Store,Furniture / Home Store,Gas Station,General Entertainment,Gift Shop,Gourmet Shop,Grocery Store,Gym,Gym / Fitness Center,Gymnastics Gym,Hawaiian Restaurant,Health & Beauty Service,Health Food Store,Hostel,Hot Dog Joint,Hotel,Hotel Bar,Hunan Restaurant,Ice Cream Shop,Indian Restaurant,Indie Movie Theater,Indie Theater,Insurance Office,Intersection,Irish Pub,Italian Restaurant,Japanese Restaurant,Kitchen Supply Store,Lebanese Restaurant,Lingerie Store,Liquor Store,Lounge,Mediterranean Restaurant,Men's Store,Mexican Restaurant,Middle Eastern Restaurant,Miscellaneous Shop,Mobile Phone Shop,Multiplex,New American Restaurant,Nightclub,Non-Profit,Office,Outdoors & Recreation,Paper / Office Supplies Store,Park,Pet Store,Pharmacy,Pilates Studio,Pizza Place,Print Shop,Pub,Record Shop,Rest Area,Restaurant,Salad Place,Salon / Barbershop,Sandwich Place,Seafood Restaurant,Shipping Store,Shopping Mall,Smoothie Shop,Snack Place,South American Restaurant,Southern / Soul Food Restaurant,Spa,Sporting Goods Shop,Sports Bar,Steakhouse,Supermarket,Sushi Restaurant,Szechuan Restaurant,Taco Place,Tea Room,Thai Restaurant,Theater,Thrift / Vintage Store,Trail,Train Station,Vegetarian / Vegan Restaurant,Video Store,Vietnamese Restaurant,Wine Bar,Wings Joint,Women's Store,Yoga Studio
0,Bellaire,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,Braeswood Place,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,Braeswood Place,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,Braeswood Place,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,Braeswood Place,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


Let's check the size of the new dataframe

In [47]:
houston_onehot.shape

(687, 146)

#### Next, let's group rows by neighborhood and by taking the mean of the frequency of occurrence of each category

In [48]:
houston_grouped = houston_onehot.groupby('Neighborhood').mean().reset_index()
houston_grouped

Unnamed: 0,Neighborhood,ATM,Accessories Store,Airport Service,American Restaurant,Art Gallery,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,BBQ Joint,Bagel Shop,Bakery,Bank,Bar,Beach,Beer Garden,Bistro,Board Shop,Bookstore,Bowling Alley,Boxing Gym,Brazilian Restaurant,Breakfast Spot,Brewery,Bridal Shop,Bubble Tea Shop,Burger Joint,Burrito Place,Bus Station,Business Service,Café,Cajun / Creole Restaurant,Casino,Chinese Restaurant,Clothing Store,Cocktail Bar,Coffee Shop,College Rec Center,Comfort Food Restaurant,Comic Shop,Convenience Store,Cosmetics Shop,Cuban Restaurant,Cupcake Shop,Dance Studio,Deli / Bodega,Dessert Shop,Diner,Discount Store,Electronics Store,Farmers Market,Fast Food Restaurant,Flower Shop,Food & Drink Shop,Food Court,Food Service,Food Truck,French Restaurant,Fried Chicken Joint,Frozen Yogurt Shop,Fruit & Vegetable Store,Furniture / Home Store,Gas Station,General Entertainment,Gift Shop,Gourmet Shop,Grocery Store,Gym,Gym / Fitness Center,Gymnastics Gym,Hawaiian Restaurant,Health & Beauty Service,Health Food Store,Hostel,Hot Dog Joint,Hotel,Hotel Bar,Hunan Restaurant,Ice Cream Shop,Indian Restaurant,Indie Movie Theater,Indie Theater,Insurance Office,Intersection,Irish Pub,Italian Restaurant,Japanese Restaurant,Kitchen Supply Store,Lebanese Restaurant,Lingerie Store,Liquor Store,Lounge,Mediterranean Restaurant,Men's Store,Mexican Restaurant,Middle Eastern Restaurant,Miscellaneous Shop,Mobile Phone Shop,Multiplex,New American Restaurant,Nightclub,Non-Profit,Office,Outdoors & Recreation,Paper / Office Supplies Store,Park,Pet Store,Pharmacy,Pilates Studio,Pizza Place,Print Shop,Pub,Record Shop,Rest Area,Restaurant,Salad Place,Salon / Barbershop,Sandwich Place,Seafood Restaurant,Shipping Store,Shopping Mall,Smoothie Shop,Snack Place,South American Restaurant,Southern / Soul Food Restaurant,Spa,Sporting Goods Shop,Sports Bar,Steakhouse,Supermarket,Sushi Restaurant,Szechuan Restaurant,Taco Place,Tea Room,Thai Restaurant,Theater,Thrift / Vintage Store,Trail,Train Station,Vegetarian / Vegan Restaurant,Video Store,Vietnamese Restaurant,Wine Bar,Wings Joint,Women's Store,Yoga Studio
0,Bellaire,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,Braeswood Place,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,Briargrove,0.0,0.0,0.0,0.166667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.166667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.166667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.166667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.166667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.166667
3,Cottage Grove,0.0,0.0,0.0,0.046875,0.0,0.0,0.0,0.015625,0.015625,0.0,0.015625,0.0,0.15625,0.0,0.0,0.0,0.0,0.0,0.0,0.015625,0.0,0.0,0.0,0.0,0.0,0.0,0.015625,0.0,0.0,0.0,0.015625,0.0,0.015625,0.015625,0.0,0.0,0.0,0.0,0.015625,0.0,0.0,0.015625,0.0,0.015625,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.015625,0.0,0.0,0.015625,0.0,0.0,0.0,0.0,0.015625,0.015625,0.03125,0.0,0.015625,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.015625,0.0,0.0,0.0,0.0,0.0,0.03125,0.0,0.0,0.015625,0.0,0.0,0.046875,0.0,0.0,0.046875,0.0,0.0,0.015625,0.0,0.015625,0.09375,0.0,0.0,0.0,0.0,0.0,0.015625,0.0,0.0,0.015625,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.015625,0.015625,0.015625,0.0,0.0,0.0,0.0,0.03125,0.03125,0.0,0.0,0.0,0.0,0.03125,0.0,0.015625,0.0,0.0,0.0,0.0,0.0,0.015625,0.0,0.0,0.015625,0.0,0.015625,0.0,0.0
4,Denver Harbor,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.071429,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.071429,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.071429,0.0,0.0,0.142857,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.071429,0.0,0.0,0.0,0.071429,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.071429,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.071429,0.0,0.071429,0.0,0.071429,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.142857,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.071429,0.0,0.0,0.0,0.0,0.0
5,East End Revitalized,0.0,0.0,0.0,0.0,0.0,0.1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.1,0.0,0.0,0.0,0.0,0.0,0.0,0.1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.1,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.1,0.0,0.0,0.1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
6,Galleria,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.111111,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.111111,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.222222,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.111111,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.111111,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.111111,0.0,0.0,0.0,0.111111,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.111111,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
7,Greenway Plaza,0.0,0.0,0.0,0.0625,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0625,0.0,0.0,0.0,0.0625,0.0,0.1875,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0625,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0625,0.0,0.0,0.0,0.0,0.0,0.0,0.0625,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0625,0.0,0.0,0.0,0.0,0.0625,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0625,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0625,0.0,0.0,0.0,0.0,0.0,0.125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0625,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
8,Heights/Greater Heights,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.142857,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.142857,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.142857,0.0,0.142857,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.142857,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.142857,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.142857,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
9,Highland Village/Midlane,0.0,0.022222,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.066667,0.022222,0.0,0.0,0.022222,0.0,0.0,0.0,0.0,0.022222,0.0,0.0,0.022222,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.022222,0.066667,0.0,0.022222,0.0,0.022222,0.0,0.0,0.066667,0.0,0.022222,0.0,0.0,0.0,0.0,0.0,0.044444,0.0,0.0,0.022222,0.0,0.0,0.0,0.0,0.0,0.0,0.022222,0.0,0.111111,0.0,0.0,0.022222,0.022222,0.0,0.0,0.022222,0.0,0.0,0.022222,0.0,0.0,0.0,0.0,0.0,0.0,0.022222,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.022222,0.0,0.022222,0.0,0.0,0.0,0.022222,0.022222,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.022222,0.022222,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.044444,0.0,0.0,0.022222,0.0,0.0,0.0,0.0,0.022222,0.0,0.0,0.0,0.0,0.022222,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.044444,0.0


Confirm the new size

In [49]:
houston_grouped.shape

(35, 146)

We will print each neighborhood along with the top 5 most common venues

In [50]:
num_top_venues = 5

for hood in houston_grouped['Neighborhood']:
    print("----"+hood+"----")
    temp = houston_grouped[houston_grouped['Neighborhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Bellaire----
                       venue  freq
0                   Pharmacy   1.0
1                        ATM   0.0
2                  Nightclub   0.0
3         Mexican Restaurant   0.0
4  Middle Eastern Restaurant   0.0


----Braeswood Place----
          venue  freq
0          Park  0.50
1  Cocktail Bar  0.25
2     Multiplex  0.25
3           ATM  0.00
4    Non-Profit  0.00


----Briargrove----
                     venue  freq
0              Yoga Studio  0.17
1             Dessert Shop  0.17
2  Fruit & Vegetable Store  0.17
3      Lebanese Restaurant  0.17
4                      Spa  0.17


----Cottage Grove----
                 venue  freq
0                  Bar  0.16
1            Nightclub  0.09
2  American Restaurant  0.05
3               Lounge  0.05
4   Mexican Restaurant  0.05


----Denver Harbor----
                  venue  freq
0  Fast Food Restaurant  0.14
1        Sandwich Place  0.14
2           Pizza Place  0.07
3        Discount Store  0.07
4           Bus Station 

Put the data into a pandas dataframe

In [51]:
# i will write a function to sort the venues in descending order

def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

Now we will create the new dataframe and display the top 10 venues for each neighborhood

In [52]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = houston_grouped['Neighborhood']

for ind in np.arange(houston_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(houston_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Bellaire,Pharmacy,College Rec Center,Flower Shop,Fast Food Restaurant,Farmers Market,Electronics Store,Discount Store,Diner,Dessert Shop,Deli / Bodega
1,Braeswood Place,Park,Cocktail Bar,Multiplex,Yoga Studio,Deli / Bodega,Farmers Market,Electronics Store,Discount Store,Diner,Dessert Shop
2,Briargrove,Yoga Studio,Spa,Lebanese Restaurant,Fruit & Vegetable Store,Dessert Shop,American Restaurant,Cosmetics Shop,Cuban Restaurant,Cupcake Shop,Dance Studio
3,Cottage Grove,Bar,Nightclub,Lounge,Mexican Restaurant,American Restaurant,Italian Restaurant,Southern / Soul Food Restaurant,Spa,Gym / Fitness Center,Sushi Restaurant
4,Denver Harbor,Fast Food Restaurant,Sandwich Place,Park,Intersection,Grocery Store,Pharmacy,Video Store,Pizza Place,Gas Station,Discount Store


### Cluster the Neighborhoods

In [53]:
# set number of clusters
kclusters = 8

houston_grouped_clustering = houston_grouped.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(houston_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([4, 6, 1, 1, 1, 1, 1, 1, 1, 1])

Let's create a new dataframe that includes the cluster as well as the top 10 venues for each neighborhood


In [54]:
# add clustering labels
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

houston_merged = houston_df

# merge Toronto_grouped with Toronto_data to add latitude/longitude for each neighborhood
houston_merged = houston_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='Neighborhood')
houston_merged.fillna(0, inplace = True)
houston_merged.head() # check the last columns!

Unnamed: 0,Neighborhood,ZIP,2016 Median Home Price,LAT,LNG,LAT_LNG,Distance from Downtown,Within 15 km of Downtown,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Bellaire,77401,"$933,000",29.704019,-95.460905,"29.704019 , -95.46090500000001",11.56,True,4.0,Pharmacy,College Rec Center,Flower Shop,Fast Food Restaurant,Farmers Market,Electronics Store,Discount Store,Diner,Dessert Shop,Deli / Bodega
1,Braeswood Place,77025,"$715,000",29.685706,-95.434764,"29.685706 , -95.434764",10.81,True,6.0,Park,Cocktail Bar,Multiplex,Yoga Studio,Deli / Bodega,Farmers Market,Electronics Store,Discount Store,Diner,Dessert Shop
2,Knollwood/Woodside Area,77025,"$430,000",29.685706,-95.434764,"29.685706 , -95.434764",10.81,True,6.0,Park,Cocktail Bar,Multiplex,Yoga Studio,Deli / Bodega,Farmers Market,Electronics Store,Discount Store,Diner,Dessert Shop
3,Briargrove,77057,"$824,000",29.744081,-95.487974,"29.744081 , -95.487974",12.71,True,1.0,Yoga Studio,Spa,Lebanese Restaurant,Fruit & Vegetable Store,Dessert Shop,American Restaurant,Cosmetics Shop,Cuban Restaurant,Cupcake Shop,Dance Studio
4,Cottage Grove,77007,"$392,500",29.771545,-95.411083,"29.771545 , -95.41108299999999",5.48,True,1.0,Bar,Nightclub,Lounge,Mexican Restaurant,American Restaurant,Italian Restaurant,Southern / Soul Food Restaurant,Spa,Gym / Fitness Center,Sushi Restaurant


Let's visualize the resulting clusters!

In [55]:
# create map
map_clusters = folium.Map(
    location=[latitude, longitude],
    zoom_start=12)



# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]



# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(houston_merged['LAT'],
                                  houston_merged['LNG']
                                  , houston_merged['Neighborhood']
                                  , houston_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[int(cluster)-1],
        fill=True,
        fill_color=rainbow[int(cluster)-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

### Examine the clusters now

Cluster 1

In [56]:
houston_merged.loc[houston_merged['Cluster Labels'] == 0, houston_merged.columns[[1] + list(range(5, houston_merged.shape[1]))]]

Unnamed: 0,ZIP,LAT_LNG,Distance from Downtown,Within 15 km of Downtown,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
13,77081,"29.712099 , -95.480935",12.91,True,0.0,0,0,0,0,0,0,0,0,0,0
30,77005,"29.718434999999996 , -95.42355500000001",7.64,True,0.0,Outdoors & Recreation,Beach,Bakery,Yoga Studio,Diner,Flower Shop,Fast Food Restaurant,Farmers Market,Electronics Store,Discount Store
31,77005,"29.718434999999996 , -95.42355500000001",7.64,True,0.0,Outdoors & Recreation,Beach,Bakery,Yoga Studio,Diner,Flower Shop,Fast Food Restaurant,Farmers Market,Electronics Store,Discount Store


Cluster 2

In [57]:
houston_merged.loc[houston_merged['Cluster Labels'] == 1, houston_merged.columns[[1] + list(range(5, houston_merged.shape[1]))]]

Unnamed: 0,ZIP,LAT_LNG,Distance from Downtown,Within 15 km of Downtown,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
3,77057,"29.744081 , -95.487974",12.71,True,1.0,Yoga Studio,Spa,Lebanese Restaurant,Fruit & Vegetable Store,Dessert Shop,American Restaurant,Cosmetics Shop,Cuban Restaurant,Cupcake Shop,Dance Studio
4,77007,"29.771545 , -95.41108299999999",5.48,True,1.0,Bar,Nightclub,Lounge,Mexican Restaurant,American Restaurant,Italian Restaurant,Southern / Soul Food Restaurant,Spa,Gym / Fitness Center,Sushi Restaurant
5,77007,"29.771545 , -95.41108299999999",5.48,True,1.0,Bar,Nightclub,Lounge,Mexican Restaurant,American Restaurant,Italian Restaurant,Southern / Soul Food Restaurant,Spa,Gym / Fitness Center,Sushi Restaurant
6,77007,"29.771545 , -95.41108299999999",5.48,True,1.0,Bar,Nightclub,Lounge,Mexican Restaurant,American Restaurant,Italian Restaurant,Southern / Soul Food Restaurant,Spa,Gym / Fitness Center,Sushi Restaurant
7,77007,"29.771545 , -95.41108299999999",5.48,True,1.0,Bar,Nightclub,Lounge,Mexican Restaurant,American Restaurant,Italian Restaurant,Southern / Soul Food Restaurant,Spa,Gym / Fitness Center,Sushi Restaurant
8,77020,"29.773179 , -95.314327",4.57,True,1.0,Fast Food Restaurant,Sandwich Place,Park,Intersection,Grocery Store,Pharmacy,Video Store,Pizza Place,Gas Station,Discount Store
9,77003,"29.749778000000003 , -95.345885",1.3,True,1.0,Gym / Fitness Center,Park,Trail,Italian Restaurant,Insurance Office,Café,General Entertainment,Brewery,Arts & Crafts Store,Cuban Restaurant
10,77056,"29.748202000000003 , -95.468948",10.83,True,1.0,Coffee Shop,New American Restaurant,Smoothie Shop,Supermarket,Bridal Shop,Sandwich Place,Bank,Deli / Bodega,Cupcake Shop,Dance Studio
11,77056,"29.748202000000003 , -95.468948",10.83,True,1.0,Coffee Shop,New American Restaurant,Smoothie Shop,Supermarket,Bridal Shop,Sandwich Place,Bank,Deli / Bodega,Cupcake Shop,Dance Studio
12,77046,"29.733777000000003 , -95.433346",7.75,True,1.0,Coffee Shop,Seafood Restaurant,Gym / Fitness Center,Sushi Restaurant,Clothing Store,Record Shop,Café,Liquor Store,Food Court,Italian Restaurant


Cluster 3

In [58]:
houston_merged.loc[houston_merged['Cluster Labels'] == 2, houston_merged.columns[[1] + list(range(5, houston_merged.shape[1]))]]

Unnamed: 0,ZIP,LAT_LNG,Distance from Downtown,Within 15 km of Downtown,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
26,77092,"29.830023999999998 , -95.474409",13.98,True,2.0,Hotel,Arts & Crafts Store,Yoga Studio,Dessert Shop,Flower Shop,Fast Food Restaurant,Farmers Market,Electronics Store,Discount Store,Diner
27,77092,"29.830023999999998 , -95.474409",13.98,True,2.0,Hotel,Arts & Crafts Store,Yoga Studio,Dessert Shop,Flower Shop,Fast Food Restaurant,Farmers Market,Electronics Store,Discount Store,Diner


Cluster 4

In [59]:
houston_merged.loc[houston_merged['Cluster Labels'] == 3, houston_merged.columns[[1] + list(range(5, houston_merged.shape[1]))]]

Unnamed: 0,ZIP,LAT_LNG,Distance from Downtown,Within 15 km of Downtown,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
18,77061,"29.647022999999997 , -95.276656",14.38,True,3.0,Airport Service,Yoga Studio,Dessert Shop,Flower Shop,Fast Food Restaurant,Farmers Market,Electronics Store,Discount Store,Diner,Deli / Bodega


Cluster 5

In [60]:
houston_merged.loc[houston_merged['Cluster Labels'] == 4, houston_merged.columns[[1] + list(range(5, houston_merged.shape[1]))]]

Unnamed: 0,ZIP,LAT_LNG,Distance from Downtown,Within 15 km of Downtown,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,77401,"29.704019 , -95.46090500000001",11.56,True,4.0,Pharmacy,College Rec Center,Flower Shop,Fast Food Restaurant,Farmers Market,Electronics Store,Discount Store,Diner,Dessert Shop,Deli / Bodega


Cluster 6

In [61]:
houston_merged.loc[houston_merged['Cluster Labels'] == 5, houston_merged.columns[[1] + list(range(5, houston_merged.shape[1]))]]

Unnamed: 0,ZIP,LAT_LNG,Distance from Downtown,Within 15 km of Downtown,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
20,77051,"29.656113 , -95.37869599999999",11.26,True,5.0,Health & Beauty Service,Health Food Store,Flower Shop,Fast Food Restaurant,Farmers Market,Electronics Store,Discount Store,Diner,Dessert Shop,Deli / Bodega


Cluster 7

In [62]:
houston_merged.loc[houston_merged['Cluster Labels'] == 6, houston_merged.columns[[1] + list(range(5, houston_merged.shape[1]))]]

Unnamed: 0,ZIP,LAT_LNG,Distance from Downtown,Within 15 km of Downtown,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
1,77025,"29.685706 , -95.434764",10.81,True,6.0,Park,Cocktail Bar,Multiplex,Yoga Studio,Deli / Bodega,Farmers Market,Electronics Store,Discount Store,Diner,Dessert Shop
2,77025,"29.685706 , -95.434764",10.81,True,6.0,Park,Cocktail Bar,Multiplex,Yoga Studio,Deli / Bodega,Farmers Market,Electronics Store,Discount Store,Diner,Dessert Shop
33,77055,"29.796871000000003 , -95.49165",13.76,True,6.0,Coffee Shop,Park,Bowling Alley,Dessert Shop,Flower Shop,Fast Food Restaurant,Farmers Market,Electronics Store,Discount Store,Diner


Cluster 8

In [63]:
houston_merged.loc[houston_merged['Cluster Labels'] == 7, houston_merged.columns[[1] + list(range(5, houston_merged.shape[1]))]]

Unnamed: 0,ZIP,LAT_LNG,Distance from Downtown,Within 15 km of Downtown,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
25,77028,"29.827869 , -95.287485",10.45,True,7.0,Burger Joint,Convenience Store,Yoga Studio,Flower Shop,Fast Food Restaurant,Farmers Market,Electronics Store,Discount Store,Diner,Dessert Shop


***
## Results, Discussion and Conclusion
Based on the results above, we can see JUST how much there is to do near Downtown Houston! Cocktail bars and health venues seem to be the majority of the popular venues.This data makes sense due to the population near downtown being young professionals, while the more tenured professionals often move the the suburbs. Outside activities and socialization are driving the scene in Houston, as people want to look good and have fun!

Depending on your preferences and lifestyle, you can make your own decision based on where you would want to live or explore. One of the reasons I retiained the median home price in the `houston_df` dataframe is so that could help influence your decision. Mind you - it is circa 2016, which is likely quite outdated, but could still be helpful as a relational comparison tool.

I hope that you can glean some valuable insights from this exercize and feel free to drop a note with any questions!

Thanks for reading along!