In this article I will examine how to look up nearby venues by using Foursquare API to retrieve data from Foursquare. 

I will extract different neighborhood in Munich, Germany for amenities. We will define the acceptable driving distance to be within 1.5 km (walking distance)

Based on definition of our problem, we must consider the following factors:

type and number of venues in the surrounding area of selected neighborhood 
It is better to use regularly spaced grid of locations, centered around city center, to set as a starting point to scrape the amenities in the selected neighborhood.

Following data sources will be needed to extract/generate the required information:

1. center of selected neiborhoods will be generated algorithmically and approximate addresses of centers of those areas will be obtained using Openstreet Maps API reverse geocoding
2. type and number of amenities and their type and location in every neighborhood will be obtained using Foursquare API


In [1]:
#import libraries
import pandas as pd
import numpy as np

import re
from geopy.geocoders import Nominatim
import shapely
import pyproj
import math
import folium
import requests
import pandas as pd
import time
import numpy as np
from sklearn.cluster import KMeans
import matplotlib.cm as cm
import matplotlib.colors as colors

import warnings
warnings.filterwarnings('ignore')

In [6]:
##### Define the loation of interest:
within_km = 1.5 # Select the within range in km (walking distance?)
bubble_detail_factor = 7 #varible factor determining how big the bubble will be, higher value = smaller bubble

# Read in the addresses of intrested locations in to a dataframe
d = {'listed_location': ['Altstadt-Lehel, Munich', 
                         'Ludwigsvorstadt and Isarvorstadt, Munich',
                         'Maxvorstadt, Munich',
                         'Schwabing West, Munich',
                         'Au-Haidhausen, Munich',
                         'Sendling, Munich',
                         'Sendling – Westpark, Munich',
                         'Schwanthalerhöhe, Munich', 
                         'Neuhausen Nymphenburg, Munich',
                         'Moosach, Munich',
                         'Milbertshofen und Am Hart, Munich',
                         'Schwabing-Freimann, Munich',
                         'Bogenhausen, Munich',
                         'Berg am Laim, Munich',
                         'Trudering – Riem, Munich',
                         'Ramersdorf und Perlach, Munich',
                         'Obergiesing, Munich',
                         'Untergiesing und Harlaching, Munich',
                         'Thalkirchen, Munich',
                         'Obersendling, Munich',
                         'Forstenried, Munich',
                         'Fürstenried-Solln, Munich',
                         'Hadern, Munich',
                         'Pasing – Obermenzing, Munich',
                         'Aubing-Lochhausen-Langwied, Munich',
                         'Allach Untermenzing, Munich',
                         'Feldmoching-Hasenbergl, Munich',
                         'Laim, Munich',                       
                        ]}
dfListing = pd.DataFrame(data=d)
dfListing

Unnamed: 0,listed_location
0,"Altstadt-Lehel, Munich"
1,"Ludwigsvorstadt and Isarvorstadt, Munich"
2,"Maxvorstadt, Munich"
3,"Schwabing West, Munich"
4,"Au-Haidhausen, Munich"
5,"Sendling, Munich"
6,"Sendling – Westpark, Munich"
7,"Schwanthalerhöhe, Munich"
8,"Neuhausen Nymphenburg, Munich"
9,"Moosach, Munich"


Munich is located in Upper Bavaria, one of the 7 administrative districts of Bavaria. and is the capital of the administrative district of Upper Bavaria. Munich is also the capital of the Federal State and Free State of Bavaria. The city is the third largest city in Germany after Berlin and Hamburg. Since 2015, the population of Munich has exceeded 1.5 million for the first time. At the beginning of 2017, the Bavarian metropolis has 1.54 million inhabitants. So the population is growing rapidly, which leads to a housing shortage. The city has the highest rents and property prices in Germany.
The city of Munich is politically and administratively divided into 27 districts. The names of the districts are largely familiar to Munich’s residents, as they usually correspond to the traditional names of the city districts. The first district is the Altstadt (Old Town), the districts around the city centre have low numbers. The high double-digit districts are the suburbs, which are located on the outskirts of the city. The numbers of the districts are rather unknown in Munich.

Reading in the API keys and client ID for both Google and Foursquare in to variables:

In [63]:
#Read in API keys from a seperate text file, and define the version of the Foursquare API
behdad = open("behdad.txt", "r")
foursquare_client_id = re.search('(?<=Foursquare CLIENT_ID:)\S+',re.findall(r'Foursquare CLIENT_ID:.*', behdad.read())[0])[0]
behdad.seek(0)
foursquare_client_secret = re.search('(?<=Foursquare CLIENT_SECRET:)\S+',re.findall(r'Foursquare CLIENT_SECRET:.*', behdad.read())[0])[0]
behdad.seek(0)

VERSION = '20180323' # Foursquare API version
LIMIT = 7 # A default Foursquare API limit value

## Neighborhood Candidates
For this project, we would need to select centroids of latitude & longitude coordinates to scan nearby areas. The method will be creating grids of cells within 1.5 km of our selected addresses, which is approximate 3 x 3 km centered around the selected address.

Lets first find the latitude & longitude of the select city, using Openstreet Maps geocoding API.

In [8]:
geolocator = Nominatim(user_agent="foursqure_agent")

def get_coordinates(listing):
    try:
        geocode_result = geolocator.geocode(listing)
    except IndexError:
        print("Address was wrong...")
    except Exception as e:
        print("Unexpected error occurred.", e )
    return geocode_result[1]

Now let's create a grid of area candidates, equally spaced, centered around the given property address and within ~0.2143km (within 1.5 km divided by bubble size factor of 7) from the center. Each grid "bubble" surrounding the address will be defined as circular areas with a radius of 0.75 km (within 1.5 km, there for the radius is half of that), so grid bubbles' center will be 0.4286 km apart.

Using a method already available online, to accurately calculate distances we need to create our grid of locations in Cartesian 2D coordinate system which allows us to calculate distances in meters (not in latitude/longitude degrees). Then we will project those coordinates back to latitude/longitude degrees to be shown on Folium map. So let's create functions to convert between WGS84 spherical coordinate system (latitude/longitude degrees) and UTM Cartesian coordinate system (X/Y coordinates in meters).

UTM stands for "Universal Transverse Mercator".
Reference: [Cartesian/Projected Coordinate Systems, UTM](https://openpress.usask.ca/introgeomatics/chapter/cartesianprojected-coordinate-systems-utm/#:~:text=Universal%20Transverse%20Mercator%20(UTM)%20is,in%20metric%20units%20(metres).&text=Conversions%20from%20one%20coordinate%20system,a%20mathematical%20process%20called%20projection.)

In [7]:
#longitude & latitude to UTM xy
def lonlat_to_xy(lon, lat):
    proj_latlon = pyproj.Proj(proj='latlong',datum='WGS84')
    proj_xy = pyproj.Proj(proj="utm", zone=33, datum='WGS84')
    xy = pyproj.transformer.transform(proj_latlon, proj_xy, lon, lat)
    return xy[0], xy[1]

#UTM xy to longitude & latitude
def xy_to_lonlat(x, y):
    proj_latlon = pyproj.Proj(proj='latlong',datum='WGS84')
    proj_xy = pyproj.Proj(proj="utm", zone=33, datum='WGS84')
    lonlat = pyproj.transformer.transform(proj_xy, proj_latlon, x, y)
    return lonlat[0], lonlat[1]

#calculate distance between the two points (x1, y1) and (x2, y2)
def calc_xy_distance(x1, y1, x2, y2):
    dx = x2 - x1
    dy = y2 - y1
    return math.sqrt(dx*dx + dy*dy)

The very next step is to create a **hexagonal grid of cells**: we offset every other row, and adjust vertical row spacing so that **every cell center is equally distant from all it's neighbors**.

For more details, please refer to: [Hexagonal Grids](https://www.redblobgames.com/grids/hexagons/#:~:text=Size%20and%20Spacing%23&text=In%20the%20pointy%20orientation%2C%20a,from%20sin(60%C2%B0).&text=The%20horizontal%20distance%20between%20adjacent,is%20h%20*%203%2F4%20.)

Applying the get_coordinates() function and adding to the results to the dataframe

In [10]:
dfListing['location_corr'] = dfListing['listed_location'].apply(get_coordinates)
dfListing

Unnamed: 0,listed_location,location_corr
0,"Altstadt-Lehel, Munich","(48.1378285, 11.5745823)"
1,"Ludwigsvorstadt and Isarvorstadt, Munich","(48.1303398, 11.5733658)"
2,"Maxvorstadt, Munich","(48.1510916, 11.5624179)"
3,"Schwabing West, Munich","(48.1682709, 11.5698727)"
4,"Au-Haidhausen, Munich","(48.130273849999995, 11.59833361534854)"
5,"Sendling, Munich","(48.1180125, 11.5390832)"
6,"Sendling – Westpark, Munich","(48.11803085, 11.519332770284128)"
7,"Schwanthalerhöhe, Munich","(48.1337822, 11.5410566)"
8,"Neuhausen Nymphenburg, Munich","(48.1542217, 11.5315172)"
9,"Moosach, Munich","(48.1798949, 11.5105712)"


In [11]:
#mapping out the locations and their surrounding venues
neighborhood=[]
latitudes = []
longitudes = []
distances_from_center = []
xs = []
ys = []
map_city = folium.Map(location=dfListing['location_corr'][1], zoom_start=(11))

for (index, row) in (dfListing.iterrows()):
    listed_location_x, listed_location_y = lonlat_to_xy(row ['location_corr'][1], row ['location_corr'][0]) 
    folium.Marker(row['location_corr'], popup=row['listed_location']).add_to(map_city)
    m = within_km * 1000
    k = math.sqrt(3) / 2 # Vertical offset for hexagonal grid cells
    x_min = listed_location_x - m
    x_step = m/(bubble_detail_factor/2)
    y_min = listed_location_y - m - (int(21/k)*k*m/bubble_detail_factor - 2*m)/2
    y_step = m/(bubble_detail_factor/2) * k
    for i in range(0, int(21/k)):
        y = y_min + i * y_step
        x_offset = (m/bubble_detail_factor) if i%2==0 else 0
        for j in range(0, 21):
            x = x_min + j * x_step + x_offset
            distance_from_center = calc_xy_distance(listed_location_x, listed_location_y, x, y)
            if (distance_from_center <= (m+1)):
                lon, lat = xy_to_lonlat(x, y)
                neighborhood.append(row['listed_location'])
                latitudes.append(lat)
                longitudes.append(lon)
                distances_from_center.append(distance_from_center)
                xs.append(x)
                ys.append(y)    

In [12]:
for lat, lon in zip(latitudes, longitudes):
    #folium.CircleMarker([lat, lon], radius=2, color='blue', fill=True, fill_color='blue', fill_opacity=1).add_to(map_city) 
    folium.Circle([lat, lon], radius=m/bubble_detail_factor, color='blue', fill=False).add_to(map_city)
    #folium.Marker([lat, lon]).add_to(map_city)
map_city

In [22]:
geolocator = Nominatim(user_agent="foursqure_agent")

def get_address(latlong):
    try:
        geocode_result = geolocator.reverse(latlong)
    except IndexError:
        print("latlong was wrong...")
    except Exception as e:
        print("Unexpected error occurred.", e )
    return geocode_result[0]

In [23]:
print('Obtaining location addresses: ', end='')
addresses = []
for latlon in zip(latitudes, longitudes):
    address = get_address(latlon)
    if address is None:
        address = 'NO ADDRESS'
    addresses.append(address)
    print(' .', end='')

Obtaining location addresses:  . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

In [24]:
df_locations = pd.DataFrame({'Neighborhood' : neighborhood,
                             'Address': addresses, #address of each grid bubble
                             'Latitude': latitudes,
                             'Longitude': longitudes,
                             'X': xs,
                             'Y': ys,
                             'Distance from center': distances_from_center})

df_locations

Unnamed: 0,Neighborhood,Address,Latitude,Longitude,X,Y,Distance from center
0,"Altstadt-Lehel, Munich","24, Eduard-Schmid-Straße, Untere Au, Bezirkste...",48.124495,11.575470,245177.072223,5.335811e+06,1484.614978
1,"Altstadt-Lehel, Munich","58, Pestalozzistraße, Bezirksteil Glockenbach,...",48.127571,11.566624,244534.215080,5.336182e+06,1285.714286
2,"Altstadt-Lehel, Munich","35, Westermühlstraße, Bezirksteil Glockenbach,...",48.127743,11.572374,244962.786509,5.336182e+06,1133.893419
3,"Altstadt-Lehel, Munich","Erhardtstraße, Bezirksteil Deutsches Museum, L...",48.127914,11.578123,245391.357937,5.336182e+06,1133.893419
4,"Altstadt-Lehel, Munich","45, Zeppelinstraße, Untere Au, Bezirksteil Unt...",48.128086,11.583872,245819.929366,5.336182e+06,1285.714286
...,...,...,...,...,...,...,...
1199,"Laim, Munich","An der Schloßmauer, Bezirksteil Nymphenburg, N...",48.149286,11.492859,239156.457330,5.338843e+06,1285.714286
1200,"Laim, Munich","Zuccalistraße, Bezirksteil Nymphenburg, Neuhau...",48.149462,11.498610,239585.028758,5.338843e+06,1133.893419
1201,"Laim, Munich","30, Wotanstraße, Bezirksteil Nymphenburg, Neuh...",48.149637,11.504361,240013.600187,5.338843e+06,1133.893419
1202,"Laim, Munich","Königlicher Hirschgarten, Hirschgarten, Birket...",48.149812,11.510112,240442.171616,5.338843e+06,1285.714286


So, we have 1204 grid bubbles for us to look at. Next is to use Foursquare to scrape all the venues within each grid bubbles (if any):

Next, we are going to start utilizing the Foursquare API to explore the addresses and segment them.

In [64]:
def getNearbyVenues(names, latitudes, longitudes, neighborhood, radius=m/20):
    
    venues_list=[]
    counter = 0
    for name, lat, lng, neighborhood in zip(names, latitudes, longitudes, neighborhood):
        print(neighborhood, '->', name) #to make sure the programming is running and at which stage
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
                foursquare_client_id, 
                foursquare_client_secret, 
                VERSION, 
                lat, 
                lng, 
                radius, 
                LIMIT)
        # make the GET request
        try:
            results = requests.get(url).json()["response"]['groups'][0]['items']
        except requests.exceptions.RequestException as err:
            print ("OOps: Something Else",err)
        except requests.exceptions.HTTPError as errh:
            print ("Http Error:",errh)
        except requests.exceptions.ConnectionError as errc:
            print ("Error Connecting:",errc)
        except requests.exceptions.Timeout as errt:
            print ("Timeout Error:",errt)  

        # return only relevant information for each nearby venue
        venues_list.append([(
                neighborhood,
                name, 
                lat, 
                lng, 
                v['venue']['id'],
                v['venue']['name'], 
                v['venue']['location']['lat'], 
                v['venue']['location']['lng'],  
                v['venue']['location']['formattedAddress'][0],
                v['venue']['categories'][0]['name']) for v in results])
            
    print(venues_list)
    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    print(nearby_venues)
    nearby_venues.columns = ['Neighborhood',
                             'District', 
                  'District Latitude', 
                  'District Longitude', 
                  'id',
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Address',
                  'Venue Category']
    
    return(nearby_venues)

city_venues = getNearbyVenues(names=df_locations['Address'],
                                   latitudes=df_locations['Latitude'],
                                   longitudes=df_locations['Longitude'],
                                  neighborhood = df_locations['Neighborhood']
                                  )

Altstadt-Lehel, Munich -> 24, Eduard-Schmid-Straße, Untere Au, Bezirksteil Untere Au, Au, Au-Haidhausen, München, Bayern, 81541, Deutschland
Altstadt-Lehel, Munich -> 58, Pestalozzistraße, Bezirksteil Glockenbach, Ludwigsvorstadt-Isarvorstadt, München, Bayern, 80469, Deutschland
Altstadt-Lehel, Munich -> 35, Westermühlstraße, Bezirksteil Glockenbach, Ludwigsvorstadt-Isarvorstadt, München, Bayern, 80469, Deutschland
Altstadt-Lehel, Munich -> Erhardtstraße, Bezirksteil Deutsches Museum, Ludwigsvorstadt-Isarvorstadt, München, Bayern, 80469, Deutschland
Altstadt-Lehel, Munich -> 45, Zeppelinstraße, Untere Au, Bezirksteil Untere Au, Au, Au-Haidhausen, München, Bayern, 81669, Deutschland
Altstadt-Lehel, Munich -> 2, Haydnstraße, Klinikviertel, Bezirksteil Ludwigsvorstadt-Kliniken, Ludwigsvorstadt-Isarvorstadt, München, Bayern, 80336, Deutschland
Altstadt-Lehel, Munich -> 7, Maistraße, Bezirksteil Am alten südlichen Friedhof, Ludwigsvorstadt-Isarvorstadt, München, Bayern, 80337, Deutschland
A

So, in total we have 1204 venues from all grid bubbles. Lets check how many venues were returned from all of these addresses.

In [65]:
city_venues

Unnamed: 0,Neighborhood,District,District Latitude,District Longitude,id,Venue,Venue Latitude,Venue Longitude,Venue Address,Venue Category
0,"Altstadt-Lehel, Munich","24, Eduard-Schmid-Straße, Untere Au, Bezirkste...",48.124495,11.575470,4e3d0f8f45dd68e32733f731,Isarfest,48.124145,11.575537,München,Arts & Entertainment
1,"Altstadt-Lehel, Munich","35, Westermühlstraße, Bezirksteil Glockenbach,...",48.127743,11.572374,4b1589b7f964a52063ae23e3,Kirschbluete,48.127920,11.573120,Ickstattstr. 26,Asian Restaurant
2,"Altstadt-Lehel, Munich","35, Westermühlstraße, Bezirksteil Glockenbach,...",48.127743,11.572374,4ba3ee9df964a520dd6e38e3,Maroto Bar,48.127438,11.571582,Westermühlstr. 31,Bar
3,"Altstadt-Lehel, Munich","35, Westermühlstraße, Bezirksteil Glockenbach,...",48.127743,11.572374,51210c87e4b07e643351484f,Vintage Selection,48.127172,11.571875,Westermühlstraße 39,Wine Bar
4,"Altstadt-Lehel, Munich","2, Haydnstraße, Klinikviertel, Bezirksteil Lud...",48.130646,11.557778,523b5140498e98204df77776,Hans im Glück - Burgergrill,48.130126,11.557854,Goetheplatz 2 (Mozartstr.),Burger Joint
...,...,...,...,...,...,...,...,...,...,...
776,"Laim, Munich","17, Agricolastraße, Bezirksteil St. Ulrich, La...",48.142620,11.493313,502155cee4b0f7aa0b19ae21,Agricolaplatz,48.142236,11.492911,Agricolaplatz,Plaza
777,"Laim, Munich","Edeka, 390, Landsberger Straße, Am Westbad, Be...",48.145690,11.484460,4e188808b0fb8567c66d6ed3,EDEKA,48.145099,11.484187,Landsberger Str. 390,Supermarket
778,"Laim, Munich","Edeka, 390, Landsberger Straße, Am Westbad, Be...",48.145690,11.484460,4beeb11ce24d20a13a327314,ALDI SÜD,48.145265,11.485066,Landsberger Str. 388,Supermarket
779,"Laim, Munich","5, Schloßschmidstraße, Siedlung Neuhausen, Bez...",48.146741,11.518965,5565fe7d498e124cb64a7ef5,Juli Restaurant,48.146195,11.518886,Friesenheimer Brücke,Szechuan Restaurant


In [66]:
city_venues['Venue Category'].unique()

array(['Arts & Entertainment', 'Asian Restaurant', 'Bar', 'Wine Bar',
       'Burger Joint', 'Movie Theater', 'Cocktail Bar',
       'Doner Restaurant', 'Theater', 'Currywurst Joint',
       'Food & Drink Shop', 'Bistro', 'Hotel',
       'Vegetarian / Vegan Restaurant', 'Nightclub',
       'College Academic Building', 'Camera Store', 'Turkish Restaurant',
       'Bank', 'Museum', 'Jewish Restaurant', 'Plaza',
       'Falafel Restaurant', 'Fish Market', 'Seafood Restaurant', 'Café',
       'English Restaurant', 'Argentinian Restaurant',
       'German Restaurant', 'Bathing Area', 'Indie Movie Theater',
       'Cupcake Shop', 'Bavarian Restaurant', 'Clothing Store',
       'Japanese Restaurant', 'Park', 'Bakery', 'Trattoria/Osteria',
       'Coffee Shop', 'Gym', 'Bookstore', 'Pizza Place', 'Supermarket',
       'Palatine Restaurant', 'Deli / Bodega', 'Electronics Store',
       'Tunnel', 'Tea Room', 'Athletics & Sports', 'Lake',
       'Miscellaneous Shop', 'Pub', 'Dumpling Restaurant',


In [67]:
city_venues.to_csv('city_venues_munich.csv', index = False)

In [68]:
city_venues.groupby('Venue Category').count()

Unnamed: 0_level_0,Neighborhood,District,District Latitude,District Longitude,id,Venue,Venue Latitude,Venue Longitude,Venue Address
Venue Category,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
Accessories Store,1,1,1,1,1,1,1,1,1
Afghan Restaurant,2,2,2,2,2,2,2,2,2
American Restaurant,2,2,2,2,2,2,2,2,2
Arcade,1,1,1,1,1,1,1,1,1
Argentinian Restaurant,1,1,1,1,1,1,1,1,1
...,...,...,...,...,...,...,...,...,...
Vietnamese Restaurant,6,6,6,6,6,6,6,6,6
Vineyard,1,1,1,1,1,1,1,1,1
Wine Bar,1,1,1,1,1,1,1,1,1
Wine Shop,1,1,1,1,1,1,1,1,1


In [69]:
print('There are {} unique categories.'.format(len(city_venues['Venue Category'].unique())))

There are 184 unique categories.


There are 184 unique categories.

Analyze Each Property's surrounding area - using one hot coding, each venue category will become a column in the dataframe:

In [70]:
# one hot encoding
city_venues_onehot = pd.get_dummies(city_venues[['Venue Category']], prefix="", prefix_sep="")

# add property column back to dataframe
city_venues_onehot['Neighborhood'] = city_venues['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [city_venues_onehot.columns[-1]] + list(city_venues_onehot.columns[:-1])
city_venues_onehot = city_venues_onehot[fixed_columns]

city_venues_onehot.head()

Unnamed: 0,Neighborhood,Accessories Store,Afghan Restaurant,American Restaurant,Arcade,Argentinian Restaurant,Art Gallery,Art Museum,Arts & Entertainment,Asian Restaurant,...,Tram Station,Trattoria/Osteria,Tunnel,Turkish Restaurant,Vegetarian / Vegan Restaurant,Vietnamese Restaurant,Vineyard,Wine Bar,Wine Shop,Zoo Exhibit
0,"Altstadt-Lehel, Munich",0,0,0,0,0,0,0,1,0,...,0,0,0,0,0,0,0,0,0,0
1,"Altstadt-Lehel, Munich",0,0,0,0,0,0,0,0,1,...,0,0,0,0,0,0,0,0,0,0
2,"Altstadt-Lehel, Munich",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,"Altstadt-Lehel, Munich",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,1,0,0
4,"Altstadt-Lehel, Munich",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [71]:
city_venues_onehot.shape

(781, 185)

Next, let's group rows by neighborhood and by taking the mean of the frequency of occurrence of each category:

In [72]:
city_venues_grouped = city_venues_onehot.groupby('Neighborhood').mean().reset_index()
city_venues_grouped

Unnamed: 0,Neighborhood,Accessories Store,Afghan Restaurant,American Restaurant,Arcade,Argentinian Restaurant,Art Gallery,Art Museum,Arts & Entertainment,Asian Restaurant,...,Tram Station,Trattoria/Osteria,Tunnel,Turkish Restaurant,Vegetarian / Vegan Restaurant,Vietnamese Restaurant,Vineyard,Wine Bar,Wine Shop,Zoo Exhibit
0,"Allach Untermenzing, Munich",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,"Altstadt-Lehel, Munich",0.0,0.0,0.0,0.0,0.010989,0.0,0.0,0.010989,0.043956,...,0.0,0.010989,0.010989,0.010989,0.010989,0.0,0.0,0.010989,0.0,0.0
2,"Au-Haidhausen, Munich",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.014925,...,0.014925,0.0,0.0,0.0,0.0,0.044776,0.0,0.0,0.0,0.0
3,"Aubing-Lochhausen-Langwied, Munich",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,"Berg am Laim, Munich",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0625,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
5,"Bogenhausen, Munich",0.0,0.0,0.0,0.0,0.0,0.05,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
6,"Feldmoching-Hasenbergl, Munich",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
7,"Forstenried, Munich",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.058824,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
8,"Fürstenried-Solln, Munich",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
9,"Hadern, Munich",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.083333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [73]:
city_venues_grouped.shape

(28, 185)

In [74]:
num_top_venues = 7

for hood in city_venues_grouped['Neighborhood']:
    print("----"+hood+"----")
    temp = city_venues_grouped[city_venues_grouped['Neighborhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Allach Untermenzing, Munich----
                        venue  freq
0                  Playground  0.25
1         Sporting Goods Shop  0.25
2                      Forest  0.25
3                 Snack Place  0.25
4          Mexican Restaurant  0.00
5          Miscellaneous Shop  0.00
6  Modern European Restaurant  0.00


----Altstadt-Lehel, Munich----
                 venue  freq
0                 Café  0.07
1                Hotel  0.05
2                  Bar  0.05
3     Asian Restaurant  0.04
4          Supermarket  0.03
5         Burger Joint  0.03
6  Indie Movie Theater  0.03


----Au-Haidhausen, Munich----
                   venue  freq
0                   Café  0.09
1     Italian Restaurant  0.06
2                  Hotel  0.04
3      Indian Restaurant  0.04
4           Burger Joint  0.04
5  Vietnamese Restaurant  0.04
6       Doner Restaurant  0.04


----Aubing-Lochhausen-Langwied, Munich----
                        venue  freq
0           Recreation Center  0.33
1             

In [75]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [76]:
num_top_venues = 7

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
city_venues_sorted = pd.DataFrame(columns=columns)
city_venues_sorted['Neighborhood'] = city_venues_grouped['Neighborhood']

for ind in np.arange(city_venues_grouped.shape[0]):
    city_venues_sorted.iloc[ind, 1:] = return_most_common_venues(city_venues_grouped.iloc[ind, :], num_top_venues)

city_venues_sorted

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue
0,"Allach Untermenzing, Munich",Playground,Sporting Goods Shop,Snack Place,Forest,Dumpling Restaurant,Drugstore,Doner Restaurant
1,"Altstadt-Lehel, Munich",Café,Hotel,Bar,Asian Restaurant,Indie Movie Theater,Supermarket,Burger Joint
2,"Au-Haidhausen, Munich",Café,Italian Restaurant,Indian Restaurant,Doner Restaurant,Vietnamese Restaurant,Burger Joint,Hotel
3,"Aubing-Lochhausen-Langwied, Munich",Clothing Store,Recreation Center,Bus Stop,Dance Studio,English Restaurant,Electronics Store,Dumpling Restaurant
4,"Berg am Laim, Munich",Bus Stop,Supermarket,Hotel,Construction & Landscaping,Tram Station,Restaurant,Theme Park
5,"Bogenhausen, Munich",Bus Stop,Drugstore,Bakery,Chinese Restaurant,Dog Run,Supermarket,Middle Eastern Restaurant
6,"Feldmoching-Hasenbergl, Munich",Cocktail Bar,IT Services,Motorcycle Shop,Playground,College Academic Building,Comedy Club,English Restaurant
7,"Forstenried, Munich",Bus Stop,Bakery,BBQ Joint,Castle,Dog Run,Supermarket,Asian Restaurant
8,"Fürstenried-Solln, Munich",Bakery,Farmers Market,Supermarket,Pizza Place,Bowling Alley,Plaza,Mexican Restaurant
9,"Hadern, Munich",Bus Stop,Trattoria/Osteria,Supermarket,Salon / Barbershop,Bus Line,Forest,Soccer Field


In [77]:
# set number of clusters
kclusters = 5

city_venues_grouped_clustering = city_venues_grouped.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(city_venues_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([4, 1, 1, 2, 0, 0, 3, 0, 0, 0])

In [78]:
# add clustering labels
city_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

city_venues_merged = df_locations

# merge manhattan_grouped with manhattan_data to add latitude/longitude for each neighborhood
city_venues_merged = city_venues_merged.join(city_venues_sorted.set_index('Neighborhood'), on='Neighborhood')


city_venues_merged # check the last columns!

Unnamed: 0,Neighborhood,Address,Latitude,Longitude,X,Y,Distance from center,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue
0,"Altstadt-Lehel, Munich","24, Eduard-Schmid-Straße, Untere Au, Bezirkste...",48.124495,11.575470,245177.072223,5.335811e+06,1484.614978,1,Café,Hotel,Bar,Asian Restaurant,Indie Movie Theater,Supermarket,Burger Joint
1,"Altstadt-Lehel, Munich","58, Pestalozzistraße, Bezirksteil Glockenbach,...",48.127571,11.566624,244534.215080,5.336182e+06,1285.714286,1,Café,Hotel,Bar,Asian Restaurant,Indie Movie Theater,Supermarket,Burger Joint
2,"Altstadt-Lehel, Munich","35, Westermühlstraße, Bezirksteil Glockenbach,...",48.127743,11.572374,244962.786509,5.336182e+06,1133.893419,1,Café,Hotel,Bar,Asian Restaurant,Indie Movie Theater,Supermarket,Burger Joint
3,"Altstadt-Lehel, Munich","Erhardtstraße, Bezirksteil Deutsches Museum, L...",48.127914,11.578123,245391.357937,5.336182e+06,1133.893419,1,Café,Hotel,Bar,Asian Restaurant,Indie Movie Theater,Supermarket,Burger Joint
4,"Altstadt-Lehel, Munich","45, Zeppelinstraße, Untere Au, Bezirksteil Unt...",48.128086,11.583872,245819.929366,5.336182e+06,1285.714286,1,Café,Hotel,Bar,Asian Restaurant,Indie Movie Theater,Supermarket,Burger Joint
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1199,"Laim, Munich","An der Schloßmauer, Bezirksteil Nymphenburg, N...",48.149286,11.492859,239156.457330,5.338843e+06,1285.714286,0,Supermarket,Doner Restaurant,Plaza,Light Rail Station,Tram Station,Szechuan Restaurant,Bus Stop
1200,"Laim, Munich","Zuccalistraße, Bezirksteil Nymphenburg, Neuhau...",48.149462,11.498610,239585.028758,5.338843e+06,1133.893419,0,Supermarket,Doner Restaurant,Plaza,Light Rail Station,Tram Station,Szechuan Restaurant,Bus Stop
1201,"Laim, Munich","30, Wotanstraße, Bezirksteil Nymphenburg, Neuh...",48.149637,11.504361,240013.600187,5.338843e+06,1133.893419,0,Supermarket,Doner Restaurant,Plaza,Light Rail Station,Tram Station,Szechuan Restaurant,Bus Stop
1202,"Laim, Munich","Königlicher Hirschgarten, Hirschgarten, Birket...",48.149812,11.510112,240442.171616,5.338843e+06,1285.714286,0,Supermarket,Doner Restaurant,Plaza,Light Rail Station,Tram Station,Szechuan Restaurant,Bus Stop


In [79]:
# create map
map_clusters = folium.Map(location=dfListing['location_corr'][1], zoom_start=10.4)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(city_venues_merged['Latitude'], city_venues_merged['Longitude'], city_venues_merged['Neighborhood'], city_venues_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.Circle(
        [lat, lon],
        radius=m/bubble_detail_factor,
        popup=label,
        color=rainbow[cluster-1],
        fill=False,
        fill_color=rainbow[cluster-1]).add_to(map_clusters)
       
map_clusters

In [80]:
examine_clusters = city_venues_merged.drop(['Address','Latitude','Longitude','X','Y','Distance from center'],1).drop_duplicates()
examine_clusters

Unnamed: 0,Neighborhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue
0,"Altstadt-Lehel, Munich",1,Café,Hotel,Bar,Asian Restaurant,Indie Movie Theater,Supermarket,Burger Joint
43,"Ludwigsvorstadt and Isarvorstadt, Munich",1,Hotel,Café,Bar,Italian Restaurant,Restaurant,Bavarian Restaurant,Shopping Mall
86,"Maxvorstadt, Munich",1,Bar,Café,Italian Restaurant,Asian Restaurant,Supermarket,Tram Station,Coffee Shop
129,"Schwabing West, Munich",1,Supermarket,Bakery,Breakfast Spot,Tram Station,Movie Theater,Beer Garden,Bus Station
172,"Au-Haidhausen, Munich",1,Café,Italian Restaurant,Indian Restaurant,Doner Restaurant,Vietnamese Restaurant,Burger Joint,Hotel
215,"Sendling, Munich",1,Italian Restaurant,Playground,Café,Caucasian Restaurant,Sporting Goods Shop,Soccer Field,Seafood Restaurant
258,"Sendling – Westpark, Munich",1,Café,Greek Restaurant,Supermarket,Flower Shop,Bus Stop,Business Service,Bakery
301,"Schwanthalerhöhe, Munich",1,Italian Restaurant,Hotel,Playground,Pizza Place,Bakery,Café,Doner Restaurant
344,"Neuhausen Nymphenburg, Munich",1,Greek Restaurant,Trattoria/Osteria,Italian Restaurant,Cocktail Bar,Austrian Restaurant,Electronics Store,Metro Station
387,"Moosach, Munich",1,Bakery,American Restaurant,Chinese Restaurant,Greek Restaurant,Beach,Supermarket,German Restaurant


In [81]:
examine_clusters.loc[examine_clusters['Cluster Labels'] == 0, examine_clusters.columns[[0] + list(range(2, examine_clusters.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue
430,"Milbertshofen und Am Hart, Munich",Supermarket,Bus Stop,Tram Station,Restaurant,Bank,Bakery,Bus Station
516,"Bogenhausen, Munich",Bus Stop,Drugstore,Bakery,Chinese Restaurant,Dog Run,Supermarket,Middle Eastern Restaurant
559,"Berg am Laim, Munich",Bus Stop,Supermarket,Hotel,Construction & Landscaping,Tram Station,Restaurant,Theme Park
602,"Trudering – Riem, Munich",Bus Stop,Hostel,Organic Grocery,Rental Car Location,Café,Food & Drink Shop,Italian Restaurant
645,"Ramersdorf und Perlach, Munich",Bus Stop,Supermarket,Garden Center,Drugstore,Dog Run,Shopping Mall,Clothing Store
860,"Forstenried, Munich",Bus Stop,Bakery,BBQ Joint,Castle,Dog Run,Supermarket,Asian Restaurant
903,"Fürstenried-Solln, Munich",Bakery,Farmers Market,Supermarket,Pizza Place,Bowling Alley,Plaza,Mexican Restaurant
946,"Hadern, Munich",Bus Stop,Trattoria/Osteria,Supermarket,Salon / Barbershop,Bus Line,Forest,Soccer Field
1161,"Laim, Munich",Supermarket,Doner Restaurant,Plaza,Light Rail Station,Tram Station,Szechuan Restaurant,Bus Stop


#### Reference:

https://www.linkedin.com/pulse/finding-surrounding-venues-before-buying-home-tony-chow/