# IBM Data Sciene

# Applied Data Science Capstone

## Final Project "The Battle of Neighborhoods"

This notebook is used mainly to create the Capstone project, using Python and different Data Science related tools.

Notebook created by Carlos Granados, 2020

### Step 1:

Load many of the needed libraries, starting with NumPy, Pandas, scikit_learn, requests, matplotlib, geocoder, folium...

In [1]:
# Import basic libraries
import numpy as np
import re

# Libraries for data manipulation
import pandas as pd
import json
from pandas.io.json import json_normalize

# Libraries to read information from web pages
import requests
import urllib.request
from bs4 import BeautifulSoup

# Libraries for geographical data and its visualization
import geocoder
from geopy.geocoders import Nominatim
import folium

# Libraries for Machine Learning
from sklearn.cluster import KMeans

# Libraries for visualization
import matplotlib.cm as cm
import matplotlib.colors as colors

Intel(R) Data Analytics Acceleration Library (Intel(R) DAAL) solvers for sklearn enabled: https://intelpython.github.io/daal4py/sklearn.html


### Step 2:

Some initial configurations, including the city coordinates and webpage source.

The APIs are included here initially, but in the final version they will not saved, for security reasons.

Some initial functions used to obtain geographical coordinates are also defined here

In [2]:
# Google API:
API_KEY_GOOGLE = ''

# Foursquare API:
CLIENT_ID = ''
CLIENT_SECRET = ''

In [3]:
# Function to get the geographical coordinates for different addresses

def getCoords(adr):
    ll_coords = None
    # Loop until get the coordinates
    while(ll_coords == None):
        g = geocoder.google(adr, key=API_KEY_GOOGLE)
        if g.status == 'ZERO_RESULTS':
            # It returns a None if the address is not found
            #  maybe because it is not a place in the city...
            return [np.nan, np.nan]
        #print(adr + " ", ll_coords)
        ll_coords = g.latlng
    return ll_coords

In [4]:
# Obtain Leipzig Coordinates

lpzCoords = getCoords('Leipzig')
lpzLat = lpzCoords[0]
lpzLon = lpzCoords[1]

print('Leipzig coordinates are : ' + str(lpzCoords))

Leipzig coordinates are : [51.3396955, 12.3730747]


### Step 3:

Read the corresponding URL in order to import several points of interest.

The data is transformed later into a pandas Data Frame, and the corresponding coordinates must be obtained. In the event of no answer for a given place, it will be discarded.

1.: Read the data from the URL and then store it in the string `lpzPlaces`

In [5]:
# We will use BeautifulSoup here...

# URL for the wikipedia article
url = 'https://en.wikipedia.org/wiki/Leipzig'

# Import resources from URLs
res = requests.get(url).text

# Soup parser
soup = BeautifulSoup(res, 'html.parser')

names0 = soup.find_all('td', {'align':'left'})
names1 = soup.find_all('a')

In [6]:
# Empty list to save all interesting points...

lpzPlaces = []

# A function to obtain a given number of titles,
#  after a specific title found in the name list...

def scrapSomeTitles(numTitles, secRefTitle, names):
    i = -1
    for n in names:
        s = n.get('title')
        if s == secRefTitle:
            i += 1
        if i >= 0 and i <= numTitles:
            if s != None and s != secRefTitle:
                if "not exist" in s:
                    # Sometimes the wiki pedia link does not exist
                    #  and it is specified in the title...
                    temp = re.split('\(', s)
                    s = temp[0]
                i += 1
                lpzPlaces.append(s)

In [7]:
# Extract names. There are many names, we are interested only in the
#  most important ones...

lpzPlaces = []

# Places from the tallest buildings table
for n in names0:
    s = n.get_text()
    sList = re.split(r'\n', s)
    if len(sList) > 1:
        lpzPlaces.append(sList[0])

# Places from museum names
scrapSomeTitles(numTitles=19,
                secRefTitle="Edit section: Museums and arts",
                names=names1)

# Places from Main Sights
scrapSomeTitles(numTitles=24,
                secRefTitle="Edit section: Main sights",
                names=names1)

# Places from Churches
scrapSomeTitles(numTitles=10,
                secRefTitle="Edit section: Churches",
                names=names1)

#lpzPlaces

2.: Transform the list to a data frame and process the data: eliminate duplicates, and possible names of historical characters or countries...

In [10]:
# Transform list to a dataframe
cols = ['Place Name']
lpz_df = pd.DataFrame(lpzPlaces, columns=cols)
lpz_df.sort_values(by=['Place Name'], inplace=True)
print(lpz_df.head())
print(lpz_df.shape)
num0 = lpz_df.shape[0]

             Place Name
51     Auerbachs Keller
45  Augusteum (Leipzig)
47        Augustusplatz
33         Bach Archive
37    Battle of Leipzig
(70, 1)


Now check if there are some duplicates, and then drop some entries...

In [11]:
# Lets start with the duplicates
lpz_df.drop_duplicates(inplace=True)
print(lpz_df.shape)
num1 = lpz_df.shape[0]
print(str(num0 - num1) + " duplicates eliminated")

# And then some famous names associated with the city,
#  as Goethe or Bach or some historical events...
names = ['Felix Mendelssohn', 'Martin Luther', 'Johann Eck',
         'Johann Sebastian Bach', 'Johann Wolfgang von Goethe',
         'Friedrich Schiller', "Goethe's Faust",
         'Monday demonstrations in East Germany',
         'DDR', 'East Germany']
lpz_df = lpz_df[~lpz_df['Place Name'].isin(names)]
num2 = lpz_df.shape[0]
print(str(num1 - num2) + " names eliminated")

# Restart index
lpz_df.reset_index(drop=True, inplace=True)
lpz_df.head(20)

(63, 1)
7 duplicates eliminated
9 names eliminated


Unnamed: 0,Place Name
0,Auerbachs Keller
1,Augusteum (Leipzig)
2,Augustusplatz
3,Bach Archive
4,Battle of Leipzig
5,Cantor (church)
6,Center Torgauer Platz
7,Chimney of Stahl- und Hartgusswerk Bösdorf GmbH
8,City-Hochhaus Leipzig
9,DVB-T-Sendeturm


Now we are ready to obtain the geografical coordinates of each place of interest. We use the `getCoords` function, defined some cells above

In [12]:
# Empty lists to save all latitude and longitude coordinates
latAll = []
lonAll = []

# Loop through each place, to find its geographical coordinates

for plc in lpz_df['Place Name']:
    #print(plc)
    address = "{}, Leipzig".format(plc)
    coords = getCoords(address)
    latAll.append(coords[0])
    lonAll.append(coords[1])

lpz_df['Latitude'] = latAll
lpz_df['Longitude'] = lonAll

# In the preprocessing maybe some names were not excluded...
#  For such cases, the coords are nans...
lpz_df.dropna(axis=0, inplace=True)

num3 = lpz_df.shape[0]
print(str(num2 - num3) + " names eliminated")

# Restart index
lpz_df.reset_index(drop=True, inplace=True)
lpz_df.head(20)

2 names eliminated


Unnamed: 0,Place Name,Latitude,Longitude
0,Auerbachs Keller,51.339685,12.375492
1,Augusteum (Leipzig),51.338351,12.379219
2,Augustusplatz,51.339543,12.381827
3,Bach Archive,51.338805,12.372245
4,Battle of Leipzig,51.312367,12.413267
5,Cantor (church),51.339335,12.372544
6,Center Torgauer Platz,51.345696,12.414419
7,Chimney of Stahl- und Hartgusswerk Bösdorf GmbH,51.246862,12.263808
8,City-Hochhaus Leipzig,51.337936,12.378978
9,Europahaus,51.337451,12.38157


### step 4:

Use Foursquare to get different information of each place.

Lets start seeing wich place has the most visited places nearby for different days of the week...

In [13]:
# Set radius, limit and version
radius = 100
LIMIT = 20
VERSION = '20192106'

# Function to get the nearby venues for a set of locations

def getTrendingVenues(names, latAll, lonAll, radius=500):
    venue_list = []
    for name, lat, lon in zip(names, latAll, lonAll):
        #print(name)
        # Create the API and request URL
        url = 'https://api.foursquare.com/v2/venues/trending?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lon, 
            radius, 
            LIMIT)
        # Make the GET request
        print(requests.get(url).json())
        results = requests.get(url).json()["response"]
        return results


def getNearbyVenues(names, latAll, lonAll, radius=500):
    venue_list = []
    for name, lat, lon in zip(names, latAll, lonAll):
        #print(name)
        # Create the API and request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lon, 
            radius, 
            LIMIT)
        # Make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        # return only relevant information for each nearby venue
        venue_list.append([(
            name, 
            lat, 
            lon, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])
        nearby_venues = pd.DataFrame([item for venue_list in venue_list for item in venue_list])
        nearby_venues.columns = ['Place Name', 
                  'Place Latitude', 
                  'Place Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
        
    # Return
    return(nearby_venues)

Unfortunately, Foursquare apparently not support trending information for the selected city...

In [14]:
venues = getTrendingVenues(names=lpz_df['Place Name'],
                         latAll=lpz_df['Latitude'],
                         lonAll=lpz_df['Longitude'])

print('Nearby venues done')

venues

{'meta': {'code': 200, 'requestId': '5ebaed8502a172001be0d4fb'}, 'response': {'venues': []}}
Nearby venues done


{'venues': []}

But it does give the venues around given places...

In [15]:
venues = getNearbyVenues(names=lpz_df['Place Name'],
                         latAll=lpz_df['Latitude'],
                         lonAll=lpz_df['Longitude'])

print('Nearby venues done')

venues.head(10)

Nearby venues done


Unnamed: 0,Place Name,Place Latitude,Place Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Auerbachs Keller,51.339685,12.375492,Marktplatz,51.340338,12.374609,Plaza
1,Auerbachs Keller,51.339685,12.375492,Steigenberger Grandhotel Handelshof,51.340656,12.376341,Hotel
2,Auerbachs Keller,51.339685,12.375492,Zeitgeschichtliches Forum,51.339881,12.376043,History Museum
3,Auerbachs Keller,51.339685,12.375492,Motel One Nikolaikirche,51.340666,12.37793,Hotel
4,Auerbachs Keller,51.339685,12.375492,Nikolaikirche,51.340532,12.378227,Church
5,Auerbachs Keller,51.339685,12.375492,Mephisto,51.339588,12.375477,Lounge
6,Auerbachs Keller,51.339685,12.375492,Motel One Augustusplatz,51.339761,12.379088,Hotel
7,Auerbachs Keller,51.339685,12.375492,Breuninger,51.34009,12.374108,Clothing Store
8,Auerbachs Keller,51.339685,12.375492,Handbrotzeit,51.341338,12.378071,Bistro
9,Auerbachs Keller,51.339685,12.375492,Gourmetage,51.339212,12.376137,Deli / Bodega


Analysis of each zone (Post Code). The final data frame will be grouped by postal code and using the mean of the frequency of each category

In [16]:
#First, group by Postal Code
venues.groupby('Place Name').count()

# One-hot enconding, and adding the Name back to the Data Frame
L_oneHot = pd.get_dummies(venues[['Venue Category']], prefix="", prefix_sep="")
L_oneHot['Place Name'] = venues['Place Name']

# Move Postal Code column to the first column
fixed_columns = [L_oneHot.columns[-1]] + list(L_oneHot.columns[:-1])
L_oneHot = L_oneHot[fixed_columns]

L_oneHot.head(10)

Unnamed: 0,Place Name,African Restaurant,American Restaurant,Art Gallery,Art Museum,Arts & Crafts Store,Auto Dealership,Automotive Shop,Bagel Shop,Bakery,...,Theater,Theme Restaurant,Toy / Game Store,Train Station,Tram Station,Trattoria/Osteria,Tunnel,Vietnamese Restaurant,Zoo,Zoo Exhibit
0,Auerbachs Keller,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,Auerbachs Keller,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,Auerbachs Keller,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,Auerbachs Keller,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,Auerbachs Keller,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
5,Auerbachs Keller,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
6,Auerbachs Keller,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
7,Auerbachs Keller,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
8,Auerbachs Keller,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
9,Auerbachs Keller,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [17]:
L_group = L_oneHot.groupby('Place Name').mean().reset_index()
L_group

Unnamed: 0,Place Name,African Restaurant,American Restaurant,Art Gallery,Art Museum,Arts & Crafts Store,Auto Dealership,Automotive Shop,Bagel Shop,Bakery,...,Theater,Theme Restaurant,Toy / Game Store,Train Station,Tram Station,Trattoria/Osteria,Tunnel,Vietnamese Restaurant,Zoo,Zoo Exhibit
0,Auerbachs Keller,0.0,0.0,0.0,0.05,0.0,0.0,0.0,0.0,0.0,...,0.05,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,Augusteum (Leipzig),0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.05,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,Augustusplatz,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.05,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,Bach Archive,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.1,0.0,0.0,0.0,0.0,0.05,0.0,0.0,0.0,0.0
4,Battle of Leipzig,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.4,0.0,0.0,0.0,0.0,0.0
5,Cantor (church),0.0,0.0,0.0,0.05,0.0,0.0,0.0,0.0,0.0,...,0.1,0.0,0.0,0.0,0.0,0.05,0.0,0.0,0.0,0.0
6,Center Torgauer Platz,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.5,0.0,0.0,0.0,0.0,0.0
7,Chimney of Stahl- und Hartgusswerk Bösdorf GmbH,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0
8,City-Hochhaus Leipzig,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.05,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
9,Europahaus,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.05,0.0,0.0,0.0,0.0,0.05,0.0,0.0,0.0,0.0


In [76]:
# See all top 10 of venues for each Postal Code

numMax = 10
for pn in L_group['Place Name']:
    print('-----'+pn+'-----')
    temp = L_group[L_group['Place Name'] == pn].T.reset_index()
    temp.columns = ['Venue', 'Freq.']
    temp = temp.iloc[1:]
    temp['Freq.'] = temp['Freq.'].astype(float)
    temp = temp.round({'Freq.':2})
    print(temp.sort_values('Freq.', ascending=False).reset_index(drop=True).head(numMax))
    print('\n')


-----Auerbachs Keller-----
                  Venue  Freq.
0                 Hotel   0.15
1                Bistro   0.05
2                Lounge   0.05
3        Ice Cream Shop   0.05
4        History Museum   0.05
5  Gym / Fitness Center   0.05
6                 Plaza   0.05
7         Deli / Bodega   0.05
8           Coffee Shop   0.05
9        Clothing Store   0.05


-----Augusteum (Leipzig)-----
            Venue  Freq.
0           Hotel   0.20
1           Plaza   0.10
2          Bistro   0.05
3          Lounge   0.05
4  History Museum   0.05
5          Museum   0.05
6       Nightclub   0.05
7     Opera House   0.05
8       Drugstore   0.05
9   Deli / Bodega   0.05


-----Augustusplatz-----
            Venue  Freq.
0           Hotel   0.25
1          Bistro   0.10
2           Plaza   0.10
3  History Museum   0.05
4          Museum   0.05
5       Nightclub   0.05
6     Opera House   0.05
7       Drugstore   0.05
8     Coffee Shop   0.05
9  Scenic Lookout   0.05


-----Bach Archive-----

                  Venue  Freq.
0                  Café   0.15
1  Gym / Fitness Center   0.05
2         Grocery Store   0.05
3             Nightclub   0.05
4            Soup Place   0.05
5      Sushi Restaurant   0.05
6     German Restaurant   0.05
7            Beer Store   0.05
8           Coffee Shop   0.05
9               Theater   0.05


-----Reichsgericht-----
                Venue  Freq.
0                Café   0.20
1        Tech Startup   0.05
2          Beer Store   0.05
3                Park   0.05
4        Cupcake Shop   0.05
5  Italian Restaurant   0.05
6   German Restaurant   0.05
7          Soup Place   0.05
8  Chinese Restaurant   0.05
9    Sushi Restaurant   0.05


-----St. Nicholas Church, Leipzig-----
                Venue  Freq.
0               Hotel   0.25
1              Bistro   0.10
2               Plaza   0.10
3              Lounge   0.05
4      History Museum   0.05
5         Opera House   0.05
6       Deli / Bodega   0.05
7      Scenic Lookout   0.05
8  Italian R

### Step 5: Clustering!

Now that the data is sorted, we can start with the clustering!

In [33]:
# Number of Clusters. Lets start with 5, as in the NYC example
Kmax = 10

L_group_clustering = L_group.drop('Place Name', 1)

# k-Means Clustering
kmM = KMeans(n_clusters=Kmax, random_state=0).fit(L_group_clustering)

# Check the cluster labels
kmM.labels_[0:10]

array([0, 0, 0, 4, 1, 4, 1, 6, 0, 0], dtype=int32)

Merge the cluster with the top 10 venues for each place.

It starts creating the top 10 data frame, and then merge it with the cluster

In [34]:
numTop = 10

indicators = ['st', 'nd', 'rd']

# Function to sort venues in descending order
def mostCommon(row, num_top):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top]

# Create columns according to number of top venues
columns = ['Place Name']
for indx in np.arange(numTop):
    try:
        columns.append('{}{} Most Common Venue'.format(indx+1, indicators[indx]))
    except:
        columns.append('{}th Most Common Venue'.format(indx+1))

# Create a new dataframe
venuesSort = pd.DataFrame(columns=columns)
venuesSort['Place Name'] = L_group['Place Name']

for indx in np.arange(L_group.shape[0]):
    venuesSort.iloc[indx, 1:] = mostCommon(L_group.iloc[indx, :], numTop)
venuesSort.head(10)

Unnamed: 0,Place Name,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Auerbachs Keller,Hotel,Gym / Fitness Center,Ice Cream Shop,Boutique,Bistro,Spanish Restaurant,Café,Italian Restaurant,Sushi Restaurant,Plaza
1,Augusteum (Leipzig),Hotel,Plaza,Church,Bed & Breakfast,Concert Hall,Museum,Café,Scenic Lookout,Deli / Bodega,Lounge
2,Augustusplatz,Hotel,Plaza,Bistro,Church,Coffee Shop,Nightclub,Concert Hall,Café,Museum,Scenic Lookout
3,Bach Archive,Sushi Restaurant,Theater,Gym / Fitness Center,Coffee Shop,Plaza,Deli / Bodega,Burger Joint,Restaurant,Lounge,Shopping Mall
4,Battle of Leipzig,Tram Station,Historic Site,Business Service,Monument / Landmark,Event Space,Concert Hall,Construction & Landscaping,Cupcake Shop,Currywurst Joint,Deli / Bodega
5,Cantor (church),Sushi Restaurant,Theater,Clothing Store,Restaurant,Deli / Bodega,Shopping Mall,Plaza,Spanish Restaurant,Indie Movie Theater,Ice Cream Shop
6,Center Torgauer Platz,Tram Station,Supermarket,Plaza,Pet Store,Grocery Store,Donut Shop,Clothing Store,Cocktail Bar,Coffee Shop,Concert Hall
7,Chimney of Stahl- und Hartgusswerk Bösdorf GmbH,Train Station,Zoo Exhibit,Exhibit,Coffee Shop,Concert Hall,Construction & Landscaping,Cupcake Shop,Currywurst Joint,Deli / Bodega,Dessert Shop
8,City-Hochhaus Leipzig,Hotel,Plaza,Church,Bed & Breakfast,Concert Hall,Gym / Fitness Center,Café,Museum,Scenic Lookout,Deli / Bodega
9,Europahaus,Hotel,Museum,Plaza,Nightclub,Burger Joint,Drugstore,Bed & Breakfast,Bar,Café,Scenic Lookout


In [35]:
venuesSort.insert(0, 'Cluster Labels', kmM.labels_)

# Merge the original data frame with the cluster data and venues
L_merged = lpz_df
L_merged = L_merged.join(venuesSort.set_index('Place Name'),
                         on='Place Name')

# There are some NaNs in the clustering, maybe because there are
#  no data from the venues in such neighborhood!
L_merged.dropna(axis=0, inplace=True)

# For some reason, after the merging the Cluster labels are float...
L_merged['Cluster Labels'].astype('int32')

L_merged.head(10)

Unnamed: 0,Place Name,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Auerbachs Keller,51.339685,12.375492,0.0,Hotel,Gym / Fitness Center,Ice Cream Shop,Boutique,Bistro,Spanish Restaurant,Café,Italian Restaurant,Sushi Restaurant,Plaza
1,Augusteum (Leipzig),51.338351,12.379219,0.0,Hotel,Plaza,Church,Bed & Breakfast,Concert Hall,Museum,Café,Scenic Lookout,Deli / Bodega,Lounge
2,Augustusplatz,51.339543,12.381827,0.0,Hotel,Plaza,Bistro,Church,Coffee Shop,Nightclub,Concert Hall,Café,Museum,Scenic Lookout
3,Bach Archive,51.338805,12.372245,4.0,Sushi Restaurant,Theater,Gym / Fitness Center,Coffee Shop,Plaza,Deli / Bodega,Burger Joint,Restaurant,Lounge,Shopping Mall
4,Battle of Leipzig,51.312367,12.413267,1.0,Tram Station,Historic Site,Business Service,Monument / Landmark,Event Space,Concert Hall,Construction & Landscaping,Cupcake Shop,Currywurst Joint,Deli / Bodega
5,Cantor (church),51.339335,12.372544,4.0,Sushi Restaurant,Theater,Clothing Store,Restaurant,Deli / Bodega,Shopping Mall,Plaza,Spanish Restaurant,Indie Movie Theater,Ice Cream Shop
6,Center Torgauer Platz,51.345696,12.414419,1.0,Tram Station,Supermarket,Plaza,Pet Store,Grocery Store,Donut Shop,Clothing Store,Cocktail Bar,Coffee Shop,Concert Hall
7,Chimney of Stahl- und Hartgusswerk Bösdorf GmbH,51.246862,12.263808,6.0,Train Station,Zoo Exhibit,Exhibit,Coffee Shop,Concert Hall,Construction & Landscaping,Cupcake Shop,Currywurst Joint,Deli / Bodega,Dessert Shop
8,City-Hochhaus Leipzig,51.337936,12.378978,0.0,Hotel,Plaza,Church,Bed & Breakfast,Concert Hall,Gym / Fitness Center,Café,Museum,Scenic Lookout,Deli / Bodega
9,Europahaus,51.337451,12.38157,0.0,Hotel,Museum,Plaza,Nightclub,Burger Joint,Drugstore,Bed & Breakfast,Bar,Café,Scenic Lookout


In [36]:
L_merged.dtypes

Place Name                 object
Latitude                  float64
Longitude                 float64
Cluster Labels            float64
1st Most Common Venue      object
2nd Most Common Venue      object
3rd Most Common Venue      object
4th Most Common Venue      object
5th Most Common Venue      object
6th Most Common Venue      object
7th Most Common Venue      object
8th Most Common Venue      object
9th Most Common Venue      object
10th Most Common Venue     object
dtype: object

The final step is to visualize it on a map!

In [53]:
# Constructor for the map
mapClus = folium.Map(location=[lpzLat, lpzLon],
                     zoom_start=10)

# Map colors
x = np.arange(Kmax)
ys = [i + x + (i*x)**2 for i in range(Kmax)]
colorsArray = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colorsArray]

# Add markers to the map
markers_col = []
for lat, lon, pc, clus in zip(L_merged['Latitude'],
                              L_merged['Longitude'],
                              L_merged['Place Name'],
                              L_merged['Cluster Labels']):
    label = folium.Popup(str(pc) + ' Cluster ' + str(clus), parse_html=True)
    j = int(clus)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[j-1],
        fill=True,
        fill_color=rainbow[j-1],
        fill_opacity=0.7    
    ).add_to(mapClus)

mapClus

Lets explore each cluster!

#### Cluster 1:

In [40]:
L_merged.loc[L_merged['Cluster Labels'] == 0, L_merged.columns[[0] + list(range(4, L_merged.shape[1]))]]

Unnamed: 0,Place Name,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Auerbachs Keller,Hotel,Gym / Fitness Center,Ice Cream Shop,Boutique,Bistro,Spanish Restaurant,Café,Italian Restaurant,Sushi Restaurant,Plaza
1,Augusteum (Leipzig),Hotel,Plaza,Church,Bed & Breakfast,Concert Hall,Museum,Café,Scenic Lookout,Deli / Bodega,Lounge
2,Augustusplatz,Hotel,Plaza,Bistro,Church,Coffee Shop,Nightclub,Concert Hall,Café,Museum,Scenic Lookout
8,City-Hochhaus Leipzig,Hotel,Plaza,Church,Bed & Breakfast,Concert Hall,Gym / Fitness Center,Café,Museum,Scenic Lookout,Deli / Bodega
9,Europahaus,Hotel,Museum,Plaza,Nightclub,Burger Joint,Drugstore,Bed & Breakfast,Bar,Café,Scenic Lookout
18,Hotel The Westin Leipzig,Hotel,Zoo Exhibit,Steakhouse,Art Museum,Bagel Shop,Coffee Shop,Hotel Bar,Modern European Restaurant,Restaurant,Shopping Mall
21,Leipzig Hauptbahnhof,Hotel,Sports Bar,Irish Pub,Newsstand,Modern European Restaurant,Restaurant,Donut Shop,Bistro,Shopping Mall,Japanese Restaurant
29,MDR-Hochhaus,Hotel,Plaza,Church,Bed & Breakfast,Concert Hall,Gym / Fitness Center,Café,Museum,Scenic Lookout,Deli / Bodega
31,Museum der bildenden Künste,Hotel,Bistro,Bagel Shop,Steakhouse,Plaza,Sushi Restaurant,Indie Movie Theater,Ice Cream Shop,Lounge,Italian Restaurant
32,Museum of Antiquities of the University of Lei...,Hotel,Bistro,Plaza,Concert Hall,Deli / Bodega,Lounge,Opera House,Café,Italian Restaurant,Church


This cluster is one of the most populated, composed of several museums, schools, universities... It's basically the city center, and it's full of hotels, plazas, churches... Maybe it is an ideal location, but the competition can be though!

#### Cluster 2:

In [42]:
L_merged.loc[L_merged['Cluster Labels'] == 1, L_merged.columns[[0] + list(range(4, L_merged.shape[1]))]]

Unnamed: 0,Place Name,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
4,Battle of Leipzig,Tram Station,Historic Site,Business Service,Monument / Landmark,Event Space,Concert Hall,Construction & Landscaping,Cupcake Shop,Currywurst Joint,Deli / Bodega
6,Center Torgauer Platz,Tram Station,Supermarket,Plaza,Pet Store,Grocery Store,Donut Shop,Clothing Store,Cocktail Bar,Coffee Shop,Concert Hall
14,German Museum of Books and Writing,Tram Station,Automotive Shop,Plaza,Event Space,Gym,Nightclub,Bar,Concert Hall,Construction & Landscaping,Cupcake Shop
15,German National Library,Tram Station,Plaza,Automotive Shop,Nightclub,Mediterranean Restaurant,Bar,Furniture / Home Store,Supermarket,Hockey Rink,Gym
30,Monument to the Battle of the Nations,Tram Station,Historic Site,Business Service,Monument / Landmark,Event Space,Concert Hall,Construction & Landscaping,Cupcake Shop,Currywurst Joint,Deli / Bodega
47,Südfriedhof (Leipzig),Tram Station,Historic Site,Monument / Landmark,Event Space,Coffee Shop,Concert Hall,Construction & Landscaping,Cupcake Shop,Currywurst Joint,Deli / Bodega


This cluster is composed of several landmarks of the city, but most of them are located relatively outside from the city center. Most of locations are close to tram stops, and there are no many restaurants nearby, but some bars or nightclubs

#### Cluster 3:

In [43]:
L_merged.loc[L_merged['Cluster Labels'] == 2, L_merged.columns[[0] + list(range(4, L_merged.shape[1]))]]

Unnamed: 0,Place Name,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
16,Grassi Museum,Hotel,Museum,Supermarket,Coffee Shop,Plaza,Gym / Fitness Center,Art Gallery,Trattoria/Osteria,Gym,Drugstore
22,Leipzig Museum of Applied Arts,Hotel,Museum,Supermarket,Plaza,Karaoke Bar,Coffee Shop,Gym / Fitness Center,Art Gallery,Trattoria/Osteria,Exhibit
23,Leipzig Museum of Ethnography,Hotel,Museum,Supermarket,Coffee Shop,Plaza,Gym / Fitness Center,Art Gallery,Trattoria/Osteria,Gym,Drugstore
33,Museum of Musical Instruments of the Universit...,Hotel,Museum,Supermarket,Plaza,Karaoke Bar,Coffee Shop,Gym / Fitness Center,Art Gallery,Trattoria/Osteria,Exhibit


A small cluster composed of museums, surrounded mainly by hotels, supermarkets and coffee shops. Maybe it is the second ideal location!

#### Cluster 4:

In [44]:
L_merged.loc[L_merged['Cluster Labels'] == 3, L_merged.columns[[0] + list(range(4, L_merged.shape[1]))]]

Unnamed: 0,Place Name,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
10,Federal Administrative Court of Germany,Café,Chinese Restaurant,Bar,Bank,Sushi Restaurant,Bakery,Beer Store,Bagel Shop,Tech Startup,Soup Place
19,Leipzig Bayerischer Bahnhof,Tram Station,Hookah Bar,Bike Shop,German Restaurant,Italian Restaurant,Drugstore,Dessert Shop,Nightclub,Chinese Restaurant,Café
37,New Town Hall (Leipzig),Café,Bar,Nightclub,Clothing Store,Park,Chinese Restaurant,Gym / Fitness Center,Plaza,Cupcake Shop,Deli / Bodega
40,"Propsteikirche, Leipzig",Café,Beer Store,Grocery Store,German Restaurant,Dessert Shop,Deli / Bodega,Cupcake Shop,Coffee Shop,Nightclub,Park
41,Reichsgericht,Café,Tech Startup,Soup Place,Beer Store,Bar,Sushi Restaurant,Bakery,Italian Restaurant,Bagel Shop,Park
43,"St. Peter, Leipzig",Chinese Restaurant,Café,Sushi Restaurant,Hotel,German Restaurant,Drugstore,Dessert Shop,Cupcake Shop,Organic Grocery,Cocktail Bar
44,St. Peters',Chinese Restaurant,Café,Sushi Restaurant,Hotel,German Restaurant,Drugstore,Dessert Shop,Cupcake Shop,Organic Grocery,Cocktail Bar
48,Tower of New Town Hall,Café,Bar,Nightclub,Clothing Store,Park,Chinese Restaurant,Gym / Fitness Center,Plaza,Cupcake Shop,Deli / Bodega


Another interesting cluster, composed by several interesting landmarks. The most common venues are cafés and some restaurants. This one is also a good option!

#### Cluster 5:

In [45]:
L_merged.loc[L_merged['Cluster Labels'] == 4, L_merged.columns[[0] + list(range(4, L_merged.shape[1]))]]

Unnamed: 0,Place Name,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
3,Bach Archive,Sushi Restaurant,Theater,Gym / Fitness Center,Coffee Shop,Plaza,Deli / Bodega,Burger Joint,Restaurant,Lounge,Shopping Mall
5,Cantor (church),Sushi Restaurant,Theater,Clothing Store,Restaurant,Deli / Bodega,Shopping Mall,Plaza,Spanish Restaurant,Indie Movie Theater,Ice Cream Shop
12,Funkturm Leipzig,Exhibit,Supermarket,Restaurant,Auto Dealership,Coffee Shop,Concert Hall,Construction & Landscaping,Cupcake Shop,Currywurst Joint,Deli / Bodega
13,G2 Kunsthalle,Sushi Restaurant,Spanish Restaurant,Tapas Restaurant,Lounge,Plaza,Burger Joint,Restaurant,Indie Movie Theater,Coffee Shop,Bar
17,Hochhaus Löhr's Carree,Hotel,Zoo Exhibit,Supermarket,Art Museum,Currywurst Joint,Hotel Bar,Indie Movie Theater,Italian Restaurant,Japanese Restaurant,Modern European Restaurant
20,Leipzig Debate,Hotel,Sushi Restaurant,Restaurant,Deli / Bodega,Shopping Mall,Plaza,Spanish Restaurant,Coffee Shop,Indie Movie Theater,Ice Cream Shop
25,Leipzig Synagogue,Hotel,Restaurant,Sports Bar,Steakhouse,Supermarket,Sushi Restaurant,Italian Restaurant,Bagel Shop,Currywurst Joint,Ice Cream Shop
27,Leipzig Zoological Garden,Zoo Exhibit,Hotel,Science Museum,Greek Restaurant,Zoo,Hotel Bar,Italian Restaurant,Latin American Restaurant,Modern European Restaurant,Park
34,Naturkundemuseum Leipzig,Hotel,Zoo Exhibit,Hotel Bar,Currywurst Joint,Pub,Burger Joint,Science Museum,Italian Restaurant,Indie Movie Theater,Ice Cream Shop
39,Primate,Hotel,Sushi Restaurant,Restaurant,Deli / Bodega,Shopping Mall,Plaza,Spanish Restaurant,Coffee Shop,Indie Movie Theater,Ice Cream Shop


A quite mixed cluster. The most common venues are restaurants, different exhibitions, museums and hotels. Also a good candidate! Most of the restaurants looks quite diverse!

#### Cluster 6:

In [46]:
L_merged.loc[L_merged['Cluster Labels'] == 5, L_merged.columns[[0] + list(range(4, L_merged.shape[1]))]]

Unnamed: 0,Place Name,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
36,New Leipzig School,Market,American Restaurant,Construction & Landscaping,Fast Food Restaurant,Concert Hall,Cupcake Shop,Currywurst Joint,Deli / Bodega,Dessert Shop,Donut Shop


#### Cluster 7:

In [47]:
L_merged.loc[L_merged['Cluster Labels'] == 6, L_merged.columns[[0] + list(range(4, L_merged.shape[1]))]]

Unnamed: 0,Place Name,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
7,Chimney of Stahl- und Hartgusswerk Bösdorf GmbH,Train Station,Zoo Exhibit,Exhibit,Coffee Shop,Concert Hall,Construction & Landscaping,Cupcake Shop,Currywurst Joint,Deli / Bodega,Dessert Shop


#### Cluster 8:

In [48]:
L_merged.loc[L_merged['Cluster Labels'] == 7, L_merged.columns[[0] + list(range(4, L_merged.shape[1]))]]

Unnamed: 0,Place Name,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
24,Leipzig Panometer,Exhibit,Concert Hall,Cafeteria,Bus Stop,Cocktail Bar,Construction & Landscaping,Cupcake Shop,Currywurst Joint,Deli / Bodega,Dessert Shop


#### Cluster 9:

In [49]:
L_merged.loc[L_merged['Cluster Labels'] == 8, L_merged.columns[[0] + list(range(4, L_merged.shape[1]))]]

Unnamed: 0,Place Name,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
26,Leipzig Trade Fair,Fast Food Restaurant,Hardware Store,Tram Station,Bus Stop,Fountain,Event Space,Concert Hall,Construction & Landscaping,Cupcake Shop,Currywurst Joint


#### Cluster 10:

In [50]:
L_merged.loc[L_merged['Cluster Labels'] == 9, L_merged.columns[[0] + list(range(4, L_merged.shape[1]))]]

Unnamed: 0,Place Name,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
28,Leipziger Baumwollspinnerei,Art Gallery,Bistro,Arts & Crafts Store,Pub,Rental Car Location,Café,Music Venue,Park,Zoo Exhibit,Concert Hall
35,Neo Rauch,Art Gallery,Park,Arts & Crafts Store,Café,Music Venue,Bistro,Event Space,Construction & Landscaping,Cupcake Shop,Currywurst Joint


In [51]:
L_merged.shape

(51, 14)