# TABLE OF CONTENT
+ THE BUSINESS CASE
+ THE DATA
+ THE METHODOLOGY

# THE BUSINESS PROBLEM
According to [Wikipedia](https://en.wikipedia.org/wiki/Types_of_restaurants), you can classify restaurants into seven categories that in turn can be broken down into eleven variations. Some of these combinations might be a little odd (e.g. "Fine Dining" and "Pub") but let us assume for a minute that this list is complete and all combinations are possible. In result, we would have 77 unique combinations of how a restaurant could look like. [Wikipedia](https://en.wikipedia.org/wiki/List_of_cuisines) also hosts a list of cuisines and cuisine types. According to Wikipedia, there are five major cusine styles and 123 distinguished ethnic and religious cuisines, giving us 615  combinations for different types of food that we could eat. If we trust these figures, we are all in all looking at a possible variety of over 43,000 unique restaurants - great, right?
<br>
<br>
![But why](https://media.giphy.com/media/s239QJIh56sRW/giphy.gif)
<br>
<br>
Glad you asked! These figures do not really matter for our problem. I only wanted to show you one thing: The food industry is extremely diverse and each type of restaurant brings its own unique challenges. However, they all have one common issue: Location, location, location! No matter what type of restaurant you own, you need a specific location that matches your profile. Of course you could now say: "That's easy to fix - just go where a lot of people are!" 
<br>
<br>
But is it really so easy? Depending on your restaurant, you need parking space. Maybe you want a quite neighborhood because you own a fine dining restaurant. Maybe you don't want to be where a lot of other restaurants are that are similiar to the cuisine you offer. Or maybe you have a super trendy concept that requires your restaurant to be placed in the most hipster district of your city. We could make this list longer and longer but the point is: It ain't easy. 
<br>
<br>
Choosing the right location also becomes increasingly difficult when you don't know the city you want to open your restaurant in. Finding the right spot will require you to do some extensive research and location scouting. This process takes a lot of time and eats up ressources. While this approach is feasible for small businesses, it becomes unhandy for larger corporations who want to expand their restaurants to multiple cities at the same time. In result, corporations need to spend more money to keep up quality and speed. So how could we help larger restaurant corporations to make this process more efficient? 
<br>
<br>
Well, with data science of course! Larger restaurant chains usually already have a couple of restaurants in place and know which locations run well in their business model and which not. We can use this information to train a clustering algorithm to identify neighborhoods that are similar to the locations with profitable restaurants. This way we can build a recommender system that tells us which neighborhoods in a new town are more likely to host a successful restaurant. Consequently, we would not have to research all areas but could focus our scouting activity on the most promising neighborhoods. This way, we would be able to improve speed and quality as well as cut the related costs significantly.

<br>
<br>

# THE DATA

## L'OSTERIA
As our business problem aims to build a recommender system for a fast growing restaurant chain, we need an example that we can work with. Ladies and gentlemen, please welcome my favorite pizza chain: [L'Osteria](https://losteria.net/en/)!
<br>
<br>
L'Osteria is an italian restaurant chain that was founded in Germany in 1999. The first restaurant was quickly becomming a hit as it combined tasty food, a trendy atmosphere and reasonable prices. Today, the chain has expanded to basically every larger city in Germany and starts to go international with places in Switzerland, Austria, Netherlands and the UK. That gives us a lot of possibilities to train our algorithm. Unfortunately, we do not have access to L'Osteria's database which is why we need to construct our own workaround. Luckily for us, the chain publishes the addresses of all their restaurants on their [website](https://losteria.net/en/restaurants/view/list/?tx_losteriarestaurants_restaurantsreopening%5Bfilter%5D%5Bcountry%5D=de&tx_losteriarestaurants_restaurantsreopening%5Bfilter%5D%5BsearchTerm%5D=&&&tx_losteriarestaurants_restaurantsreopening%5Bfilter%5D%5Btype%5D=reservations&).
<br>
<br>
To use this information, we need to scrape it from their website. When you take a closer look at the L'Osteria website, you will notice that it uses an event button which hides most of the restaurants. We could use a library called [Selenium](https://www.selenium.dev/), if we want a fully automized script that clicks the button for us. However, that would be an overkill as the button only needs to be clicked ~10 times. In result, it is easier to do so by hand and download the website that now shows all restaurants and store it as a html file in our working directory. 
<br>
<br>
That enables us to use the infamous html scraping library [BeautifulSoup](https://www.crummy.com/software/BeautifulSoup/bs4/doc/). The following code scrapes the data we need from the downloaded html file (also available in the repository) and transforms it as the file contains e.g German characters that otherwise result in jibberish. 

In [1]:
# Import working libraries
import pandas as pd # Data wrangling
from tqdm import tqdm # Progress visualization
import requests # Request handling
from bs4 import BeautifulSoup # HTML search
import re # String handling
import ast # String handling

# Set path to html file
path = r"C:\Users\maurice.buettgenbach\OneDrive - Aquila Capital Management GmbH\Desktop\Desktop\Private\IBM\10_Capstone project\IBM_The_battle_of_neighborhoods_v2\LOSTERIA_DATA\Restaurants - L'Osteria.htm"

# Open document and transform to BS object
with open(path) as page:
    soup = BeautifulSoup(page, 'html.parser')

# Define results
results = soup.find(id='losteria-restaurants-list-wrapper')

# Find elements
page_elements = results.find_all('div', class_='address')

# Create variable to store data in
restaurants = pd.DataFrame()

# Iterate through found elements
for page_element in tqdm(page_elements):
    # Isolate address
    element = str(page_element)
    # Get rid of unnecessary characters
    element = element.replace("\n", "")
    element = element.replace("\r", "")
    # Get rid of spaces
    element = element.replace(" ", "")
    # Search for address in between <div>
    search_result = re.search(">(.*)<", element)
    # Store search result in new variable
    address = search_result.group(1)
    # Get street
    street = address.split(",")[0]
    # Add space between street name and house number
    street = re.sub(r"([0-9]+(\.[0-9]+)?)",r" \1 ", street).strip()
    # Add space before Capital letter
    street = re.sub(r"(?<=\w)([A-Z])", r" \1", street).strip()
    # Get ZIP code and city
    address = address.split(",")[1]
    # Isolate ZIP code
    zip_code = re.split('(\d+)', address)[1]
    # Isolate city
    city = re.split('(\d+)', address)[2]
    # Create temporary df
    temp_df = pd.DataFrame()
    # Add data to df
    temp_df['STREET'] = [street]
    temp_df['ZIP_CODE'] = zip_code
    temp_df['CITY'] = city
    # Append to variable
    restaurants = restaurants.append(temp_df)
# Reset index
restaurants.reset_index(drop=True, inplace=True)

# Replace cryptic html for German characters
restaurants = restaurants.replace('Ã¤', "ae", regex=True) # For German Umlaut "ä"
restaurants = restaurants.replace('Ã¼', "ue", regex=True) # For German Umlaut "ü"
restaurants = restaurants.replace('Ã¶', "oe", regex=True) # For German Umlaut "ö"
restaurants = restaurants.replace('ÃŸ', "ss", regex=True) # For German Umlaut "ß"

# Add spacer for output
print()
print()

# Visualize variable
print("df shape:", restaurants.shape)
restaurants.head(10)

100%|██████████| 121/121 [00:00<00:00, 420.92it/s]



df shape: (121, 3)





Unnamed: 0,STREET,ZIP_CODE,CITY
0,Gut-Daemme-Strasse 1,52070,Aachen
1,Franziskanergasse 5,92224,Amberg
2,Pearl-S.-Buck-Strasse 12,86156,Augsburg
3,Albert-Schenavsky-Str. 2,86165,Augsburg
4,Maximilianstrasse 83,95444,Bayreuth
5,Hilda-Geiringer-Weg 4,10557,Berlin
6,Alt Mahlsdorf 88,12623,Berlin
7,Memhardstrasse 3,10178,Berlin
8,Mildred-Harnack-Str. 11,10243,Berlin
9,Savignyplatz 5,10623,Berlin


Alright, we have our first dataframe ready! It contains the address of each L'Osteria restaurant from the website, stating their address, ZIP code and city. However, if we want to visualize the restaurants or download information of their surrounding areas, we also need their GPS locations. 
<br>
<br>
Lucky for us, we can reverse engineer the address by using the service of Nominatim that we can access via the [Geopy](https://geopy.readthedocs.io/en/stable/) library. The following code takes our existing addresses and fetches their coordinates from the Nominatim API:

In [2]:
# Import working library
from geopy.geocoders import Nominatim

# Create new variable
lats = []
longs = []

# Set geolocator and set user agent
geolocator = Nominatim(user_agent="german_italian_restaurant_analysis")

# Get temporary df
temp_df = restaurants.copy()

# Loop through df and get addresses for coordinates
for i in tqdm(range(len(temp_df))):
    # Get address
    address = temp_df.iloc[i, 0:1].values[0]
    # Get ZIP code
    zip_code = temp_df.iloc[i, 1:2].values[0]
    # Get city
    city = temp_df.iloc[i, 2:3].values[0]
    # Define full address
    full_address = "{Address}, {ZIPCode}, {City}".format(Address=address, ZIPCode=zip_code, City=city)
    # Get full location details
    location = geolocator.geocode(full_address)
    # Check if location is empty
    if location is not None:
        # Get latitude
        latitude = location.latitude
        # Append latitude
        lats.append(latitude)
    # If empty, add 0
    else:
        # Ste lats to 0
        latitude = 0
        # Append latitude
        lats.append(latitude)
    # Check if location is empty
    if location is not None:
        # Get longitude
        longitude = location.longitude
        # Append longitude
        longs.append(longitude)
    # If empty, add 0
    else:
        longitude = 0
        # Append finding to df
        longs.append(longitude)


# Add longs and lats to restaurant df
restaurants['LATITUDE'] = lats
restaurants['LONGITUDE'] = longs

# Visualize
restaurants.head()

100%|██████████| 121/121 [01:02<00:00,  1.93it/s]


Unnamed: 0,STREET,ZIP_CODE,CITY,LATITUDE,LONGITUDE
0,Gut-Daemme-Strasse 1,52070,Aachen,50.797335,6.10745
1,Franziskanergasse 5,92224,Amberg,49.446234,11.855468
2,Pearl-S.-Buck-Strasse 12,86156,Augsburg,48.371837,10.865182
3,Albert-Schenavsky-Str. 2,86165,Augsburg,48.381857,10.931547
4,Maximilianstrasse 83,95444,Bayreuth,49.944196,11.571159


As we can see, downloading all GPS coordinates took us ~1 minute. Usually, this would be much faster. However, Nominatim sometimes cannot find the addresses provided which is why we have added a simple if loop in the code above to intercept these cases. Our loop adds a "0" to latitude and longitude when Nominatim cannot find the address. So let us check how many addresses are affected:

In [3]:
# Start counter
counter = 0

# Loop through dataset
for i in tqdm(range(len(restaurants.index))):
    # If 0 in latitude, print street name
    if restaurants.iloc[i, 3:4].values[0] == 0:
        # Print address
        print(restaurants.iloc[i, 0:1].values[0])
        # Add 1 to counter
        counter += 1
    # If large 0
    else:
        # Do nothing
        continue

# Spacer
print() 

# Show counter
print("We are missing coordinates for {counter} restaurants.".format(counter=counter))

# Calculate share of dataset
print("That is {:.2f}% of our dataset.".format(counter/121*100))

100%|██████████| 121/121 [00:00<00:00, 5511.21it/s]

Inder Suerst 3
Bra Wo-Allee 1
Otto-Lillienthal-Strasse 19
Speicherstrasse 1
Hanauer Landstrasse 110
Bleichenbruecke 9
Bruesseler Strasse 14
Ander Untertrave 111
Anden Reeperbahnen 2
Enzian Hoefe
Duesseldorferstrasse 162
Muehlstrasse 17
Am Hafendeck 6 - 8
Ander Vorburg 1

We are missing coordinates for 14 restaurants.
That is 11.57% of our dataset.





We can observe that 14 restaurants have no coordinates. A closer look at the respective addresses shows us that some of them are written wrongly (e.g. "Ander Vorburg 1" should be "An der Vorburg 1"). We could fix these addresses and run our code section again, however, the amount of remaining restaurants with correct coordinates should be more than enough for our analysis. We will therefore exclude restaurants with missing coordinates from our data set:

In [4]:
# Get indexed with lat == 0
index_names = restaurants[(restaurants['LATITUDE'] == 0 )].index

# Kick out entries with lat == 0
restaurants.drop(index_names, inplace=True) 

# Reset index
restaurants.reset_index(drop=True, inplace=True)

# Visualize
print(restaurants.shape)
restaurants.head()

(107, 5)


Unnamed: 0,STREET,ZIP_CODE,CITY,LATITUDE,LONGITUDE
0,Gut-Daemme-Strasse 1,52070,Aachen,50.797335,6.10745
1,Franziskanergasse 5,92224,Amberg,49.446234,11.855468
2,Pearl-S.-Buck-Strasse 12,86156,Augsburg,48.371837,10.865182
3,Albert-Schenavsky-Str. 2,86165,Augsburg,48.381857,10.931547
4,Maximilianstrasse 83,95444,Bayreuth,49.944196,11.571159


Great! Our L'Osteria data seems to be ready for further analysis. However, before moving on, we will conduct a last check and visualize the GPS coordinates on a map. There are a lot of different ways to visualize geospatial data on a map via Python. For this project, we will use the [Kepler.gl](https://kepler.gl/) tool. It is build on [Mapbox](https://www.mapbox.com/) and can visualize millions of data points as well as has an aggregation function that can be handy for analysis. 

In [5]:
# Load an empty map
from keplergl import KeplerGl
losteria_map = KeplerGl(height=600)

#Load config
config = {'version': 'v1', 'config': {'visState': {'filters': [], 'layers': [{'id': '73uc1wa', 'type': 'point', 'config': {'dataId': 'losteria_restaurants', 'label': 'LOSTERIA_RESTAURANTS', 'color': [181, 18, 65], 'columns': {'lat': 'LATITUDE', 'lng': 'LONGITUDE', 'altitude': None}, 'isVisible': True, 'visConfig': {'radius': 10, 'fixedRadius': False, 'opacity': 0.5, 'outline': False, 'thickness': 2, 'strokeColor': None, 'colorRange': {'name': 'Global Warming', 'type': 'sequential', 'category': 'Uber', 'colors': ['#5A1846', '#900C3F', '#C70039', '#E3611C', '#F1920E', '#FFC300']}, 'strokeColorRange': {'name': 'Global Warming', 'type': 'sequential', 'category': 'Uber', 'colors': ['#5A1846', '#900C3F', '#C70039', '#E3611C', '#F1920E', '#FFC300']}, 'radiusRange': [0, 50], 'filled': True}, 'hidden': False, 'textLabel': [{'field': None, 'color': [255, 255, 255], 'size': 18, 'offset': [0, 0], 'anchor': 'start', 'alignment': 'center'}]}, 'visualChannels': {'colorField': None, 'colorScale': 'quantile', 'strokeColorField': None, 'strokeColorScale': 'quantile', 'sizeField': None, 'sizeScale': 'linear'}}], 'interactionConfig': {'tooltip': {'fieldsToShow': {'losteria_restaurants': [{'name': 'STREET', 'format': None}, {'name': 'ZIP_CODE', 'format': None}, {'name': 'CITY', 'format': None}, {'name': 'LATITUDE', 'format': None}, {'name': 'LONGITUDE', 'format': None}]}, 'compareMode': False, 'compareType': 'absolute', 'enabled': True}, 'brush': {'size': 0, 'enabled': False}, 'geocoder': {'enabled': False}, 'coordinate': {'enabled': False}}, 'layerBlending': 'normal', 'splitMaps': [], 'animationConfig': {'currentTime': None, 'speed': 1}}, 'mapState': {'bearing': 0, 'dragRotate': False, 'latitude': 50.489182321984785, 'longitude': 11.880129742841573, 'pitch': 0, 'zoom': 4.289649820829549, 'isSplit': False}, 'mapStyle': {'styleType': 'light', 'topLayerGroups': {}, 'visibleLayerGroups': {'label': True, 'road': True, 'border': True, 'building': True, 'water': True, 'land': True, '3d building': False}, 'threeDBuildingColor': [218.82023004728686, 223.47597962276103, 223.47597962276103], 'mapStyles': {}}}}

# Add data
losteria_map.add_data(data=restaurants, name='losteria_restaurants')

# Add config
losteria_map.config = config

# Execute
losteria_map

User Guide: https://docs.kepler.gl/docs/keplergl-jupyter


KeplerGl(config={'version': 'v1', 'config': {'visState': {'filters': [], 'layers': [{'id': '73uc1wa', 'type': …

The first thing that catches our eye is that all restaurants seem to be located in Germany. That is odd as we know that the chain has expanded to other European countries already. So what went wrong? After checking the L'Osteria website again, we can see that international restaurants are shown on a different list that we have not downloaded yet. However, as the amount of international restaurants is still small and we should have sufficient data already, we can choose to ignore them for now in order to keep the data set simpler. 
<br>
<br>
Second, we can observe that the majority of restaurants is located in the west and south of Germany. It is especially interesting to see that there is a very high concentration of restaurants in North Rhine Westphalia (the state next to The Netherlands and Belgium). We should keep that in mind during our further analysis as there seems to be something interesting about that region.

## UBER H3 
I was lucky enough to travel a lot in my youth but it is still impossible for me to tell whether a restaurant is located in a trendy neighborhood or not. To draw any conclusions on the neighborhoods the restaurants are located in, I would have to google their locations and research the respective city as a whole. However, L'Osteria has over 120 restaurants in Europe - it would therefore take me days or even weeks to conduct a respective analysis. 
<br>
<br>
Since this is anything else but ressource efficient, we have to find a way of speeding things up. In order to analyse the location of each restaurant, we need to define a radius from which we want to draw conclusions from. One approach would be to get shapefiles for each neighborhood in each city L'Osteria has a restaurant in and then download information related to these neigborhoods. But good luck trying to get these shapefile for Germany. Here, every state has its own data service - one more confusing than the other. Collecting the necessary data would take weeks and provide us with a minimal granularity.
<br>
<br>
This is where Uber's H3 hexagonial mapping framework comes in play. At its core, H3 is a geospatial analysis tool that provides a global index of hirachically ordered hexagons. Just imagine that you would map the entire earth (pole to pole) in hexagonial polygons. Now, each hexagon would receive a unique ID so that you could exactly tell what place on earth belongs to which hexagon. And finally, think of different layers of hexagons. Each layer is much smaller and more precise, enabling you to drill up and down on your map. That is exactly what H3 does - it provides you unique hexagons with different resolutions, all the way down to a precision of 0.0000009 km² area. 
<br>
<br>
So how is this helping us? By knowing the GPS coordinates of the L'Osteria restaurants, we can assign them to an unique hexagon. This hexagon will then serve as our designated radius for which we will download further data for our analysis. The H3 framework also helps us to map any city on earth with the same size of hexagons, making it easier for us to compare our results. 
<br>
<br>
But enough of the chit-chat! Let us begin with mapping the coordinates of our L'Osteria restaurants against the H3 index. To do so, we can simply use the [H3 library](https://github.com/uber/h3-py). The library provides a function that does the mapping for us. However, besides providing the coordinates, we also need to define the resolution of the hexagon we want to map with. The H3 index has 16 unique [resolutions](https://h3geo.org/docs/core-library/restable/) that we can choose from:

In [6]:
# Fetch table from H3 geo website
h3_resolution_table = pd.read_html('https://h3geo.org/docs/core-library/restable/')

# Visualize
h3_resolution_table[0]

Unnamed: 0,H3 Resolution,Average Hexagon Area (km2),Average Hexagon Edge Length (km),Number of unique indexes
0,0,4250547.0,1107.712591,122
1,1,607221.0,418.676005,842
2,2,86745.85,158.244656,5882
3,3,12392.26,59.810858,41162
4,4,1770.324,22.606379,288122
5,5,252.9034,8.544408,2016842
6,6,36.12905,3.229483,14117882
7,7,5.161293,1.22063,98825162
8,8,0.7373276,0.461355,691776122
9,9,0.1053325,0.174376,4842432842


As described above, the resolution goes all the way down to 0.0000009 km² or 0.9 m² respectively. That is extremely small and would not serve our purpose well as we aim to analyze larger areas around our restaurants. By looking at a map, we can visually see that the H3 resolution 7 with an area of roughly 5km² could be a good fit as it should cover the restaurants' general catchment area. 
<br>
<br>
In the following step, we will download the unique hexagons for each restaurant and visualize them on our kepler map:

In [7]:
# Import working directories
import h3

# Creat empty variable
h3_indexes = []

# Download H3 indexes
for i in tqdm(range(len(restaurants.index))):
    # Get lat
    lat = restaurants.iloc[i, 3:4][0]
    # Get lng
    lng = restaurants.iloc[i, 4:5][0]
    # Get H3 index
    h3_index = h3.geo_to_h3(
        lat=lat,
        lng=lng,
        resolution=7
    )
    # Store in variable
    h3_indexes.append(h3_index)

# Store in restaurants variable
restaurants['H3_INDEX'] = h3_indexes

# Visualize
from keplergl import KeplerGl
losteria_map_v2 = KeplerGl(height=600)
config = {'version': 'v1', 'config': {'visState': {'filters': [], 'layers': [{'id': 've9o5ju', 'type': 'hexagonId', 'config': {'dataId': 'losteria_restaurants', 'label': 'LOSTERIA_RESTAURANTS_HEXAGONS', 'color': [249, 196, 196], 'columns': {'hex_id': 'H3_INDEX'}, 'isVisible': True, 'visConfig': {'opacity': 0.5, 'colorRange': {'name': 'Global Warming', 'type': 'sequential', 'category': 'Uber', 'colors': ['#5A1846', '#900C3F', '#C70039', '#E3611C', '#F1920E', '#FFC300']}, 'coverage': 1, 'enable3d': False, 'sizeRange': [0, 500], 'coverageRange': [0, 1], 'elevationScale': 5, 'enableElevationZoomFactor': True}, 'hidden': False, 'textLabel': [{'field': None, 'color': [255, 255, 255], 'size': 18, 'offset': [0, 0], 'anchor': 'start', 'alignment': 'center'}]}, 'visualChannels': {'colorField': None, 'colorScale': 'quantile', 'sizeField': None, 'sizeScale': 'linear', 'coverageField': None, 'coverageScale': 'linear'}}, {'id': '73uc1wa', 'type': 'point', 'config': {'dataId': 'losteria_restaurants', 'label': 'LOSTERIA_RESTAURANTS', 'color': [181, 18, 65], 'columns': {'lat': 'LATITUDE', 'lng': 'LONGITUDE', 'altitude': None}, 'isVisible': True, 'visConfig': {'radius': 10, 'fixedRadius': False, 'opacity': 0.5, 'outline': False, 'thickness': 2, 'strokeColor': None, 'colorRange': {'name': 'Global Warming', 'type': 'sequential', 'category': 'Uber', 'colors': ['#5A1846', '#900C3F', '#C70039', '#E3611C', '#F1920E', '#FFC300']}, 'strokeColorRange': {'name': 'Global Warming', 'type': 'sequential', 'category': 'Uber', 'colors': ['#5A1846', '#900C3F', '#C70039', '#E3611C', '#F1920E', '#FFC300']}, 'radiusRange': [0, 50], 'filled': True}, 'hidden': False, 'textLabel': [{'field': None, 'color': [255, 255, 255], 'size': 18, 'offset': [0, 0], 'anchor': 'start', 'alignment': 'center'}]}, 'visualChannels': {'colorField': None, 'colorScale': 'quantile', 'strokeColorField': None, 'strokeColorScale': 'quantile', 'sizeField': None, 'sizeScale': 'linear'}}], 'interactionConfig': {'tooltip': {'fieldsToShow': {'losteria_restaurants': [{'name': 'STREET', 'format': None}, {'name': 'ZIP_CODE', 'format': None}, {'name': 'CITY', 'format': None}, {'name': 'LATITUDE', 'format': None}, {'name': 'LONGITUDE', 'format': None}]}, 'compareMode': False, 'compareType': 'absolute', 'enabled': True}, 'brush': {'size': 0, 'enabled': False}, 'geocoder': {'enabled': False}, 'coordinate': {'enabled': False}}, 'layerBlending': 'normal', 'splitMaps': [], 'animationConfig': {'currentTime': None, 'speed': 1}}, 'mapState': {'bearing': 0, 'dragRotate': False, 'latitude': 53.56581943999632, 'longitude': 9.954395594799491, 'pitch': 0, 'zoom': 11.223547206164303, 'isSplit': False}, 'mapStyle': {'styleType': 'light', 'topLayerGroups': {}, 'visibleLayerGroups': {'label': True, 'road': True, 'border': True, 'building': True, 'water': True, 'land': True, '3d building': False}, 'threeDBuildingColor': [218.82023004728686, 223.47597962276103, 223.47597962276103], 'mapStyles': {}}}}
losteria_map_v2.add_data(data=restaurants, name='losteria_restaurants') # refresh data
losteria_map_v2.config = config # refresh config
losteria_map_v2 # execute

100%|██████████| 107/107 [00:00<00:00, 2680.64it/s]

User Guide: https://docs.kepler.gl/docs/keplergl-jupyter





KeplerGl(config={'version': 'v1', 'config': {'visState': {'filters': [], 'layers': [{'id': 've9o5ju', 'type': …

When we zoom in (e.g. to the city of Hamburg), we can observe that we have successfully added the hexagons to our restaurants' locations. However, we can see that the hexagons are not centered around our restaurants' locations and thus cover large areas that probably do not account as the restaurants't catchment areas. To fix this issue, we will step down in our resolution and check again:

In [8]:
# Import working directories
import h3

# Creat empty variable
h3_indexes = []

# Download H3 indexes
for i in tqdm(range(len(restaurants.index))):
    # Get lat
    lat = restaurants.iloc[i, 3:4][0]
    # Get lng
    lng = restaurants.iloc[i, 4:5][0]
    # Get H3 index
    h3_index = h3.geo_to_h3(
        lat=lat,
        lng=lng,
        resolution=9
    )
    # Store in variable
    h3_indexes.append(h3_index)

# Store in restaurants variable
restaurants['H3_INDEX'] = h3_indexes

# Visualize
from keplergl import KeplerGl
losteria_map_v3 = KeplerGl(height=600)
config = {'version': 'v1', 'config': {'visState': {'filters': [], 'layers': [{'id': 've9o5ju', 'type': 'hexagonId', 'config': {'dataId': 'losteria_restaurants', 'label': 'LOSTERIA_RESTAURANTS_HEXAGONS', 'color': [249, 196, 196], 'columns': {'hex_id': 'H3_INDEX'}, 'isVisible': True, 'visConfig': {'opacity': 0.5, 'colorRange': {'name': 'Global Warming', 'type': 'sequential', 'category': 'Uber', 'colors': ['#5A1846', '#900C3F', '#C70039', '#E3611C', '#F1920E', '#FFC300']}, 'coverage': 1, 'enable3d': False, 'sizeRange': [0, 500], 'coverageRange': [0, 1], 'elevationScale': 5, 'enableElevationZoomFactor': True}, 'hidden': False, 'textLabel': [{'field': None, 'color': [255, 255, 255], 'size': 18, 'offset': [0, 0], 'anchor': 'start', 'alignment': 'center'}]}, 'visualChannels': {'colorField': None, 'colorScale': 'quantile', 'sizeField': None, 'sizeScale': 'linear', 'coverageField': None, 'coverageScale': 'linear'}}, {'id': '73uc1wa', 'type': 'point', 'config': {'dataId': 'losteria_restaurants', 'label': 'LOSTERIA_RESTAURANTS', 'color': [181, 18, 65], 'columns': {'lat': 'LATITUDE', 'lng': 'LONGITUDE', 'altitude': None}, 'isVisible': True, 'visConfig': {'radius': 10, 'fixedRadius': False, 'opacity': 0.5, 'outline': False, 'thickness': 2, 'strokeColor': None, 'colorRange': {'name': 'Global Warming', 'type': 'sequential', 'category': 'Uber', 'colors': ['#5A1846', '#900C3F', '#C70039', '#E3611C', '#F1920E', '#FFC300']}, 'strokeColorRange': {'name': 'Global Warming', 'type': 'sequential', 'category': 'Uber', 'colors': ['#5A1846', '#900C3F', '#C70039', '#E3611C', '#F1920E', '#FFC300']}, 'radiusRange': [0, 50], 'filled': True}, 'hidden': False, 'textLabel': [{'field': None, 'color': [255, 255, 255], 'size': 18, 'offset': [0, 0], 'anchor': 'start', 'alignment': 'center'}]}, 'visualChannels': {'colorField': None, 'colorScale': 'quantile', 'strokeColorField': None, 'strokeColorScale': 'quantile', 'sizeField': None, 'sizeScale': 'linear'}}], 'interactionConfig': {'tooltip': {'fieldsToShow': {'losteria_restaurants': [{'name': 'STREET', 'format': None}, {'name': 'ZIP_CODE', 'format': None}, {'name': 'CITY', 'format': None}, {'name': 'LATITUDE', 'format': None}, {'name': 'LONGITUDE', 'format': None}]}, 'compareMode': False, 'compareType': 'absolute', 'enabled': True}, 'brush': {'size': 0, 'enabled': False}, 'geocoder': {'enabled': False}, 'coordinate': {'enabled': False}}, 'layerBlending': 'normal', 'splitMaps': [], 'animationConfig': {'currentTime': None, 'speed': 1}}, 'mapState': {'bearing': 0, 'dragRotate': False, 'latitude': 53.56581943999632, 'longitude': 9.954395594799491, 'pitch': 0, 'zoom': 11.223547206164303, 'isSplit': False}, 'mapStyle': {'styleType': 'light', 'topLayerGroups': {}, 'visibleLayerGroups': {'label': True, 'road': True, 'border': True, 'building': True, 'water': True, 'land': True, '3d building': False}, 'threeDBuildingColor': [218.82023004728686, 223.47597962276103, 223.47597962276103], 'mapStyles': {}}}}
losteria_map_v3.add_data(data=restaurants, name='losteria_restaurants') # refresh data
losteria_map_v3.config = config # refresh config
losteria_map_v3 # execute

100%|██████████| 107/107 [00:00<00:00, 2614.95it/s]

User Guide: https://docs.kepler.gl/docs/keplergl-jupyter





KeplerGl(config={'version': 'v1', 'config': {'visState': {'filters': [], 'layers': [{'id': 've9o5ju', 'type': …

We have now used resolution 9 which is covering an area of a little bit more than 0.1km². We can observe that the hexagons are now lying much closer to our locations. However, we still have the problem that the hexagons' centers are far away from our restaurants' locations. We now also have the problem that they cover a much smaller area that would give us less information and might poorly reflect the catchment area of our restaurants. To counter these issues, we will download the hexagons' neighbours and assign them to our data frame. This way we will expand our search radius.

In [9]:
# Degree of neighbouring hexagons
k_neighbours = 2

# Creat empty variable
h3_indexes = pd.DataFrame()

# Get neighbours per hexagon
for i in tqdm(range(len(restaurants.index))):
    # Get H3 index
    h3_index = restaurants.iloc[i, 5:6][0]
    # Get neighbouring hexagon indexes
    h3_neighbours = h3.k_ring(h3_index, k_neighbours)
    # Convert to df
    h3_neighbours = pd.DataFrame(h3_neighbours)
    # Get restaurant information
    street = restaurants.iloc[i, 0:1][0]
    zip_code = restaurants.iloc[i, 1:2][0]
    city = restaurants.iloc[i, 2:3][0]
    lat = restaurants.iloc[i, 3:4][0]
    lng = restaurants.iloc[i, 4:5][0]
    # Store in df
    h3_neighbours['STREET'] = street
    h3_neighbours['ZIP_CODE'] = zip_code
    h3_neighbours['CITY'] = city
    h3_neighbours['LATITUDE'] = lat
    h3_neighbours['LONGITUDE'] = lng
    # Append empty variable
    h3_indexes = h3_indexes.append(h3_neighbours)

# Rename column
h3_indexes.rename(columns={0:'H3_INDEX'}, inplace=True)

# Reset index
h3_indexes.reset_index(drop=True, inplace=True)

# Merge with restaurants variable
restaurants_hexagons = restaurants.merge(h3_indexes, how='right', on=['STREET','ZIP_CODE', 'CITY', 'LATITUDE', 'LONGITUDE', 'H3_INDEX'])

# Visualize
from keplergl import KeplerGl
losteria_map_v4 = KeplerGl(height=600)
config = {'version': 'v1', 'config': {'visState': {'filters': [], 'layers': [{'id': '73uc1wa', 'type': 'point', 'config': {'dataId': 'losteria_restaurants', 'label': 'LOSTERIA_RESTAURANTS', 'color': [181, 18, 65], 'columns': {'lat': 'LATITUDE', 'lng': 'LONGITUDE', 'altitude': None}, 'isVisible': True, 'visConfig': {'radius': 10, 'fixedRadius': False, 'opacity': 0.5, 'outline': False, 'thickness': 2, 'strokeColor': None, 'colorRange': {'name': 'Global Warming', 'type': 'sequential', 'category': 'Uber', 'colors': ['#5A1846', '#900C3F', '#C70039', '#E3611C', '#F1920E', '#FFC300']}, 'strokeColorRange': {'name': 'Global Warming', 'type': 'sequential', 'category': 'Uber', 'colors': ['#5A1846', '#900C3F', '#C70039', '#E3611C', '#F1920E', '#FFC300']}, 'radiusRange': [0, 50], 'filled': True}, 'hidden': False, 'textLabel': [{'field': None, 'color': [255, 255, 255], 'size': 18, 'offset': [0, 0], 'anchor': 'start', 'alignment': 'center'}]}, 'visualChannels': {'colorField': None, 'colorScale': 'quantile', 'strokeColorField': None, 'strokeColorScale': 'quantile', 'sizeField': None, 'sizeScale': 'linear'}}, {'id': '1fxg53d', 'type': 'hexagonId', 'config': {'dataId': 'losteria_restaurants_hexagons', 'label': 'LOSTERIA_RESTAURANTS_HEXAGONS', 'color': [249, 196, 196], 'columns': {'hex_id': 'H3_INDEX'}, 'isVisible': True, 'visConfig': {'opacity': 0.5, 'colorRange': {'name': 'Global Warming', 'type': 'sequential', 'category': 'Uber', 'colors': ['#5A1846', '#900C3F', '#C70039', '#E3611C', '#F1920E', '#FFC300']}, 'coverage': 1, 'enable3d': False, 'sizeRange': [0, 500], 'coverageRange': [0, 1], 'elevationScale': 5, 'enableElevationZoomFactor': True}, 'hidden': False, 'textLabel': [{'field': None, 'color': [255, 255, 255], 'size': 18, 'offset': [0, 0], 'anchor': 'start', 'alignment': 'center'}]}, 'visualChannels': {'colorField': None, 'colorScale': 'quantile', 'sizeField': None, 'sizeScale': 'linear', 'coverageField': None, 'coverageScale': 'linear'}}], 'interactionConfig': {'tooltip': {'fieldsToShow': {'losteria_restaurants': [{'name': 'STREET', 'format': None}, {'name': 'ZIP_CODE', 'format': None}, {'name': 'CITY', 'format': None}, {'name': 'LATITUDE', 'format': None}, {'name': 'LONGITUDE', 'format': None}], 'losteria_restaurants_hexagons': [{'name': 'STREET', 'format': None}, {'name': 'ZIP_CODE', 'format': None}, {'name': 'CITY', 'format': None}, {'name': 'LATITUDE', 'format': None}, {'name': 'LONGITUDE', 'format': None}]}, 'compareMode': False, 'compareType': 'absolute', 'enabled': True}, 'brush': {'size': 0, 'enabled': False}, 'geocoder': {'enabled': False}, 'coordinate': {'enabled': False}}, 'layerBlending': 'normal', 'splitMaps': [], 'animationConfig': {'currentTime': None, 'speed': 1}}, 'mapState': {'bearing': 0, 'dragRotate': False, 'latitude': 53.56581943999632, 'longitude': 9.954395594799491, 'pitch': 0, 'zoom': 11.223547206164303, 'isSplit': False}, 'mapStyle': {'styleType': 'light', 'topLayerGroups': {}, 'visibleLayerGroups': {'label': True, 'road': True, 'border': True, 'building': True, 'water': True, 'land': True, '3d building': False}, 'threeDBuildingColor': [218.82023004728686, 223.47597962276103, 223.47597962276103], 'mapStyles': {}}}}
losteria_map_v4.add_data(data=restaurants, name='losteria_restaurants') # refresh data
losteria_map_v4.add_data(data=restaurants_hexagons, name='losteria_restaurants_hexagons') # new data
losteria_map_v4.config = config # refresh config
losteria_map_v4 # execute

100%|██████████| 107/107 [00:00<00:00, 136.26it/s]


User Guide: https://docs.kepler.gl/docs/keplergl-jupyter


KeplerGl(config={'version': 'v1', 'config': {'visState': {'filters': [], 'layers': [{'id': '73uc1wa', 'type': …

So what happened now? We added neighbouring hexagons of the degree 2 or in other words, we added our initial hexagons' neighbours and their neighbours. In result, we receive a much larger area that a) moves our restaurants'locations more towards the centre of our hexagonial area and b) expands the hexagons to reflect a larger catchment area. With this our H3 data set should be ready and we can move to the next step!

## OPEN STREET MAP
We can now analyze the area of our L'Osteria locations by downloading the information linked to the initial as well as neighbouring hexagons. However, if we want to analyze a specific area such as a city, we cannot use this approach straight forward since we are limited to borders that are not considered under H3. The closer we get to the borders of a city, the more hexagons we would observe that are located outside of the city's borders. Hence, our data could be distorted. So what do we do?
<br>
<br>
We need shape files. A shape file is a specific data format that allows you to handle geospatial data. In our case, we are specifically interested in shapefiles that contain border information of European cities. There are many data providers that offer this information in form of a shape file. Since we are on a budget, we will use the shape files provided by [OpenStreetMap](https://www.openstreetmap.de/). OpenStreetMap is an international project founded in 2004 with the goal of creating a free map of the world. For this purpose they collect data about roads, railroads, rivers, forests, houses and much more worldwide. However, the shapefiles we are interested in are provided by [Geofabrik](https://www.geofabrik.de/) which is a German provider who built its service based on OpenStreetMap. Here, you can download shapefiles of various countries, regions and cities for free.
<br>
<br>
L'Osteria has recently started adding locations in the UK to their portfolio. However, so far there has been no new restaurant in London itself. So guess what we will do - right, we are going to analyse where in London would be a potentially good location for a new L'Osteria restaurant! 
<br>
<br>
So in the first step, we are going to download the necessary shape file from Geofabrik. You can do so via this [link](https://download.geofabrik.de/europe/great-britain/england/greater-london.html). After the download, the file needs to be extracted within the working directory. It then can be read via the [Geopandas](https://geopandas.org/) library. The shapefile we are interested in is called "gis_osm_places_a_free_1.shp" and contains multiple shapes of the city of London in form of polygons. Since we are not interested in multiple different shapes that cluster London, we need to select the right shape with the name "London":

In [10]:
# Import working libraries
import geopandas as gpd

# Set path
path = r'C:\Users\maurice.buettgenbach\OneDrive - Aquila Capital Management GmbH\Desktop\Desktop\Private\IBM\10_Capstone project\IBM_The_battle_of_neighborhoods_v2\LONDON_SHAPE_FILES\gis_osm_places_a_free_1.shp'

# Read file
london_shape = gpd.read_file(path)

# Isolate city polygon
london_shape = london_shape[(london_shape['name'] == 'London')]

# Visualize
from keplergl import KeplerGl
london_map_v1 = KeplerGl(height=600)
config = {'version': 'v1', 'config': {'visState': {'filters': [], 'layers': [{'id': 'ci1qt6n', 'type': 'geojson', 'config': {'dataId': 'london_shape_file', 'label': 'London', 'color': [179, 173, 158], 'columns': {'geojson': 'geometry'}, 'isVisible': True, 'visConfig': {'opacity': 0.8, 'strokeOpacity': 0.8, 'thickness': 2, 'strokeColor': [192, 192, 192], 'colorRange': {'name': 'Global Warming', 'type': 'sequential', 'category': 'Uber', 'colors': ['#5A1846', '#900C3F', '#C70039', '#E3611C', '#F1920E', '#FFC300']}, 'strokeColorRange': {'name': 'Global Warming', 'type': 'sequential', 'category': 'Uber', 'colors': ['#5A1846', '#900C3F', '#C70039', '#E3611C', '#F1920E', '#FFC300']}, 'radius': 10, 'sizeRange': [0, 10], 'radiusRange': [0, 50], 'heightRange': [0, 500], 'elevationScale': 1, 'enableElevationZoomFactor': True, 'stroked': True, 'filled': False, 'enable3d': False, 'wireframe': False}, 'hidden': False, 'textLabel': [{'field': None, 'color': [255, 255, 255], 'size': 18, 'offset': [0, 0], 'anchor': 'start', 'alignment': 'center'}]}, 'visualChannels': {'colorField': None, 'colorScale': 'quantile', 'strokeColorField': None, 'strokeColorScale': 'quantile', 'sizeField': None, 'sizeScale': 'linear', 'heightField': None, 'heightScale': 'linear', 'radiusField': None, 'radiusScale': 'linear'}}], 'interactionConfig': {'tooltip': {'fieldsToShow': {'losteria_restaurants': [{'name': 'STREET', 'format': None}, {'name': 'ZIP_CODE', 'format': None}, {'name': 'CITY', 'format': None}, {'name': 'LATITUDE', 'format': None}, {'name': 'LONGITUDE', 'format': None}], 'losteria_restaurants_hexagons': [{'name': 'STREET', 'format': None}, {'name': 'ZIP_CODE', 'format': None}, {'name': 'CITY', 'format': None}, {'name': 'LATITUDE', 'format': None}, {'name': 'LONGITUDE', 'format': None}], 'london_shape_file': [{'name': 'osm_id', 'format': None}, {'name': 'code', 'format': None}, {'name': 'fclass', 'format': None}, {'name': 'name', 'format': None}]}, 'compareMode': False, 'compareType': 'absolute', 'enabled': True}, 'brush': {'size': 0, 'enabled': False}, 'geocoder': {'enabled': False}, 'coordinate': {'enabled': False}}, 'layerBlending': 'normal', 'splitMaps': [], 'animationConfig': {'currentTime': None, 'speed': 1}}, 'mapState': {'bearing': 0, 'dragRotate': False, 'latitude': 51.54868482103512, 'longitude': -0.054559532218202485, 'pitch': 0, 'zoom': 9.579370455413446, 'isSplit': False}, 'mapStyle': {'styleType': 'light', 'topLayerGroups': {}, 'visibleLayerGroups': {'label': True, 'road': True, 'border': True, 'building': True, 'water': True, 'land': True, '3d building': False}, 'threeDBuildingColor': [218.82023004728686, 223.47597962276103, 223.47597962276103], 'mapStyles': {}}}}
london_map_v1.add_data(data=london_shape, name='london_shape_file')
london_map_v1.config = config # refresh config
london_map_v1 # execute

User Guide: https://docs.kepler.gl/docs/keplergl-jupyter


KeplerGl(config={'version': 'v1', 'config': {'visState': {'filters': [], 'layers': [{'id': 'ci1qt6n', 'type': …

We can observe that our approach was successful - with the exception that the city centre is missing a piece. The shape seems to be excluding the district "City of London". The documentation of Geofabrik says that all "city" shapes follow official boundaries. A quick research shows that the City of London has a different politcal status than the rest of London and hence is treated as its own "city in the city". It is therefore not covered by the shapefile. 
<br>
<br>
Since we want to cover the entire city, we need to find a solution on how to eliminate the donut hole in our polygon. Good thing that there is stackoverflow! Someone apparently already had this problem, so we will use [this code](https://stackoverflow.com/questions/61427797/filling-a-hole-in-a-multipolygon-with-shapely-netherlands-2-digit-postal-codes) and amend it to our variables:

In [11]:
# Import working libraries
from shapely.geometry import Polygon

# Define function
def close_holes(poly: Polygon) -> Polygon:
        """
        Close polygon holes by limitation to the exterior ring.
        Args:
            poly: Input shapely Polygon
        Example:
            df.geometry.apply(lambda p: close_holes(p))
        """
        if poly.interiors:
            return Polygon(list(poly.exterior.coords))
        else:
            return poly

# Apply function
london_shape = london_shape.geometry.apply(lambda p: close_holes(p))

# Convert to geopandas df
london_shape = gpd.GeoDataFrame(london_shape)

# Copy for later mapping
london_border = london_shape.copy() # copy

# Visualize
from keplergl import KeplerGl
london_map_v2 = KeplerGl(height=600)
config = {'version': 'v1', 'config': {'visState': {'filters': [], 'layers': [{'id': 'ci1qt6n', 'type': 'geojson', 'config': {'dataId': 'london_shape_file', 'label': 'London', 'color': [179, 173, 158], 'columns': {'geojson': 'geometry'}, 'isVisible': True, 'visConfig': {'opacity': 0.8, 'strokeOpacity': 0.8, 'thickness': 2, 'strokeColor': [192, 192, 192], 'colorRange': {'name': 'Global Warming', 'type': 'sequential', 'category': 'Uber', 'colors': ['#5A1846', '#900C3F', '#C70039', '#E3611C', '#F1920E', '#FFC300']}, 'strokeColorRange': {'name': 'Global Warming', 'type': 'sequential', 'category': 'Uber', 'colors': ['#5A1846', '#900C3F', '#C70039', '#E3611C', '#F1920E', '#FFC300']}, 'radius': 10, 'sizeRange': [0, 10], 'radiusRange': [0, 50], 'heightRange': [0, 500], 'elevationScale': 1, 'enableElevationZoomFactor': True, 'stroked': True, 'filled': False, 'enable3d': False, 'wireframe': False}, 'hidden': False, 'textLabel': [{'field': None, 'color': [255, 255, 255], 'size': 18, 'offset': [0, 0], 'anchor': 'start', 'alignment': 'center'}]}, 'visualChannels': {'colorField': None, 'colorScale': 'quantile', 'strokeColorField': None, 'strokeColorScale': 'quantile', 'sizeField': None, 'sizeScale': 'linear', 'heightField': None, 'heightScale': 'linear', 'radiusField': None, 'radiusScale': 'linear'}}], 'interactionConfig': {'tooltip': {'fieldsToShow': {'losteria_restaurants': [{'name': 'STREET', 'format': None}, {'name': 'ZIP_CODE', 'format': None}, {'name': 'CITY', 'format': None}, {'name': 'LATITUDE', 'format': None}, {'name': 'LONGITUDE', 'format': None}], 'losteria_restaurants_hexagons': [{'name': 'STREET', 'format': None}, {'name': 'ZIP_CODE', 'format': None}, {'name': 'CITY', 'format': None}, {'name': 'LATITUDE', 'format': None}, {'name': 'LONGITUDE', 'format': None}], 'london_shape_file': [{'name': 'osm_id', 'format': None}, {'name': 'code', 'format': None}, {'name': 'fclass', 'format': None}, {'name': 'name', 'format': None}]}, 'compareMode': False, 'compareType': 'absolute', 'enabled': True}, 'brush': {'size': 0, 'enabled': False}, 'geocoder': {'enabled': False}, 'coordinate': {'enabled': False}}, 'layerBlending': 'normal', 'splitMaps': [], 'animationConfig': {'currentTime': None, 'speed': 1}}, 'mapState': {'bearing': 0, 'dragRotate': False, 'latitude': 51.54868482103512, 'longitude': -0.054559532218202485, 'pitch': 0, 'zoom': 9.579370455413446, 'isSplit': False}, 'mapStyle': {'styleType': 'light', 'topLayerGroups': {}, 'visibleLayerGroups': {'label': True, 'road': True, 'border': True, 'building': True, 'water': True, 'land': True, '3d building': False}, 'threeDBuildingColor': [218.82023004728686, 223.47597962276103, 223.47597962276103], 'mapStyles': {}}}}
london_map_v2.add_data(data=london_shape, name='london_shape_file')
london_map_v2.config = config # refresh config
london_map_v2 # execute

User Guide: https://docs.kepler.gl/docs/keplergl-jupyter


KeplerGl(config={'version': 'v1', 'config': {'visState': {'filters': [], 'layers': [{'id': 'ci1qt6n', 'type': …

Now that we have one shape covering entire London, we can begin the download the H3 hexagons for London. We will do so by using the "polyfill" function of the H3 python library. This function will "fill up" our shape with unique hexagons until the shape is completely covered:

In [12]:
# Convert to json
london_geojson = london_shape.to_json()

# Convert do dict
london_geojson = pd.read_json(london_geojson)

# Isolate polygon
london_poly = london_geojson['features'][0]['geometry'].copy() 

# Set resolution
resolution = 9

# Get h3 indexes
london_h3_hex = h3.polyfill_geojson(london_poly, resolution) 

# Convert to pandas df
london_h3_hex = pd.DataFrame(london_h3_hex)

# Show df shape
print(london_h3_hex.shape)

# Visualize
from keplergl import KeplerGl
london_map_v3 = KeplerGl(height=600)
config = {'version': 'v1', 'config': {'visState': {'filters': [], 'layers': [{'id': 'ci1qt6n', 'type': 'geojson', 'config': {'dataId': 'london_shape_file', 'label': 'LONDON_BORDER', 'color': [179, 173, 158], 'columns': {'geojson': 'geometry'}, 'isVisible': True, 'visConfig': {'opacity': 0.8, 'strokeOpacity': 0.8, 'thickness': 2, 'strokeColor': [192, 192, 192], 'colorRange': {'name': 'Global Warming', 'type': 'sequential', 'category': 'Uber', 'colors': ['#5A1846', '#900C3F', '#C70039', '#E3611C', '#F1920E', '#FFC300']}, 'strokeColorRange': {'name': 'Global Warming', 'type': 'sequential', 'category': 'Uber', 'colors': ['#5A1846', '#900C3F', '#C70039', '#E3611C', '#F1920E', '#FFC300']}, 'radius': 10, 'sizeRange': [0, 10], 'radiusRange': [0, 50], 'heightRange': [0, 500], 'elevationScale': 1, 'enableElevationZoomFactor': True, 'stroked': True, 'filled': False, 'enable3d': False, 'wireframe': False}, 'hidden': False, 'textLabel': [{'field': None, 'color': [255, 255, 255], 'size': 18, 'offset': [0, 0], 'anchor': 'start', 'alignment': 'center'}]}, 'visualChannels': {'colorField': None, 'colorScale': 'quantile', 'strokeColorField': None, 'strokeColorScale': 'quantile', 'sizeField': None, 'sizeScale': 'linear', 'heightField': None, 'heightScale': 'linear', 'radiusField': None, 'radiusScale': 'linear'}}, {'id': 'akq9on', 'type': 'hexagonId', 'config': {'dataId': 'london_h3_hexagons', 'label': 'LONDON_H3_RES_9', 'color': [47, 197, 204], 'columns': {'hex_id': 'column_0'}, 'isVisible': True, 'visConfig': {'opacity': 0.05, 'colorRange': {'name': 'Global Warming', 'type': 'sequential', 'category': 'Uber', 'colors': ['#5A1846', '#900C3F', '#C70039', '#E3611C', '#F1920E', '#FFC300']}, 'coverage': 1, 'enable3d': False, 'sizeRange': [0, 500], 'coverageRange': [0, 1], 'elevationScale': 5, 'enableElevationZoomFactor': True}, 'hidden': False, 'textLabel': [{'field': None, 'color': [255, 255, 255], 'size': 18, 'offset': [0, 0], 'anchor': 'start', 'alignment': 'center'}]}, 'visualChannels': {'colorField': None, 'colorScale': 'quantile', 'sizeField': None, 'sizeScale': 'linear', 'coverageField': None, 'coverageScale': 'linear'}}], 'interactionConfig': {'tooltip': {'fieldsToShow': {'london_shape_file': [], 'london_h3_hexagons': [{'name': 'column_0', 'format': None}]}, 'compareMode': False, 'compareType': 'absolute', 'enabled': True}, 'brush': {'size': 0, 'enabled': False}, 'geocoder': {'enabled': False}, 'coordinate': {'enabled': False}}, 'layerBlending': 'normal', 'splitMaps': [], 'animationConfig': {'currentTime': None, 'speed': 1}}, 'mapState': {'bearing': 0, 'dragRotate': False, 'latitude': 51.49025098481283, 'longitude': -0.11004399497071785, 'pitch': 0, 'zoom': 9.031311538496494, 'isSplit': False}, 'mapStyle': {'styleType': 'light', 'topLayerGroups': {}, 'visibleLayerGroups': {'label': True, 'road': True, 'border': True, 'building': True, 'water': True, 'land': True, '3d building': False}, 'threeDBuildingColor': [218.82023004728686, 223.47597962276103, 223.47597962276103], 'mapStyles': {}}}}
london_map_v3.add_data(data=london_border, name='london_shape_file')
london_map_v3.add_data(data=london_h3_hex, name='london_h3_hexagons')
london_map_v3.config = config # refresh config
london_map_v3 # execute

(16890, 1)
User Guide: https://docs.kepler.gl/docs/keplergl-jupyter


KeplerGl(config={'version': 'v1', 'config': {'visState': {'filters': [], 'layers': [{'id': 'ci1qt6n', 'type': …

In result, our London shape has been filled up with a total of 16,890 unique hexagons with the resolution 9. When you scroll in and have a closer look at the shape's border, you will see that the hexagons are sometimes overlapping the border or that some hexagons are missing. This is correct since the polyfill function excludes hexagons whose majority does not lie within the given shape. In result, the H3 mapping approach is getting imprecise around a shape's border. However, for our purpose this is negligible as we are going to add a few more hexagons to simulate catchment areas. To do that, we can simply recycle our code from above:

In [13]:
# Degree of neighbouring hexagons
k_neighbours = 2

# Creat empty variable
h3_indexes = pd.DataFrame()

# Get neighbours per hexagon
for i in tqdm(range(len(london_h3_hex.index))):
    # Get H3 index
    h3_index = london_h3_hex.iloc[i, 0:1][0]
    # Get neighbouring hexagon indexes
    h3_neighbours = h3.k_ring(h3_index, k_neighbours)
    # Convert to df
    h3_neighbours = pd.DataFrame(h3_neighbours)
    # Store initial hexagon in df
    h3_neighbours['INITIAL_HEXAGON'] = h3_index
    # Append empty variable
    h3_indexes = h3_indexes.append(h3_neighbours)

# Rename column
h3_indexes.rename(columns={0:'H3_INDEX'}, inplace=True)

# Reset index
h3_indexes.reset_index(drop=True, inplace=True)

# Update variable
london_h3_hex = h3_indexes

# Isolate unique hexagons for download and mapping
london_h3_hex_unique = pd.DataFrame(london_h3_hex['H3_INDEX'].unique())

# Refresh border data
london_border = pd.DataFrame(london_border) # convert to df

# Visualize shape
print(london_h3_hex.shape)
# Visualize kepler map
from keplergl import KeplerGl
london_map_v4 = KeplerGl(height=600)
config = {'version': 'v1', 'config': {'visState': {'filters': [], 'layers': [{'id': 'ci1qt6n', 'type': 'geojson', 'config': {'dataId': 'london_shape_file', 'label': 'LONDON_BORDER', 'color': [179, 173, 158], 'columns': {'geojson': 'geometry'}, 'isVisible': True, 'visConfig': {'opacity': 0.8, 'strokeOpacity': 0.8, 'thickness': 2, 'strokeColor': [192, 192, 192], 'colorRange': {'name': 'Global Warming', 'type': 'sequential', 'category': 'Uber', 'colors': ['#5A1846', '#900C3F', '#C70039', '#E3611C', '#F1920E', '#FFC300']}, 'strokeColorRange': {'name': 'Global Warming', 'type': 'sequential', 'category': 'Uber', 'colors': ['#5A1846', '#900C3F', '#C70039', '#E3611C', '#F1920E', '#FFC300']}, 'radius': 10, 'sizeRange': [0, 10], 'radiusRange': [0, 50], 'heightRange': [0, 500], 'elevationScale': 1, 'enableElevationZoomFactor': True, 'stroked': True, 'filled': False, 'enable3d': False, 'wireframe': False}, 'hidden': False, 'textLabel': [{'field': None, 'color': [255, 255, 255], 'size': 18, 'offset': [0, 0], 'anchor': 'start', 'alignment': 'center'}]}, 'visualChannels': {'colorField': None, 'colorScale': 'quantile', 'strokeColorField': None, 'strokeColorScale': 'quantile', 'sizeField': None, 'sizeScale': 'linear', 'heightField': None, 'heightScale': 'linear', 'radiusField': None, 'radiusScale': 'linear'}}, {'id': 'akq9on', 'type': 'hexagonId', 'config': {'dataId': 'london_h3_hexagons', 'label': 'LONDON_H3_RES_9', 'color': [47, 197, 204], 'columns': {'hex_id': 'column_0'}, 'isVisible': True, 'visConfig': {'opacity': 0.05, 'colorRange': {'name': 'Global Warming', 'type': 'sequential', 'category': 'Uber', 'colors': ['#5A1846', '#900C3F', '#C70039', '#E3611C', '#F1920E', '#FFC300']}, 'coverage': 1, 'enable3d': False, 'sizeRange': [0, 500], 'coverageRange': [0, 1], 'elevationScale': 5, 'enableElevationZoomFactor': True}, 'hidden': False, 'textLabel': [{'field': None, 'color': [255, 255, 255], 'size': 18, 'offset': [0, 0], 'anchor': 'start', 'alignment': 'center'}]}, 'visualChannels': {'colorField': None, 'colorScale': 'quantile', 'sizeField': None, 'sizeScale': 'linear', 'coverageField': None, 'coverageScale': 'linear'}}], 'interactionConfig': {'tooltip': {'fieldsToShow': {'london_shape_file': [], 'london_h3_hexagons': [{'name': 'column_0', 'format': None}]}, 'compareMode': False, 'compareType': 'absolute', 'enabled': True}, 'brush': {'size': 0, 'enabled': False}, 'geocoder': {'enabled': False}, 'coordinate': {'enabled': False}}, 'layerBlending': 'normal', 'splitMaps': [], 'animationConfig': {'currentTime': None, 'speed': 1}}, 'mapState': {'bearing': 0, 'dragRotate': False, 'latitude': 51.49025098481283, 'longitude': -0.11004399497071785, 'pitch': 0, 'zoom': 9.031311538496494, 'isSplit': False}, 'mapStyle': {'styleType': 'light', 'topLayerGroups': {}, 'visibleLayerGroups': {'label': True, 'road': True, 'border': True, 'building': True, 'water': True, 'land': True, '3d building': False}, 'threeDBuildingColor': [218.82023004728686, 223.47597962276103, 223.47597962276103], 'mapStyles': {}}}}
london_map_v4.add_data(data=london_border, name='london_shape_file')
london_map_v4.add_data(data=london_h3_hex_unique, name='london_h3_hexagons')
london_map_v4.config = config # refresh config
london_map_v4 # execute

100%|██████████| 16890/16890 [01:30<00:00, 187.26it/s]


(320910, 2)
User Guide: https://docs.kepler.gl/docs/keplergl-jupyter


KeplerGl(config={'version': 'v1', 'config': {'visState': {'filters': [], 'layers': [{'id': 'ci1qt6n', 'type': …

Alright, what happened? In the first step, we have assigned all of the hexagons in our London shape their neighbouring hexagons with a degree of two. In result, we received a much larger df. Is that right? Yes, it is because each unique hexagon is also included as a neighbour of another unique hexagon. We therefore have a lot of overlapping catchment areas and hence a much larger dataframe. 
<br>
<br>
In the second step, we have isolated the unique hexagons from our larger df. The result can be seen on the map above: We now have hexagons that are outside of our shape. These hexagons have been added because they are degree two neighbours of the hexagons located close to the shape's borders. This way we were able to solve the issue that some hexagons were excluded by the polyfill function without loosing the initial shape's appearance. 
<br>
<br>
We are now ready to make the next step: Adding actual data!

## FOURSQUARE
To add data to our hexagons, we will use [Foursquare](https://foursquare.com/) which is an app that lets its users rank all different kinds of venues all over the world. By doing so, Foursquare has build one of the most comprehensive geospatial data sets in the world. The data set covers an extremely large amount of venue and user data that allows you to access everything from a venue's name, location or menu up to user ratings, comments or pictures of each venue. And the best part: Foursquare provides developers an [API](https://developer.foursquare.com/) that is (in limits) free to use! 
<br>
<br>
Even though Foursquare provides an incredible variety of data, we will use the venue categories (e.g. sushi restaurant, park, sports club, italian restaurant, etc.) for our analysis only. While it would most likely increase the success of our algorithm to include more information such as rankings or types of user data, we will not include this data due to ressource limitations.
<br>
<br>
We will use the Foursquare data to analyze the area of the hexagons (or neighbouring hexagons) in which L'Osteria has existing restaurants as well as the hexagons of the new city we are interested in. This way we will be able to cluster the hexagons according to their characteristics.
<br>
<br>
Ok, let's start with the basics: Adding our user data! However, I will not include my user credentials in the code due to privacy issues. Instead, I will load them from a txt file I have prepared. The textfile is not stored on the repository. So if you want to replicate my steps, you would have to do create your own txt files and load your credentials seperately.

In [14]:
# Set credentials
client_id = list(pd.read_csv(r'C:\Users\maurice.buettgenbach\OneDrive - Aquila Capital Management GmbH\Desktop\Desktop\Private\IBM\10_Capstone project\IBM_The_battle_of_neighborhoods_v2\FOURSQUARE_LOGIN_DETAILS\FOURSQUARE_CLIENT_ID.txt').columns)[0]
client_secret = list(pd.read_csv(r'C:\Users\maurice.buettgenbach\OneDrive - Aquila Capital Management GmbH\Desktop\Desktop\Private\IBM\10_Capstone project\IBM_The_battle_of_neighborhoods_v2\FOURSQUARE_LOGIN_DETAILS\FOURSQUARE_CLIENT_SECRET.txt').columns)[0]
access_token = list(pd.read_csv(r'C:\Users\maurice.buettgenbach\OneDrive - Aquila Capital Management GmbH\Desktop\Desktop\Private\IBM\10_Capstone project\IBM_The_battle_of_neighborhoods_v2\FOURSQUARE_LOGIN_DETAILS\FOURSQUARE_ACCESS_TOKEN.txt').columns)[0]

# Check out
print("Foursquare credentials loaded.")

Foursquare credentials loaded.


With our login details ready, we are able to download data now. Foursquare downloads the information based on GPS coordinates. At the moment we have only coordinates of the existing restaurants themselves. Since we want to base our analysis on the defined hexagons, this information is not enough. Before we can start downloading data, we need to get the correct GPS coordinates.
<br>
<br>
Depending on the size of the hexagon, it can contain more than a million different GPS coordinates. So choosing the right one is crucial. The Foursquare API lets you download data based on a radius. So if we would choose a random set of GPS coordinates within each hexagon, we will have the risk that the different radiuses are overlapping and most likely leave vast areas uncovered. We will therefore extract the GPS coordinates of the centroids for each hexagon. This way we will be able to adjust the radius to a maximum that covers most of the hexagons' areas. 
<br>
<br>
To do so we will first merge our restaurand and london variables. Then we will use the H3 library to retrieve the hexagons' centroids:

In [15]:
# Standardize columns
london_h3_hex_unique.rename(columns={0:'H3_INDEX'}, inplace=True)
restaurants_hexagons.rename(columns={
    'STREET':'RESTAURANT_STREET',
    'ZIP_CODE':'RESTAURANT_ZIP_CODE',
    'CITY':'RESTAURANT_CITY',
    'LATITUDE':'RESTAURANT_LATITUDE',
    'LONGITUTDE':'RESTAURANT_LONGITUDE'
    }, inplace=True)


# Set frames
frames = [restaurants_hexagons, london_h3_hex_unique]

# Concatenate data sets
venue_data = pd.concat(frames)

# Get centroids' latitudes
venue_data['H3_CENTROID_LATITUDE'] = venue_data.apply(lambda row: h3.h3_to_geo(row['H3_INDEX'])[0], axis=1)

# Get centroids' longitude
venue_data['H3_CENTROID_LONGITUDE'] = venue_data.apply(lambda row: h3.h3_to_geo(row['H3_INDEX'])[1], axis=1)

# Reset index
venue_data.reset_index(drop=True, inplace=True)

# Viz
print(venue_data.shape)
venue_data.tail()

(20391, 8)


Unnamed: 0,RESTAURANT_STREET,RESTAURANT_ZIP_CODE,RESTAURANT_CITY,RESTAURANT_LATITUDE,LONGITUDE,H3_INDEX,H3_CENTROID_LATITUDE,H3_CENTROID_LONGITUDE
20386,,,,,,89194ac8437ffff,51.325372,-0.334556
20387,,,,,,89194ac8433ffff,51.327884,-0.33752
20388,,,,,,89194ac1dd3ffff,51.296847,-0.155008
20389,,,,,,89195db41abffff,51.695144,-0.083397
20390,,,,,,89194e61463ffff,51.629929,0.177963


Alrighty! We are ready to proceed and start downloading our data from Foursquare. To download data, we can use Foursquare's API that is based on an URl request. To make a successful request, we have to provide a number of information:
<br>
<br>
- Client credientials:
<br>
We have to provide the client id, secret and access token that we have created earlier. Foursquare requires this information to match the request with an existing account. 
<br>
<br>
- Latitude & longitude:
<br>
We need to provide the latitude and longitude of the GPS coordinates where we want to download data from. 
<br>
- Version:
<br>
We need to specify the version of the API we want to use. In our case, we will stick to the versioning that was available within the IBM course. This verison was '20180605'.
<br>
<br>
- Radius:
<br>
As already mentioned, the API requires us to state a raduis in which we want to download data from. Picking the right radius is crucial as a small radius will miss data and a too large radius can potentially double it. In result, our algorithm would be biased. So what's the correct radius? According to the H3 documentation, a hexagon with a resolution of 9 has a radius of 0.174375668km. Since the API requires the input in meters, we need to convert this figure to 174.375668m.
<br>
<br>
- Limit:
<br>
The API downloads all venues within the radio, starting with the most popular one. The limit setting allows us to state a maximum of vanues we want to download. As the London dataset includes more than 20k hexagons, a high limit (e.g. 100) would result in very large amounts of data. As I only have a rather old laptop to work with, we will set the limit to the top 5 venues in our radius. That would give us approx. more than 100k datapoints to work with which should be enough for our algorithm and still handable for my vintage laptop.

Finally, we need to talk about another limit - the API call limit. Foursquare data is generally for free but the API limits the calls for a regular user to 950 calls per day. A premium user with a personal subcription plan is allowed to make up to 99,500 calls per day. Both, regular and premium users, are limited to a 500 calls per hour limit.
<br>
<br>
Our venue data set has 20,391 unique hexagons. With a regular user's call limit, we would need 21 days to extract the information. Since that is way too long, I have upgraded my account to a personal non-commercial account. This plan is for free as well, however, you need to verify yourself and provide a credit card. 
<br>
<br>
While this solves our problem with the daily limit, we still have the issue with the hourly limit. To overcome this, we will partition our dataset so that it contains a maximum of 500 rows each. We then will install a timer that pauses our program for 1 hour before the next partition is downloaded. Since we have to wait one hour in between each download process, the programm will run over 40 hours in total. To avoid any data losses during the process, we will save the information for each partition as a CSV file. After all partitions have been downloaded, we will merge the datasets and continue. 
<br>
<br>
We will begin with partitioning the data. While there is most likely a more elegant way to do it, I chose to go for the brute force method and code the partitions myself. Mea culpa.

In [16]:
# Partitioning
part_1 = venue_data[:499].copy()
part_2 = venue_data[499:999].copy()
part_3 = venue_data[1000:1499].copy()
part_4 = venue_data[1500:1999].copy()
part_5 = venue_data[2000:2499].copy()
part_6 = venue_data[2500:2999].copy()
part_7 = venue_data[3000:3499].copy()
part_8 = venue_data[3500:3999].copy()
part_9 = venue_data[4000:4499].copy()
part_10 = venue_data[4500:4999].copy()
part_11 = venue_data[5000:5499].copy()
part_12 = venue_data[5500:5999].copy()
part_13 = venue_data[6000:6499].copy()
part_14 = venue_data[6500:6999].copy()
part_15 = venue_data[7000:7499].copy()
part_16 = venue_data[7500:7999].copy()
part_17 = venue_data[8000:8499].copy()
part_18 = venue_data[8500:8999].copy()
part_19 = venue_data[9000:9499].copy()
part_20 = venue_data[9500:9999].copy()
part_21 = venue_data[10000:10499].copy()
part_22 = venue_data[10500:10999].copy()
part_23 = venue_data[11000:11499].copy()
part_24 = venue_data[11500:11999].copy()
part_25 = venue_data[12000:12499].copy()
part_26 = venue_data[12500:12999].copy()
part_27 = venue_data[13000:13499].copy()
part_28 = venue_data[13500:13999].copy()
part_29 = venue_data[14000:14499].copy()
part_30 = venue_data[14500:14999].copy()
part_31 = venue_data[15000:15499].copy()
part_32 = venue_data[15500:15999].copy()
part_33 = venue_data[16000:16499].copy()
part_34 = venue_data[16500:16999].copy()
part_35 = venue_data[17000:17499].copy()
part_36 = venue_data[17500:17999].copy()
part_37 = venue_data[18000:18499].copy()
part_38 = venue_data[18500:18999].copy()
part_39 = venue_data[19000:19499].copy()
part_40 = venue_data[19500:19999].copy()
part_41 = venue_data[20000:20391].copy()

# Store in list for iteration
partitions = [
    part_1,
    part_2,
    part_3,
    part_4,
    part_5,
    part_6,
    part_7,
    part_8,
    part_9,
    part_10,
    part_11,
    part_12,
    part_13,
    part_14,
    part_15,
    part_16, 
    part_17,
    part_18,
    part_19,
    part_20,
    part_21,
    part_22,
    part_23,
    part_24,
    part_25,
    part_26,
    part_27,
    part_28,
    part_29,
    part_30,
    part_31,
    part_32,
    part_33,
    part_34,
    part_35,
    part_36,
    part_37,
    part_38,
    part_39,
    part_40,
    part_41]

Alright. We now have a list with multiple partitions of our venue dataset. Each dataframe has no more than 500 rows, making it impossible to breach the hourly download limit. The next code sections will download the data for each partition automatically. 
<br>
<br>
However, there is a catch. Normally, one section would be enough. We could set the timer to 1 hour waiting period and let it iterate over our partitions autonomously. In this case the program would run for a little more than 41 hours. In this case, however, I have decided to split the run time into 8 hour intervals. Reason for this is that I need to commute to work everyday during which the computer will loose internet connection, resulting in an error which would break the loop. Of course that is something we could implement in our code as well but as this way works too, we should try to make it not more complicated than it is already. ;-)
<br>
<br>
Alrighty! See you in 41 hours.

In [17]:
# Import working libraries
import json
import requests
import time

# Define Foursqaure settings
version = '20180605'
radius = 174.375668
limit = 5

# Set snooze for timer
snooze = 3600 # 1 hour in seconds

# Function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

# Create empty variable
top_nearby_venues_1 = pd.DataFrame()

# Loop through data set
for partition in partitions[:8]:
    print("Starting download...")
    for i in tqdm(range(len(partition.index))):
        # Get hex id
        hex_id = partition.iloc[i, 5:6][0]
        # Get hex lat
        lat = partition.iloc[i, 6:7][0]
        # Get hex lng
        lng = partition.iloc[i, 7:8][0]
        # Construct URL    
        url = 'https://api.foursquare.com/v2/venues/search?client_id={}&client_secret={}&ll={},{}&oauth_token={}&v={}&radius={}&limit={}'.format(
            client_id, 
            client_secret, 
            lat, 
            lng, 
            access_token, 
            version, 
            radius, 
            limit)
        # Get results
        results = requests.get(url).json()
        # Get venues
        venues = results['response']['venues']
        # Convert venues dict to df
        nearby_venues = pd.json_normalize(venues)
        # Filter columns
        filtered_columns = ['name', 'categories', 'location.lat', 'location.lng']
        nearby_venues = nearby_venues.loc[:, filtered_columns]
        # Filter the category for each row
        nearby_venues['categories'] = nearby_venues.apply(get_category_type, axis=1)
        # Clean columns
        nearby_venues.columns = [col.split(".")[-1] for col in nearby_venues.columns]
        # Store hex id
        nearby_venues['H3_INDEX'] = hex_id
        # Store data in variable
        top_nearby_venues_1 = nearby_venues.append(top_nearby_venues_1)
    # Check out
    print("Download complete...")
    # Set start time
    start_time = time.time()
    print("Timer started...")
    # Start timer
    while True:
        # Set current time
        current_time = time.time()
        # Calc elapsed time
        elapsed_time = current_time - start_time                        
        # Check snooze
        if elapsed_time > snooze: # End loop after 60 min
            print("... snooze over!")
            print()
            break

# Update
print()
print("All data downloaded!")

# Save data
top_nearby_venues_1.to_csv('TOP_NEARBY_VENUES_1.csv')
# Check out
print()
print("Data saved.")
print()

# Viz
print(top_nearby_venues_1.shape)
top_nearby_venues_1.head(10)

  0%|          | 0/499 [00:00<?, ?it/s]

Starting download...


100%|██████████| 499/499 [02:01<00:00,  4.09it/s]


Download complete...
Timer started...


  0%|          | 0/500 [00:00<?, ?it/s]

... snooze over!

Starting download...


100%|██████████| 500/500 [02:02<00:00,  4.07it/s]


Download complete...
Timer started...


  0%|          | 0/499 [00:00<?, ?it/s]

... snooze over!

Starting download...


100%|██████████| 499/499 [02:04<00:00,  4.02it/s]


Download complete...
Timer started...


  0%|          | 0/499 [00:00<?, ?it/s]

... snooze over!

Starting download...


100%|██████████| 499/499 [02:00<00:00,  4.14it/s]


Download complete...
Timer started...


  0%|          | 0/499 [00:00<?, ?it/s]

... snooze over!

Starting download...


100%|██████████| 499/499 [02:03<00:00,  4.04it/s]


Download complete...
Timer started...


  0%|          | 0/499 [00:00<?, ?it/s]

... snooze over!

Starting download...


100%|██████████| 499/499 [02:24<00:00,  3.45it/s]


Download complete...
Timer started...


  0%|          | 0/499 [00:00<?, ?it/s]

... snooze over!

Starting download...


100%|██████████| 499/499 [03:01<00:00,  2.76it/s]


Download complete...
Timer started...


  0%|          | 0/499 [00:00<?, ?it/s]

... snooze over!

Starting download...


100%|██████████| 499/499 [02:04<00:00,  4.01it/s]


Download complete...
Timer started...
... snooze over!


All data downloaded!

Data saved.

(19965, 5)


Unnamed: 0,name,categories,lat,lng,H3_INDEX
0,Saturn,Electronics Store,51.222585,6.780338,891fa52b2a7ffff
1,Mr. Phung,Chinese Restaurant,51.222596,6.779392,891fa52b2a7ffff
2,SEVENS,Shopping Mall,51.222573,6.779397,891fa52b2a7ffff
3,Moleskine,Stationery Store,51.222266,6.779591,891fa52b2a7ffff
4,Gran Caffè Leonardo,Café,51.22257,6.77937,891fa52b2a7ffff
0,Takumi,Ramen Restaurant,51.223429,6.788531,891fa52b3d7ffff
1,Tonkatsu Gonta,Japanese Restaurant,51.22349,6.788785,891fa52b3d7ffff
2,Modern Times,Nightclub,51.223359,6.788862,891fa52b3d7ffff
3,Kyoto - Japan Art Deco,Arts & Crafts Store,51.223554,6.788404,891fa52b3d7ffff
4,What's Beef,Burger Joint,51.223753,6.788566,891fa52b3d7ffff


In [18]:
# Import working libraries
import json
import requests
import time

# Define Foursqaure settings
version = '20180605'
radius = 174.375668
limit = 5

# Set snooze for timer
snooze = 3600 # 1 hour in seconds

# Function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

# Create empty variable
top_nearby_venues_2 = pd.DataFrame()

# Loop through data set
for partition in partitions[8:16]:
    print("Starting download...")
    for i in tqdm(range(len(partition.index))):
        # Get hex id
        hex_id = partition.iloc[i, 5:6][0]
        # Get hex lat
        lat = partition.iloc[i, 6:7][0]
        # Get hex lng
        lng = partition.iloc[i, 7:8][0]
        # Construct URL    
        url = 'https://api.foursquare.com/v2/venues/search?client_id={}&client_secret={}&ll={},{}&oauth_token={}&v={}&radius={}&limit={}'.format(
            client_id, 
            client_secret, 
            lat, 
            lng, 
            access_token, 
            version, 
            radius, 
            limit)
        # Get results
        results = requests.get(url).json()
        # Get venues
        venues = results['response']['venues']
        # Convert venues dict to df
        nearby_venues = pd.json_normalize(venues)
        # Filter columns
        filtered_columns = ['name', 'categories', 'location.lat', 'location.lng']
        nearby_venues = nearby_venues.loc[:, filtered_columns]
        # Filter the category for each row
        nearby_venues['categories'] = nearby_venues.apply(get_category_type, axis=1)
        # Clean columns
        nearby_venues.columns = [col.split(".")[-1] for col in nearby_venues.columns]
        # Store hex id
        nearby_venues['H3_INDEX'] = hex_id
        # Store data in variable
        top_nearby_venues_2 = nearby_venues.append(top_nearby_venues_2)
    # Check out
    print("Download complete...")
    # Set start time
    start_time = time.time()
    print("Timer started...")
    # Start timer
    while True:
        # Set current time
        current_time = time.time()
        # Calc elapsed time
        elapsed_time = current_time - start_time                        
        # Check snooze
        if elapsed_time > snooze: # End loop after 60 min
            print("... snooze over!")
            print()
            break

# Update
print()
print("All data downloaded!")

# Save data
top_nearby_venues_2.to_csv('TOP_NEARBY_VENUES_2.csv')
# Check out
print()
print("Data saved.")
print()

# Viz
print(top_nearby_venues_2.shape)
top_nearby_venues_2.head(10)

  0%|          | 0/499 [00:00<?, ?it/s]

Starting download...


100%|██████████| 499/499 [03:09<00:00,  2.63it/s]


Download complete...
Timer started...


  0%|          | 0/499 [00:00<?, ?it/s]

... snooze over!

Starting download...


100%|██████████| 499/499 [03:04<00:00,  2.70it/s]


Download complete...
Timer started...


  0%|          | 0/499 [00:00<?, ?it/s]

... snooze over!

Starting download...


100%|██████████| 499/499 [02:17<00:00,  3.62it/s]


Download complete...
Timer started...


  0%|          | 0/499 [00:00<?, ?it/s]

... snooze over!

Starting download...


100%|██████████| 499/499 [02:16<00:00,  3.67it/s]


Download complete...
Timer started...


  0%|          | 0/499 [00:00<?, ?it/s]

... snooze over!

Starting download...


100%|██████████| 499/499 [02:12<00:00,  3.78it/s]


Download complete...
Timer started...


  0%|          | 0/499 [00:00<?, ?it/s]

... snooze over!

Starting download...


100%|██████████| 499/499 [02:09<00:00,  3.86it/s]


Download complete...
Timer started...


  0%|          | 0/499 [00:00<?, ?it/s]

... snooze over!

Starting download...


100%|██████████| 499/499 [02:13<00:00,  3.74it/s]


Download complete...
Timer started...


  0%|          | 0/499 [00:00<?, ?it/s]

... snooze over!

Starting download...


100%|██████████| 499/499 [02:15<00:00,  3.67it/s]


Download complete...
Timer started...
... snooze over!


All data downloaded!

Data saved.

(19960, 5)


Unnamed: 0,name,categories,lat,lng,H3_INDEX
0,Saturn,Electronics Store,51.222585,6.780338,891fa52b2a7ffff
1,Mr. Phung,Chinese Restaurant,51.222596,6.779392,891fa52b2a7ffff
2,Napapijri,Clothing Store,51.222666,6.779353,891fa52b2a7ffff
3,Gran Caffè Leonardo,Café,51.22257,6.77937,891fa52b2a7ffff
4,SEVENS,Shopping Mall,51.222573,6.779397,891fa52b2a7ffff
0,Takumi,Ramen Restaurant,51.223429,6.788531,891fa52b3d7ffff
1,Tonkatsu Gonta,Japanese Restaurant,51.22349,6.788785,891fa52b3d7ffff
2,Modern Times,Nightclub,51.223359,6.788862,891fa52b3d7ffff
3,Kyoto - Japan Art Deco,Arts & Crafts Store,51.223554,6.788404,891fa52b3d7ffff
4,What's Beef,Burger Joint,51.223753,6.788566,891fa52b3d7ffff
