# Index

[Introduction](#Introduction)<br>
[Data](#Data)<br>
[Methodology](#Methodology)<br>
[Results](#Results)<br>
[Discussion](#Discussion)<br>
[Conclusion](#Conclusion)<br>

# Introduction

I would like to perform a beer trip to Prague in Szech Republic.

I would like to have a map with all Prague districts filled with beer restaurants and bars point.

Each place should be marked according to its Foursquare rating as Gold (highest rating), Silver (medium rating), Bronze (minimal rating) and Green (non rated yet)

This kind of classification will be a very suitable during the trip.

#### Prepare the book

In [1]:
!pip install geocoder

  from cryptography.utils import int_from_bytes
  from cryptography.utils import int_from_bytes
Collecting geocoder
  Downloading geocoder-1.38.1-py2.py3-none-any.whl (98 kB)
[K     |████████████████████████████████| 98 kB 5.9 MB/s  eta 0:00:01
Collecting ratelim
  Downloading ratelim-0.1.6-py2.py3-none-any.whl (4.0 kB)
Installing collected packages: ratelim, geocoder
Successfully installed geocoder-1.38.1 ratelim-0.1.6


In [2]:
import pandas as pd
from geopy.geocoders import Nominatim
from sklearn.cluster import KMeans
import numpy as np
import requests
from bs4 import BeautifulSoup
from pandas import json_normalize
import sys

%matplotlib inline

import matplotlib as mpl
import matplotlib.pyplot as plt

import geocoder

In [3]:
!pip install shapely
from shapely.geometry import Point
from shapely.geometry.polygon import Polygon

  from cryptography.utils import int_from_bytes
  from cryptography.utils import int_from_bytes
Collecting shapely
  Downloading Shapely-1.7.1-cp37-cp37m-manylinux1_x86_64.whl (1.0 MB)
[K     |████████████████████████████████| 1.0 MB 14.7 MB/s eta 0:00:01
[?25hInstalling collected packages: shapely
Successfully installed shapely-1.7.1


In [4]:
!pip install geopandas
import geopandas as gpd

  from cryptography.utils import int_from_bytes
  from cryptography.utils import int_from_bytes
Collecting geopandas
  Downloading geopandas-0.9.0-py2.py3-none-any.whl (994 kB)
[K     |████████████████████████████████| 994 kB 13.9 MB/s eta 0:00:01
Collecting fiona>=1.8
  Downloading Fiona-1.8.19-cp37-cp37m-manylinux1_x86_64.whl (15.3 MB)
[K     |████████████████████████████████| 15.3 MB 18.8 MB/s eta 0:00:01
[?25hCollecting pyproj>=2.2.0
  Downloading pyproj-3.0.1-cp37-cp37m-manylinux2010_x86_64.whl (6.5 MB)
[K     |████████████████████████████████| 6.5 MB 21.8 MB/s eta 0:00:01
Collecting click-plugins>=1.0
  Downloading click_plugins-1.1.1-py2.py3-none-any.whl (7.5 kB)
Collecting cligj>=0.5
  Downloading cligj-0.7.1-py3-none-any.whl (7.1 kB)
Collecting munch
  Downloading munch-2.5.0-py2.py3-none-any.whl (10 kB)
Installing collected packages: click-plugins, cligj, munch, fiona, pyproj, geopandas
Successfully installed click-plugins-1.1.1 cligj-0.7.1 fiona-1.8.19 geopandas-0.9.0 mu

#### Helper functions

In [5]:
# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

# Coordinates retrieve function

def get_coordinates(dataFrame, index_row):
    dict_coordinates = {}
    total_count = len(dataFrame.index)
    current = 0
    errors = 0
    for index, row in dataFrame.iterrows():
        try:
            g = geocoder.arcgis(row[index_row])
            lat = g.json['lat']
            lng = g.json['lng']
            dict_coordinates[index] = [lat, lng]
            current+=1
        except:
            errors+=1
            print ('Failed to get coordinates for {}: {}'.format(index_row, sys.exc_info()[0]))
    
    dataFrame['latitude'] = 0.0
    dataFrame['longitude'] = 0.0
    
    for k, v in dict_coordinates.items():
        dataFrame.loc[k,'latitude']=v[0]
        dataFrame.loc[k,'longitude']=v[1]
        
    print('Done: Total: {} Success: {} Error {}'.format(total_count, current, errors))

#function populate given dataframe with new attributes - rating and url    
def get_ratings(dataFrame):
    total_count = len(dataFrame.index)
    current = 0
    errors = 0
    dict_venues = {}
    for index, row in dataFrame.iterrows():
        try:
            venue_id = row['id']
            url = 'https://api.foursquare.com/v2/venues/{}?client_id={}&client_secret={}&oauth_token={}&v={}'.format(venue_id, CLIENT_ID, CLIENT_SECRET,ACCESS_TOKEN, VERSION)
            results = requests.get(url).json()
            venues = results['response']['venue']
            
            rating = venues['rating'] if 'rating' in venues else -1
            venue_url = venues['url'] if 'url' in venues else ''
            dict_venues[index] = [rating, venue_url]
            current+=1
        except:
            errors+=1
            print ('Failed to get data')
    
    dataFrame['rating'] = 0.0
    dataFrame['url'] = 0.0
    
    for k, v in dict_venues.items():
        dataFrame.loc[k,'rating']=v[0]
        dataFrame.loc[k,'url']=v[1]
        
    print('Done: Total: {} Success: {} Error {}'.format(total_count, current, errors))

#function to map rating to color
def get_color(rating):
    if rating == 'high':
        color = 'gold'
    elif rating == 'medium':
        color = 'silver'
    elif rating == 'low':
        color = 'bronze'
    else:
        color = 'green'
    return color

#function check if given dataFrame object is inside of district
def check_district(dataFrame, cityDataFrame):
    total_count = len(dataFrame.index)
    current = 0
    errors = 0
    dict_venues = {}
    for index, row in dataFrame.iterrows():
        try:
       
            point = Point(row['lng'], row['lat'])
            
            pol = cityDataFrame.loc[cityDataFrame['Name'] == row['district'],'Geometry'].values[0]
            polygon = Polygon(pol)

            dict_venues[index] = polygon.contains(point)
            current+=1
        except:
            errors+=1
            print ('Failed to get data ')
    
        dataFrame.loc[index,'inDistrict'] = True
#     dataFrame['url'] = 0.0
    for k, v in dict_venues.items():
        if not v:
            dataFrame.loc[k,'inDistrict'] = False
#         dataFrame.loc[k,'url']=v[1]
        
    print('Done: Total: {} Success: {} Error {}'.format(total_count, current, errors))

    # create a geojson from a list of dictionaries
    # containing coordinates with the name of the polygon
    # in our case a polygon is a district
    #assert type(data) == list, "The parameter data should be a list of coordinates with a name argument!"
def create_geojson(data):    
    geojson = {
        "type": "FeatureCollection",
        "features": [
        {
            "type": "Feature",
            "geometry" : {
                "type": "Polygon",
                "name": row['Name'],
                "coordinates": [row['Geometry']]
                },
            "properties" : {'name': row['Name']},
            
         } for index, row in data.iterrows()]
    }
    
    return geojson

In [6]:
# The code was removed by Watson Studio for sharing.

# Data

The set of Prague city coordinates included district borders is needed to found an each neighbor "central" point to query nearest beer related objects by 10 km radius.

the open data will be used to get this data: http://opendata.iprpraha.cz/CUR/DTMP/TMMESTSKECASTI_P/WGS_84/TMMESTSKECASTI_P.json

The full set includes a list of coordinates (geometry.coordinates), area (properties.PLOCHA) and name (properties.NAZEV_MC) that will be used to calculate a "central" point and to include Foursquare getting objects inside the radius but outside the particular district.

The final list of all beer places wil include a Foursquare rating based on which the classification will be done. All unrated object rating in the data set will be set as -1(minus one) to avoid NaN values.

In [7]:
#Foursquare related variables
VERSION = '20180605' # Foursquare API version
LIMIT = 100 # A default Foursquare API limit value
RADIUS = 10000

## Getting geo data

#### getting Prague coordinates

In [8]:
address = 'Prague, Czech Republic'

geolocator = Nominatim(user_agent="ny_explorer")
location = geolocator.geocode(address)
prague_latitude = location.latitude
prague_longitude = location.longitude
print('The geograpical coordinate of Prague are {}, {}.'.format(prague_latitude, prague_longitude))

The geograpical coordinate of Prague are 50.0874654, 14.4212535.


#### getting districts of Prague

In [9]:
districts_url = 'http://opendata.iprpraha.cz/CUR/DTMP/TMMESTSKECASTI_P/WGS_84/TMMESTSKECASTI_P.json'
dist_results = requests.get(districts_url).json(encoding = "utf8")
prague_districts = json_normalize(dist_results['features']) 
print('Loaded: {} districts'.format(prague_districts.shape[0]))

Loaded: 57 districts


#### converting district data to Dataframe

In [10]:
result = []
#converting from json to pandas data frame
result.append([
    v['properties']['NAZEV_MC'].lower(),
    v['geometry']['coordinates'][0],
    v['properties']['PLOCHA']] for v in dist_results['features'])
    
df_prague_districts = pd.DataFrame([item for result in result for item in result])
df_prague_districts.columns = ['Name', 'Geometry', 'Area']

#### generating Prague dataframe

In [11]:
df_prague = df_prague_districts.set_index('Name').copy()
df_prague.sort_values('Name', inplace = True)
df_prague.reset_index(inplace=True)
df_prague.shape

(57, 3)

#### calculate central point of each district

In [12]:
get_coordinates(df_prague, 'Name')
df_prague.head()

Done: Total: 57 Success: 57 Error 0


Unnamed: 0,Name,Geometry,Area,latitude,longitude
0,praha 1,"[[14.410891049000043, 50.078674687000046], [14...",5538443.86,50.08728,14.41742
1,praha 10,"[[14.502644277000059, 50.04445519500007], [14....",18599769.32,50.06762,14.46016
2,praha 11,"[[14.54355294800007, 50.03618763800006], [14.5...",9793679.76,50.03178,14.50719
3,praha 12,"[[14.450632163000023, 50.01452735600003], [14....",23317909.03,50.00564,14.40462
4,praha 13,"[[14.320621949000042, 50.04010680700003], [14....",13196802.35,50.05163,14.34231


#### generate geojson for future visualization

In [13]:
district_geo = create_geojson(df_prague)

# Methodology

Collected data is classified according to 4 bins:

the highest rated places - from 75% to max value of rating

the medium - from meat to 75% of rating

the minimal - from 0 to mean rating value

non-rated - equal to -1 rating value

### Process one district - 'Praha 1' as a pilot

In [14]:
query = 'beer' #search query
query_lat = df_prague.loc[df_prague['Name'] == 'praha 1','latitude'].values[0]
query_long = df_prague.loc[df_prague['Name'] == 'praha 1','longitude'].values[0]
name = df_prague.loc[df_prague['Name'] == 'praha 1','Name'].values[0]
print(name, query_lat, query_long)

praha 1 50.08728000000008 14.41742000000005


#### query Foursquare

In [15]:
url = 'https://api.foursquare.com/v2/venues/search?client_id={}&client_secret={}&ll={},{}&oauth_token={}&v={}&query={}&radius={}&limit={}'.format(CLIENT_ID, CLIENT_SECRET, query_lat, query_long,ACCESS_TOKEN, VERSION, query, RADIUS, LIMIT)

#### request data

In [16]:
results = requests.get(url).json()

#### get venues

In [17]:
venues = results['response']['venues']

In [18]:
nearby_venues = pd.json_normalize(venues)  #create venues dataframe

In [19]:
nearby_venues.head()

Unnamed: 0,id,name,categories,referralId,hasPerk,location.address,location.lat,location.lng,location.labeledLatLngs,location.distance,location.postalCode,location.cc,location.city,location.state,location.country,location.formattedAddress,venuePage.id,location.neighborhood,location.crossStreet
0,4d7b811e86cfa1432e7fb5a0,Letná Beer Garden,"[{'id': '4bf58dd8d48988d117941735', 'name': 'B...",v-1618338717,False,Kostelní,50.096031,14.425448,"[{'label': 'display', 'lat': 50.09603117382258...",1130,170 00,CZ,Praha,Hlavní město Praha,Česká republika,"[Kostelní, 170 00 Praha, Česká republika]",,,
1,533830af498e2e6499e78d4e,Czech Beer Museum Prague,"[{'id': '4bf58dd8d48988d190941735', 'name': 'H...",v-1618338717,False,Husova 7,50.084961,14.417916,"[{'label': 'display', 'lat': 50.08496051229967...",260,110 00,CZ,Praha,Hlavní město Praha,Česká republika,"[Husova 7, 110 00 Praha, Česká republika]",90452131.0,,
2,5b950734121384002c00e478,Craft Beer Spot,"[{'id': '56aa371ce4b08b9a8d57356c', 'name': 'B...",v-1618338717,False,Plaská 623/5,50.080581,14.405875,"[{'label': 'display', 'lat': 50.080581, 'lng':...",1111,150 00,CZ,Praha,Hlavní město Praha,Česká republika,"[Plaská 623/5, 150 00 Praha, Česká republika]",514632749.0,Malá Strana,
3,50d9dc09e4b0260d1c11aefe,Beer Spa Bernard Prague,"[{'id': '4bf58dd8d48988d1ed941735', 'name': 'S...",v-1618338717,False,Týn 10,50.088085,14.424127,"[{'label': 'display', 'lat': 50.08808503986637...",487,,CZ,Praha,Hlavní město Praha,Česká republika,"[Týn 10, Praha, Česká republika]",,,
4,5ef22ddd8d0fc400085053e1,Meat Beer,"[{'id': '4bf58dd8d48988d16c941735', 'name': 'B...",v-1618338717,False,Na Příkopě 852/10,50.08429,14.425851,"[{'label': 'display', 'lat': 50.08429, 'lng': ...",688,110 00,CZ,Praha,Hlavní město Praha,Česká republika,"[Na Příkopě 852/10, 110 00 Praha, Česká republ...",,,


#### create dataframe for near venues with filtred columns

In [20]:
# filter columns
filtered_columns = ['id','name', 'categories', 'location.lat', 'location.lng']
nearby_venues =nearby_venues.loc[:, filtered_columns]

# filter the category for each row
nearby_venues['categories'] = nearby_venues.apply(get_category_type, axis=1)

# clean columns
nearby_venues.columns = [col.split(".")[-1] for col in nearby_venues.columns]

nearby_venues.head()

Unnamed: 0,id,name,categories,lat,lng
0,4d7b811e86cfa1432e7fb5a0,Letná Beer Garden,Beer Garden,50.096031,14.425448
1,533830af498e2e6499e78d4e,Czech Beer Museum Prague,History Museum,50.084961,14.417916
2,5b950734121384002c00e478,Craft Beer Spot,Beer Bar,50.080581,14.405875
3,50d9dc09e4b0260d1c11aefe,Beer Spa Bernard Prague,Spa,50.088085,14.424127
4,5ef22ddd8d0fc400085053e1,Meat Beer,Burger Joint,50.08429,14.425851


#### get only Beer Bar, Bar, Beer Garden, Restaurant categories

In [21]:
beer_venues = nearby_venues[(nearby_venues['categories'].isin(['Beer Bar', 'Bar', 'Beer Garden', 'Restaurant']))].copy()

In [22]:
beer_venues['district'] = name

In [23]:
beer_venues.head()

Unnamed: 0,id,name,categories,lat,lng,district
0,4d7b811e86cfa1432e7fb5a0,Letná Beer Garden,Beer Garden,50.096031,14.425448,praha 1
2,5b950734121384002c00e478,Craft Beer Spot,Beer Bar,50.080581,14.405875,praha 1
5,5f691897501fe94186d60c6e,Beer & Beer,Beer Bar,50.078335,14.426868,praha 1
6,52c1ce5911d2d1e4fd8950bb,Prague Beer Museum,Beer Bar,50.074536,14.43661,praha 1
8,5ebc209c3c19fe0008678505,Beer Garden Karlín,Beer Garden,50.093918,14.439669,praha 1


#### check if venue is inside district

In [24]:
check_district(beer_venues, df_prague)

Done: Total: 29 Success: 29 Error 0


In [25]:
beer_venues.head()

Unnamed: 0,id,name,categories,lat,lng,district,inDistrict
0,4d7b811e86cfa1432e7fb5a0,Letná Beer Garden,Beer Garden,50.096031,14.425448,praha 1,False
2,5b950734121384002c00e478,Craft Beer Spot,Beer Bar,50.080581,14.405875,praha 1,False
5,5f691897501fe94186d60c6e,Beer & Beer,Beer Bar,50.078335,14.426868,praha 1,True
6,52c1ce5911d2d1e4fd8950bb,Prague Beer Museum,Beer Bar,50.074536,14.43661,praha 1,False
8,5ebc209c3c19fe0008678505,Beer Garden Karlín,Beer Garden,50.093918,14.439669,praha 1,False


#### delete outlaing venues

In [26]:
beer_venues.drop(beer_venues[beer_venues['inDistrict'] == False].index, inplace = True)
beer_venues.head()

Unnamed: 0,id,name,categories,lat,lng,district,inDistrict
5,5f691897501fe94186d60c6e,Beer & Beer,Beer Bar,50.078335,14.426868,praha 1,True
10,5c18ea09efa82a002c26e77d,Beer Story,Beer Bar,50.08923,14.422927,praha 1,True
11,58dd50378cfe546addd9e2a9,Prague Beer Museum,Beer Bar,50.084971,14.413799,praha 1,True
12,52fe5f8111d2a858a202b3ca,Beer challenge,Beer Garden,50.081594,14.416927,praha 1,True
16,5de032da2401ec00086a78e9,Beer Point,Bar,50.078776,14.429829,praha 1,True


#### get rating and url for each venue
if rating is empty it will be set as -1<br>
if url is empty it will be set as ' '

In [27]:
get_ratings(beer_venues)

Done: Total: 13 Success: 13 Error 0


In [28]:
beer_venues.head()

Unnamed: 0,id,name,categories,lat,lng,district,inDistrict,rating,url
5,5f691897501fe94186d60c6e,Beer & Beer,Beer Bar,50.078335,14.426868,praha 1,True,-1.0,https://beerandbeer.cz
10,5c18ea09efa82a002c26e77d,Beer Story,Beer Bar,50.08923,14.422927,praha 1,True,5.8,
11,58dd50378cfe546addd9e2a9,Prague Beer Museum,Beer Bar,50.084971,14.413799,praha 1,True,7.2,http://www.praguebeermuseum.cz/cz/
12,52fe5f8111d2a858a202b3ca,Beer challenge,Beer Garden,50.081594,14.416927,praha 1,True,-1.0,
16,5de032da2401ec00086a78e9,Beer Point,Bar,50.078776,14.429829,praha 1,True,-1.0,


#### split rest venues to 3 bins: from 0 till meat, from meat till 75%, from 75% till 10

In [29]:
rated_venues = beer_venues.loc[beer_venues['rating'] != -1]

In [30]:
rated_venues.describe()['rating']

count    5.000000
mean     7.420000
std      1.114451
min      5.800000
25%      7.200000
50%      7.200000
75%      8.200000
max      8.700000
Name: rating, dtype: float64

#### split data into 4 bins: from -1.1 till 0, from 0 till mean, from mean till 75%, from 75% till 10

In [31]:
# np.histogram returns 2 values
count, bin_edges = np.histogram(beer_venues['rating'],[-1.1, 0, rated_venues.describe()['rating']['mean'], rated_venues.describe()['rating']['75%'], 10 ] )

print(count) # frequency count
print(bin_edges) # bin ranges

[8 3 0 2]
[-1.1   0.    7.42  8.2  10.  ]


#### generate labels

In [32]:
labels = ['not rated','low','medium','high']
bins = [-1.1,0, rated_venues.describe()['rating']['mean'], rated_venues.describe()['rating']['75%'], rated_venues.describe()['rating']['max']+1 ]
beer_venues['divided'] = pd.cut(beer_venues['rating'], bins = bins, labels = labels)

In [33]:
beer_venues.head()

Unnamed: 0,id,name,categories,lat,lng,district,inDistrict,rating,url,divided
5,5f691897501fe94186d60c6e,Beer & Beer,Beer Bar,50.078335,14.426868,praha 1,True,-1.0,https://beerandbeer.cz,not rated
10,5c18ea09efa82a002c26e77d,Beer Story,Beer Bar,50.08923,14.422927,praha 1,True,5.8,,low
11,58dd50378cfe546addd9e2a9,Prague Beer Museum,Beer Bar,50.084971,14.413799,praha 1,True,7.2,http://www.praguebeermuseum.cz/cz/,low
12,52fe5f8111d2a858a202b3ca,Beer challenge,Beer Garden,50.081594,14.416927,praha 1,True,-1.0,,not rated
16,5de032da2401ec00086a78e9,Beer Point,Bar,50.078776,14.429829,praha 1,True,-1.0,,not rated


# Results

### Going through all districts

#### getting all venues from FourSquare

#### check if venue is located inside of the district and mark it as False otherwise

In [34]:
all_venues = pd.DataFrame()
for index, row in df_prague.iterrows():
    query_lat = row['latitude']
    query_long = row['longitude']
    name = row['Name']
    
    url = 'https://api.foursquare.com/v2/venues/search?client_id={}&client_secret={}&ll={},{}&oauth_token={}&v={}&query={}&radius={}&limit={}'.format(CLIENT_ID, CLIENT_SECRET, query_lat, query_long,ACCESS_TOKEN, VERSION, query, RADIUS, LIMIT)
     
    #request data
    tmp_results = requests.get(url).json()
    #get venues
    tpm_venus = tmp_results['response']['venues']
    
    #create venues dataframe
    tmp_nearby_venues = pd.json_normalize(tpm_venus)
    
    # filter columns
    filtered_columns = ['id','name','categories', 'location.lat', 'location.lng']
    tmp_nearby_venues =tmp_nearby_venues.loc[:, filtered_columns]
    
    # filter the category for each row
    tmp_nearby_venues['categories'] = tmp_nearby_venues.apply(get_category_type, axis=1)

    # clean columns
    tmp_nearby_venues.columns = [col.split(".")[-1] for col in tmp_nearby_venues.columns]
    
    tmp_beer_venues = tmp_nearby_venues[(tmp_nearby_venues['categories'].isin(['Beer Bar', 'Bar', 'Beer Garden', 'Restaurant']))].copy()
    tmp_beer_venues['district'] = name
    
    if not tmp_beer_venues.empty:
        check_district(tmp_beer_venues, df_prague)
        all_venues = pd.concat([all_venues, tmp_beer_venues])

Done: Total: 29 Success: 29 Error 0
Done: Total: 27 Success: 27 Error 0
Done: Total: 30 Success: 30 Error 0
Done: Total: 28 Success: 28 Error 0
Done: Total: 29 Success: 29 Error 0
Done: Total: 29 Success: 29 Error 0
Done: Total: 29 Success: 29 Error 0
Done: Total: 4 Success: 4 Error 0
Done: Total: 28 Success: 28 Error 0
Done: Total: 28 Success: 28 Error 0
Done: Total: 30 Success: 30 Error 0
Done: Total: 30 Success: 30 Error 0
Done: Total: 9 Success: 9 Error 0
Done: Total: 2 Success: 2 Error 0
Done: Total: 10 Success: 10 Error 0
Done: Total: 28 Success: 28 Error 0
Done: Total: 30 Success: 30 Error 0
Done: Total: 28 Success: 28 Error 0
Done: Total: 29 Success: 29 Error 0
Done: Total: 30 Success: 30 Error 0
Done: Total: 28 Success: 28 Error 0
Done: Total: 27 Success: 27 Error 0
Done: Total: 9 Success: 9 Error 0
Done: Total: 9 Success: 9 Error 0
Done: Total: 29 Success: 29 Error 0
Done: Total: 28 Success: 28 Error 0
Done: Total: 29 Success: 29 Error 0
Done: Total: 29 Success: 29 Error 0
Do

In [35]:
all_venues.head()

Unnamed: 0,id,name,categories,lat,lng,district,inDistrict
0,4d7b811e86cfa1432e7fb5a0,Letná Beer Garden,Beer Garden,50.096031,14.425448,praha 1,False
2,5b950734121384002c00e478,Craft Beer Spot,Beer Bar,50.080581,14.405875,praha 1,False
5,5f691897501fe94186d60c6e,Beer & Beer,Beer Bar,50.078335,14.426868,praha 1,True
6,52c1ce5911d2d1e4fd8950bb,Prague Beer Museum,Beer Bar,50.074536,14.43661,praha 1,False
8,5ebc209c3c19fe0008678505,Beer Garden Karlín,Beer Garden,50.093918,14.439669,praha 1,False


In [54]:
all_beer_venues = all_venues.copy()
all_beer_venues.head()

Unnamed: 0,id,name,categories,lat,lng,district,inDistrict
0,4d7b811e86cfa1432e7fb5a0,Letná Beer Garden,Beer Garden,50.096031,14.425448,praha 1,False
2,5b950734121384002c00e478,Craft Beer Spot,Beer Bar,50.080581,14.405875,praha 1,False
5,5f691897501fe94186d60c6e,Beer & Beer,Beer Bar,50.078335,14.426868,praha 1,True
6,52c1ce5911d2d1e4fd8950bb,Prague Beer Museum,Beer Bar,50.074536,14.43661,praha 1,False
8,5ebc209c3c19fe0008678505,Beer Garden Karlín,Beer Garden,50.093918,14.439669,praha 1,False


#### exclude venues that aren't in district

In [70]:
processed_venues = all_beer_venues[all_beer_venues['inDistrict'] == True].copy()

In [71]:
processed_venues.head()

Unnamed: 0,id,name,categories,lat,lng,district,inDistrict
5,5f691897501fe94186d60c6e,Beer & Beer,Beer Bar,50.078335,14.426868,praha 1,True
10,5c18ea09efa82a002c26e77d,Beer Story,Beer Bar,50.08923,14.422927,praha 1,True
11,58dd50378cfe546addd9e2a9,Prague Beer Museum,Beer Bar,50.084971,14.413799,praha 1,True
12,52fe5f8111d2a858a202b3ca,Beer challenge,Beer Garden,50.081594,14.416927,praha 1,True
16,5de032da2401ec00086a78e9,Beer Point,Bar,50.078776,14.429829,praha 1,True


#### get rating and url for each venue
if rating is empty it will be set as -1 <br>
if url is empty it will be set as ' '

In [73]:
get_ratings(processed_venues)

Done: Total: 39 Success: 39 Error 0


In [74]:
processed_venues.head()

Unnamed: 0,id,name,categories,lat,lng,district,inDistrict,rating,url
5,5f691897501fe94186d60c6e,Beer & Beer,Beer Bar,50.078335,14.426868,praha 1,True,6.2,http://beertime.pub
10,5c18ea09efa82a002c26e77d,Beer Story,Beer Bar,50.08923,14.422927,praha 1,True,5.8,
11,58dd50378cfe546addd9e2a9,Prague Beer Museum,Beer Bar,50.084971,14.413799,praha 1,True,7.2,http://www.praguebeermuseum.cz/cz/
12,52fe5f8111d2a858a202b3ca,Beer challenge,Beer Garden,50.081594,14.416927,praha 1,True,-1.0,
16,5de032da2401ec00086a78e9,Beer Point,Bar,50.078776,14.429829,praha 1,True,-1.0,


#### split rest venues to 3 bins: from 0 till meat, from meat till 75%, from 75% till 10

In [75]:
all_rated_venues = processed_venues.loc[processed_venues['rating'] != -1]

In [76]:
all_rated_venues.describe()['rating']

count    25.000000
mean      7.864000
std       1.100333
min       5.800000
25%       7.200000
50%       8.200000
75%       8.800000
max       9.100000
Name: rating, dtype: float64

#### split data into 4 bins: from -1.1 till 0, from 0 till mean, from mean till 75%, from 75% till 10

In [77]:
# np.histogram returns 2 values
count, bin_edges = np.histogram(all_rated_venues['rating'],[-1.1, 0, all_rated_venues.describe()['rating']['mean'], all_rated_venues.describe()['rating']['75%'], 10 ] )

print(count) # frequency count
print(bin_edges) # bin ranges

[ 0 11  7  7]
[-1.1    0.     7.864  8.8   10.   ]


### Data visualisation

#### generate labels

In [78]:
labels = ['not rated','low','medium','high']
bins = [-1.1,0, all_rated_venues.describe()['rating']['mean'], all_rated_venues.describe()['rating']['75%'], all_rated_venues.describe()['rating']['max']+1 ]

In [79]:
processed_venues['divided'] = pd.cut(processed_venues['rating'], bins = bins, labels = labels)

In [80]:
processed_venues.head()

Unnamed: 0,id,name,categories,lat,lng,district,inDistrict,rating,url,divided
5,5f691897501fe94186d60c6e,Beer & Beer,Beer Bar,50.078335,14.426868,praha 1,True,6.2,http://beertime.pub,low
10,5c18ea09efa82a002c26e77d,Beer Story,Beer Bar,50.08923,14.422927,praha 1,True,5.8,,low
11,58dd50378cfe546addd9e2a9,Prague Beer Museum,Beer Bar,50.084971,14.413799,praha 1,True,7.2,http://www.praguebeermuseum.cz/cz/,low
12,52fe5f8111d2a858a202b3ca,Beer challenge,Beer Garden,50.081594,14.416927,praha 1,True,-1.0,,not rated
16,5de032da2401ec00086a78e9,Beer Point,Bar,50.078776,14.429829,praha 1,True,-1.0,,not rated


In [81]:
!pip install folium
import folium

print('Folium installed and imported!')

  from cryptography.utils import int_from_bytes
  from cryptography.utils import int_from_bytes
Collecting folium
  Downloading folium-0.12.1-py2.py3-none-any.whl (94 kB)
[K     |████████████████████████████████| 94 kB 5.1 MB/s  eta 0:00:01
Collecting branca>=0.3.0
  Downloading branca-0.4.2-py3-none-any.whl (24 kB)
Installing collected packages: branca, folium
Successfully installed branca-0.4.2 folium-0.12.1
Folium installed and imported!


In [135]:
map_prague = folium.Map(tiles='{}'.format(FOLIUM_TILES), 
                        attr='Mapbox Bright',
                        width=1200,height=700,
                        location=[prague_latitude, prague_longitude], zoom_start=12)

In [132]:
fgv = folium.FeatureGroup(name="Beer")

for lat, lng, color, name, district, divided in zip(processed_venues['lat'], processed_venues['lng'], processed_venues['divided'], processed_venues['name'], processed_venues['district'], processed_venues['divided']):
    beer_label = '{}, {} \n rating: {}'.format(name, district, divided)
    beer_label = folium.Popup(beer_label, parse_html=True)
    fgv.add_child(folium.CircleMarker(
            [lat, lng],
            radius=7,
            popup=beer_label,
            color=get_color(divided),
            fill=True,
            fill_color=get_color(divided),
            fill_opacity=0.7,
            parse_html=False))

In [136]:
fgp = folium.FeatureGroup(name="District")

style_function = lambda x: {'fillColor': '#ffffff', 
                            'color':'#000000', 
                            'fillOpacity': 0.3, 
                            'weight': 0.1}
highlight_function = lambda x: {'fillColor': '#000000', 
                                'color':'#000000', 
                                'fillOpacity': 0.50, 
                                'weight': 0.1}
fgp.add_child(folium.GeoJson(
    district_geo,
    style_function = style_function,
    highlight_function=highlight_function,
    tooltip=folium.GeoJsonTooltip(
        fields=['name']),
    name='District'))
            

<folium.map.FeatureGroup at 0x7ff5ef61d1d0>

In [134]:
#map_prague.fit_bounds(map_prague.get_bounds())

map_prague.add_child(fgp)
map_prague.add_child(fgv)
map_prague.add_child(folium.LayerControl())

map_prague

# Discussion

The interactive map above shows all beer related venues in Prague according to Foursquare data.
As we can see there is a big amount of venues are not rated yet, which gives a chance to any vistor to rate it.

Probably the visitor of Prague don't actively use the Foursquare.


As expected the beer venues concentrate near to central city's part, but there are some interesting places in bedroom communities 
quite far from the center.

According to the collected data and choosen beans the low rating is less then 7.864, and minimum rating is 5.8
Such rating is hign for a many cities, so to classify correctly all beer venues the more "beans" is needed.

Anyway above map gives a start for Prague's beer venues exploring.

# Conclusion

The main goal - collect all be