# Introduction/Business Problem

Where to go for Pizza?

Let’s say a tourist or a citizen wants to have some pizza. The good idea can be to go to a place where several options of Pizza places are located.
The goal of this project is to analyze Pizza restaurants in Moscow and Saint-Petersburg, two biggest cities in Russia, and propose places to go for pizza-seekers.

Target audience: People who want to find a place where different Pizza places can be found nearby.

# Data

The FourSquare API will be used to collect data about locations of Pizza restaurants in 2 major cities in Russia.

# Methodology

The main goal in this analysis is to define a place where Pizza restaurants show the highest density.
The Four Square API will be used to get venues in the cities. 

CategoryID (4bf58dd8d48988d1ca941735) will be set to show only Pizza places. Unfortunately, Foursquare limits observations to maximum of 100 venues per query. This request will be repeated for the both studied cities. Then we get  top 100 venues, save them and plot them on the map for visual representation.

Next, to get an indicator of the density of Pizza Places, I calculate a centroid of the venues to get the mean longitude and latitude values. Then the mean of the Euclidean distance is calculated from each venue to the mean coordinates. This indicator will be used for evaluation.


In [1]:
import numpy as np # library to handle data in a vectorized manner

import pandas as pd # library for data analsysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe
import folium # map rendering library

print('Libraries imported.')

Libraries imported.


In [2]:
# Definition of Foursquare Credentials and Version
CLIENT_ID = 'N25ZLNNGVAH2LZA52PCTPL3EYS05J0GQV2AVUTYJM4UPLYR0' # Foursquare ID
CLIENT_SECRET = 'XVMD5MHQA5BADPKHC0LR2CBDDER4X4QN2BTJ20RPAI4NUTLY' # Foursquare Secret
VERSION = '20200331' # Foursquare API version

In [3]:
LIMIT = 100
cities = ['Moscow', 'Saint-Petersburg']
results = {}
for city in cities:
    url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&near={}&limit={}&categoryId={}'.format(
        CLIENT_ID, 
        CLIENT_SECRET, 
        VERSION, 
        city,
        LIMIT,
        "4bf58dd8d48988d1ca941735") # PIZZA PLACE CATEGORY ID
    results[city] = requests.get(url).json()

In [7]:
df_venues={}
for city in cities:
    venues = json_normalize(results[city]['response']['groups'][0]['items'])
    df_venues[city] = venues[['venue.name', 'venue.location.address', 'venue.location.lat', 'venue.location.lng']]
    df_venues[city].columns = ['Name', 'Address', 'Lat', 'Lng']

In [11]:
# The FourSquare API returns only 100 plaes. First, let's check the total amount of Pizza places 

maps = {}
for city in cities:
    city_lat = np.mean([results[city]['response']['geocode']['geometry']['bounds']['ne']['lat'],
                        results[city]['response']['geocode']['geometry']['bounds']['sw']['lat']])
    city_lng = np.mean([results[city]['response']['geocode']['geometry']['bounds']['ne']['lng'],
                        results[city]['response']['geocode']['geometry']['bounds']['sw']['lng']])
    maps[city] = folium.Map(location=[city_lat, city_lng], zoom_start=11)

    # add markers to map
    for lat, lng, label in zip(df_venues[city]['Lat'], df_venues[city]['Lng'], df_venues[city]['Name']):
        label = folium.Popup(label, parse_html=True)
        folium.CircleMarker(
            [lat, lng],
            radius=5,
            popup=label,
            color='blue',
            fill=True,
            fill_color='#3186cc',
            fill_opacity=0.7,
            parse_html=False).add_to(maps[city])  
    print(f"Total number of pizza places in {city} = ", results[city]['response']['totalResults'])
    print("Showing Top 100")

Total number of pizza places in Moscow =  184
Showing Top 100
Total number of pizza places in Saint-Petersburg =  148
Showing Top 100


In [12]:
maps[cities[0]]   # Moscow

In [13]:
maps[cities[1]]   # Saint-Petersburg

It's not obvious from observing the maps where is the most dense Pizza places.

To measure the density let's use some statistcs. The mean location of the pizza places should be similar to one another if they are close to each other and the opposite if not. 

Next take the average of the distance of the venues to the mean coordinates.

In [14]:
maps = {}
for city in cities:
    city_lat = np.mean([results[city]['response']['geocode']['geometry']['bounds']['ne']['lat'],
                        results[city]['response']['geocode']['geometry']['bounds']['sw']['lat']])
    city_lng = np.mean([results[city]['response']['geocode']['geometry']['bounds']['ne']['lng'],
                        results[city]['response']['geocode']['geometry']['bounds']['sw']['lng']])
    maps[city] = folium.Map(location=[city_lat, city_lng], zoom_start=11)
    venues_mean_coor = [df_venues[city]['Lat'].mean(), df_venues[city]['Lng'].mean()] 
    # add markers to map
    for lat, lng, label in zip(df_venues[city]['Lat'], df_venues[city]['Lng'], df_venues[city]['Name']):
        label = folium.Popup(label, parse_html=True)
        folium.CircleMarker(
            [lat, lng],
            radius=5,
            popup=label,
            color='blue',
            fill=True,
            fill_color='#3186cc',
            fill_opacity=0.7,
            parse_html=False).add_to(maps[city])
        folium.PolyLine([venues_mean_coor, [lat, lng]], color="green", weight=1.5, opacity=0.5).add_to(maps[city])
    
    label = folium.Popup("Mean Co-ordinate", parse_html=True)
    folium.CircleMarker(
        venues_mean_coor,
        radius=10,
        popup=label,
        color='green',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(maps[city])

    print(city)
    print("Mean Distance from Mean coordinates")
    print(np.mean(np.apply_along_axis(lambda x: np.linalg.norm(x - venues_mean_coor),1,df_venues[city][['Lat','Lng']].values)))

Moscow
Mean Distance from Mean coordinates
0.09067859920593385
Saint-Petersburg
Mean Distance from Mean coordinates
0.06437860499704157


In [15]:
maps[cities[0]]

In [16]:
maps[cities[1]]