## Tracking the most attractive spots for food truck fast food business in Skopje, North Macedonia

### Business Problem Introduction

In developing countries, with still underdeveloped service industry there is a great potential in tapping into the underserved sectors and extract additional profits from being the first market entrants. Skopje, the capital city of North Macedonia is one such territory, where most of the fast food industry is based on traditional places such as restaurants, fast food chain joints and bakeries. 
To stimulate the operationalization of different types of fast food ventures, this report aims to identify the potential of food truck type fast food ventures through tracking the spots where aforementioned trucks would serve their customers. To the author knowledge, they are currently no fast food trucks in Skopje, so this analysis is explorative in nature i.e. providing motivation and marketing sense to the potential of this particular industry.
The author believes that the advent of freely available geolocation data makes the market research easier and more effective; by leveraging existing venue mappings, food truck owners and no longer required to browse through detailed maps of every neighborhood or trial-and-error locations to test their attractiveness. Instead, a script could be written that, based on specified criteria, could access, analyze and visualize spots that would be attractive to the business at hand. 


### Methodology & Data

The specific task at hand requires some industry-specific knowledge that would be utilized to compile a set of criteria used for the analysis of the specific spots. First off, the fast food trucks do not need specific business hours that could virtually be open 24/7 as long as they are close and easily accessible to potential customer base. To that end, as the hours of the day change, so do the venues in which people spend their time. For example, for the first shift parks and offices may be the most attractive, while for the later in the day (third shift) bars and clubs may be more suitable. For this purposes, this report will analyze and conclude on spots for all three shifts, divided in particular:
1.	09 AM – 16 PM (based around offices and parks)
2.	16 PM – 20 PM (residential areas and schools)
3.	20 PM – O2 AM (bars and dance clubs) 

This task will require mapping all the spots, then segmenting them based on their venues tag and lastly, clustering the areas in which the spots are more prevalent. For the last part, a specific types of clustering algorithm called DBSCAN will be used.
As for the data, the authors will mainly use Foursquare API to extract the venues from specific neighborhoods in Skopje, with an additional dataset consisting of the neighborhood names and their coordinates. This dataset is created using a script that extracts the information from Wikipedia page on Skopje neighborhoods (link in Macedonian available [here](https://mk.wikipedia.org/wiki/%D0%A1%D0%BF%D0%B8%D1%81%D0%BE%D0%BA_%D0%BD%D0%B0_%D1%81%D0%BA%D0%BE%D0%BF%D1%81%D0%BA%D0%B8_%D0%BD%D0%B0%D1%81%D0%B5%D0%BB%D0%B1%D0%B8_%D0%B8_%D0%BC%D0%B0%D0%B0%D0%BB%D0%B0)), and where needed, manually searching and inputting from other sources. 
The data will then be used to firstly visualize the neighborhoods, select appropriate venues (depending on the criteria explained above) and conclude on the specific spots where a truck could be placed in a specific shift. 


## Importing libraries & Data

In [1]:
import pandas as pd
import numpy as np 
import requests 

import json # library to handle JSON files

from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans
import folium # map rendering library
from bs4 import BeautifulSoup

import geocoder

In [2]:
# Data

df = pd.read_excel('skopje_hoods.xlsx')
df.head()

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude
0,Gazi Baba,Avtokomanda,42.004167,21.465278
1,Aerodrom,Aerodrom,41.981881,21.46839
2,Gazi Baba,Butel,42.030133,21.442405
3,Gjorche Petrov,Vlae,42.0077,21.3755
4,Centar,Vodno,41.989722,21.413333


### Map of Skopje Neighborhoods

In [5]:
address = 'Skopje, North Macedonia'
geolocator = Nominatim(user_agent="sk_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Skopje are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Skopje are 41.9960924, 21.4316495.


In [6]:
# create map of New York using latitude and longitude values
Skopje_map = folium.Map(location=[latitude, longitude], zoom_start=10)

# add markers to map
for lat, lng, borough, neighborhood in zip(df['Latitude'], df['Longitude'], df['Borough'], df['Neighborhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(Skopje_map)  
    
Skopje_map

### Define foursquare credentials and create functions for extracting data

In [7]:
CLIENT_ID = 'ZBH2Z0202NLDMAASF10XT1QS0CVEF2LCGF3Z4DGUXBKD4KCE' 
CLIENT_SECRET = '5KL0G531E0EF2NLPGY2MIX5ZOVFJXRQF4FX4UBJMX3PVERKB' 
VERSION = '20180605' 
Limit = 100 

In [8]:
# Functions that harvest data for nearby venues from foursquare json file

def getNearbyVenues(names, latitudes, longitudes, radius=750):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):   
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            Limit)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [9]:
# Get venues for all listed neighborhoods

skopje_venues = getNearbyVenues(names=df['Neighborhood'],latitudes = df['Latitude'],longitudes=df['Longitude'])
print('There are {} uniques categories.'.format(len(skopje_venues['Venue Category'].unique())))


There are 187 uniques categories.


In [44]:
# Get the number of categories per neighborhood

venues_by_hood = skopje_venues.groupby(["Neighborhood"]).count().reset_index()
venues_by_hood[['Neighborhood','Venue Category']].sort_values('Venue Category',ascending=False).set_index('Neighborhood')

Unnamed: 0_level_0,Venue Category
Neighborhood,Unnamed: 1_level_1
Debar Maalo,100
Prolet,100
Kapishtec,100
Karposh 1,100
Karposh 3,100
Jane Sandanski,81
Karposh 2,81
Karposh 4,79
Aerodrom,65
Taftalidze,65


In [27]:
# one hot encoding
skopje_onehot = pd.get_dummies(skopje_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
skopje_onehot['Neighborhood'] = skopje_venues['Neighborhood'] 

venues_grouped = skopje_onehot.groupby('Neighborhood').sum().reset_index()

venues_grouped.head()

Unnamed: 0,Neighborhood,Accessories Store,Airport Lounge,American Restaurant,Art Gallery,Arts & Crafts Store,Athletics & Sports,Auto Garage,Automotive Shop,BBQ Joint,...,Tennis Court,Theater,Toy / Game Store,Track,Trail,Tree,Turkish Restaurant,Water Park,Women's Store,Zoo
0,Aerodrom,0,0,0,0,0,0,0,0,3,...,0,0,0,0,0,0,0,0,0,0
1,Avtokomanda,0,0,0,0,0,2,0,0,0,...,0,0,0,1,0,1,0,0,0,0
2,Butel,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,Chair,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,2,0,0,0
4,Crnice,0,0,0,1,0,0,1,0,0,...,1,0,0,0,0,0,1,0,0,0


### Extract the top 10 venues in each neighborhood

In [41]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [42]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
venues_sorted = pd.DataFrame(columns=columns)
venues_sorted['Neighborhood'] = venues_grouped['Neighborhood']

for ind in np.arange(venues_grouped.shape[0]):
    venues_sorted.iloc[ind, 1:] = return_most_common_venues(venues_grouped.iloc[ind, :], num_top_venues)

venues_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Aerodrom,Café,Park,Cosmetics Shop,Supermarket,Grocery Store,Gym / Fitness Center,BBQ Joint,Bakery,Bar,Restaurant
1,Avtokomanda,Pizza Place,Athletics & Sports,Park,Market,Bakery,Forest,Nature Preserve,Department Store,Plaza,Cocktail Bar
2,Butel,Lounge,Bakery,Restaurant,Furniture / Home Store,Convenience Store,Gun Range,Market,Shopping Mall,Casino,Dive Spot
3,Chair,Comfort Food Restaurant,Turkish Restaurant,Market,Furniture / Home Store,Café,Park,Eastern European Restaurant,Historic Site,Dessert Shop,Soccer Field
4,Crnice,Market,Hotel,Café,Gym / Fitness Center,Fast Food Restaurant,Diner,Clothing Store,Bakery,Hostel,Food Court


### Clustering 

In [43]:
# Make the dataframe ready for clustering

venues_grouped_clus = venues_grouped.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=3, random_state=0).fit(venues_grouped_clus)

# add clustering labels to the dataframe
venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

# merge the original datarame with to sorted venues dataset 
df_full = df.join(venues_sorted.set_index('Neighborhood'), on='Neighborhood')

### Visualize the clusters

In [34]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(5)
ys = [i + x + (i*x)**2 for i in range(5)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(df_full['Latitude'], df_full['Longitude'], df_full['Neighborhood'], df_full['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[int(cluster)-1],
        fill=True,
        fill_color=rainbow[int(cluster)-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

### Examine clusters

In [39]:
# Cluster 1
df_full.loc[df_full['Cluster Labels'] == 0, df_full.columns[[1] + list(range(5, df_full.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Avtokomanda,Pizza Place,Athletics & Sports,Park,Market,Bakery,Forest,Nature Preserve,Department Store,Plaza,Cocktail Bar
2,Butel,Lounge,Bakery,Restaurant,Furniture / Home Store,Convenience Store,Gun Range,Market,Shopping Mall,Casino,Dive Spot
3,Vlae,Grocery Store,Café,Auto Garage,Restaurant,Soccer Field,Optical Shop,Pizza Place,Department Store,Dessert Shop,Dive Spot
6,Drachevo,Pizza Place,Food & Drink Shop,Farmers Market,Pharmacy,Dessert Shop,Basketball Court,Bar,Camera Store,BBQ Joint,Café
7,Gjorche Petrov,Bakery,Market,Cosmetics Shop,Recreation Center,Ice Cream Shop,Bus Station,Dessert Shop,Soccer Field,Indoor Play Area,Automotive Shop
8,Zelezara,Forest,Market,Auto Garage,Bistro,Eastern European Restaurant,Zoo,Doner Restaurant,Farmers Market,Fair,Event Space
9,Zlokukjani,Bar,National Park,Bus Station,Park,Bus Stop,Multiplex,Supermarket,Event Space,English Restaurant,Electronics Store
16,Kisela Voda,Café,Restaurant,Market,Park,Grocery Store,Supermarket,Fast Food Restaurant,Bus Station,Bookstore,Laser Tag
17,Kisela Jabuka,Gaming Cafe,Bakery,Food & Drink Shop,Market,Betting Shop,Bus Station,Cupcake Shop,Dance Studio,Farmers Market,Costume Shop
18,Kozle,Park,Bakery,Gym / Fitness Center,Fast Food Restaurant,Café,BBQ Joint,Burger Joint,Electronics Store,Food & Drink Shop,Restaurant


In [37]:
# cluster 2
df_full.loc[df_full['Cluster Labels'] == 1, df_full.columns[[1] + list(range(5, df_full.shape[1]))]]


In [38]:
#cluster 3
df_full.loc[df_full['Cluster Labels'] == 2, df_full.columns[[1] + list(range(5, df_full.shape[1]))]]
