### PROJECT - opening of new catering distributor centers in Minsk, Belarus

### Introduction/Business Problem

One distribution company decided to open its representative office in Minsk to sell various products for catering: kitchen and restaurant equipment, reusable and disposable tableware. In addition, it imports and sells premium coffee, tea and various snacks, etc. It is expected that about 5 warehouses will be opened in the city, from which goods will be delivered to the nearest points of sale. 

The objective of the study is to identify areas for each of the 5 points that will optimally correspond to the logistics tasks.

### Data

- Foursquare API - to get the venue data from location
- geolocator librarys - to update zip for venues that not represented in Foursquare
- Folium library - to plot the geo data
- sklearn - To do clustering on the data
- panda, nampu - python libraries to manupulate with data

All these services/libraries will be used to fetch and analyze existing foodservice points in Mins city where customer's firm can apply its business plans

In [1]:
MINSK_LATITUDE = '53.893009'
MINSK_LONGITUDE = '27.567444'
print('The geograpical coordinates of MINSK are {}, {}.'.format(MINSK_LATITUDE, MINSK_LONGITUDE))

The geograpical coordinates of MINSK are 53.893009, 27.567444.


##### install all needed libraris

In [None]:
!conda install -c conda-forge folium

##### import all needed libraris that we use

In [2]:
import folium

let's check that we have correctly indicated our coordinates and we can visualize the city

In [3]:
minsk_map = folium.Map(location = [MINSK_LATITUDE, MINSK_LONGITUDE], zoom_start = 13)
folium.Marker([MINSK_LATITUDE, MINSK_LONGITUDE]).add_to(minsk_map)
minsk_map.save("Minsk Map.html")
minsk_map

Specify main constatns for foursquare requests and define get_category_type method:

In [8]:
CLIENT_ID = 'KU4UKJC3QPGCV4LP3QO5HZ4UORKKBSPL5BT0VZXAGTZ14KQQ' #'client_ID'
CLIENT_SECRET = 'RVUUAWXWQL5JFYVJRLVL1Y5OAFW3VA2TVYOZRN4CRIA0HSV1' #'client_secret'
RADIUS = 15000 # 15 Km
NO_OF_VENUES = 100
VERSION = '20180605' # '20190612'

def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

Fetch all venues in Minsk in nearest 15 KM

In [10]:
import numpy as np
import pandas as pd

import matplotlib.pyplot as plt
import matplotlib.cm as cm
import matplotlib.colors as colors

from pandas.io.json import json_normalize
import requests

pd.set_option('display.max_rows', None)

offset = 0
total_venues = 0
foursquare_venues = pd.DataFrame() # columns = ['name', 'categories', 'lat', 'lng']

while (True):
    url = ('https://api.foursquare.com/v2/venues/explore?client_id={}'
           '&client_secret={}&v={}&ll={},{}&radius={}&limit={}&offset={}').format(CLIENT_ID, 
                                                                        CLIENT_SECRET, 
                                                                        VERSION, 
                                                                        MINSK_LATITUDE, 
                                                                        MINSK_LONGITUDE, 
                                                                        RADIUS,
                                                                        NO_OF_VENUES,
                                                                        offset)
    result = requests.get(url).json()
    venues_fetched = len(result['response']['groups'][0]['items'])
    total_venues = total_venues + venues_fetched
    print("Total {} venues fetched within a total radius of {} Km".format(venues_fetched, RADIUS/1000))

    venues = result['response']['groups'][0]['items']
    venues = json_normalize(venues)

    # use only needed columns
    filtered_columns = ['venue.name', 'venue.location.address', 'venue.location.lat',
       'venue.location.lng',
       'venue.location.distance',
       'venue.location.formattedAddress', 'venue.categories',
       'venue.location.postalCode'
    ]
    venues = venues.loc[:, filtered_columns]

    # Filter the category for each row
    venues['venue.categories'] = venues.apply(get_category_type, axis = 1)

    foursquare_venues = pd.concat([foursquare_venues, venues], axis = 0, sort = False)
    
    if (venues_fetched < 100):
        break
    else:
        offset = offset + 100

foursquare_venues = foursquare_venues.reset_index(drop = True)
print("\nTotal {} venues fetched".format(total_venues))

Total 100 venues fetched within a total radius of 15.0 Km
Total 100 venues fetched within a total radius of 15.0 Km
Total 22 venues fetched within a total radius of 15.0 Km

Total 222 venues fetched


show our result

In [11]:
foursquare_venues

Unnamed: 0,venue.name,venue.location.address,venue.location.lat,venue.location.lng,venue.location.distance,venue.location.formattedAddress,venue.categories,venue.location.postalCode
0,Yoga Place,"Октябрьская, 16",53.88989,27.574825,595,"[Октябрьская, 16, Мінск, Беларусь]",Yoga Studio,
1,Moby Dick Gym,"Октябрьская ул., 16",53.891044,27.57211,376,"[Октябрьская ул., 16, Мінск, Беларусь]",Gym / Fitness Center,
2,NEWTONLABS,"ул. Кирова, 19",53.896936,27.558829,714,"[ул. Кирова, 19, Мінск, Беларусь]",IT Services,
3,Silver Screen Cinemas,"Бобруйская ул., 6",53.890475,27.55381,937,"[Бобруйская ул., 6, Мінск, 220030, Беларусь]",Multiplex,220030.0
4,Kew London Bar,проспект Независимости 18,53.89958,27.557199,993,"[проспект Независимости 18, Мінск, Беларусь]",Bar,
5,Grand Café,"ул. Ленина, 2",53.902347,27.556641,1258,"[ул. Ленина, 2 (Интернациональная ул.), Мінск,...",French Restaurant,
6,Spa Beijing Hotel,Гостиница «Пекин»,53.892471,27.577848,685,"[Гостиница «Пекин» (Красноармейская ул., 36), ...",Spa,
7,Svobody 4,"пл. Свободы, 4",53.904034,27.554967,1475,"[пл. Свободы, 4, Мінск, Беларусь]",Wine Bar,
8,Burgerlab,ул. Октябрьская 19к4,53.891083,27.573041,425,"[ул. Октябрьская 19к4, Мінск, Беларусь]",Burger Joint,
9,Vetka-Kvetka Flower Bar,"ул. Янки Купалы, 25",53.905385,27.561948,1424,"[ул. Янки Купалы, 25 (Интернациональная ул.), ...",Flower Shop,


As we can see not all items have postalCode. let's update them using geopy librarys knowing lat and lng

In [13]:
import geopy
import pandas as pd


geolocator = geopy.Nominatim(user_agent='my-application')


def update_zipcode(lat, lng, current_code):
    if pd.notna(current_code):
        return current_code
    else:
        location = geolocator.reverse((lat, lng))
        try:
            return location.raw['address']['postcode']
        except Exception as e:
            return current_code
            


foursquare_venues['venue.location.postalCode'] = foursquare_venues.apply(lambda x: update_zipcode(x['venue.location.lat'], x['venue.location.lng'], x['venue.location.postalCode']), axis=1)
foursquare_venues


Unnamed: 0,venue.name,venue.location.address,venue.location.lat,venue.location.lng,venue.location.distance,venue.location.formattedAddress,venue.categories,venue.location.postalCode
0,Yoga Place,"Октябрьская, 16",53.88989,27.574825,595,"[Октябрьская, 16, Мінск, Беларусь]",Yoga Studio,220030.0
1,Moby Dick Gym,"Октябрьская ул., 16",53.891044,27.57211,376,"[Октябрьская ул., 16, Мінск, Беларусь]",Gym / Fitness Center,220030.0
2,NEWTONLABS,"ул. Кирова, 19",53.896936,27.558829,714,"[ул. Кирова, 19, Мінск, Беларусь]",IT Services,220030.0
3,Silver Screen Cinemas,"Бобруйская ул., 6",53.890475,27.55381,937,"[Бобруйская ул., 6, Мінск, 220030, Беларусь]",Multiplex,220030.0
4,Kew London Bar,проспект Независимости 18,53.89958,27.557199,993,"[проспект Независимости 18, Мінск, Беларусь]",Bar,220030.0
5,Grand Café,"ул. Ленина, 2",53.902347,27.556641,1258,"[ул. Ленина, 2 (Интернациональная ул.), Мінск,...",French Restaurant,220030.0
6,Spa Beijing Hotel,Гостиница «Пекин»,53.892471,27.577848,685,"[Гостиница «Пекин» (Красноармейская ул., 36), ...",Spa,220050.0
7,Svobody 4,"пл. Свободы, 4",53.904034,27.554967,1475,"[пл. Свободы, 4, Мінск, Беларусь]",Wine Bar,220030.0
8,Burgerlab,ул. Октябрьская 19к4,53.891083,27.573041,425,"[ул. Октябрьская 19к4, Мінск, Беларусь]",Burger Joint,220030.0
9,Vetka-Kvetka Flower Bar,"ул. Янки Купалы, 25",53.905385,27.561948,1424,"[ул. Янки Купалы, 25 (Интернациональная ул.), ...",Flower Shop,220030.0


ooh, there are some records with NaN postalCode then skip them

In [14]:
foursquare_venues = foursquare_venues.dropna()
foursquare_venues.shape

(199, 8)

How many uniq postalCodes we have?

In [17]:
kclusters = foursquare_venues['venue.location.postalCode'].unique()
num_kclusters = len(kclusters)
print(f"Number of postalCodes in Minsk: {num_kclusters}" )

Number of postalCodes in Minsk: 58


we can show all locations as we have and split them by postalCodes

In [88]:
# create map

map_clusters = folium.Map(location=[MINSK_LATITUDE, MINSK_LONGITUDE], zoom_start=12)

# set color scheme for the clusters
x = np.arange(num_kclusters)
ys = [i+x+(i*x)**2 for i in range(num_kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, post, cluster in zip(foursquare_venues['venue.location.lat'], foursquare_venues['venue.location.lng'], foursquare_venues['venue.location.postalCode'], foursquare_venues['venue.location.postalCode']):
    label = folium.Popup('{} - Cluster {}'.format(post, cluster), parse_html=True)
    cluster = int(np.where(kclusters == cluster)[0])
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)

map_clusters.save("MINSK_ZIP_clusters.html")

the following function help us to display ourr map directly inside notebook:  

In [18]:
def inline_map(m):
    from folium import Map
    from IPython.display import HTML, IFrame
    if isinstance(m, Map):
        m._build_map()
        srcdoc = m.HTML.replace('"', '&quot;')
        embed = HTML('<iframe srcdoc="{srcdoc}" '
                     'style="width: 100%; height: 500px; '
                     'border: none"></iframe>'.format(srcdoc=srcdoc))
    elif isinstance(m, str):
        embed = IFrame(m, width=1200, height=600)
    return embed


In [19]:
# and here our current status
inline_map("MINSK_ZIP_clusters.html")

To be sure that we didn't include some non-needed venues that can't be set as catering: e.g. government agencies, transportation and so on

In [20]:
categories_unique = foursquare_venues['venue.categories'].unique()
categories_unique

array(['Yoga Studio', 'Gym / Fitness Center', 'IT Services', 'Multiplex',
       'Bar', 'French Restaurant', 'Spa', 'Wine Bar', 'Burger Joint',
       'Flower Shop', 'Hotel', 'Bookstore', 'Restaurant', 'Theater',
       'Arcade', 'Coffee Shop', 'Waterfront', 'Music Venue', 'Art Museum',
       'Bakery', 'Street Art', 'Tennis Court', 'Park', 'Cocktail Bar',
       'Tea Room', 'Art Gallery', 'City Hall', 'Hostel', 'Opera House',
       'Shoe Store', 'Gastropub', 'Café', 'Blini House', 'Hookah Bar',
       'Italian Restaurant', 'Dance Studio', 'Seafood Restaurant', 'Pub',
       'Beer Bar', 'Church', 'Plaza', 'History Museum',
       'Tapas Restaurant', 'Gym', 'Brewery', 'Museum',
       'Salon / Barbershop', 'Modern European Restaurant', 'Squash Court',
       'Cupcake Shop', 'Aquarium', 'Bistro', 'Karaoke Bar',
       'Auto Workshop', 'Dessert Shop', 'Japanese Restaurant',
       'Botanical Garden', 'Public Art', 'Indian Restaurant',
       'Lingerie Store', 'Pelmeni House', 'Kebab Rest

In [21]:
categories_counts = foursquare_venues['venue.categories'].value_counts()
categories_counts

Coffee Shop                   14
Park                          10
Gym                            9
Gym / Fitness Center           9
Bookstore                      7
Hotel                          7
Café                           6
Multiplex                      5
Cocktail Bar                   5
Spa                            5
Wine Bar                       4
Arcade                         4
Bakery                         4
Bar                            4
Flower Shop                    4
Beer Bar                       4
Big Box Store                  3
Tennis Court                   3
Bistro                         3
Furniture / Home Store         3
French Restaurant              3
Hookah Bar                     3
Italian Restaurant             3
Restaurant                     3
Shopping Mall                  2
Dessert Shop                   2
Beer Store                     2
Lingerie Store                 2
Seafood Restaurant             2
Salad Place                    2
Water Park

Okay, seems like all of them correspond to our requirements and even parks can be be used later for catering. Now we need to resolve our main problem: customer need only 5 warehouses that will be opened in the city, from which goods will be delivered to the nearest points of sale. We will divide all venues into 5 _equivalent_ areas

In [24]:
# import k-means from clustering stage
from sklearn.cluster import KMeans

foursquare_venues.head()

# # set number of clusters
kclusters = 5

venues_clustering = foursquare_venues.drop(["venue.name", "venue.categories", "venue.location.address", "venue.location.formattedAddress",
                                           "venue.location.distance", "venue.location.postalCode"], 1)

venues_clustering.head()
# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(venues_clustering)

# # check cluster labels generated for each row in the dataframe
kmeans.labels_
venues_merged = venues_clustering.copy()

# # add clustering labels
venues_merged["Cluster"] = kmeans.labels_
venues_merged


Unnamed: 0,venue.location.lat,venue.location.lng,Cluster
0,53.88989,27.574825,1
1,53.891044,27.57211,1
2,53.896936,27.558829,1
3,53.890475,27.55381,1
4,53.89958,27.557199,1
5,53.902347,27.556641,1
6,53.892471,27.577848,1
7,53.904034,27.554967,1
8,53.891083,27.573041,1
9,53.905385,27.561948,1


and here we got these 5 areas:

In [25]:
# create map

map_clusters = folium.Map(location=[MINSK_LATITUDE, MINSK_LONGITUDE], zoom_start=12)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i+x+(i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, cluster in zip(venues_merged['venue.location.lat'], venues_merged['venue.location.lng'], venues_merged['Cluster']):
    label = folium.Popup('Cluster {}'.format(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)

map_clusters.save("MINSK_ZIP_by_clusters.html")

inline_map("MINSK_ZIP_by_clusters.html")

##### Conclusion:
As a result, we received 5 equivalent areas in which the customer can place their centers. Given the specifics of the area / development / property prices, the customer now needs to find a suitable place in this area where his warehouse / office will be located